## Want to keep learning?

This content is taken from the University of Leeds & Institute of Coding's online course, Evidence and Data Collection for Problem Solving. Join the course to learn more.
1.4

# What do we mean by data?

First things first: what IS data?

Data is a collection of facts, information or measurements held in a form that is suitable for processing (usually by computers).

To be useful, data needs to be processed - measured, collected, analysed, interpreted and reported; this allows you to discover patterns and draw meaning from the data as a way of understanding the world. All of those actions involve some element of human planning and intervention. People decide what to measure or what facts to collect, how to programme the computers analyzing the data, and how to structure that data to fit the purpose they’re collecting it for.

There are many different kinds of data.

• Quantitative data is numerical, and assumes a fixed and measurable reality. It is collected from sensors, measurements and counts, and tends to be structured, that is held in systems like spreadsheets or databases, and conforming to recognisable formats. For instance, a date and time has a format that tells you the year, month, day, hour and minute. - 2020-02-17 14:34.

• Qualitative data is not easily expressed in numbers as it is descriptive and subjective. It is generally created by people and is unstructured, in that it does not follow a predictable format. For example, the contents of a photo can be described as unstructured qualitative data. Qualitative data is harder to analyse with computers, and often requires classification or categorisation by humans before it can be processed.

• Data often travels with descriptions or labels which describe what the data contains. This is metadata: data about data. Examples are the headings of columns in a spreadsheet, or the track and artist name stored in an audio file.

• Data is collected in to data sets. These are collections of related data organised so you can easily process it. There are different ways of collecting and organising data, but one of the most common is a data table, made up of rows and columns of related data, like a spreadsheet. Databases are organised collections of data sets stored on computers, held within software which gives you ways of working with that data. Most websites have a database somewhere behind the scenes, storing and presenting information as you use them.

• The internet has given rise to ’big data’, huge unstructured data sets that are hard to process with traditional data analysis methods. ‘Big Data’ datasets might be split across many different servers; they might be changing or being added to very fast. They could also be very large in size, and have a lot of different types of data within them like images, video or text, often of very variable quality.

A good example of big data is everything that’s ever been posted on Facebook: probably over 100 Petabytes of data, which is equivalent to more than 100,000 big laptop hard drives. Big data needs special tools and advanced computing techniques to process it, including new kinds of database software and things like machine learning (artificial intelligence) algorithms, which are very good at spotting trends in data that a human might not see.

Processing raw data, that is those numbers and figures stored on a database, to spot trends and patterns puts it into context and turns data into information, which a human can use to build their knowledge or understanding of a situation. That knowledge can help you solve problems: you could use data to identify the cause of a problem, or provide evidence to help inform solutions. Businesses collect, process and analyse data precisely because it helps them solve problems.