Big isn't always better

So far you’ve focused on big data and the vast computing resources needed to store and process it, but small data is just as important. This article investigates four different ways in which smaller data can add value and meaning.

Example one: data compression

Firstly you might have your data as an image. The data stored in a computer as 1s and zeros, could add up to gigabytes to make a picture, depending on the size of the image and the resolution. A way to reduce the size of the data is to use compression algorithms. A good example of this is when you save an image as a jpeg and you are often asked to specify the image quality if you use software for this process. Sacrificing some of the quality by compression allows for smaller file sizes.

Example two: restricting data recorded

Another way to keep data small is to restrict the amount you collect from the outset. You can do this by only recording the most important information. The METAR code which you saw in the previous Step is a good example of recording environmental data in a compact way (ie only what a pilot needs to know) without including any unnecessary extra information.

EGBB - Airfield identifier 161150Z - Time 27011KT- windspeed and direction 4000 - horizontal visibility RA - weather OVC015 - cloudbase and cover 13/11 - air temperature and dewpoint Q1003 - air pressure

From these first two examples you can see that reducing large amounts of data to small data, can result in the data being ‘incomplete’. For example, if you reduce the digital size of an image, some information might be lost in the data compression, and in the case of the METAR code only a limited amount of information is recorded in the first place. A balance has to be struck on how much and what data are needed for your purpose, whether presenting a digital image which looks good to the human eye or whether the right information that you need is preserved in an environmental observation such as a weather report.

The following examples describe two more ways environmental analysis thrives on getting the most out of small data.

Example 3: visualisation

In Step 2.4 you looked at ways of visualising data for power usage in London and added your visualisations to the course Padlet Wall. Visualisations such as bar charts, pie charts or plots help you to group and therefore explore and analyse the data. This could be as simple as a line plot with a value recorded each day, or a time series which tells you how something is changing daily.

example of a line plot

Thoughtfully created visualisations provide a number of advantages. Visualisations can aid analysis by instantly highlighting how different factors vary together. For instance, the maximum temperature at a specific location each day, may vary with the amount of daily sunshine and plotting the two together might start to reveal a relationship between the two. Visualisations can also ease the communication of data to others, to reveal your analysis in an appealing way. Professor Min Chen describes four levels of visualisation later this week. Transforming data into visualisations that make the information immediately accessible to a wide audience can have a significant impact, for example the global temperature spiral created by Ed Hawkin which you saw in Step 1.7.

Ed Hawkin's spiral visualisation on global temperature change (1850 - 2017)

Example four: significant datapoints

When reflecting on the significance of small data, it’s important to remember how valuable data itself can be. Historic datasets, observations of key events and even single observations that challenge the whole, are all examples of how size isn’t everything. The measurements taken by scientists a hundred years ago, a last known sighting of a species, the recording of volcanic eruptions passed down through oral history, all add to the data record and occupy virtually no digital space.

It’s also worth remembering that the gathering, processing and analysis behind big data can be output into a single number. This ultimate in small data might be a threshold value behind a law or policy; the result of years of observation and analysis distilled into a guideline or maximum, such as an air quality standard to protect human health, fishing quotas to maintain sustainable marine populations or a global temperature value to prevent dangerous climate change.

In this final week, you’ll explore how big and small data work together through examples of data visualisations and how these are useful for communicating to a wider audience. You’ll also discover that digitally tiny datasets can add important information alongside big data sources.

Share this article:

This article is from the free online course:

Big Data and the Environment

University of Reading