Skip main navigation

Sources of Big Data

Where do we encounter big data? Watch Vicky Lucas highlight a range of sources of big data and highlight the potential use for research.
My name is Vicky Lucas and I’m the Training Manager for the Institute for Environmental Analytics, and along with Jon Blower, will be introducing you to big data and the environment and telling you more about our work. This first week of the course focuses on big data. And let’s start by thinking about where we encounter big data. There are so many things around us that are continually processing and producing big data. And that means the mobile phone sitting in your pocket to satellites thousands of kilometres above our heads. The increasing connectivity of sensors is a rapidly growing source of big data, from instruments on roads to monitor traffic flow to intelligent bins that send a message when they need emptying.
Another example is smart meters. Electricity and gas meters have sat in draughty cobwebbed cupboards for decades, and you might have looked at them once in a while to send a reading to your energy provider. But now, with smart meters, these can transmit to a monitor in your house and to the energy company, showing instantly and continually how much energy a home is consuming. And we’ll return to smart meters later in the course. Other places where sensors are found is on aircraft; not only are aeroplanes themselves bristling with sensors for tracking their performance and maintenance, each aeroplane’s position can be reported via satellite.
This location information can be used to show real-time tracks of aeroplanes literally all over the world and it’s fascinating to watch. And health and the human body is an area of big data in itself. To explore DNA requires biology, analysis and computing to come together, this is bioinformatics. The genome is all the genetic material in an organism, and the benefits of greater understanding of genomes is to find new ways to treat or prevent diseases. And it’s a truly big data challenge, raw data for each human genome is 100 Gigabytes. For research scientists, it’s the ability to process and analyse the big data easily which is important here. Another everyday example of big data is the internet.
Much of the backbone of the internet itself is about big data. You can type ‘Justin Bieber’ into a search engine and be returned 60 million results in two-thirds of a second. It’s the effective searching of the data and then quick access to the right data which is the challenge here. Or I can type ‘guitar’ into an online store and get 300,000 results, from instruments to books to novelty socks. So a big data challenge here, is to help the user find the needle in the haystack, that sensible sorting and searching is vital to find the right information, and we’ll concentrate on data discovery as the theme for Week 2. Of course, social media is an ever expanding source of big data.
As of June 2017 Facebook reached two billion monthly active users and five new profiles are created every second. It’s estimated that a 300 million photos are uploaded to Facebook every day. And other sources of big data include fitness trackers, the stock market and census data. But what about environmental data? The one I’ll talk about here is weather forecasts. Weather forecasts rely on massive computer simulations run every day. And weather forecasts have been testing the upper boundaries of big data since their inception.
In the early 1900s, as meteorology emerged as a separate subject to physics, Vilhelm Bjerknes, a Norwegian scientist, published a paper suggesting that it would be possible to predict future weather by solving the equations that describe the air as a fluid flowing around the atmosphere. In the 1920s, Lewis Fry Richardson, a British scientist, envisaged a forecast factory with thousands of ‘human computers’ solving the equations on paper, the people all sat in a vast circular hall with one conductor at the centre and everyone’s calculations simulating the evolving weather. Paper calculations weren’t the solution, computers were. And the first computer, ENIAC, was developed by Hungarian-American John von Neumann. It created the first computer weather forecast in 1950.
The one day forecast took more than 24 hours to execute, and yet by 1955 there were two computer forecasts issued by the US every day. And today, some of the most powerful supercomputers in the world are dedicated to weather and climate modelling, with their speed measured in petaflops, and a petaflop is one million billion calculations per second. And this last example brings us neatly to where environmental data meets big data needs, which we’ll talk about more throughout the course. The environment is a source of big data, because the Earth is so vast. There’s so much to measure, from air pressure, to the colour and temperature of oceans, to the land coverage of forests and crops.
The environment also requires big data solutions for our effective analysis and understanding of environmental systems. Only by understanding how the environment works can we hope to both use and protect it at the same time.
In this second activity, we explore sources of big data which can inform research by focusing on two examples; satellites and temperature change.
Where do we encounter big data? Listen to Vicky Lucas highlight a range of sources from your mobile phones, to satellites and bioinformatics to the environment and how they can be used to benefit research and business.
Did any of these sources of big data surprise you? Share your thoughts in the comments area below, remember you can ‘like’ or reply to comments made by others.
You may find the following infographic on byte sizes useful as you work through the course.
Byte - 1 character or symbol, kilobyte - a very short story, megabyte - 3.5inch floppy disk, gigabyte - a mini disk, terrabyte - modern day drive, petabyte - 5 years of Earth Observation data, Extabyte - Internet traffic per month in 2004, zettabyte - 250 million DVDs, Yottabyte - 100 microbes
Don’t forget that there’s a glossary available which includes definitions of key words and phrases used on the course.
In this video, Vicky mentions sensors that are connected via the internet; often referred to as Internet of Things. Search engines, online shopping and social media are examples of how the world-wide-web requires big data capabilities to process and disseminate vast quantities of data in fractions of a second. If you’re interested in finding out more about the Internet of Things then the links in the ‘See also’ area below may be of interest.
This article is from the free online

Big Data and the Environment

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education