My name is Vicky Lucas and I’m the Training Manager for the Institute for Environmental Analytics, and along with Jon Blower, will be introducing you to big data and the environment and telling you more about our work. This first week of the course focuses on big data. And let’s start by thinking about where we encounter big data. There are so many things around us that are continually processing and producing big data. And that means the mobile phone sitting in your pocket to satellites thousands of kilometres above our heads. The increasing connectivity of sensors is a rapidly growing source of big data, from instruments on roads to monitor traffic flow to intelligent bins that send a message when they need emptying.
Another example is smart meters. Electricity and gas meters have sat in draughty cobwebbed cupboards for decades, and you might have looked at them once in a while to send a reading to your energy provider. But now, with smart meters, these can transmit to a monitor in your house and to the energy company, showing instantly and continually how much energy a home is consuming. And we’ll return to smart meters later in the course. Other places where sensors are found is on aircraft; not only are aeroplanes themselves bristling with sensors for tracking their performance and maintenance, each aeroplane’s position can be reported via satellite.
This location information can be used to show real-time tracks of aeroplanes literally all over the world and it’s fascinating to watch. And health and the human body is an area of big data in itself. To explore DNA requires biology, analysis and computing to come together, this is bioinformatics. The genome is all the genetic material in an organism, and the benefits of greater understanding of genomes is to find new ways to treat or prevent diseases. And it’s a truly big data challenge, raw data for each human genome is 100 Gigabytes. For research scientists, it’s the ability to process and analyse the big data easily which is important here. Another everyday example of big data is the internet.
Much of the backbone of the internet itself is about big data. You can type ‘Justin Bieber’ into a search engine and be returned 60 million results in two-thirds of a second. It’s the effective searching of the data and then quick access to the right data which is the challenge here. Or I can type ‘guitar’ into an online store and get 300,000 results, from instruments to books to novelty socks. So a big data challenge here, is to help the user find the needle in the haystack, that sensible sorting and searching is vital to find the right information, and we’ll concentrate on data discovery as the theme for Week 2. Of course, social media is an ever expanding source of big data.
As of June 2017 Facebook reached two billion monthly active users and five new profiles are created every second. It’s estimated that a 300 million photos are uploaded to Facebook every day. And other sources of big data include fitness trackers, the stock market and census data. But what about environmental data? The one I’ll talk about here is weather forecasts. Weather forecasts rely on massive computer simulations run every day. And weather forecasts have been testing the upper boundaries of big data since their inception.
In the early 1900s, as meteorology emerged as a separate subject to physics, Vilhelm Bjerknes, a Norwegian scientist, published a paper suggesting that it would be possible to predict future weather by solving the equations that describe the air as a fluid flowing around the atmosphere. In the 1920s, Lewis Fry Richardson, a British scientist, envisaged a forecast factory with thousands of ‘human computers’ solving the equations on paper, the people all sat in a vast circular hall with one conductor at the centre and everyone’s calculations simulating the evolving weather. Paper calculations weren’t the solution, computers were. And the first computer, ENIAC, was developed by Hungarian-American John von Neumann. It created the first computer weather forecast in 1950.
The one day forecast took more than 24 hours to execute, and yet by 1955 there were two computer forecasts issued by the US every day. And today, some of the most powerful supercomputers in the world are dedicated to weather and climate modelling, with their speed measured in petaflops, and a petaflop is one million billion calculations per second. And this last example brings us neatly to where environmental data meets big data needs, which we’ll talk about more throughout the course. The environment is a source of big data, because the Earth is so vast. There’s so much to measure, from air pressure, to the colour and temperature of oceans, to the land coverage of forests and crops.
The environment also requires big data solutions for our effective analysis and understanding of environmental systems. Only by understanding how the environment works can we hope to both use and protect it at the same time.