Skip to 0 minutes and 9 seconds One of the most common challenges we find when working with data is simply identifying and getting hold of the right data to answer the particular question at hand. And we can be quite creative in finding other data sources that contain information that help us to solve whatever problem it might be. And to pick one example, we recently worked with Highways England on the problem of identifying fog patches on motorway networks, which is a very difficult problem for various reasons. Unfortunately, we don’t have direct measurements of fog from sensors that we can trust and that are sufficiently good quality.
Skip to 0 minutes and 40 seconds But we were able to look at traffic patterns and movements of traffic, speeds of traffic, in the different lanes of the motorway, and use that as a kind of proxy measurement that might be indicative of the presence of fog. Understanding the quality of the data, by which I really mean the fitness for purpose of the data to address a particular solution, is extremely important. For example, shadows in satellite images might affect the quality of the image. We might be looking at gaps in sensor records caused by weather conditions or by interruptions in internet connectivity or something like that. And we really need to understand all those particular nuances of the data in order to be able to use them effectively.
Skip to 1 minute and 18 seconds In many of our projects, we need to combine data from lots of different sources. One of the key challenges is simply that data collected by different organisations may be registered differently. For example, they may use different names for the same place. They may use different ways of identifying a position on the Earth’s surface. Or there may be even more difficult challenges than that. So we need to really understand the nature of the data in order to be able to combine them successfully. When getting to grips with the complexities of data from different sources, one of the most important techniques that we use is visualisation.
Skip to 1 minute and 48 seconds To create a picture of the data that we can understand as humans, that give us a lot of insight into the nature of the data, what it’s good for, what it’s not so good for, and also for communicating the results of our work to our customers and to the wider world. We have a lot of technical solutions and computing solutions for working with big data in all its various forms, but what’s really important to remember is that we really need experts, human experts, domain experts, who understand the nature of the data and how it can be applied in any given situation.
As you’ve worked through this course and watched industry experts discuss their big data projects, you’ve heard about many of the challenges when working with big data. Watch Dr Jon Blower summarise these and highlight some of the possible solutions.
© University of Reading and Institute for Environmental Analytics