Skip to 0 minutes and 9 seconds Cloud computing encompasses a range of techniques and solutions that mean that we don’t necessarily have to own the resources, the computing resources, that we need to solve a particular problem. We can access those resources in the cloud, so to speak. And they may be provided by some of the big IT providers that many of us will be familiar with. And that might be something like Dropbox, or Microsoft’s OneDrive, or Apple’s iCloud. And in this case, the data are being managed for you by this provider in the cloud, and you access it from your devices whenever you need it. So for handling big data, it’s really quite important to have enough computational resources to be able to handle the data efficiently.
Skip to 0 minutes and 44 seconds But it’s not just about the hardware that’s involved. It’s also about understanding the algorithms, the software, and the mathematics behind handling the data in order to get the most out of the information that we have. And one aspect of that is having quite efficient and fast access to the underlying data. It’s no good having a big computer that can process data very quickly if it can’t read the data quickly enough to keep up with that processing. So it’s very important to have the right kind of computers, as well as they’re being sufficiently powerful.
Skip to 1 minute and 11 seconds So with big data problems, we often find that we’re not just limited by how fast we can crunch the numbers, but simply by how fast we can move the data around. So when we’re designing computing architectures for big data solutions, we have to think very carefully about that, and design them for very fast data transfer. So to take JASMIN, as an example, that’s designed specifically for high performance data processing as opposed to high performance simulation. And that means we need to design the computer slightly differently from how we were designing a different kind of supercomputer.
Skip to 1 minute and 40 seconds For example, it means that we need computing power and high performance data transfer to be co-located in the same machine so that we can take advantage of both those capabilities. I think it’s very interesting to look at what some of the large IT providers are doing in this area of big data. And we might take examples like Netflix, and Google, which make it very, very easy for us to access large amounts of data from a variety of devices. And I think we can learn a lot from that in terms of making data more usable, more accessible, and more relevant to the situation that the user finds themselves in.
Skip to 2 minutes and 10 seconds One good example of technology that’s transferred over from the large providers is what’s called the “Map Reduce Algorithm,” which was published originally as a paper from Google and has now seen much wider adoption in solutions such as Hadoop, which is a classic big data solution that’s very widely used. It’s a generic algorithm that can be applied to many, many different kinds of problems, and it involves splitting up a large problem into a large number of small problems, and distributing those problems across a larger computer. But it’s important to realise that there is no one solution that will address all big data problems. And in this course, we’ll look at some of the variety of techniques that we can use.
Skip to 2 minutes and 47 seconds Over the next five years, I see the biggest challenges in environmental data analytics being around not just technical solutions for how we physically deal with large amounts of data, but also usability solutions. How do we make sure that the right data gets into the hands of the right people and they can understand it in order to make a useful decision?
Big data computing
In this video, Dr Jon Blower explains why you need sufficiently powerful computers to process big data to ensure it is usable, accessible and relevant for users.
Jon mentions Map Reduce, if you are interested in learning more about this algorithm the following article may be of interest.
You’ve already heard about ENIAC, one of the first general purpose computers, developed in the US in the 1940s and explored the 17 petabytes of storage at JASMIN, now we’d really like to see your own pictures of computers, data storage and transmission (both old and new). You can upload your pictures to our course picture wall. Please note: this link takes you to an external website Padlet - you can find information on using Padlet here.
© University of Reading and Institute for Environmental Analytics