Skip main navigation

Big data computing

Big data requires powerful computers to process it in order to ensure it is usable, accessible and relevant for users. Watch Jon Blower explain more.
Cloud computing encompasses a range of techniques and solutions that mean that we don’t necessarily have to own the resources, the computing resources, that we need to solve a particular problem. We can access those resources in the cloud, so to speak. And they may be provided by some of the big IT providers that many of us will be familiar with. And that might be something like Dropbox, or Microsoft’s OneDrive, or Apple’s iCloud. And in this case, the data are being managed for you by this provider in the cloud, and you access it from your devices whenever you need it. So for handling big data, it’s really quite important to have enough computational resources to be able to handle the data efficiently.
But it’s not just about the hardware that’s involved. It’s also about understanding the algorithms, the software, and the mathematics behind handling the data in order to get the most out of the information that we have. And one aspect of that is having quite efficient and fast access to the underlying data. It’s no good having a big computer that can process data very quickly if it can’t read the data quickly enough to keep up with that processing. So it’s very important to have the right kind of computers, as well as they’re being sufficiently powerful.
So with big data problems, we often find that we’re not just limited by how fast we can crunch the numbers, but simply by how fast we can move the data around. So when we’re designing computing architectures for big data solutions, we have to think very carefully about that, and design them for very fast data transfer. So to take JASMIN, as an example, that’s designed specifically for high performance data processing as opposed to high performance simulation. And that means we need to design the computer slightly differently from how we were designing a different kind of supercomputer.
For example, it means that we need computing power and high performance data transfer to be co-located in the same machine so that we can take advantage of both those capabilities. I think it’s very interesting to look at what some of the large IT providers are doing in this area of big data. And we might take examples like Netflix, and Google, which make it very, very easy for us to access large amounts of data from a variety of devices. And I think we can learn a lot from that in terms of making data more usable, more accessible, and more relevant to the situation that the user finds themselves in.
One good example of technology that’s transferred over from the large providers is what’s called the “Map Reduce Algorithm,” which was published originally as a paper from Google and has now seen much wider adoption in solutions such as Hadoop, which is a classic big data solution that’s very widely used. It’s a generic algorithm that can be applied to many, many different kinds of problems, and it involves splitting up a large problem into a large number of small problems, and distributing those problems across a larger computer. But it’s important to realise that there is no one solution that will address all big data problems. And in this course, we’ll look at some of the variety of techniques that we can use.
Over the next five years, I see the biggest challenges in environmental data analytics being around not just technical solutions for how we physically deal with large amounts of data, but also usability solutions. How do we make sure that the right data gets into the hands of the right people and they can understand it in order to make a useful decision?
In this video, Dr Jon Blower explains why you need sufficiently powerful computers to process big data to ensure it is usable, accessible and relevant for users.
Jon mentions Map Reduce, if you are interested in learning more about this algorithm the following article may be of interest.
You’ve already heard about ENIAC, one of the first general purpose computers, developed in the US in the 1940s and explored the 17 petabytes of storage at JASMIN, now we’d really like to see your own pictures of computers, data storage and transmission (both old and new). You can upload your pictures to our course picture wall. Please note: this link takes you to an external website Padlet – you can find information on using Padlet here.
This article is from the free online

Big Data and the Environment

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education

  • 30% off Futurelearn Unlimited!