Skip main navigation

What is Big Data?

.

In the first step of this activity, Graeme gives us an overview of what classifies something as ‘big data’.

What is Big Data?

The term big data originally referred to the problem of supercomputers generating so much data output that was problematic to process and understand. While there is some discussion about what qualifies something as big data, it mostly refers to when we try to draw meaning out of data and are struggling to do so because of one or more of the Three Vs.

Three Vs: Volume, Velocity, and Variety

  • Volume – when there is so much data that it becomes difficult to sort through and manage. This can present as a logistical problem where the amount of computing power needed is extreme, or that it’s hard to ascertain certain boundaries of the data; where does it start, where does it end? A common example of this is historical data warehouse stores for large organisations.
  • Velocity – when data is arriving into your system at such a fast rate that it’s difficult to process, report on, and/or store. A common example is dealing with information on social media platforms for large or popular organisations, or aggregating data from smart sensors built into an internet of things (IoT) device.
  • Variety – when data in a store or arriving is so varied that it becomes hard to establish commonalities, relationships, and/or trends in the data.

A big data problem is generally considered to be when some combination of the above makes it difficult for a traditional database system to handle the data.

Over the next few steps, we’ll explore tools and techniques for dealing with high-volume and high-velocity data problems using tools within the Azure platform.

Note: Big data was a term first coined in 1997 by astronomers and NASA researchers Michael Cox and David Ellsworth. If you’re interested, a copy of the original research paper where the term was first used can be found in the Downloads section below.

For more insight on the question, ‘What is big data?’, read this article by IBM.

In the following step, we’ll take a look at working with high volumes of data in storage, also known as big data at rest.

This article is from the free online

Microsoft Future Ready: Fundamentals of Big Data

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education