Skip main navigation

What is Big Data?

.

In the first step of this activity, Graeme gives us an overview of what classifies something as ‘big data’.

What is Big Data?

The term big data originally referred to the problem of supercomputers generating so much data output that was problematic to process and understand. While there is some discussion about what qualifies something as big data, it mostly refers to when we try to draw meaning out of data and are struggling to do so because of one or more of the Three Vs.

Three Vs: Volume, Velocity, and Variety

  • Volume – when there is so much data that it becomes difficult to sort through and manage. This can present as a logistical problem where the amount of computing power needed is extreme, or that it’s hard to ascertain certain boundaries of the data; where does it start, where does it end? A common example of this is historical data warehouse stores for large organisations.
  • Velocity – when data is arriving into your system at such a fast rate that it’s difficult to process, report on, and/or store. A common example is dealing with information on social media platforms for large or popular organisations, or aggregating data from smart sensors built into an internet of things (IoT) device.
  • Variety – when data in a store or arriving is so varied that it becomes hard to establish commonalities, relationships, and/or trends in the data.

A big data problem is generally considered to be when some combination of the above makes it difficult for a traditional database system to handle the data.

Over the next few steps, we’ll explore tools and techniques for dealing with high-volume and high-velocity data problems using tools within the Azure platform.

Note: Big data was a term first coined in 1997 by astronomers and NASA researchers Michael Cox and David Ellsworth. If you’re interested, a copy of the original research paper where the term was first used can be found in the Downloads section below.

For more insight on the question, ‘What is big data?’, read this article by IBM.

In the following step, we’ll take a look at working with high volumes of data in storage, also known as big data at rest.

This article is from the free online

Microsoft Future Ready: Fundamentals of Big Data

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now