Learn more about this course.

What Does Big Data Mean?

No doubt you’ve heard the terms ‘big data’ and ‘analytics’ being thrown around in the media. Let’s look at what these concepts really mean. Watch this short video for a quick introduction to what big data is and the possibilities that it holds.

No doubt you’ve heard the terms ‘big data’ and ‘analytics’ being thrown around in the media. Let’s look at what these concepts really mean.

Watch this short video for a quick introduction to what big data is and the possibilities that it holds.

This is an additional video, hosted on YouTube.

Defining Big Data

Want to keep
learning?

This content is taken from
Griffith University online course,

Big Data Analytics: Opportunities, Challenges and the Future

View Course

‘Big data’ refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze.¹

If you look closely at this definition, you can see that it is framed in terms of time. It uses the word ‘typical’, and thus refers to current state-of-the-art technology. So, what we called big data 10 years ago, may not be big data now because the ‘typical’ tools and technologies have changed. And what we call big data now, may not be big data in 5 years.² In the future, we may still use traditional data collection, storage, and processing systems, however, most likely in conjunction with newer systems.

The V’s of Big Data

To determine whether data is big data, we can also consider the V’s that characterise big data. The four most commonly defined V dimensions are volume, variety, velocity, and veracity.³

Volume

Volume refers to the quantity of data to be stored. For example, Walmart deals with big data. They handle more than 1 million customer transactions every hour, importing more than 2.5 petabytes of data into their database. This is about 167 times the amount of information contained in all the books in the US Library of Congress.

The following table lists the different storage capacity units. To put these in context, there are 8,000,000,000,000,000,000,000,000 bits (that’s an eight followed by 24 zeros) in one yottabyte.

Term	Capacity	Abbreviation
Bit	0 or 1 value	b
Byte	8 bits	B
Kilobyte	1024* bytes	KB
Megabyte	1024 KB	MB
Gigabyte	1024 MB	GB
Terabyte	1024 GB	TB
Petabyte	1024 TB	PB
Exabyte	1024 PB	EB
Zettabyte	1024 EB	ZB
Yottabyte	1024 ZB	YB

* Note that because bits are binary in nature and are the basis on which all other storage values are based, all values for data storage units are defined in terms of powers of 2. For example, the prefix kilo typically means 1000; however, in data storage, a kilobyte = 2¹⁰ = 1024 bytes.² ^{(Table 14.1, Storage Capacity Units; p. 651)}

Managing Big Data

To manage big volumes of data, we have two options for handling additional load.²

Scale up, meaning we keep the same number of systems to store and process data, but migrate each system to a larger system.
Scale out, meaning we increase the number of systems, but do not migrate to larger systems.

Velocity

Velocity refers to the speed at which data is entered into a system and must be processed. For example, Amazon captures every click of the mouse while shoppers are browsing on its website.² This happens rapidly.

Velocity is important in stream processing. Think of all the data from radio-frequency identification (RFID), global positioning system (GPS), near-field communication (NFC), and Bluetooth sensors flooding in to a system. Stream processing aims to aggregate single data points from high-velocity data, in order to trigger a high-level event when a certain pattern is detected. It also focuses on deciding which data to keep from a stream, since it is unfeasible to retain all the data that is rushing in.

Variety

Variety refers to the complexity of data formats. Big data consists of different forms of data. For example, when a telecommunications company like Telstra records data on calls to its call centre, this data includes both:

structured data, which conforms to a predefined data model (e.g., your customer ID, the timestamp of your call, your service type), and
unstructured data (e.g., the recording of the call, notes that the call centre operator makes during the call, the problem history related to your call).

Veracity

Veracity refers to the trustworthiness of data. The more data is collected and analysed automatically but not captured in its entirety (due to the high volume and velocity), the higher the uncertainty about the accuracy of data. For example, it is particularly challenging to verify the truthfulness of posts on social media platforms, as we do not always know the posters’ backgrounds and their intentions. In fact, detecting fake reviews, fake news, and fake friends is currently an active research area.

The four V’s as an infographic

The IBM Big Data & Analytics Hub provides an infographic which explains and gives examples of each of the four V’s.

To expand the infographic, click on the image.

Other Big Data V’s

Further V’s that are often mentioned as key characteristics of big data are:

value: how meaningful the data is
visualisation: graphical representations to assist humans in understanding big data.

Hopefully, you now have an idea of what big data is and that you are beginning to get some sense of where all the data is coming from! And so, we ask you to consider the question we opened with: how would you define big data?

References

Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Byers AH. Big data: The next frontier for innovation, competition, and productivity [Internet]. McKinsey Global Institute; 2011[cited 2018 Oct 24]. 143 p. Available from: https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/big-data-the-next-frontier-for-innovation ↩
Coronel C, Morris S. Database systems: Design, implementation, and management. 12th ed. Boston (MA): Cengage Learning; 2016. ↩ ↩² ↩³ ↩⁴
Elmasri R, Navathe SB. Fundamentals of database systems. 7th ed. Pearson; 2017. ↩

Want to keep learning?

This content is taken from Griffith University online course

Big Data Analytics: Opportunities, Challenges and the Future

View Course

See other articles from this course

This article is from the free online

Big Data Analytics: Opportunities, Challenges and the Future

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

What Does Big Data Mean?

Defining Big Data

Want to keep
learning?

Big Data Analytics: Opportunities, Challenges and the Future