Skip main navigation

The ‘data’ story

What is data? What are its types and sources?

As the Cambridge Dictionary defines it, data is facts and statistics collected for reference, examination, or analysis that is used to help decision-making. [1]

For example, at a manufacturing unit, the price of the product, costs incurred, and the number of units manufactured would be the data. In the context of data science and computing, data is the information that has been transformed into a format that is efficient for processing.

Data forms the foundation of data science, and organisations are looking at several ways to collect and process data effectively to make data-driven decisions.

So, where does all the data come from?

Sources of data

A data source is a location from where the data is originating from. For instance, imagine a cosmetic brand selling products online. To display the stock of an item, the website sources its information from its warehouse database. In this case, the warehouse database is the data source.

As a data scientist, you need to know the source of your data before analysis to ensure its authenticity, reliability, and validity.

Now, watch the video below to learn about the five common sources of data.

Common sources of data

Types of data

Understanding the different types of data will equip data scientists to choose the right set of tools, methods, and technology to process it.

As seen earlier, data is of two broad types – structured and unstructured. The table below compares the characteristics of both these types in detail.

Aspect Structured data Unstructured data
Definition Structured data adhere to a predefined data model that describes how the data is recorded, stored, queried, and analysed. Unstructured data does not follow a predefined data model.
Storage Structured data can be stored in traditional relational databases. Unstructured data can be stored only in non-relational databases, data lakes, and data warehouses.
Accessibility Structured data can be processed by computers/machines. Unstructured data requires human intervention, artificial intelligence, or machine learning to process it.
Examples Examples of structured data are names of people, dates, phone numbers, product names, identification numbers, and so on. Examples of unstructured data are images, videos, social media posts, product reviews, text messages, blogs, and so on.
Format Structured data is often arranged in a tabular format where columns represent attributes and rows represent records. Unstructured data could take the form of a block of text or multimedia content. It does not follow a specified format.

The rise of big data

‘Big data’ is an enormously large and complex data set. This image shows just how ‘big’ big data really is.

Graphic shows Big Data “Data Sizes”. The reference is the letter size and the size of a book. They are: Bit = 1/8 of a letter. Nibble (1.2 byte) = ½ a letter. Byte (1 byte) = 1 letter. Megabyte (1,024 Kilobytes) = 1 book. Gigabyte (1,024 Megabytes) = 1,600 books. Terabyte (1,024 Gigabytes) = 1,600,000 books. Petabyte (1,024 Terabytes) = 160,000,000 books. Exabyte (1,024 petabytes) = 1,600,000,000,000 books.Click to enlarge

Big data can include any digitally stored data, content shared over the internet, social media, data recorded by sensors built-in to devices such as GPS or fitness trackers, and anything connected to the internet of things. It could also be the traditional data from point-of-sale transactions or product identification via barcodes or RFID.

Commonly used database software tools currently fall short in handling such high volumes and variety of data; therefore, many new tools, methods, frameworks, and technologies have emerged along with the rise of big data. Data science is the field of study that leverages the scientific method and frameworks to analyse big data, reveal patterns and trends, draw insights, build models, and create predictions to inform business decisions and practices.

Next, let’s learn more about the field of data science.

References

  1. data [Internet]. Cambridge Dictionary; [date unknown]. Available from: https://dictionary.cambridge.org/dictionary/english/data.
This article is from the free online

Introduction to Data Science for Business

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education