Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only T&Cs apply

Find out more

Data lives in tables

Data sets are organized in a tabular form. In this unplugged demo, Jeremy Singer explains how to operate on data in tables.
JEREMY: When you think of a table, you might imagine a piece of furniture with four legs and a wooden top, like this. However, in the context of data science, a table encodes a data set. It’s normally a two-dimensional grid of values, like you might see in a textbook or you might fill in when you’re doing a science experiment. We’ve got an example table here. This is just a toy data set. And you can see we’ve got the rows going across, with one special row, which is recording the headings for us, to tell us what each entry describes. The columns here are individual features for each of the entries.
You can see in this data set, we’ve got three entries here, 1, 2, 3. And they each record information about a country– Scotland, England, and China.
There are different operations we might want to do on our data set. So for instance, if we count the number of rows, excluding the header row, then we know how many elements or entries there are in the data set. Here, we’ve got three entries. If we count the number of columns, then we know what measurements or information we’re recording about each country. 1, 2, 3, 4– we’re recording four pieces of information here about each member of the set. Other operations we might want to perform include finding the maximum value for one of the columns. So for instance, the maximum population is China, with 1.386 billion people.
So we find the maximum in the column, then we read across to find out which entry this is. Other summary statistics to compute might include the minimum values. Conveniently here, Scotland seemed to have the minimum value for population and area. And we can also look at averages. The median, for instance, the middle value here, there’s England that has the middle value for both population and area.
When you store a data set on disk, it might be saved as an Excel spreadsheet. Or you can store it as a comma separated value, CSV, file. This is a plain text encoding, so it normally takes much less memory than the corresponding Excel spreadsheet. When you load the information into memory to do analysis and processing, you load the table into a data frame in Python. We’re going to look at data frames and explore them in more detail, and do different operations on them in the next practical exercise. Do have a go.

Individual data values, or elements, are organized into tables to describe larger sets of data.

A table might be represented as a spreadsheet, which is often saved as a CSV file.

In our coding class, a table is referred to as a data frame.

This article is from the free online

Getting Started with Teaching Data Science in Schools

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now