Skip main navigation

Data lives in tables

Data sets are organized in a tabular form. In this unplugged demo, Jeremy Singer explains how to operate on data in tables.
JEREMY: When you think of a table, you might imagine a piece of furniture with four legs and a wooden top, like this. However, in the context of data science, a table encodes a data set. It’s normally a two-dimensional grid of values, like you might see in a textbook or you might fill in when you’re doing a science experiment. We’ve got an example table here. This is just a toy data set. And you can see we’ve got the rows going across, with one special row, which is recording the headings for us, to tell us what each entry describes. The columns here are individual features for each of the entries.
You can see in this data set, we’ve got three entries here, 1, 2, 3. And they each record information about a country– Scotland, England, and China.
There are different operations we might want to do on our data set. So for instance, if we count the number of rows, excluding the header row, then we know how many elements or entries there are in the data set. Here, we’ve got three entries. If we count the number of columns, then we know what measurements or information we’re recording about each country. 1, 2, 3, 4– we’re recording four pieces of information here about each member of the set. Other operations we might want to perform include finding the maximum value for one of the columns. So for instance, the maximum population is China, with 1.386 billion people.
So we find the maximum in the column, then we read across to find out which entry this is. Other summary statistics to compute might include the minimum values. Conveniently here, Scotland seemed to have the minimum value for population and area. And we can also look at averages. The median, for instance, the middle value here, there’s England that has the middle value for both population and area.
When you store a data set on disk, it might be saved as an Excel spreadsheet. Or you can store it as a comma separated value, CSV, file. This is a plain text encoding, so it normally takes much less memory than the corresponding Excel spreadsheet. When you load the information into memory to do analysis and processing, you load the table into a data frame in Python. We’re going to look at data frames and explore them in more detail, and do different operations on them in the next practical exercise. Do have a go.

Individual data values, or elements, are organized into tables to describe larger sets of data.

A table might be represented as a spreadsheet, which is often saved as a CSV file.

In our coding class, a table is referred to as a data frame.

This article is from the free online

Getting Started with Teaching Data Science in Schools

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education