Want to keep learning?

This content is taken from the The University of Glasgow's online course, Getting Started with Teaching Data Science in Schools. Join the course to learn more.

Skip to 0 minutes and 1 second JEREMY: When you think of a table, you might imagine a piece of furniture with four legs and a wooden top, like this. However, in the context of data science, a table encodes a data set. It’s normally a two-dimensional grid of values, like you might see in a textbook or you might fill in when you’re doing a science experiment. We’ve got an example table here. This is just a toy data set. And you can see we’ve got the rows going across, with one special row, which is recording the headings for us, to tell us what each entry describes. The columns here are individual features for each of the entries.

Skip to 0 minutes and 57 seconds You can see in this data set, we’ve got three entries here, 1, 2, 3. And they each record information about a country– Scotland, England, and China.

Skip to 1 minute and 10 seconds There are different operations we might want to do on our data set. So for instance, if we count the number of rows, excluding the header row, then we know how many elements or entries there are in the data set. Here, we’ve got three entries. If we count the number of columns, then we know what measurements or information we’re recording about each country. 1, 2, 3, 4– we’re recording four pieces of information here about each member of the set. Other operations we might want to perform include finding the maximum value for one of the columns. So for instance, the maximum population is China, with 1.386 billion people.

Skip to 2 minutes and 0 seconds So we find the maximum in the column, then we read across to find out which entry this is. Other summary statistics to compute might include the minimum values. Conveniently here, Scotland seemed to have the minimum value for population and area. And we can also look at averages. The median, for instance, the middle value here, there’s England that has the middle value for both population and area.

Skip to 2 minutes and 27 seconds When you store a data set on disk, it might be saved as an Excel spreadsheet. Or you can store it as a comma separated value, CSV, file. This is a plain text encoding, so it normally takes much less memory than the corresponding Excel spreadsheet. When you load the information into memory to do analysis and processing, you load the table into a data frame in Python. We’re going to look at data frames and explore them in more detail, and do different operations on them in the next practical exercise. Do have a go.

Data lives in tables

Individual data values, or elements, are organized into tables to describe larger sets of data.

A table might be represented as a spreadsheet, which is often saved as a CSV file.

In our coding class, a table is referred to as a data frame.

Share this video:

This video is from the free online course:

Getting Started with Teaching Data Science in Schools

The University of Glasgow