Skip main navigation

Getting numerical data into Python

Getting numerical data into Python
© Coventry University. CC BY-NC 4.0

Suppose we have a question, some data, and we know a little bit of Python. How do we get our data into Python in order to start exploring it?

Getting data into Python is usually quite easy. Here we’re going to focus on numerical data, rather than text, audio or video.

For tiny datasets, we can just type (or copy-and-paste) the data directly into our Python code. For larger datasets, we can import data from a text file, an Excel spreadsheet, a database or from many other sources. For huge datasets, referred to as Big Data, we would need a different approach because the data may not fit onto one computer.

Tiny datasets

The table below shows the number of births in the USA for each possible day of the week, covering the years 2000-2014.

Day of the week Number of births
Monday 9316001
Tuesday 10274874
Wednesday 10109130
Thursday 10045436
Friday 9850199
Saturday 6704495
Sunday 5886889

Since this table is such a tiny dataset (even though the counts are quite large), we can type this data directly into Python. In the Python code below, we store the two columns separately as two Python lists.

weekdays = ["Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"]
births = [ 9316001, 10274874, 10109130, 10045436, 9850199, 6704495, 5886889]
print(weekdays)
print(births)

A list is a data structure that holds several values and keeps them in the order that they are given. We recognise a list from the square brackets. The items in the list can be mixtures of strings, numbers, and data of other types (even lists, so you could have lists of lists).

For tiny datasets, it’s ok to store it within your actual Python code, but for all other data, it’s not a good idea. There’s always a danger of miscopying data or not remembering where you got the data from. Also, if you wish to update your dataset, you will have to change the values in your code.

Getting a bit more advanced

Python is one of the most commonly used programming languages for data science. One reason for this is that it is easy to get started performing data analysis using these four Python libraries: NumPy, Pandas, SciPy and Matplotlib. They are part of a collection of Python libraries known as the SciPy ecosystem. They provide data structures and functions for loading, storing, processing, analysing and plotting data.

The Pandas library provides a function for reading data directly from a comma separated values (CSV) file. That file could be a local file or a file on a website.

Have a look at the original dataset
US_births_2000-2014_SSA.csv from which the table above is a summary. Notice how there is one header line (giving the names of the columns of data) and then one row for each day. The Python code below reads the data from this CSV file directly into Python, ready for further processing.

import pandas as pd
url = 'https://raw.githubusercontent.com/fivethirtyeight/data/master/births/US_births_2000-2014_SSA.csv'
mydata = pd.read_csv(url)
print(mydata)

Of course, Pandas has functions for reading from all sorts of different sources. CSV is commonly used because it’s easy to read and write using Microsoft Excel or any text editor. In the next step, we’ll go through an example of how you can use a CSV file in Python.

In summary, for a very small amount of data, it is often quick and easy to type (or copy, paste and edit) data directly into a Python script, but for larger datasets, we can read the data directly from a file.

References

FiveThirtyEight. (2020, June 26). FiveThirtyEight / data. GitHub. https://github.com/fivethirtyeight/data

FiveThirtyEight. (n.d.). US_births_2000-2014_SSA.csv [Dataset]. GitHub. https://raw.githubusercontent.com/fivethirtyeight/data/master/births/US_births_2000-2014_SSA.csv

Hoffman, C. (2018). What is a CSV file, and how do I open it? How-To Geek. https://www.howtogeek.com/348960/what-is-a-csv-file-and-how-do-i-open-it/

© Coventry University. CC BY-NC 4.0
This article is from the free online

Get ready for a Masters in Data Science and AI

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education