Skip main navigation

Using Pandas with Seaborn

Learn more about using Pandas with Seaborn

Numerical plots

Plots are the way for visualising the relationship between variables. These variables can either be numerical (categories such as a group, class, or division) or categorical. We will unpack the former in this step.

Numerical variables are quantitative. Such variables are numerical measures of individuals or elements used in a data set. To display such variables and data, we use numerical or continuous plots. In a nutshell, numerical plots:

  • involve numerical variables
  • compare trends and display outliers.

For example:

Scatter plots and line plots

Graphic shows 2 charts with the same information displayed differently. Left side graph: X-axis labelled "Time, s" reads from left to right 0, 10, and 20. Y-axis labelled "Response, %" reads from the bottom to the top shows 0, 10, 20, 30 40. Left side chart contains a zigzag dotted lined graph that goes upward. It starts at 0 of the y-axis and 0 of the x-axis. There's a steady upward and downward movement of the dotted lines until it reaches the end which is just above 20 of the y-axis and 20 of the x-axis. Right side chart contains the same data only this time there's no line connecting the dots. Only the dots are plotted on the chart.

Scatter plots

Typically, Seaborn integrates with Pandas, so that we can pass a DataFrame to one of its plot functions. You can either choose to create a DataFrame from scratch by adding the set of code, using the DataFrame syntax in the image here, or import an existing file. Pandas will then pull the data out for you.

Graphic shows a screenshot from Jupyter Notebook that shows creating a data frame from scratch by adding the set of code using data frame syntax. 3 steps included with 3 arrows next to each one, pointing to the code relevant to the step. First step reads "Import Seaborn and Pandas". Second step reads "Add you own Dataframe from scratch". Last step reads "Import an existing file and use the Dataframe that Pandas provides for you".

We will start with creating scatter plots in Seaborn and then move to other types of plots in the subsequent activities.

If you remember, scatter plots are presented when you want to show the relationship between two continuous values, which means there are two variables plotted. The plot displays how one variable gets affected by the other in every fraction of the value in the data set.

Graphic with four graphs showing "strong, positive, linear", :moderate, negative, linear", "null/no relationship", and "moderate, negative, lienar" respectively.

For depicting a scatter plot in Seaborn, we will be using the scatterplot() function. Do note that the scatterplot() function takes its arguments as keywords.

Some of the common arguments are as follows:

  • data: the Pandas DataFrame that you want to plot.
  • x: the name of the column in the DataFrame to source X values from.
  • y: the name of the column in the DataFrame to source Y values from.

For now, these arguments should suffice to draw a scatter plot. However, we will introduce you to other arguments as and when we use them.

Demonstration

Let’s look at an easy way to display the relationship between Iris petal width and size. Follow the steps given below in the Jupyter Notebook you just downloaded in the previous step.

Step 1

First, we import Pandas and Seaborn. The convention is to alias Seaborn to sns.

import seaborn as sns
import pandas as pd

Step 2

Then, we’ll read the iris.csv file again. You can extract it from the zipped folder again, if you do not have it already from the previous steps.

Code:

data = pd.read_csv("iris.csv")

Output:

Screenshot of the jupyter notebook output displaying the relationship between Iris petal width and size. It shows a table with 6 columns and 12 rows. The first row contains the headings for columns 2 to 5. Column 2 heading reads "sepal.length". Column 3 heading reads "sepal.width". Column 4 heading reads "petal.length". Column 5 heading reads "petal.width". Column 6 heading reads "variety". Row 2 reads from left to right: 0, 5.1, 3.5, 1.4, 0.2, Setosa. Row 3 reads from left to right: 1, 4.9, 3.0, 1.4, 0.2, Setosa. Row 4 reads from left to right: 2, 4.7, 3.2, 1.3, 0.2, Setosa. Row 5 reads from left to right: 3, 4.6, 3.1, 1.5, 0.2, Setosa. Row 6 reads from left to right: 4, 5.0, 3.6, 1.4, 0.2, Setosa. Row 7 shows ellipses across all 6 columns. Row 8 reads from left to right: 145, 6.7, 3.0, 5.2, 2.3, Virginica. Row 9 reads from left to right: 146, 6.3, 2.5, 5.0, 1.9, Virginica. Row 10 reads from left to right: 147, 6.5, 3.0, 5.2, 2.0, Virginica. Row 11 reads from left to right: 148, 6.2, 3.4, 5.4, 2.3, Virginica. Row 12 reads from left to right: 149, 5.9, 3.0, 5.1, 1.8, Virginica. Below the table it reads "150 rows x 5 columns".

Step 3

Next, we will filter the data to just the Setosa species.

Code:

setosa = data[data.variety == "Setosa"]
setosa

Output:

Screenshot of the jupyter notebook output displaying the relationship between Iris petal width and size. It shows a table with 6 columns and 14 rows. The first row contains the headings for columns 2 to 5. Column 2 heading reads "sepal.length". Column 3 heading reads "sepal.width". Column 4 heading reads "petal.length". Column 5 heading reads "petal.width". Column 6 heading reads "variety". Row 2 reads from left to right: 0, 5.1, 3.5, 1.4, 0.2, Setosa. Row 3 reads from left to right: 1, 4.9, 3.0, 1.4, 0.2, Setosa. Row 4 reads from left to right: 2, 4.7, 3.2, 1.3, 0.2, Setosa. Row 5 reads from left to right: 3, 4.6, 3.1, 1.5, 0.2, Setosa. Row 6 reads from left to right: 4, 5.0, 3.6, 1.4, 0.2, Setosa. Row 7 reads from left to right: 5, 5.4, 3.9, 1.7, 0.4, Setosa. Row 8 reads from left to right: 6, 4.6, 3.4, 1.4, 0.3, Setosa. Row 9 reads from left to right 7, 5.0, 3.4, 1.5, 0.2, Setosa. Row 10 reads from left to right: 8, 4.4, 2.9, 1.4, 0.2, Setosa. Row 11 reads from left to right: 9, 4.9, 3.1, 1.5, 0.1, Setosa. Row 12 reads from left to right: 10, 5.4, 3.7, 1.5, 0.2, Setosa. Row 13 reads from left to right: 11, 4.8, 3.4, 1.6, 0.2, Setosa. Row 14 reads from left to right: 12, 4.8, 3.0, 1.4, 0.1, Setosa.

Step 4

Finally, we plot the data by using the following code snippet.

Code:

sns.scatterplot(
data=setosa,
x="petal.length",
y="petal.width",
)

Output:

Screenshot of the jupyter notebook output displaying the relationship between Iris petal width and size. The image show a scatter plot. X-axis labelled "petal.length" reads from left to right: 1.0, 1.2, 1.4, 1.6, 1.8. Y-axis labelled "petal.width" reads from bottom to top: 0.1, 0.2, 0.3, 0.4, 0.5, 06. Most of the dots are located on the region of y-axis: 0.2 to 0.4 and x-axis: 1.2 to beyond 1.6.

In the output, we just tell the scatterplot function to source the x-ordinate from the ‘petal.length’ column in the DataFrame and y-ordinate from the ‘petal.width’ column.

The third variable

When we introduced you to scatter plots in Course 1, and again in this one, we mentioned that they are used to plot two variables. But, if we have a third variable that equally plays a role in showing the comparison of values, then with Seaborn it can be automated. How is that possible?

You can display the third variable (in the example below, the variety of the Iris species) by either adding hue or by adding a style to the scatters on your plot.

Adding hue

Seaborn can automatically help us compare data with more than two variables. One simple way is to plot different data series in different colours (or hues, as Seaborn refers to them). To illustrate, let us draw another scatter plot, but tell Seaborn to add a new colour (hue) for each variety of Iris.

You will be using the entire DataFrame that was read from the Iris dataset CSV for this plot, and not just Setosa.

Code:

sns.scatterplot(
data=data,
x="petal.length",
y="petal.width",
hue="variety"
)

Output:

Screenshot of the jupyter notebook output. The image is a table that represents adding colour to another scatterplot in Seaborn for the Iris flower varieties, Setosa, Versicolor, Virginica. There's a legend on the top left hand corner. The blue dot represents Setosa. The orange dot represents Versicolor. The green dot represents Virginica. The x-axis is labelled "petal.length". It reads from left to right: 1, 2, 3, 4, 5, 6, 7. The y-axis is labelled "petal.width". It reads from bottom to top: 0.0, 0.5, 1.0, 1.5, 2.0, 2.5. Cluster of blue dots is in the region of x-axis 1 to 2 and y-axis 0.0 to 0.5 Cluster of orange dots is in the region of x-axis 3 to 5 and y-axis 1.0 to 1.5. Cluster of green dots is in the region of x-axis 4 to 7 and y-axis 1.5 to 2.5.

As you can see, the Iris variety is treated as the third variable in this resulting plot.

Adding style

Another simple way to plot the third variable would be to depict different data series in different styles of scatter icons. To illustrate, let us draw another scatter plot using the same data set, but tell Seaborn to add a new style for each variety of Iris.

For demonstration, you will be again using the entire DataFrame that was read from the Iris data set for this plot.

Code:

sns.scatterplot(
data=data,
x="petal.length",
y="petal.width",
hue="variety",
style="variety",
)

Output:

Screenshot of the jupyter notebook output. The image is a table that represents adding styles to the colours in your scatterplot in Seaborn. There's a legend on the top left hand corner. The blue dot represents Setosa. The orange crossmark represents Versicolor. The green square represents Virginica. The x-axis is labelled "petal.length". It reads from left to right: 1, 2, 3, 4, 5, 6, 7. The y-axis is labelled "petal.width". It reads from bottom to top: 0.0, 0.5, 1.0, 1.5, 2.0, 2.5. Cluster of blue dots is in the region of x-axis 1 to 2 and y-axis 0.0 to 0.5 Cluster of orange crossmarks is in the region of x-axis 3 to 5 and y-axis 1.0 to 1.5. Cluster of green squares is in the region of x-axis 4 to 7 and y-axis 1.5 to 2.5.

As you can see, the Iris variety is treated as the third variable in this resulting plot.

To add more features and arguments to your plots, check the link to the Seaborn documentation for reference and follow the steps.

Refer to: Seaborn documentation [1]

Did you notice?

In the previous example, Seaborn has gone ahead and labelled the axes automatically for you, which saves you having to do it with extra code. What else did you notice was different in plotting using Seaborn as compared to Matplotlib?

References

  1. Seaborn.scatterplot [Document]. Seaborn; [date unknown]. Available from: https://seaborn.pydata.org/generated/seaborn.scatterplot.html
This article is from the free online

Data Visualisation with Python: Seaborn and Scatter Plots

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education