Skip main navigation

Using Pandas with Seaborn

Learn more about using Pandas with Seaborn

Numerical plots

Plots are the way for visualising the relationship between variables. These variables can either be numerical (categories such as a group, class, or division) or categorical. We will unpack the former in this step.

Numerical variables are quantitative. Such variables are numerical measures of individuals or elements used in a data set. To display such variables and data, we use numerical or continuous plots. In a nutshell, numerical plots:

  • involve numerical variables
  • compare trends and display outliers.

For example:

Scatter plots and line plots

Scatter plots

Typically, Seaborn integrates with Pandas, so that we can pass a DataFrame to one of its plot functions. You can either choose to create a DataFrame from scratch by adding the set of code, using the DataFrame syntax in the image here, or import an existing file. Pandas will then pull the data out for you.

Graphic shows a screenshot from Jupyter Notebook that shows creating a data frame from scratch by adding the set of code using data frame syntax. 3 steps included with 3 arrows next to each one, pointing to the code relevant to the step. First step reads "Import Seaborn and Pandas". Second step reads "Add you own Dataframe from scratch". Last step reads "Import an existing file and use the Dataframe that Pandas provides for you".

We will start with creating scatter plots in Seaborn and then move to other types of plots in the subsequent activities.

If you remember, scatter plots are presented when you want to show the relationship between two continuous values, which means there are two variables plotted. The plot displays how one variable gets affected by the other in every fraction of the value in the data set.

Graphic with four graphs showing "strong, positive, linear", :moderate, negative, linear", "null/no relationship", and "moderate, negative, lienar" respectively.

For depicting a scatter plot in Seaborn, we will be using the scatterplot() function. Do note that the scatterplot() function takes its arguments as keywords.

Some of the common arguments are as follows:

  • data: the Pandas DataFrame that you want to plot.
  • x: the name of the column in the DataFrame to source X values from.
  • y: the name of the column in the DataFrame to source Y values from.

For now, these arguments should suffice to draw a scatter plot. However, we will introduce you to other arguments as and when we use them.

Demonstration

Let’s look at an easy way to display the relationship between Iris petal width and size. Follow the steps given below in the Jupyter Notebook you just downloaded in the previous step.

Step 1

First, we import Pandas and Seaborn. The convention is to alias Seaborn to sns.

import seaborn as sns
import pandas as pd

Step 2

Then, we’ll read the iris.csv file again. You can extract it from the zipped folder again, if you do not have it already from the previous steps.

Code:

data = pd.read_csv("iris.csv")

Output:

Step 3

Next, we will filter the data to just the Setosa species.

Code:

setosa = data[data.variety == "Setosa"]
setosa

Output:

Step 4

Finally, we plot the data by using the following code snippet.

Code:

sns.scatterplot(
data=setosa,
x="petal.length",
y="petal.width",
)

Output:

Screenshot of the jupyter notebook output displaying the relationship between Iris petal width and size. The image show a scatter plot. X-axis labelled "petal.length" reads from left to right: 1.0, 1.2, 1.4, 1.6, 1.8. Y-axis labelled "petal.width" reads from bottom to top: 0.1, 0.2, 0.3, 0.4, 0.5, 06. Most of the dots are located on the region of y-axis: 0.2 to 0.4 and x-axis: 1.2 to beyond 1.6.

In the output, we just tell the scatterplot function to source the x-ordinate from the ‘petal.length’ column in the DataFrame and y-ordinate from the ‘petal.width’ column.

The third variable

When we introduced you to scatter plots in Course 1, and again in this one, we mentioned that they are used to plot two variables. But, if we have a third variable that equally plays a role in showing the comparison of values, then with Seaborn it can be automated. How is that possible?

You can display the third variable (in the example below, the variety of the Iris species) by either adding hue or by adding a style to the scatters on your plot.

Adding hue

Seaborn can automatically help us compare data with more than two variables. One simple way is to plot different data series in different colours (or hues, as Seaborn refers to them). To illustrate, let us draw another scatter plot, but tell Seaborn to add a new colour (hue) for each variety of Iris.

You will be using the entire DataFrame that was read from the Iris dataset CSV for this plot, and not just Setosa.

Code:

sns.scatterplot(
data=data,
x="petal.length",
y="petal.width",
hue="variety"
)

Output:

Screenshot of the jupyter notebook output. The image is a table that represents adding colour to another scatterplot in Seaborn for the Iris flower varieties, Setosa, Versicolor, Virginica. There's a legend on the top left hand corner. The blue dot represents Setosa. The orange dot represents Versicolor. The green dot represents Virginica. The x-axis is labelled "petal.length". It reads from left to right: 1, 2, 3, 4, 5, 6, 7. The y-axis is labelled "petal.width". It reads from bottom to top: 0.0, 0.5, 1.0, 1.5, 2.0, 2.5. Cluster of blue dots is in the region of x-axis 1 to 2 and y-axis 0.0 to 0.5 Cluster of orange dots is in the region of x-axis 3 to 5 and y-axis 1.0 to 1.5. Cluster of green dots is in the region of x-axis 4 to 7 and y-axis 1.5 to 2.5.

As you can see, the Iris variety is treated as the third variable in this resulting plot.

Adding style

Another simple way to plot the third variable would be to depict different data series in different styles of scatter icons. To illustrate, let us draw another scatter plot using the same data set, but tell Seaborn to add a new style for each variety of Iris.

For demonstration, you will be again using the entire DataFrame that was read from the Iris data set for this plot.

Code:

sns.scatterplot(
data=data,
x="petal.length",
y="petal.width",
hue="variety",
style="variety",
)

Output:

Screenshot of the jupyter notebook output. The image is a table that represents adding styles to the colours in your scatterplot in Seaborn. There's a legend on the top left hand corner. The blue dot represents Setosa. The orange crossmark represents Versicolor. The green square represents Virginica. The x-axis is labelled "petal.length". It reads from left to right: 1, 2, 3, 4, 5, 6, 7. The y-axis is labelled "petal.width". It reads from bottom to top: 0.0, 0.5, 1.0, 1.5, 2.0, 2.5. Cluster of blue dots is in the region of x-axis 1 to 2 and y-axis 0.0 to 0.5 Cluster of orange crossmarks is in the region of x-axis 3 to 5 and y-axis 1.0 to 1.5. Cluster of green squares is in the region of x-axis 4 to 7 and y-axis 1.5 to 2.5.

As you can see, the Iris variety is treated as the third variable in this resulting plot.

To add more features and arguments to your plots, check the link to the Seaborn documentation for reference and follow the steps.

Refer to: Seaborn documentation [1]

Did you notice?

In the previous example, Seaborn has gone ahead and labelled the axes automatically for you, which saves you having to do it with extra code. What else did you notice was different in plotting using Seaborn as compared to Matplotlib?

References

  1. Seaborn.scatterplot [Document]. Seaborn; [date unknown]. Available from: https://seaborn.pydata.org/generated/seaborn.scatterplot.html
This article is from the free online

Data Visualisation with Python: Seaborn and Scatter Plots

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now