# Using Pandas with Seaborn

## Numerical plots

Plots are the way for visualising the relationship between variables. These variables can either be numerical (categories such as a group, class, or division) or categorical. We will unpack the former in this step.

Numerical variables are quantitative. Such variables are numerical measures of individuals or elements used in a data set. To display such variables and data, we use numerical or continuous plots. In a nutshell, numerical plots:

• involve numerical variables

• compare trends and display outliers.

For example:

### Scatter plots

Typically, Seaborn integrates with Pandas, so that we can pass a DataFrame to one of its plot functions. You can either choose to create a DataFrame from scratch by adding the set of code, using the DataFrame syntax in the image here, or import an existing file. Pandas will then pull the data out for you.

We will start with creating scatter plots in Seaborn and then move to other types of plots in the subsequent activities.

If you remember, scatter plots are presented when you want to show the relationship between two continuous values, which means there are two variables plotted. The plot displays how one variable gets affected by the other in every fraction of the value in the data set.

For depicting a scatter plot in Seaborn, we will be using the scatterplot() function. Do note that the scatterplot() function takes its arguments as keywords.

Some of the common arguments are as follows:

• data: the Pandas DataFrame that you want to plot.

• x: the name of the column in the DataFrame to source X values from.

• y: the name of the column in the DataFrame to source Y values from.

For now, these arguments should suffice to draw a scatter plot. However, we will introduce you to other arguments as and when we use them.

### Demonstration

Let’s look at an easy way to display the relationship between Iris petal width and size. Follow the steps given below in the Jupyter Notebook you just downloaded in the previous step.

#### Step 1

First, we import Pandas and Seaborn. The convention is to alias Seaborn to sns.

import seaborn as snsimport pandas as pd

#### Step 2

Then, we’ll read the iris.csv file again. You can extract it from the zipped folder again, if you do not have it already from the previous steps.

Code:

data = pd.read_csv("iris.csv")

Output:

#### Step 3

Next, we will filter the data to just the Setosa species.

Code:

setosa = data[data.variety == "Setosa"]setosa

Output:

#### Step 4

Finally, we plot the data by using the following code snippet.

Code:

sns.scatterplot( data=setosa,  x="petal.length", y="petal.width",)

Output:

In the output, we just tell the scatterplot function to source the x-ordinate from the ‘petal.length’ column in the DataFrame and y-ordinate from the ‘petal.width’ column.

## The third variable

When we introduced you to scatter plots in Course 1, and again in this one, we mentioned that they are used to plot two variables. But, if we have a third variable that equally plays a role in showing the comparison of values, then with Seaborn it can be automated. How is that possible?

You can display the third variable (in the example below, the variety of the Iris species) by either adding hue or by adding a style to the scatters on your plot.

Seaborn can automatically help us compare data with more than two variables. One simple way is to plot different data series in different colours (or hues, as Seaborn refers to them). To illustrate, let us draw another scatter plot, but tell Seaborn to add a new colour (hue) for each variety of Iris.

You will be using the entire DataFrame that was read from the Iris dataset CSV for this plot, and not just Setosa.

Code:

sns.scatterplot( data=data,  x="petal.length", y="petal.width", hue="variety")

Output:

As you can see, the Iris variety is treated as the third variable in this resulting plot.

Another simple way to plot the third variable would be to depict different data series in different styles of scatter icons. To illustrate, let us draw another scatter plot using the same data set, but tell Seaborn to add a new style for each variety of Iris.

For demonstration, you will be again using the entire DataFrame that was read from the Iris data set for this plot.

Code:

sns.scatterplot( data=data,  x="petal.length", y="petal.width", hue="variety", style="variety",)

Output:

As you can see, the Iris variety is treated as the third variable in this resulting plot.

Refer to: Seaborn documentation [1]

## Did you notice?

In the previous example, Seaborn has gone ahead and labelled the axes automatically for you, which saves you having to do it with extra code. What else did you notice was different in plotting using Seaborn as compared to Matplotlib?

## References

1. Seaborn.scatterplot [Document]. Seaborn; [date unknown]. Available from: https://seaborn.pydata.org/generated/seaborn.scatterplot.html