Skip main navigation

Comparing groups of data

Learn about comparing groups of data

Using different plots to compare groups of data

We can use selective highlighting to compare groups of data on the same plot. We do not need to go into this in too much detail since it involves reusing and applying techniques you have already seen!

In this step, we will use Matplotlib in Python and see examples and demonstrations of how to compare groups of data on different plots.

Demonstration: Comparing in scatter plots

To highlight points for comparison, observe the demonstration below.

Step 1

Use the same iris.csv file data set that we used in the scatter plot previously.

Step 2

Once again, the file is already imported on Matplotlib so we’ll set the axis and size of the subplots directly. We will take eight by eight.

Code:

fig, ax = plt.subplots()
fig.set_size_inches(8, 8)

Step 3

Draw the points of comparison in colour. Leave the groups that you don’t want to be compared in grey.

Code:

ax.scatter(versicolor["petal.length"], versicolor["petal.width"], marker="x", label="Versicolor", facecolor="lightgrey")
ax.scatter(setosa["petal.length"], setosa["petal.width"], label="Setosa", marker="x", facecolor="blue")
ax.scatter(virginica["petal.length"], virginica["petal.width"], label="Virginica", marker="x", facecolor="red")

Output:

Screenshot from Jupyter Notebook that shows data displayed on a scatter plot with two highlighed iris species for comparision.

The figure shows a comparison between two species highlighted in red and blue as the result of the code.

Demonstration: Comparing in line plots

For a line plot, we will again follow a similar method for highlighting the line. Here, you can choose to highlight the one you want with a different colour.

Step 1

Use the same flight.csv file data set from the zipped file that we used previously.

Step 2

Set the axis and size of the subplots as 12 x 8 (the same as earlier).
Code:

fig, ax = plt.subplots()
fig.set_size_inches(12, 8)

Step 3

Insert the comparison variables on the plot. Here, we will do a comparison between the number of passengers travelled in 1955 and 1958.

Code:

for year in flights["year"].unique():
flights_for_year = flights[flights["year"] == year]
line_color = "#4472C4" if year == 1955 or year == 1958 else "#EFF3F9"
ax.plot(flights_for_year["month"], flights_for_year["passengers"], color=line_color)
fig

Output:

Screenshot from Jupyter Notebook that shows a line plot with 2 blue lines highlighted and rest of the lines are lighter in color to put less emphasis. Y axis is labelled 100, 200, 300, 400, 500, 600. X axis is Jan, Feb. Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec.

The figure shows comparison between the number of passengers travelled in 1955 and 1958 as the result of the code.

Demonstration: Comparing in bar plots

We saw in the last step how to highlight a single bar. To compare multiple bars, highlight them all by setting their colours. Do this by calling set_color on each bar. Look at the demonstration below to understand the process.

Step 1

Use the same flight.csv file data set that we used just now for the line plot.

Step 2

Keep the axis and size of the subplots as 12 by 8 (the same as earlier).

Code:

fig, ax = plt.subplots()
fig.set_size_inches(12, 8)

Step 4

Insert the comparison variables on the plot. Here, we will do a comparison between the number of passengers travelled in 1953 and 1956.

Code:

bars = ax.bar(august_flights["year"], august_flights["passengers"], color="lightgrey")
bars[4].set_color("#4472C4")
bars[7].set_color("#4472C4")

Output:

Screenshot from Jupyter Notebook that shows bar graph and one bar highlighted in blue. Y axis is 100, 200, 300, 400, 500, 600. X axis is 1949 to 1960 is one year increments. The year 1953 (270) and 1956 (400) are highlighted.

The figure shows a comparison between the number of passengers travelled in 1953 and 1956 (highlighted bars) as the result of the code.

In the next step, we will add annotations to the charts, but before that check your knowledge of the activity so far.

How do you approach?

What problems do you hope to solve by comparing groups of data?

Which plot will you most likely use and why?

Share your thoughts in the comment section below.

This article is from the free online

Data Visualisation with Python: Seaborn and Scatter Plots

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education