Skip main navigation

Highlighting data

Learn about highlighting data

Sometimes you might want to highlight selected data points on a plot with colours and highlight some data points with different colours. Other times, you might want to present data points in different colours and annotate them with text.

Graphic depicting various generic examples of charts.

Now, we’ll use Matplotlib in Python and see examples as demonstrations of how to highlight selected data points on different plots with a different colour.

Let’s see how we can highlight data in our plots, to draw attention to it; this is a way of using the preattentive attributes of colour.

There are three different types of plots that we will look into for highlighting data.

  1. Scatter plot
  2. Line plot
  3. Bar plot

Scatter plots

A scatter plot (also known as scatter diagrams or x-y graphs) is a type of data visualisation that shows the relationship between different variables. Each data on the graph looks scattered, giving this type of data visualisation its name. Scattered data is shown on the graph by placing various data points between an x- and y-axis.

Screenshot from Jupyter Notebook that shows a blank plot. X and Y axis both show 0, 10, 20, 30, 40, 50 to 80. Setosa (Blue) dots are around 25-30: 25-30 and 50:50-60; Versicolor (green) dots are around 20:20-30; and Virginica (red) dots are between 30:25-35Click to enlarge

When should you use a scatter plot?

Scatter plots are generally used to observe and show relationships between two numeric variables. The dots in a scatter plot describe the values of individual data points and patterns when the data are taken as a whole. A scatter plot should be used to:

  • identify correlational relationships of data
  • identifying patterns in data
  • analyse unexpected gaps in the data
  • identify if there are any outlier points.

Demonstration: Highlighting scatter plots

Let’s look at how to highlight particular data points in a scatter plot. For this example, we will return to the Iris data set. Follow the steps to proceed.

Step 1

First, import the Pandas and Matplotlib libraries.

Code:

import pandas as pd
import matplotlib.pyplot as plt

Step 2

Then use the following code to import the Iris data set into Matplotlib that you might already have extracted from the zipped folder.

Code:

iris_data = pd.read_csv("iris.csv")

Step 3

Next, input Iris data variety as Versicolor, Setosa, and Virginica.

Code:

versicolor = iris_data[iris_data.variety == "Versicolor"]
setosa = iris_data[iris_data.variety == "Setosa"]
virginica = iris_data[iris_data.variety == "Virginica"]

Step 4

Set the axis and figure size on the subplot. Here we will take eight by eight.

Code:
fig, ax = plt.subplots()
fig.set_size_inches(8, 8)

Output:

Screenshot from Jupyter Notebook that shows a blank plot. X and Y axis both show 0, 0.2, 0.4, 0.6, 0.8, 1.Click to enlarge

The figure shows a blank plot as the result of the code.

Step 5

Now, let’s adjust the colour, size (length, width), and add labels to species in Iris data.

Code:

ax.scatter(versicolor["petal.length"], versicolor["petal.width"], marker="x", label="Versicolor", facecolor="green")
ax.scatter(setosa["petal.length"], setosa["petal.width"], label="Setosa", marker="x", facecolor="blue")
ax.scatter(virginica["petal.length"], virginica["petal.width"], label="Virginica", marker="x", facecolor="red")
ax.set_xlabel("Petal Length (cm)")
ax.set_ylabel("Petal Width (cm)")
ax.set_title("Iris Petal Sizes")
ax.legend()

Output:

Screenshot from Jupyter Notebook that shows iris petel sizes and labeled with 3 different colors for categorization. Y axis is petal width cm (0.0, 0.5, 1.0, 1.5, 2.0, 2.5) and x axis is petal length cm (1, 2, 3, 4, 5, 6, 7). Setosa (blue) is lower left, Versicolor (green) is mid range, Virginica (red) is high right.Click to enlarge

The figure shows petal length and width and species highlighted with different colour markers as the result of the code.

Step 6

Highlight a specific variety of Iris. Set the facecolor of the other species to lightgrey and highlight the Versicolor data points.

Code:

ax.scatter(versicolor["petal.length"], versicolor["petal.width"], marker="x", label="Versicolor", facecolor="green")
ax.scatter(setosa["petal.length"], setosa["petal.width"], label="Setosa", marker="x", facecolor="lightgrey")
ax.scatter(virginica["petal.length"], virginica["petal.width"], label="Virginica", marker="x", facecolor="lightgrey")
fig

Output:

Screenshot from Jupyter Notebook that shows iris petel sizes and labeled with 3 different colors for categorization. Y axis is petal width cm (0.0, 0.5, 1.0, 1.5, 2.0, 2.5) and x axis is petal length cm (1, 2, 3, 4, 5, 6, 7). Setosa (blue) is lower left, Versicolor (green) is mid range, Virginica (red) is high right. Green is highlighted.Click to enlarge

The figure shows petal length and width and Versicolor species highlighted with green markers as the result of the code.

Step 7

Now, let’s highlight the colour of specific points on the Iris data set (most points in the Iris plot clearly appear to belong to a specific variety; however, there are a few points between Versicolor and Virginia that could belong to either of those varieties).

To highlight just these points, we need to first identify them and put them into their own DataFrames. Here is how the points were selected for the next example:

Code:

vs_overlaps = versicolor[versicolor["petal.length"] > 4.9]
vg_overlaps = virginica[(virginica["petal.width"] < 1.75) & (virginica["petal.length"] < 5.2)]

(If the Versicolor petal length is >4.9CM, we consider it an outlier; if the Virginica petal length is <5.2CM and width <1.75, then that is an outlier.)

Step 8

Then, we’ll draw the entire data sets in light grey. This will look familiar:

Code:

fig, ax = plt.subplots()
fig.set_size_inches(8, 8)
ax.scatter(versicolor["petal.length"], versicolor["petal.width"], marker="x", facecolor="lightgrey")
ax.scatter(setosa["petal.length"], setosa["petal.width"], label="Setosa", marker="x", facecolor="lightgrey")
ax.scatter(virginica["petal.length"], virginica["petal.width"], marker="x", facecolor="lightgrey")

Output:

Screenshot from Jupyter Notebook that shows iris petel sizes with no data highlighted. All in grey.Click to enlarge

The figure shows the entire data set as light grey as the result of the code.

Note: The Versicolor or Virginica points are not labelled here. This is so that they do not appear on the legend twice.

Step 9

Next, we plot the filtered points. Since we are plotting these after the other points have been plotted, they will be placed on top of the existing grey points (i.e. we did not also have to remove the filtered points from the original data frames).

Code:

ax.scatter(vs_overlaps["petal.length"], vs_overlaps["petal.width"], label="Versicolor", marker="x", facecolor="green")
ax.scatter(vg_overlaps["petal.length"], vg_overlaps["petal.width"], label="Virginica", marker="x", facecolor="red")

Step 10

And lastly, give these data sets axis labels and titles so that they appear on the legend with the correct colours.

Code:

ax.set_xlabel("Petal Length (cm)")
ax.set_ylabel("Petal Width (cm)")
ax.set_title("Iris Petal Sizes")
ax.legend()

Output:

Screenshot from Jupyter Notebook that shows iris petel sizes and labeled with 3 different colors for categorization. Y axis is petal width cm (0.0, 0.5, 1.0, 1.5, 2.0, 2.5) and x axis is petal length cm (1, 2, 3, 4, 5, 6, 7). Setosa (blue) is lower left, Versicolor (green) is mid range, Virginica (red) is high right. Two green and three red points in the centre are highlghted.Click to enlarge

The figure shows the filtered points and labels as the result of the code.

Learn more about highlighting the points on scatter plots by reading this article in which a user asks how to highlight three points with labels and coordinates on it. Their question and the detailed response will help improve your knowledge of highlighting data.

Read: Highlight 3 points in scatter plot with label on it [1]

Next, we have line plots and bar plots.

Share with us!

Which plot did you find most interesting to highlight? Why?

Share your thoughts in the comment section below.

References

  1. Phoenix. Highlight 3 points in scatter plot with label on it [Forum]. MATLAB Answers. MathWorks; 2019 Jun 30. Available from: https://www.mathworks.com/matlabcentral/answers/469556-highlight-3-points-in-scatter-plot-with-label-on-it
This article is from the free online

Data Visualisation with Python: Seaborn and Scatter Plots

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now