Skip main navigation

Confidence bands

Learn about Confidence Bands
Scatter plots are used to show the values of discrete data, and line plots are used to show the values of continuous data where ‘in-between’ values can be interpolated.

Similarly, point estimates show uncertainty at one point while confidence bands show uncertainty over a range of points. For line plots, confidence bands are preferable. We’ve already seen Seaborn doing this automatically for us.

To draw confidence bands in Matplotlib, we make use of the Axes.fill_between method. This takes at least two arguments but to draw error bands we’ll use it with three.

  • The first argument is a sequence of x-positions of a curve.
  • The second and third are corresponding y-positions of the curve. The area between these two y-positions will be filled in.

As an aside, if we call the ‘fill_between‘ method with only two arguments, the area between the curve and the x-axis will be filled.

Demonstration: Errors on line plots

Let us bring back another dataset we used previously to demonstrate this with our time series data of temperatures in New York for the first week of June 2016.

Step 1

First, let us import the date class from the ‘datetime’ module.

Code:

from datetime import date

Step 2

We then draw the figure and axes for the plot we are intending to display.

Code:
fig, ax = plt.subplots()
fig.set_size_inches(12, 8)

We then read the NewYorkHourly.csv on the Notebook, parsing the dates and columns.

Code:

weather_data = pd.read_csv("New_York_Hourly.csv",
parse_dates=[["date", "TimeEST"]],
usecols=["date", "TimeEST", "TemperatureF", "Dew PointF", "Humidity"]
)
june_weather = weather_data[
(weather_data["date_TimeEST"] >= '2016-06-01') & (weather_data["date_TimeEST"] < '2016-06-08')
].sort_values("date_TimeEST")

Step 4

We don’t have any actual data for what sort of error there might be in the data, so we’ll generate some by assuming there might be a ±5% error for any data point.

Code:

error_min = june_weather["TemperatureF"] * 0.95
error_max = june_weather["TemperatureF"] * 1.05

Step 5

We plot the figure so far with the code snippet below.

Code:

ax.plot(june_weather["date_TimeEST"], june_weather["TemperatureF"])
ax.set_xlabel("Date")
ax.set_ylabel("Temperature (ºF)")
fig

Output:

Screenshot of confidence bands shown with the help of a line chart. Y-axis is labelled "Temperature (°F) reads from bottom to top: 65, 70, 75, 80, 85. X-axis is labelled "Date" reads from left to right: 2016-06-01, 2016-06-02, 2016-06-03, 2016-06-04, 2016-06-05, 2016-06-06, 2016-06-07, 2016-06-08. There is a single erratic zigzag blue line. Line starts from just below 75 on y-axis and 2016-06-01 on x-axis. The line ends in between 75 and 80 on y-axis and 2016-06-08 on x-axis. Click to enlarge

Step 6

Then, we just fill between those two points using the x-values from the original data (the date and time).

Code:

ax.fill_between(june_weather["date_TimeEST"], error_min, error_max, color="red", alpha=0.1)

The colour of the band is set with the color argument, and we set 10% opacity by setting the alpha to 0.1.

Output:

Screenshot of confidence bands shown with the help of a line chart using the fill_between method. Y-axis is labelled "Temperature (°F) reads from bottom to top: 65, 70, 75, 80, 85. X-axis is labelled "Date" reads from left to right: 2016-06-01, 2016-06-02, 2016-06-03, 2016-06-04, 2016-06-05, 2016-06-06, 2016-06-07, 2016-06-08. There is an erratic zigzag blue line that is on top of a thicker pink line. Line starts from just below 75 on y-axis and 2016-06-01 on x-axis. The line ends in between 75 and 80 on y-axis and 2016-06-08 on x-axis. Click to enlarge

In the output, you can see how the colours are distributed along the line plot to depict probability.

There are more ways to use the fill_between method, and you can read more about it at the official documentation in the link here:

Read: Fill_between documentation [1]

Reflect and share

Here you wrote a program using the fill_between method on a line plot. Can you think of examples or use cases where you might want to use this method in a combined plot of both scatter and line?

References

  1. matplotlib.pyplot.fill_between [Document]. Matplotlib; 2020. Available from: https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.fill_between.html
This article is from the free online

Data Visualisation with Python: Seaborn and Scatter Plots

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education