Currently set to Index
Currently set to Follow
Skip main navigation

Beyond 95%

Learn about 95% intervals
So far we’ve generally used 95% for confidence intervals. However, this is only by convention, and we can choose any confidence interval we want. We’ve seen that the values for error displays are calculated independent of the plots, so to change how error bars are drawn we just calculate their sizes.
Do note that we will not go into too much detail on how to change the confidence interval. You should be able to adjust it by tweaking the function you’re using to calculate it.

Plotting with mean and standard error

For example, in our average MPG plots, we can switch to an 80% confidence interval by changing one argument to our interval function call.
Code:
series_names = []
means = []
errors = []

confidence = 0.80
Next, we have already filtered the MPG to a specific origin.
Code:
for origin in mpg["origin"].unique():
mpg_for_origin = mpg[mpg["origin"] == origin]
Then we calculate the mean, count, and standard error of the data:
Code:
mean = mpg_for_origin["mpg"].mean()
count = len(mpg_for_origin)
std_error = mpg_for_origin["mpg"].sem()
Next, calculate the 80% confidence interval using the code snippet given here, as you have already mentioned previously, for it to be 0.80 in this example.
Code:
ci = st.t.interval(confidence, count - 1, loc=mean, scale=std_error)
11: Lastly, let us draw this plot by adding details of the figure, axes, and labels for the plot.
Code:
fig, ax = plt.subplots()
fig.set_size_inches(8, 8)
ax.bar(series_names, means, yerr=errors, facecolor="lightgreen", ecolor="red", capsize=3)
ax.set_xlabel("Origin")
ax.set_ylabel("MPG")
Output:
Screenshot of error bars shown on a bar chart for errors other than 95% confidence intervals. There are short red vertical lines on top of each bar chart. Y-axis labelled "MPG" reads 0, 5, 10, 15, 20, 25, 30. X-axis labelled "Origin" reads usa, japan, europe. The "usa" bar goes up to 20. The "japan" bar goes up to 30. The "europe" bar goes up to the in between 25 and 30. Click to enlarge
Do you notice how it changes the plot for the smaller error bars compared to the previous one?
This is what we would expect, as now there is only an 80% chance that the true mean lies between those points, rather than a 95% chance previously.

Plotting with mean, standard error, and standard deviation

We could also just plot the error bars using the standard deviation of the data. This assumes the data is normally distributed though, which it probably won’t be, so we’ll just show it as a demonstration here.
Be sure to follow the steps on your Notebook.

Step 1

We will begin just the way we did previously with the following code.
Code:
series_names = []
means = []
errors = []

confidence = 0.80

Step 2

The standard deviation can be calculated from the standard error of the data, which is calculated with the sem method.
Code:
for origin in mpg["origin"].unique():
mpg_for_origin = mpg[mpg["origin"] == origin]
mean = mpg_for_origin["mpg"].mean()
count = len(mpg_for_origin)
std_error = mpg_for_origin["mpg"].sem()
sd = std_error * np.sqrt(count)
series_names.append(origin)
means.append(mean)
errors.append(sd)

Step 3

We then just add the standard deviation to the errors list and plot it onto the bar chart.
Code:
fig, ax = plt.subplots()
fig.set_size_inches(8, 8)
ax.bar(series_names, means, yerr=errors, facecolor="lightgreen", ecolor="red", capsize=3)
ax.set_xlabel("Origin")
ax.set_ylabel("MPG")
Output:
Screenshot of error bars shown on a bar chart for errors other than 95% confidence intervals. There are long red vertical lines on top of each bar chart. Y-axis labelled "MPG" reads from bottom to top: 0, 5, 10, 15, 20, 25, 30, 35. X-axis labelled "Origin" reads from left to right: usa, japan, europe. The "usa" bar goes up to 20 with a red line that starts from just the middle of 10 and 15 then ends at the middle of 25 and 30 on the y-axis. The "japan" bar goes up to 30 with a red line that starts from 25 then ends beyond 35 on the y-axis. The "europe" bar goes up to the in between 25 and 30 with a red line that starts from just the middle of 20 and 25 then ends at 35 on the y-axis.Click to enlarge
It’s not advisable to deviate too far from the 95% confidence interval as it’s usually assumed that when an error bar is seen, that’s what it represents. If you do choose not to use 95% confidence intervals or standard deviations, you should make note of this somewhere.

Additional learning: Bootstrapping

That was about Matplotlib; Seaborn uses bootstrapping to calculate the 95% confidence interval of data. In essence, it’s a method of repeatedly resampling from a sample of the population, which gives good estimates of the true mean and 95% confidence.
We will not be going into the details of this, however you may want to learn in further detail, click on the video from The University of Auckland’s Professor Chris Wild for a great introduction to this technique, which helps to explain the process in more detail, step by step.
Watch: Confidence Intervals from Bootstrap resampling(8:23) [1]

Do you see any difference?

What difference do you find between the last two outputs (you would have had the same outputs on your Jupyter Notebooks as well)?
Share your observations with your fellow learners in the comments.

References

  1. Confidence Intervals from Bootstrap re-sampling [Video]. Wild About Statistics; 2015 Apr 1. Available from: https://www.youtube.com/watch?v=iN-77YVqLDw
This article is from the free online

Data Visualisation with Python: Seaborn and Scatter Plots

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education

close