Learn more about this course.

Beyond 95%

Learn about 95% intervals

So far we’ve generally used 95% for confidence intervals. However, this is only by convention, and we can choose any confidence interval we want. We’ve seen that the values for error displays are calculated independent of the plots, so to change how error bars are drawn we just calculate their sizes.

Do note that we will not go into too much detail on how to change the confidence interval. You should be able to adjust it by tweaking the function you’re using to calculate it.

Plotting with mean and standard error

For example, in our average MPG plots, we can switch to an 80% confidence interval by changing one argument to our interval function call.

Want to keep
learning?

This content is taken from
FutureLearn online course,

Data Visualisation with Python: Seaborn and Scatter Plots

View Course

Code:

series_names = []
means = []
errors = []

confidence = 0.80

Next, we have already filtered the MPG to a specific origin.

Code:

for origin in mpg["origin"].unique():
mpg_for_origin = mpg[mpg["origin"] == origin]

Then we calculate the mean, count, and standard error of the data:

Code:

mean = mpg_for_origin["mpg"].mean()
count = len(mpg_for_origin)
std_error = mpg_for_origin["mpg"].sem()

Next, calculate the 80% confidence interval using the code snippet given here, as you have already mentioned previously, for it to be 0.80 in this example.

Code:

ci = st.t.interval(confidence, count - 1, loc=mean, scale=std_error)

11: Lastly, let us draw this plot by adding details of the figure, axes, and labels for the plot.

Code:

fig, ax = plt.subplots()
fig.set_size_inches(8, 8)
ax.bar(series_names, means, yerr=errors, facecolor="lightgreen", ecolor="red", capsize=3)
ax.set_xlabel("Origin")
ax.set_ylabel("MPG")

Output:

Screenshot of error bars shown on a bar chart for errors other than 95% confidence intervals. There are short red vertical lines on top of each bar chart. Y-axis labelled "MPG" reads 0, 5, 10, 15, 20, 25, 30. X-axis labelled "Origin" reads usa, japan, europe. The "usa" bar goes up to 20. The "japan" bar goes up to 30. The "europe" bar goes up to the in between 25 and 30. Click to enlarge

Do you notice how it changes the plot for the smaller error bars compared to the previous one?

This is what we would expect, as now there is only an 80% chance that the true mean lies between those points, rather than a 95% chance previously.

Plotting with mean, standard error, and standard deviation

We could also just plot the error bars using the standard deviation of the data. This assumes the data is normally distributed though, which it probably won’t be, so we’ll just show it as a demonstration here.

Be sure to follow the steps on your Notebook.

Step 1

We will begin just the way we did previously with the following code.

Code:

series_names = []
means = []
errors = []

confidence = 0.80

Step 2

The standard deviation can be calculated from the standard error of the data, which is calculated with the sem method.

Code:

for origin in mpg["origin"].unique():
 mpg_for_origin = mpg[mpg["origin"] == origin]
 mean = mpg_for_origin["mpg"].mean()
 count = len(mpg_for_origin)
 std_error = mpg_for_origin["mpg"].sem()
 sd = std_error * np.sqrt(count)
 series_names.append(origin)
 means.append(mean)
 errors.append(sd)

Step 3

We then just add the standard deviation to the errors list and plot it onto the bar chart.

Code:

fig, ax = plt.subplots()
fig.set_size_inches(8, 8)
ax.bar(series_names, means, yerr=errors, facecolor="lightgreen", ecolor="red", capsize=3)
ax.set_xlabel("Origin")
ax.set_ylabel("MPG")

Output:

Click to enlarge

It’s not advisable to deviate too far from the 95% confidence interval as it’s usually assumed that when an error bar is seen, that’s what it represents. If you do choose not to use 95% confidence intervals or standard deviations, you should make note of this somewhere.

Additional learning: Bootstrapping

That was about Matplotlib; Seaborn uses bootstrapping to calculate the 95% confidence interval of data. In essence, it’s a method of repeatedly resampling from a sample of the population, which gives good estimates of the true mean and 95% confidence.

We will not be going into the details of this, however you may want to learn in further detail, click on the video from The University of Auckland’s Professor Chris Wild for a great introduction to this technique, which helps to explain the process in more detail, step by step.

Watch: Confidence Intervals from Bootstrap resampling(8:23) [1]

Do you see any difference?

What difference do you find between the last two outputs (you would have had the same outputs on your Jupyter Notebooks as well)?

Share your observations with your fellow learners in the comments.

References

Confidence Intervals from Bootstrap re-sampling [Video]. Wild About Statistics; 2015 Apr 1. Available from: https://www.youtube.com/watch?v=iN-77YVqLDw

Want to keep learning?

This content is taken from FutureLearn online course

Data Visualisation with Python: Seaborn and Scatter Plots

View Course

See other articles from this course

This article is from the free online

Data Visualisation with Python: Seaborn and Scatter Plots

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

Beyond 95%

Plotting with mean and standard error

Want to keep
learning?

Data Visualisation with Python: Seaborn and Scatter Plots

Plotting with mean, standard error, and standard deviation

Step 1

Step 2

Step 3

Additional learning: Bootstrapping

Do you see any difference?

References

Want to keep learning?

Data Visualisation with Python: Seaborn and Scatter Plots

Data Visualisation with Python: Seaborn and Scatter Plots

Data Visualisation with Python: Seaborn and Scatter Plots

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Learn more about this course.

Beyond 95%

Plotting with mean and standard error

Want to keep learning?

Data Visualisation with Python: Seaborn and Scatter Plots

Plotting with mean, standard error, and standard deviation

Step 1

Step 2

Step 3

Additional learning: Bootstrapping

Do you see any difference?

References

Want to keep learning?

Data Visualisation with Python: Seaborn and Scatter Plots

Share this

Data Visualisation with Python: Seaborn and Scatter Plots

Data Visualisation with Python: Seaborn and Scatter Plots

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Want to keep
learning?