Skip main navigation

SD Part 1

.
17
Imagine you have a variable with a mean of 13.25. Imagine we know that this is kind of a bell curve, not very skewed, so you know that’s a good measure of what the average score is. But that doesn’t really tell you where most scores are. After all, most people aren’t exactly at the average. So if I want to know where most scores are, I need to have a little bit more information. I need to know like 13.25 plus or minus what? This is the standard deviation. The standard deviation tells us where scores are relative to the mean. Do they hug the mean pretty closely, are most scores pretty close to the mean, or scores pretty spread out.
57.7
If I have this information along with the mean I can know where most of my scores are. In fact I can visualise the histogram in my head. Here’s an example of a histogram. So here’s a histogram with a mean of 13.25, and you see here again just visually illustrating what I just said, most scores are not at the mean. In fact, if you look at how far scores are spread out, on average scores in this data set are 1.11 points away from the mean. So what I’m saying is, if I take every one of those scores and I look at how far they are from the mean, the average distance from the mean is 1.11.
96
That is exactly what the standard deviation measures. It is by definition the average distance from the mean. So this tells me the average score is 1.11 units from the mean. And so I know that most scores are going to hug that mean pretty closely. Here’s another example. I’ve doubled that distance and you see how much more spread out these scores are. In this case, the standard deviation is 2.22. So just by knowing how far scores are spread out from the mean, I can get a good sense as to what the data is doing. Now just another fun factoid, within one standard deviation in a typical bell curve variable is about 68% of scores.
136.8
So roughly if I know the mean is, say, 13.25, plus or minus 1 or 2 points, I know roughly about 70% of scores are within that range. So it’s just a useful way to quantify where my scores are and what my variable is doing.

Lesson 2: SD vs. IQR

There are a couple different ways to measure the variation or “spread” in a data set: the standard deviation (which we also touched on in Module 1) and the interquartile range. In this lesson, we’ll break down how each of these works and find out when to use one over the other.

Lab: SD and IQR

In the same way that the different measures of center are more or less useful depending on the type of data, the standard deviation and the interquartile range can each be more or less effective with different data types. In this lab, we’ll use Excel formulas to calculate both, and we’ll explore why the IQR is more useful with skewed data.

The lab instructions can be downloaded as a PDF file here.

The data set for this lab can be viewed here. From the link, copy and paste all the data into a new worksheet in Excel Online.

(Note: This lab uses the same data set you used in the labs for the previous module.)

This article is from the free online

Essential Mathematics for Data Analysis in Microsoft Excel

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education