Skip main navigation

Mean Pros and Cons

.
0
[LOGO MUSIC PLAYING]
16.9
Next, I want to talk about the pros and cons of the mean, when you would want to use it, and some situations where you might want to avoid it. Here’s an example histogram with a variable. I’ve got the mean graphed here as a red line. There’s some advantages to the mean. The first is that every score in this distribution contributes. Right? I add up every score. I divide by the number of scores. Every score gets a vote. And in that sense, that mean is representing your whole data set very well. Also, a single outlier, say a single extreme score, isn’t going to influence this very much. Look at this histogram.
53.8
I do have at the upper end, maybe a little bit of positive skew. I’ve got some really high values up there. But they’re not swaying or biassing the mean very much because it’s just a couple scores and every score gets to influence it. So a single or small number of outliers don’t really have that big of an impact, typically. However, these same strengths in some situations can turn into weaknesses. In this example, I’ve got a small sample. And now, every score is counting. And I’ve got one really big score up there at 14.
89.1
And the problem now, of course, is that in sufficient percentages, this is a decent percentage of the sample, that mean has been really biassed off the centre. Like, a typical score here is in the two to three range, and yet the average is going over four. So that one big score now is having a big sway over the results. So yes, the mean is great in that every score counts. But if you have enough extreme scores, it can bias the mean to the point that it’s simply not useful. A common situation when this might happen is if you just have a lot of extreme skew. For instance, if we’re working with count data.
128
In this extreme skew example, I’ve got a small number of very large scores. And my mean, that dashed red line, is really being pulled off of where a typical score is. And so when I’ve got a lot of skew, I might want to avoid the mean. In this case, I would recommend using, and people differ on this, but a skewness cut off of around one is a good heuristic for knowing when you might want to avoid the mean and when we might want to consider using a backup measure. [LOGO MUSIC PLAYING]
This article is from the free online

Essential Mathematics for Data Analysis in Microsoft Excel

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education