Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only T&Cs apply

Find out more

Mean Pros and Cons

.
0
[LOGO MUSIC PLAYING]
16.9
Next, I want to talk about the pros and cons of the mean, when you would want to use it, and some situations where you might want to avoid it. Here’s an example histogram with a variable. I’ve got the mean graphed here as a red line. There’s some advantages to the mean. The first is that every score in this distribution contributes. Right? I add up every score. I divide by the number of scores. Every score gets a vote. And in that sense, that mean is representing your whole data set very well. Also, a single outlier, say a single extreme score, isn’t going to influence this very much. Look at this histogram.
53.8
I do have at the upper end, maybe a little bit of positive skew. I’ve got some really high values up there. But they’re not swaying or biassing the mean very much because it’s just a couple scores and every score gets to influence it. So a single or small number of outliers don’t really have that big of an impact, typically. However, these same strengths in some situations can turn into weaknesses. In this example, I’ve got a small sample. And now, every score is counting. And I’ve got one really big score up there at 14.
89.1
And the problem now, of course, is that in sufficient percentages, this is a decent percentage of the sample, that mean has been really biassed off the centre. Like, a typical score here is in the two to three range, and yet the average is going over four. So that one big score now is having a big sway over the results. So yes, the mean is great in that every score counts. But if you have enough extreme scores, it can bias the mean to the point that it’s simply not useful. A common situation when this might happen is if you just have a lot of extreme skew. For instance, if we’re working with count data.
128
In this extreme skew example, I’ve got a small number of very large scores. And my mean, that dashed red line, is really being pulled off of where a typical score is. And so when I’ve got a lot of skew, I might want to avoid the mean. In this case, I would recommend using, and people differ on this, but a skewness cut off of around one is a good heuristic for knowing when you might want to avoid the mean and when we might want to consider using a backup measure. [LOGO MUSIC PLAYING]
This article is from the free online

Essential Mathematics for Data Analysis in Microsoft Excel

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now