£199.99 £139.99 for one year of Unlimited learning. Offer ends on 14 November 2022 at 23:59 (UTC). T&Cs apply

Find out more
Skip main navigation


Sometimes we can’t use the mean, either because we have too much skew, or we have extreme scores. In those situations we’re going to use the median. The median is the point in which half of the data are below you and half of the data are above you. This is functionally going to tell you where the middle of your data is, it’s going to tell you what a typical score would be, but it’s less biassed by skew, and therefore it forms a good backup for the mean. Now our symbol for this is mdn, which essentially just stands for median. So when you see this you know that somebody using the median instead of the mean.
Let’s take a look at this data, for example. In this case, we have age distributions, and in this case, you see that you have some pretty strong positive skew. Most scores are down on the lower end of this distribution, and so therefore you have a lot of positive skew here. Rather than using the mean in this example, the median would be the best measure of the average. There’s so much skew here that is going to pull that mean up toward that long tail, those extreme scores will. The median will give you a better sense as to what the middle of this data are doing, because it doesn’t care what those values are.
I could have somebody who was 100 years old in this data set. I could have a tonne of people who are up there, it doesn’t matter whether they’re up there, it doesn’t matter whether they’re in the 50s or 60s, it doesn’t matter what those values are. This is the beauty of the median. It just doesn’t care what the extreme scores are, so it can’t bias your test statistic. So how do we do the median? How we calculate it? Well, we’re going to organise our scores in order. We’re going to sort them from smallest to largest. Then we’ll just pick the middle score, or if you’ve got two middle scores, the average of those.
So for example, say I’ve got the numbers 2, 4, and 90. Doesn’t matter that that 90 is an extremely large score. The middle score is 4. So here we see the median in all its glory being unbiased by that score of 90. Or if we’ve got a couple of middle scores, 2, 4, 5, 90, we’re just going to have a median a 4.5, because we’re just going to average those two middle scores. So there we go. It’s a beautiful back up for the mean. It doesn’t care what the extreme scores are, and it is therefore a useful substitute when we’ve got too much extreme skew or extreme scores to use the mean.
This article is from the free online

Essential Mathematics for Data Analysis in Microsoft Excel

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education