Skip main navigation

Examples of over-extrapolation

Dr Melissa Humphries provides explains real-world examples of over-extrapolation, where predictions were made outside of the range of data available.

Now that you understand the basics of what over-extrapolation is, it’s time to consider some examples of where this has occurred in real-life situations.

One example is that of Moore’s Law, which has been extrapolated for over 50 years. Moore’s Law states that computing power roughly doubles every two years. However, many computer scientists argue that this trend will not continue indefinitely due to fundamental limits. In fact, it can’t continue because it’s bounded by the speed of light and by the physical limitations of how many transistors can be put onto a chip. Gordon Moore himself—who came up with the law in the first place—doesn’t predict that it’s going to last beyond another five years or so.

Another example is the plotting of the transatlantic crossing times (i.e. the time it takes to travel across the Atlantic Ocean).

In 1620, a ship called the Mayflower crossed the Atlantic in around 1000 hours. Later, in the 1700s, Benjamin Franklin also crossed the Atlantic in around 1000 hours. In 1750, if you reviewed these two data points you would probably deduce that travel across the Atlantic Ocean could not be done any faster than around 1000 hours.

Transatlantic crossing times plot with the MF and BF data points highlightedSelect this link to expand the image (you will need to use the ‘back’ button in your browser to return to this page).

In 1750, however, aeroplanes didn’t exist. As Charles Lindberg takes to the skies in the 1900s, the travel times to cross the Atlantic start to rapidly decline. If, at the start of the 1900s you were going to make a travel time prediction based on Benjamin Franklin and a few plane trips and extrapolate that out 100 years into the future, you would probably predict that travel times would be around 100 hours. But, you can see from the data that once we actually had plane travel, the travel times declined rapidly.

Transatlantic crossing times plot with data points for plane travel added, showing a reduction in travel time that was not conceivable before planes existed.Select this link to expand the image (you will need to use the ‘back’ button in your browser to return to this page).

If you were to use the travel times data from the 1900s and extrapolate that out to the year 2000, your prediction would be that it would take around 1 hour to travel the Atlantic. In reality, that didn’t happen because we reached the physical limits of jet engines: they cannot become any faster. So what happened in each example here, is that each prediction was beyond the domain for which data existed. This resulted in bad predictions because we predicted out of our sample – we over-extrapolated from what had been observed before.

Transatlantic crossing times plot showing both ship and plane data, indicating sections where over-extrapolation occurred.Select this link to expand the image (you will need to use the ‘back’ button in your browser to return to this page).

Let me present you with a real-world instance of over-extrapolation from the journal Nature. For full disclosure, the original authors made this as a bit of a joke through a lighthearted piece, but this serves as a great example for us to consider in this course.

There was a brief communication based on fitting linear models to the 100-metre sprint times and then extrapolating those forward into the future that predicted by 2156 the women’s sprinters would be running the 100-metre sprint faster than men. The criticism of this article came in thick and fast.

One says that the authors omitted to mention that if you continue this extrapolation even further, then by the year 2636, times of less than zero seconds will be recorded in the 100-metre sprint. These authors used a domain of 104 years to extrapolate to a domain of 252 years. They took a domain that was relatively short and then they predicted well outside of that domain, well beyond where they should.

So the point of this is to say that extrapolation is a perfectly reasonable statistical method to use, but we should only do it within the domain of the data and avoid over-extrapolation. It’s fine to predict how fast people are going to run the 100-metre sprint within the 104 years these authors use. Because I have data on either side there, I can be relatively confident about that prediction. However, outside of the domain of the data is a very different story, and things can go horribly wrong.

This article is from the free online

Critical Evaluation in Data Science: Data, the World, and You

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now