Skip main navigation

Bias and error in data collection

Bias and error in data collection
© Coventry University. CC BY-NC 4.0

In a statistical sense, bias at the collection stage means that the data you have gathered is not representative of the group or activity you want to say something about.

Shortcuts and mistakes of various kinds are part of what makes us human. As the author and psychologist Daniel Levitin (2016) says:

Remember, people gather statistics. People choose what to count, and how to go about counting. There are a host of errors and biases that can enter into the collection process and these can lead millions of people to draw the wrong conclusions.
Bias and error may be unintentional, but sometimes we know it has happened and ignored for the sake of an easier life.
Some common biases or error sources to look out for in your own and others’ work include:

Sampling

An unbiased sampling method should mean that an individual data item within a whole set of items, eg a list of potential participants in an experiment, could be included. This is why researchers often try to randomise the sampling process or sample randomly from a set of pre-defined groups. If there is some aspect that limits this, you are risking bias.
For instance, if we decide to base a study of citizen digital skills on the results of an internet survey, we’re excluding people without internet access, who may be significantly lower in their digital skills level. Therefore, our data is not representative.

Measuring and calculating

Data collected from instruments or digital systems may be prone to error if, for example, a sensor is broken or a log file is corrupted. Often people will incorporate a sense-check that picks up anomalies like these.
Similarly, it is easy for the researcher themself to make a simple error in calculation, especially when complex code is used to determine results. This is where open source and open data can help in letting other people reproduce the calculations to check them.

Participation/responding

In an opinion survey, you may be more likely to get participation from those people who have a strong feeling about the subject one way or another. Whereas those who are less opinionated may be less likely to respond. One way to avoid this is to advertise it with a more general theme. For example, you might say your survey is about shopping in general rather than views on Marmite.

Reporting accuracy

People may tell you something about their own behaviour that is different to the way they behave in practice. This is why it is sometimes good to combine spoken reports with behavioural data.
It’s important for a data scientist to sharpen their understanding of statistical evidence and the claims or decisions that it may or may not support.
To sharpen your own understanding, we recommend you practise spotting biases and errors in your daily life – as you read newspapers, social media and research studies.
Going forward, you will gain confidence in designing your own analyses and applying appropriate statistical tests. You’ll also become adept at explaining what the results imply and understanding their limitations or the potential sources of bias that might have been introduced during data collection.

Your task

Have a look at the report Bouncing Back: Consumer Views on Traveling Again which reports on a survey looking at people’s intentions to fly again post-COVID-19.
If the report is about all potential consumers, what bias might there be from their sampling strategy?
Think about some of the possible issues with bias and error and how the survey could have been presented to the respondents. Share your ideas with your fellow learners in the comments area.

Further Reading

Levitin, D. (2016). A field guide to lies and statistics. Viking.

References

Flywire. (2020). Bouncing back: Consumer views on travelling again. https://flywire.foleon.com/report/bouncing-back-consumer-views-on-traveling-again/cover/?

© Coventry University. CC BY-NC 4.0
This article is from the free online

Get ready for a Masters in Data Science and AI

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education