Learn more about this course.

Bias and error in data collection

In a statistical sense, bias at the collection stage means that the data you have gathered is not representative of the group or activity you want to say something about.

Shortcuts and mistakes of various kinds are part of what makes us human. As the author and psychologist Daniel Levitin (2016) says:

Remember, people gather statistics. People choose what to count, and how to go about counting. There are a host of errors and biases that can enter into the collection process and these can lead millions of people to draw the wrong conclusions.

Bias and error may be unintentional, but sometimes we know it has happened and ignored for the sake of an easier life.

Want to keep
learning?

This content is taken from
Coventry University online course,

Get ready for a Masters in Data Science and AI

View Course

Some common biases or error sources to look out for in your own and others’ work include:

Sampling

An unbiased sampling method should mean that an individual data item within a whole set of items, eg a list of potential participants in an experiment, could be included. This is why researchers often try to randomise the sampling process or sample randomly from a set of pre-defined groups. If there is some aspect that limits this, you are risking bias.

For instance, if we decide to base a study of citizen digital skills on the results of an internet survey, we’re excluding people without internet access, who may be significantly lower in their digital skills level. Therefore, our data is not representative.

Measuring and calculating

Data collected from instruments or digital systems may be prone to error if, for example, a sensor is broken or a log file is corrupted. Often people will incorporate a sense-check that picks up anomalies like these.

Similarly, it is easy for the researcher themself to make a simple error in calculation, especially when complex code is used to determine results. This is where open source and open data can help in letting other people reproduce the calculations to check them.

Participation/responding

In an opinion survey, you may be more likely to get participation from those people who have a strong feeling about the subject one way or another. Whereas those who are less opinionated may be less likely to respond. One way to avoid this is to advertise it with a more general theme. For example, you might say your survey is about shopping in general rather than views on Marmite.

Reporting accuracy

People may tell you something about their own behaviour that is different to the way they behave in practice. This is why it is sometimes good to combine spoken reports with behavioural data.

It’s important for a data scientist to sharpen their understanding of statistical evidence and the claims or decisions that it may or may not support.

To sharpen your own understanding, we recommend you practise spotting biases and errors in your daily life – as you read newspapers, social media and research studies.

Going forward, you will gain confidence in designing your own analyses and applying appropriate statistical tests. You’ll also become adept at explaining what the results imply and understanding their limitations or the potential sources of bias that might have been introduced during data collection.

Your task

Have a look at the report Bouncing Back: Consumer Views on Traveling Again which reports on a survey looking at people’s intentions to fly again post-COVID-19.

If the report is about all potential consumers, what bias might there be from their sampling strategy?

Think about some of the possible issues with bias and error and how the survey could have been presented to the respondents. Share your ideas with your fellow learners in the comments area.

References

_{Flywire. (2020). Bouncing back: Consumer views on travelling again. https://flywire.foleon.com/report/bouncing-back-consumer-views-on-traveling-again/cover/?}

Want to keep learning?

This content is taken from Coventry University online course

Get ready for a Masters in Data Science and AI

View Course

See other articles from this course

This article is from the free online

Get ready for a Masters in Data Science and AI

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

Bias and error in data collection

Want to keep
learning?

Get ready for a Masters in Data Science and AI

Sampling

Measuring and calculating

Participation/responding

Reporting accuracy

Your task

Further Reading

References

Want to keep learning?

Get ready for a Masters in Data Science and AI

Get ready for a Masters in Data Science and AI

Get ready for a Masters in Data Science and AI

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Learn more about this course.

Bias and error in data collection

Want to keep learning?

Get ready for a Masters in Data Science and AI

Sampling

Measuring and calculating

Participation/responding

Reporting accuracy

Your task

Further Reading

References

Want to keep learning?

Get ready for a Masters in Data Science and AI

Share this

Get ready for a Masters in Data Science and AI

Get ready for a Masters in Data Science and AI

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Want to keep
learning?