Skip main navigation

Bias and randomisation

The roles of bias and randomisation is effective data collection.

Considering what we learned in the last step, if we want the sample to be sufficiently informative about the total population, it must be representative of that population. In order to ensure this, we need an awareness the concepts of bias and randomisation

Read the following two examples that introduce these concepts and the effects they have on the accuracy of the data analysis. 

Example 1: Age distribution

Suppose we are interested in learning about the age distribution of people residing in a city, and we record the age of the first 100 people to enter the town library on a particular day. If the average age of these 100 people is 46.2 years, is it justified to conclude that this is approximately the average age of the entire city population? More generally, is the age distribution in the sample similar to that in the city?

Perhaps not, since the sample chosen in this case does not seem to represent the total population: for example, students and senior citizens use the library more than working-age citizens. Thus, the experiment in this example is likely to cause bias in the statistical inference, making it unreliable. This means that the design of the age data collection needs to be changed by embracing the city population more accurately.

Returning to the earlier Exit poll example, the selection of the polling stations as well as the protocol of asking questions should be carefully considered to ensure good representation. Similarly, in the Teaching methods example, it is important that the students are divided into groups in such a manner that neither group is more likely to have the students with greater natural aptitude for programming. For instance, the instructor should not have let the male class members be one group and the females the other. If the women scored significantly higher than the men, it would not be clear whether this was due to the method used to teach them, or to the fact that women may be inherently better than men at learning programming skills.

However, reflecting upon such scenarios, it becomes clear that it is quite difficult to avoid bias by selecting subjects for the study according to some explicit rules set in advance. In practice, a sample generally cannot be assumed to be representative of a population, unless that sample has been chosen in a random manner. This is because any specific non-random rule for selecting a sample, may lead to one inherently biased towards some data values as opposed to others.

Although it may seem paradoxical, the best way to avoid bias and to obtain a representative sample is by randomly choosing its members in a totally random fashion without any prior considerations. This is called randomisation.

Example 2: Clinical trials

Testing a new drug is usually designed by arranging two groups of subjects, ‘cases’ and ‘controls’, where the first group will receive the new drug, but the second will not. Subjects are allocated to the groups randomly, without knowledge of what group they are in (‘blind’ allocation). In particular, instead of the drug, the control subjects are administered a ‘placebo’ (a fake pill, for example, made of chalk and sugar). Moreover, to ensure the integrity of testing, those running the study are also unaware of the specific allocation of subjects (‘double-blind’ study).

Before moving on you may wish to engage with your peers in the following Share area. 

Consider the question:

How do you see statistics helping bridge the gap between data and the information it contains in your own field or interests?

This article is from the free online

Statistical Methods

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now