Skip main navigation

Dealing with bias in AI

Bias occurs when an algorithm produces prejudiced results as a consequence of biased data or model choices. Examples of bias are provided.
One brown egg in a basket of white eggs
© AIProHealth Project

Bias is a phenomenon that occurs when the machine learning model systemically produces prejudiced results. It can be caused by bad quality or wrong example data, which is called representational bias, or due to choices made in algorithm development, called procedural bias. Both these sources of bias could result in incorrect predictions by the AI model, which in turn can lead to dangerous situations, such as patients receiving the wrong treatment.

Representational bias

In machine learning, the general rule is: “Garbage In, Garbage Out”. This means that if your machine is trained on wrong data, the model will not be able to produce accurate results. For this reason, it’s extremely important to consider whether your data contains any possible biases. A few of the most common biases will be discussed, along with solutions to prevent them from occurring.

  • Historical bias. This type of bias is a consequence of existing biases in society and is therefore also known as cultural bias. The data is filled with stereotypes that exist in real life. For example, Google Translate learns from existing translations from the web. However, these translations were often very biased with regard to gender. For example, “doctor” would usually be assumed male, whereas “nurse” would be assumed female. This type of bias can be prevented by examining the data first and looking for existing prejudices. If they exist, more examples could be required to reflect the society more accurately. Another solution by Google for this particular situation was to return both a masculine and feminine translation.
  • Sample bias. This occurs when the collected data is unbalanced and does not accurately represent the population the machine is supposed to be used for. When a machine learning model is supposed to recognize both benign and malignant nodules in a thoracic X-ray, it’s not sufficient to only train it with X-rays containing benign nodules. A solution is to examine the data for an even distribution of the cases among features and checking if your dataset works well on an evenly distributed test set. More training examples could be required if this is not the case. This can also be done artificially with the use of data augmentation. Data augmentation consists of techniques that help to increase your dataset synthetically by adding slightly modified copies of the existing examples in your dataset.
  • Exclusion bias. This happens when the developer of the algorithm decides to remove features or particular instances from the dataset because they believe them to be irrelevant for the problem at hand, even though they were of value. For example, a developer might believe that a feature addressing the patient’s blood pressure is irrelevant for predicting the likelihood that the patient will develop Alzheimer’s disease. However, this actually is a good indicator, especially in combination with other factors such as cholesterol levels. Prematurely removing such valuable information can be prevented by performing a proper investigation of the features and data points and their relation to the prediction that will be made beforehand, and asking someone else to take a look at the use of the features and data points before removing them.
  • Measurement bias. This happens when the values of particular features are poorly measured. For example, measuring instruments might be faulty, which might result in skewed data. Solutions include calibrating the instruments before use and using multiple measuring devices.
  • Labeling bias. This type of bias happens when the annotator does not label the data accurately due to subjective perceptions. For example, one might want to detect lung nodules in CT scans. Whereas one radiologist might classify a particular growth shown in these scans as a nodule, another might not classify it as such due to a different conception of the requirements of such a nodule (such as the minimum diameter). Common methodologies to solve this problem are the use of labeling guidelines and/or having multiple experts provide the labels and to have them reach a consensus when they have different opinions. When a large number of experts is available, a majority vote for the right label could also be used.

Procedural bias

The choices the developer makes during the process of algorithm development are also able to affect the output significantly.

  • Confirmation bias. Developers tend to choose particular models and hyperparameters that align more closely with their preconceived beliefs or hypotheses, even though it might not be the more representative model. An example of this is when a developer previously witnessed that a decision tree was able to predict very well whether or not a doctor should apply antibiotics in case of a fever. Therefore, he decides to use such a decision tree for all the problems he must create solutions for afterwards. He does this without even considering other algorithms, which might be better suited for the data or problem at hand. This confirmation bias can be prevented by involving independent critics, or by allowing for a direct comparison of models by making the used database open source.
  • Association bias. This occurs when a machine learning model is built to amplify an existing bias. A well-known example is PredPol’s drug crime prediction algorithm. This algorithm was trained on data biased by housing segregation and police bias. Because of that, it would send police more frequently to a neighborhood where a lot of minorities live, resulting in more drug arrests there. That arrest data was fed back into the algorithm, which again trained on these new examples, resulting in a positive feedback loop. Preventing this can be done by monitoring how the data is processed closely.

These examples cover only a small part of the full range of possible biases in machine learning. For this reason, you should always be critical about both your data and the algorithm development when implementing artificial intelligence. Several methodologies have been developed over the past years to help to critically assess the dataset used (Datasheets for Datasets) and to provide proper information to allow assessments of models by clinical end users (Model Facts Labels or Model Cards).

© AIProHealth Project
This article is from the free online

How Artificial Intelligence Can Support Healthcare

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now