Want to keep learning?

This content is taken from the Coventry University's online course, Get ready for a Masters in Data Science and AI. Join the course to learn more.

Bias and unfairness in data-informed decisions

We have seen that bias and error can affect the quality and representativeness of data collected. But bias also comes into how data is used afterwards in decision-making and how this translates into the treatment of different groups of people.

Bias and unfairness are related topics, but it is important to distinguish them. The International Organization for Standardization (ISO) is trying to bring some clarity to this area, so we will use their terminology.

Bias is a systematic difference in treatment of certain objects, people or groups in comparison to others. We would suspect that a coin that comes down ‘heads’ nine times and ‘tails’ once in ten tosses is ‘biased against tails’, for example.

Unfairness is the presence of bias where we believe that there should be no systematic difference.

Bias is a statistical property, whereas fairness is generally an ethical issue, depending on what our beliefs of ‘acceptable’ differences are. Different people may have different beliefs. Even legal definitions of what is acceptable my change over time. For example, it is now illegal to discriminate on the basis of gender in insurance (European Court of Justice, 2011).

Example of bias (insurance)

This example is inspired by McDonald (2015). For the data set of which the following tables have been drawn, see the downloads section below.

Suppose we are a car insurance provider and offer insurance to four different professions (known as A, B, C, D).

For the sake of simplicity, we have 10,000 customers in each profession. These professions have different accident costs, given in Table 1. The ‘correct’ premium is (in the absence of profit etc.) the accident cost divided by the number of insured.

Table 1

Profession A B C D
Population (‘000s) 10 10 10 10
Accident cost (£M) 1.1 1.9 2.4 3.6
Premium (£) 110 190 240 360

However, the different genders are not equally divided among the professions. We can work out the average insurance premium for male and female insured, as in Table 2. These are not equal. Hence, we may be seen to be discriminating (unfairly biased) against male drivers, which is now illegal (European Court of Justice, 2011). The data is telling us that men cost more than women in accidents, but because of fairness we’re not allowed to apply this to premiums.

Table 2

Profession A B C D
Male (‘000s) 1 9 2 8
Female (‘000s) 9 1 8 2
Accident cost (£M) 1.1 1.9 2.4 3.6
Premium (£) 110 190 240 360
  Average
Male (‘000s) 259
Female (‘000s) 191
  Discrepancy
  68

The obvious solution is to distribute the male ‘excess’ equally. This leads to Table 3, where we can see that male and female are now, on average, charged the same: £225.

Table 3

Profession A B C D
Male (‘000s) 1 9 2 8
Female (‘000s) 9 1 8 2
Accident cost (£M) 1.1 1.9 2.4 3.6
Raw premium (£) 110 190 240 360
Adjusted premium (£) 164.4 135.6 280.8 319.2
  Raw average Adjusted average
Male (‘000s) 259 225
Female (‘000s) 191 225
  Discrepancy Discrepancy
  68 0

The solution above was not the only one. We have four possible premium sums, and only two constraints: cost recovery (ie the total must sum to £900) and gender equality, so there are lots of possible solutions.

The one in Table 3 could be held to be unfair to profession A. They have lower accident costs than profession B, yet are now being charged more. We can fix this by insisting that A and B be charged the same, which in this case also means that C and D are charged the same, as seen in Table 4.

Table 4

Profession A B C D
Male (‘000s) 1 9 2 8
Female (‘000s) 9 1 8 2
Accident cost (£M) 1.1 1.9 2.4 3.6
Raw premium (£) 110 190 240 360
Adjusted premium (£) 150 150 300 300
  Raw average Adjusted average
Male (‘000s) 259 225
Female (‘000s) 191 225
  Discrepancy Discrepancy
  68 0

However, gender is not the only issue. Let’s further suppose that there are two ethnic groups, P and Q, and the makeup among the genders and professions is as given in Table 5. Alas, even though Table 4 did not discriminate between male and female, we find it discriminates between male P and female P, and also in Q. Male P are charged a lot more than male Q. In this case, the only solution is a flat rate premium of £225 for everyone.

Table 5

Profession A B C D
Male P (‘000s) 1 4 1 8
Female P (‘000s) 5 1 4 2
Male Q (‘000s) 0 5 1 0
Female Q (‘000s) 4 0 4 0
Accident cost (£M) 1.1 1.9 2.4 3.6
Raw premium (£) 110 190 240 360
Adjusted premium (£) 150 150 300 300
2nd Adjusted premium (£) 225 225 225 225
  Raw average Adjusted average 2nd Adjusted average
Male P (‘000s) 285 246.4285714 225
Female P (‘000s) 201.6666667 175 225
Male Q (‘000s) 198.3333333 225 225
Female Q (‘000s) 175 225 225
  P Discrepancy P Discrepancy P Discrepancy
  83.33333333 21.42857143 0
  Q Discrepancy Q Discrepancy Q Discrepancy
  23.33333333 -50 0
  Male Discrepancy Male Discrepancy Male
Discrepancy
  86.66666667 71.42857143 0

Note, at no point did we intend to discriminate on the basis of gender or ethnicity. Initially, we didn’t know the breakdown. Nevertheless, we did discriminate. This was a simple example, designed to illustrate the key point without requiring any Artificial Intelligence as such.

The problem in car insurance is real, as shown by McDonald (2015). Another example, where differential pricing discriminates, is highlighted in the article Uber and Lyft Pricing Algorithms Charge More in Non-white Areas (Lu, 2020).

Your task

Other ways you might group drivers are:

  • Claim history
  • Age
  • Postcode
  • Years of experience

What impact do you think this will have on their vehicle insurance? Are any of these groupings fairer than using gender?


References

European Court of Justice. (2011). Association Belge des Consommateurs Test-Achats ASBL and Others v Conseil des ministres. Curia. http://curia.europa.eu/juris/liste.jsf?td=ALL&language=en&jur=C,T,F&parties=test%20achats

Lu, D. (2020, June 18). Uber and Lyft pricing algorithms charge more in non-white areas. New Scientist. https://www.newscientist.com/article/2246202-uber-and-lyft-pricing-algorithms-charge-more-in-non-white-areas/#ixzz6PmrztgUH

McDonald, S. (2015). Indirect gender discrimination and the ‘test-achats ruling’: An examination of the UK motor insurance market (presentation to Royal Economic Society, April 2015). Techno Luddites. https://editorialexpress.com/cgi-bin/conference/download.cgi?db_name=RES2015&paper_id=791

Share this article:

This article is from the free online course:

Get ready for a Masters in Data Science and AI

Coventry University