Skip main navigation

£199.99 £139.99 for one year of Unlimited learning. Offer ends on 28 February 2023 at 23:59 (UTC). T&Cs apply

Find out more

Failure of Data Anonymisation ( With Examples)

Learn more about the failure of data anonymisation using specific examples.
Cyborg head with binary code repeated in the background
© University of Strathclyde

Data anonymisation is the aim of processing a data set with personal information/data within it to create a new dataset that cannot allow for the recreation of personal data. That is the aim, but anonymisation is not an exact science. In some cases, even datasets tested statistically can be processed and used in some way to identify individuals.

Protecting the personal data in this way is normally done before it is used operationally or for research purposes or shared more widely as shared or open data. It is done to enable the data asset to be useful beyond its core collection purpose whilst protecting the individual(s) concerned.

In some cases, anonymisation has not worked, which can lead to potentially damaging media coverage, fines through the General Data Protection Regulation (GDPR) but most importantly, and to be avoided, detriment to the individuals concerned.


One example from Netflix and Internet Movie Database (IMDB) illustrates the potential of the mosaic effect where an anonymous Netflix dataset was de-anonymized by correlating it with the IMDB database. This is an example of statistical deanonymization against high-dimensional micro-data, such as individual preferences, recommendations, transaction records, and so on.

The techniques were applied to the Netflix Prize dataset, which contained anonymous movie ratings of 500,000 subscribers of Netflix, the world’s largest online movie rental service. In this case, an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber’s record in the dataset. Using the Internet Movie Database as the source of background knowledge, the researchers successfully identified the Netflix records of known users, uncovering their apparent political preferences and other potentially sensitive information.

We are probably all aware but also let’s consider Facebook data and Cambridge Analytica. Users give their data to Facebook for a service allowing social interactions and networking. In this example, firms obtained users’ private information from the social media network to develop “political propaganda campaigns” in the UK and the US.

Although they may not have been aware, Facebook’s data from their users were used to understand and ultimately influence the behaviour and voting options of individuals. By understanding what people respond to in positive ways parties could, for example, tailor campaigns to be more effective.

Another recent example reported in the press illustrated that sensitive information about the location and staffing of military bases and spy outposts around the world have been revealed by a fitness tracking company on a data visualisation map that shows all the activity of tracked users. Whilst areas of online mapping are often obfuscated for sensitive information like this – the publication of strava data could enable the identification not just of the location of bases, but the roads and movements within it.

Testing anonymisation is done in a systematic way. Processes should be documented in for example a Privacy Impact Assessment. To reach an acceptable level of risk the organisation should also consider, the likelihood that someone would look to create personal data from the dataset, the possibility given existing data sets also available to integrate with the sample data and the technologies available to undertake these tasks.

However, in some cases, this is not always possible. Remember significant value is enabled when we share data, but increasingly, and with big data technologies becoming more advanced, anonymisation fails may occur.

© University of Strathclyde
This article is from the free online

The Power of Data in Health and Social Care

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education