Can data be anonymised?

Is removing 'personal data' from datasets enough? And can we hide in the sea of data around us?

Governments and companies claim that effective data anonymisation is possible. As soon as you have a closer look at this claim, however, it starts to fall apart.

Let's take a fictional example: a woman named Renata who lives in Rio de Janeiro.

Of course, if someone knows Renata's name, then they can easily identify her. But if someone does not know her name but does know that she lives in Rio de Janeiro, is female, was born on 7 July 1994, likes coffee, spends time at the Universidad Cafe and has red hair, they can identify Renata even without knowing her name.

"Anonymising data" means stripping a data-set of any kind of personal data that can identify an individual. But as shown in the Renata example, identification can also be done by combining individual data traces to create a profile.

Removing 'personal data' from datasets is not enough, as re-identification depends only on the number of data traces available, and what other data this particular dataset can be linked to.

The amount of data that is currently available about us, combined with advances in data analysis, have significantly increased the likelihood that an individual can be re-identified from 'anonymised' data.

Can we hide in the sea of data?

The notion of 'hiding in a sea of data' is tied to the idea that data sorting and analysis is done by people. This is obviously not the case, as data analysis is being done by computers which are able to process large volumes of data to find patterns and correlations, and to make inferences and predictions.

 

Read next:

Digital Traces: Content and metadata

How much control do we have over our data?