Nothing to Hide? Stories our Data Tells About Us (part 2)

03 Jun. 2015

Almost everything we do online is constantly being collected, analysed and processed by algorithms. Is it really up to us to decide if we have “something to hide”? Or is it up to data analysts...and their data mining software?

Author: Maria Xynou, Researcher at Tactical Tech

This is the 6th blog of the MyShadow series: “Why shrugging at the Snowden revelations is a bad idea” and Part 2 of its “nothing to hide” sub-series.

Almost everything we do online is constantly being collected, analysed and processed by algorithms. Is it really up to us to decide if we have “something to hide”? Or is it up to data analysts...and their data mining software?

Multiple companies, such as Google and Facebook, profit out of tracking our online activity. By tracking our online behaviour, data analysts are able to map out our interests, social network, location and much more. Various pieces of data, such as our favourite website, band or drink, might seem harmless and we might not care about concealing each and every one of them. However, when such bits of data are combined together, they might glean information that we would otherwise want to conceal. Once our data is aggregated and profiles are created about us, we have almost no say in whether such profiles are accurate and what subsequently happens to them. This is something that the “nothing to hide” argument fails to address.

Data analysts use algorithms to collect and analyse the digital traces that we leave everytime we use a digital device, as well as to process and correlate them and match patterns across time. These patterns are important as data analysts look at what we have done in the past so that they can predict what we are likely to do in the future. This is part of data mining, which strives to prognosticate our future actions through the use of predictive statistics. In the world of data minng, individuals which match certain profiles have a high probability of engaging in similar patterns of behaviour.

If our digital traces, for example, show that we are Spanish speakers and enjoy watching scary movies, we will likely be grouped together with other Spanish speakers who enjoy watching scary movies in a group profile. But if a few people within our group also illegally download movies, predictive analytics will tell the data analyst that likely all of us within the group download movies illegally – regardless of whether that is actually the case. Since profiles are created by algorithms, our data tells stories about us, which may or may not be true.

Intelligence agencies, such as the NSA, don't have to collect data about citizens themselves. The internet's corporate giants are doing it already and intelligence agencies just need to tap into their datasets. Documents leaked by Snowden illustrate that the NSA has used a programme, code-named PRISM, to collect and mine data in bulk from Microsoft, Google, Yahoo, Facebook, PalTalk, YouTube, Skype, AOL and Apple. Other programmes have been used by intelligence agencies to intercept images from Yahoo webcam chats and to monitor individuals' activity on YouTube, Facebook, Twitter and Blogger in real-time. Leaked documents also reveal that once intelligence agencies have collected huge volumes of data from online platforms that we commonly use, they subsequently use tools, such as the NSA's Marina, to create “pattern-of-life” profiles on targeted individuals.

It is disturbing that our private emails, Facebook photos and general online activity were collected in real-time by intelligence agencies without our knowledge or consent. It is also disturbing that we have no real say in how our data was subsequently correlated, aggregated, processed and stored once collected. As we were not involved in the data aggregation process, we cannot know what types of profiles were created about us, who they were subsequently shared with and whether they are even accurate. And even if they are, how can we determine if they tell a story about us that we would want to disclose if we aren't even fully aware that they are being created in the first place?

This brings to mind Kafka's The Trial, where the protagonist is denied access to information held about him. Through this work, Kafka describes how the protagonist suffers from a sense of powerlessness and vulnerability, created by the court system's use of his personal data to build a case against him in secret and without the protagonist's participation in the process. Similarly, today we are largely being excluded from the process of our data collection and aggregation and as a result we are deprived from having knowledge about what type of information is being collected about us and how it is being used or misused. The very fact that countless NSA programmes – which involve the collection of data from citizens around the world – were kept secret up until recently illustrates that we have no real participation or say in regards to how our data is collected, aggregated, processed, retained, shared and disclosed by whom and to whom.

Corporations and intelligence agencies around the world justify our exclusion from the data management process in the name of security. However, freedom to access (and correct) withheld information can empower individuals and minimise potential abuse, which is an essential component for a democratic society and for security at large. Our exclusion from our data collection and aggregation process creates a power imbalance between individuals and data collectors – which often is an already imbalanced relationship.

And if algorithms determine that we actually do have “something to hide” based on data traces that we might not even be aware of, then it's our word against software.

View the rest of the blog series here and/or check out our blog series timeline.

Source of image: http://blog.klout.com/wp-content/uploads/2014/04/Fotolia_43695416_Subscription_Monthly_M.jpg