12 September 2011
How you have no real privacy on the internet, thanks to ad networks
I've discovered something quite disturbing: Essentially, you have no privacy when you browse the internet (let alone actively communicate with it) - and you can mitigate this only by means which severly impact usability. Here's how it happens, what you can do to protect yourself, and why the onus will always be on you to do so.
The problem
Ad companies claim that online tracking is anonymous. It's not.The above article by a researcher at Stanford is a great explanation of what is probably the biggest problem with browsing the internet: Your visit to almost every popular website is tracked by ad networks. This interactive infographic from the Wall Street Journal demonstrates how each visit to the 50 most popular websites in the US is tracked by up to hundreds of elements, and the case with the most popular kids websites is even worse. Here's a list of the top 100 webpage elements used to track you.
Companies often claim the data they collect is "anonymous" because they don't directly record your name or data directly identifying you. This is false - the data is more than enough to uniquely identify you (I'll explain how below). If desired, they can link that data to your "real-world" information - name, address etc - thereby generating a detailed profile of you and your history of browsing, purchasing, and other online interactions. There's a growing market for such services, called "de-anonymizing", a kind of data-mining that turns supposedly anonymous information into real identities.
This is just part of the larger issue of increasingly widespread privacy violations by private companies that have very little accountability.
Customer data is valued immensely by corporations, and you're giving it away constantly just by loading webpages. Imagine if someone read through your browser history every day. Major ad networks have the capability to do that, for the sites their scripts run on - that is, almost all the sites you're likely to visit. Do marketing companies and random websites really deserve your trust - that they won't use your data in an undesirable way, or hand it on to third parties? And if they're trustworthy for that (which is doubtful, since they have little or no accountability for how they use your data), do you also trust that they won't be hacked, or subverted by a rogue employee?
As a quick aside: Why should you care? The most common objection at this point is "only people with something to hide (ie. criminals) need privacy". A lot of people seem to really think that it's okay to criminalize privacy, and to look at someone with suspicion because they don't share all their photos with the world on Facebook. This view is very misguided, naive, hypocritical, and ultimately terrifying. This article in The Chronicle addresses it well. There is a basic human need for privacy, whether online or not, and it's not primarily about hiding bad things, but about reducing misunderstanding and abuse. The Urewera terror raids in New Zealand were an excellent, albeit extreme, example of how a lack of privacy can result in the abuse of many innocent people.
How the networks track you, and what you can do about it
You might think that your privacy is protected by virtue of sharing a connection (IP address) with others, or being with an ISP that gives you a dynamic IP address (an address which sometimes changes). Firstly, there are statistical methods to separate users with a known probability of correctness; more importantly, all such protection will disappear under IPv6, where there are enough addresses for every machine to have a permanent address.
But in any case, tracking companies don't even need your IP address to uniquely identify you. They can use your browser.Even if you block cookies and hide your IP address through a proxy / VPN (you should be using a VPN anyway for public WiFi), you can still be uniquely identified through Javscript in your browser, in two ways: One, websites can re-create any of their cookies that you remove and block. Two, your browser provides a huge amount of information to websites. EFF's Panopticlick project demonstrates how that information is enough to uniquely identify you. The only proper protection is to completely disable Javascript - which stops most websites from displaying properly and some being readable or functional at all. Torbutton does all of the above and is widely considered the best way to protect your privacy online - but expect a frustrating experience as your browsing is much slower and websites depending on Javascript fail to work properly. So in practice, you can't properly protect yourself from a large proportion of websites, because they rely on Javascript.
I use a raft of browser add-ons and custom settings to make me more difficult to track (some of which make browsing more complicated and frustrating, but they also increase security). I also use PeerBlock to block loading content from known ad-network IPs (PeerBlock is very ineffective at stopping anti-piracy detection, which is what most people use it for, but it can be a minor help in increasing privacy), and Scroogle (scraped Google) to search without being logged.
Because of the issue with leaky Javascript, none of that is enough. And we're only talking about browsing - if you actively submit any information, privacy gets much more difficult. I'll leave that to another article.
The big picture
So while technical defenses may leak some of the holes, other holes are left wide open, and such measures are difficult, frustrating, and only usable by technically-savvy people. The market will never fix itself because private information is a lemon market. Consumers have no privacy information by which to make judgments. The only way for us to have privacy on the internet is legal protection. Companies need to disclose how they use our information and who they give it to. They need to be prevented from overriding the requests of users not to be tracked. They need be held accountable for the abuses that happen (nevermind their abysmal security which results in huge data sets being stolen on a weekly basis).
While government privacy bodies do some great work, it's like trying to stop the tide. They have few resources, pitted against the standard operating practice of the knowledge economy: an almost complete lack of transparency or accountability around personal data usage. Companies are not going to willingly give up the lucrative benefits of pervasive data-mining, technical tricks to track users against their explicit wishes, secret sharing of data with third parties, and insecure storage (good security is expensive).
And even within governments, the privacy protectors are hopelessly outmatched. Governments are responsible for the greatest privacy abuses of all - particularly military and police, but most departments, because of their wide access, and especially when they share their data - and they are consistently pushing for ever-more invasive ways to collate data and surveille the populace. To adopt the terminology of the Chronicle article: In some countries that data is used to capture and torture activists, promulgate opposing propaganda, or shut-down dissent in other ways (Orwellian privacy abuse), but in all countries the Kafkaesque abuses of bureacracy, mistakes, and lack of transparency, represent a real problem to your privacy. Just because you haven't seen the effect yet, doesn't mean there isn't a problem. In addition, digital storage means that your data is generally kept forever - the issues with this are numerous, from changes in government leadership, changes in laws (eg. data-mining to identify potential criminals), changes to officials and consultants managing the data, and changes in society (eg. some behaviours society considers acceptable now will be shocking in the future). I'm writing more about this for an upcoming article.
Preventing the government from spying on your online activity is harder still - it's possible but you need a good technical knowledge and careful awareness of your exposure. Full hard-drive encryption is a minimum requirement, as is Torbutton, but if you don't want people or firewalls knowing you're using Tor you'll need a traffic shaper like SkypeMorph which makes your traffic looks like a Skype video call. Against governments, full hard-drive encryption isn't enough - you need deniable encryption, for instance with a tool like TrueCrypt. For communication you need to use Off-the-Record Messaging, friend-to-friend networking and steganography tools like OpenPuff. And of course you can't login to websites like Google and Facebook which provide your private data to governments via an automatic interface without requiring a search warrant - so you'll also need an alternative email provider, and alternative social networking software like Diaspora or some other distributed program (although you need to be very conscious about who you're sharing with).
I don't think there will ever be a solution to the general problem of privacy in a world with computers. You can only do your best to minimize your exposure and educate your friends.

