Privacy, at least from search-engine profiling?
My colleague Helen Nissenbaum helped create this cool Firefox extension called TrackMeNot.
How It Works
TrackMeNot runs in Firefox as a low-priority background process that periodically issues randomized search-queries to popular search engines, e.g., AOL, Yahoo!, Google, and MSN. It hides users' actual search trails in a cloud of 'ghost' queries, significantly increasing the difficulty of aggregating such data into accurate or identifying user profiles. TrackMeNot integrates into the Firefox 'Tools' menu and includes a variety of user-configurable options.
*Note: TrackMeNot employs a static list of search terms from which it is can generate millions of unique queries. While a sizeable #, it is unlikely to deter serious data-profiling by those aware of the system. As a first step toward addressing this concern, the current version of TMN allows users to supply their own query lists. Future versions are likely to include larger (distributed) query databases, dynamically generated and/or web-harvested queries, as well as grammar-generated natural-language queries. Suggestions for other ways of improving TMN are always welcome!
Why We Created TrackMeNot
The practice of logging user search activities and creating individual search profiles – sometimes identifiable – has received attention in mainstream press, e.g. the recent front-page New York Times article on AOL's release of collected data on individual searchers; also this front-page New York Times Business Section article describing the User-Profiling Practices of Yahoo!, AOL, MSN & Google.
We are disturbed by the idea that search inquiries are systematically monitored and stored by corporations like AOL, Yahoo!, Google, etc. and may even be available to third parties. Because the Web has grown into such a crucial repository of information and our search behaviors profoundly reflect who we are, what we care about, and how we live our lives, there is reason to feel they should be off-limits to arbitrary surveillance. But what can be done? ...
UPDATE: As Derek Slater points out in the comments, Bruce Schneir has already offered his strong criticism of this attempt to protect your privacy:
... Let's count the ways this doesn't work.
One, it doesn't hide your searches. If the government wants to know who's been searching on "al Qaeda recruitment centers," it won't matter that you've made ten thousand other searches as well -- you'll be targeted.
Two, it's too easy to spot. There are only 1,673 search terms in the program's dictionary. Here, as a random example, are the program's "G" words:
gag, gagged, gagging, gags, gas, gaseous, gases, gassed, gasses, gassing, gen, generate, generated, generates, generating, gens, gig, gigs, gillion, gillions, glass, glasses, glitch, glitched, glitches, glitching, glob, globed, globing, globs, glue, glues, gnarlier, gnarliest, gnarly, gobble, gobbled, gobbles, gobbling, golden, goldener, goldenest, gonk, gonked, gonking, gonks, gonzo, gopher, gophers, gorp, gorps, gotcha, gotchas, gribble, gribbles, grind, grinding, grinds, grok, grokked, grokking, groks, ground, grovel, groveled, groveling, grovelled, grovelling, grovels, grue, grues, grunge, grunges, gun, gunned, gunning, guns, guru, gurus
The program's authors claim that this list is temporary, and that there will eventually be a TrackMeNot server with an ever-changing word list. Of course, that list can be monitored by any analysis program -- as could any queries to that server.
In any case, every twelve seconds -- exactly -- the program picks a random pair of words and sends it to either AOL, Yahoo, MSN, or Google. My guess is that your searches contain more than two words, you don't send them out in precise twelve-second intervals, and you favor one search engine over the others.
Three, some of the program's searches are worse than yours. The dictionary includes:
HIV, atomic, bomb, bible, bibles, bombing, bombs, boxes, choke, choked, chokes, choking, chain, crackers, empire, evil, erotics, erotices, fingers, knobs, kicking, harier, hamster, hairs, legal, letterbomb, letterbombs, mailbomb, mailbombing, mailbombs, rapes, raping, rape, raper, rapist, virgin, warez, warezes, whack, whacked, whacker, whacking, whackers, whacks, pistols
Does anyone reall think that searches on "erotic rape," "mailbombing bibles," and "choking virgins" will make their legitimate searches less noteworthy?
And four, it wastes a whole lot of bandwidth. A query every twelve seconds translates into 2,400 queries a day, assuming an eight-hour workday. A typical Google response is about 25K, so we're talking 60 megabytes of additional traffic daily. Imagine if everyone in the company used it.
I suppose this kind of thing would stop someone who has a paper printout of your searches and is looking through them manually, but it's not going to hamper computer analysis very much. Or anyone who isn't lazy. But it wouldn't be hard for a computer profiling program to ignore these searches. ...
Leave a comment