Archives

Date
  • 01
  • 02
  • 03
  • 04
  • 05
  • 06
  • 07
  • 08
  • 09
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31

The peril of anonymized data

The blogosphere is justifiably abuzz with the release by AOL of “anonymized” search query histories for over 500,000 AOL users, trying to be nice to the research community. After the fury, they pulled it and issued a decently strong apology, but the damage is done.

Many people have pointed out obvious risks, such as the fact that searches often contain text that reveal who you are. Who hasn’t searched on their own name? (Alas, I’m now the #7 “brad” on Google, a shadow of my long stint at #1.)

But some other browsers have discovered something far darker. There are searches in there for things like “how to kill your wife” and child porn. Once that’s discovered, isn’t that now going to be sufficient grounds for a court order to reveal who that person was? It seems there is probable cause to believe user 17556639 is thinking about killing his wife. And knowing this very specific bit of information, who would impede efforts to investigate and protect her?

But we can’t have this happening in general. How long before sites are forced to look for evidence of crimes in “anonymized” data and warrants then nymize it. (Did I just invent a word?)

After all, I recall a year ago, I wanted to see if Google would sell adwords on various nasty searches, and what adwords they would be. So I searched for “kiddie porn” and other nasty things. (To save you the stigma, Google clearly has a system designed to spot such searches and not show ads, since people who bought the word “kiddie” may not want to advertise on those results.)

So had my Google results been in such a leak, I might have faced one of those very scary kiddie porn raids, which in the end would find nothing after tearing apart my life and confiscating my computers. (I might hope they would have a sanity check on doing this to somebody from the EFF, but who knows. And you don’t have that protection even if somebody would accord it to me.)

I expect we’ll be seeing the reprecussions from this data spill for some time to come. In the end, if we want privacy from being data mined, deletion of such records is the only way to go.