<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://ideas.4brad.com" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>Brad Ideas - The peril of anonymized data - Comments</title>
 <link>http://ideas.4brad.com/node/441</link>
 <description>Comments for &quot;The peril of anonymized data&quot;</description>
 <language>en</language>
<item>
 <title>BIG privacy risk</title>
 <link>http://ideas.4brad.com/node/441#comment-1992</link>
 <description>&lt;p&gt;Shlomo Argamon:&lt;br /&gt;
&quot;It appears to me that there was little real privacy risk&lt;br /&gt;
in the data released by AOL&quot;&lt;/p&gt;
&lt;p&gt;Are you outta freakin&#039; gourd?&lt;/p&gt;
&lt;p&gt;Take a look at:&lt;/p&gt;
&lt;p&gt;news.com.com/AOL+offers+glimpse+into+users+lives/2100-1030_3-6103098.html&lt;/p&gt;
&lt;p&gt;No real privacy risk???  The New York Times has already located&lt;br /&gt;
and interviewed one 62y.o. woman who verified her search data.  And&lt;br /&gt;
no doubt they picked her because she was less self-incriminating&lt;br /&gt;
than others.&lt;/p&gt;
</description>
 <pubDate>Fri, 11 Aug 2006 00:45:53 -0700</pubDate>
 <dc:creator>Anon Y. Mouse</dc:creator>
 <guid isPermaLink="false">comment 1992 at http://ideas.4brad.com</guid>
</item>
<item>
 <title>Perhaps one of the early</title>
 <link>http://ideas.4brad.com/node/441#comment-1991</link>
 <description>&lt;p&gt;Perhaps one of the early research projects can be towards better anonymisation tools, which preserve privacy but don&#039;t destroy the information content.  Clearly, what was used here didn&#039;t get the job done.&lt;/p&gt;
</description>
 <pubDate>Thu, 10 Aug 2006 18:52:00 -0700</pubDate>
 <dc:creator>Paul O</dc:creator>
 <guid isPermaLink="false">comment 1991 at http://ideas.4brad.com</guid>
</item>
<item>
 <title>AOL released to the public</title>
 <link>http://ideas.4brad.com/node/441#comment-1984</link>
 <description>&lt;p&gt;Not just to researchers.     Release for researchers can be done, but the researchers should sign contracts of confidentiality, and keep the data on secured machines (not connected to internet) and destroy it after use (they can always get it again in a pinch.)&lt;/p&gt;
</description>
 <pubDate>Wed, 09 Aug 2006 19:19:34 -0700</pubDate>
 <dc:creator>brad</dc:creator>
 <guid isPermaLink="false">comment 1984 at http://ideas.4brad.com</guid>
</item>
<item>
 <title>Data sharing is critical</title>
 <link>http://ideas.4brad.com/node/441#comment-1983</link>
 <description>&lt;p&gt;In all the protests against AOL&#039;s sharing of the query-log data, there&lt;br /&gt;
has been little discussion of the importance of such data to research&lt;br /&gt;
on information retrieval.  In addition to the real privacy concerns, a&lt;br /&gt;
key point that must be considered is the fact that if useable data is&lt;br /&gt;
not made available to the wider research community, only the big&lt;br /&gt;
search companies will be able to analyze that data.  We academic&lt;br /&gt;
researchers are increasingly dependent upon industry for this sort of&lt;br /&gt;
data to do research; the sort of small-scale data that can be gathered&lt;br /&gt;
in a university-based setting is simply insufficient for obtaining&lt;br /&gt;
reliable experimental results.&lt;/p&gt;
&lt;p&gt;Should companies be prevented from sharing data with the research&lt;br /&gt;
community (either by law or public outcry), research progress will be&lt;br /&gt;
greatly reduced, as it will be impossible to compare different studies&lt;br /&gt;
with one another, since each study&#039;s data will be proprietary, and&lt;br /&gt;
thus no one will be able to trust any research result from another&lt;br /&gt;
lab.  All non-industrial research in this area will more-or-less dry&lt;br /&gt;
up, and search technology will tend more and more to be developed in&lt;br /&gt;
&quot;closed-shop&quot; efforts within the large firms; innovative startups and&lt;br /&gt;
open-source hacking will not exist, since the research projects that&lt;br /&gt;
serve as launching pads for such technological innovation will not&lt;br /&gt;
exist.  This prospect should disturb us all, as search technology&lt;br /&gt;
(broadly construed) is more and more the vehicle that people use to&lt;br /&gt;
gain information about their society and the world.&lt;/p&gt;
&lt;p&gt;All of this is not meant to ignore the real privacy issues that can be&lt;br /&gt;
involved in the preparation and release of such data.  It appears to&lt;br /&gt;
me that there was little real privacy risk in the data released by&lt;br /&gt;
AOL, but it is clear that policies and practices need to be debated&lt;br /&gt;
and developed that accomplish two essential goals: (a) to protect the&lt;br /&gt;
privacy of individuals in any sharing of research data, and (b) to&lt;br /&gt;
ensure that as much useful data can be shared by companies with the&lt;br /&gt;
greater research community.&lt;/p&gt;
&lt;p&gt;Shlomo Argamon, Associate Professor&lt;br /&gt;
Department of Computer Science&lt;br /&gt;
Illinois Institute of Technology&lt;br /&gt;
Chicago, IL 60616&lt;/p&gt;
</description>
 <pubDate>Wed, 09 Aug 2006 14:41:05 -0700</pubDate>
 <dc:creator>Shlomo Argamon</dc:creator>
 <guid isPermaLink="false">comment 1983 at http://ideas.4brad.com</guid>
</item>
<item>
 <title>&quot;Nothing&quot; raid</title>
 <link>http://ideas.4brad.com/node/441#comment-1982</link>
 <description>&lt;p&gt;If the Kiddieporn Police were stupid enough to raid you based on your search, you can be sure they wouldn&#039;t find &quot;nothing.&quot; An &quot;unproductive&quot; raid is very embarrassing to those crusading officers and their politician-prosecutor bosses, so they&#039;d do whatever it takes to avoid coming up empty handed. If they scoured your hard disk and found no child pornography, they&#039;d probably just plant some of what they had lying around the lab and commit serial perjury (they&#039;d get away with that because it&#039;s your word against theirs-- a statement by an esteemed vice officer is always believable while that of a presumed pedophile never is). Failing that, they can easily trump up some other &quot;genuine&quot; violations sufficient to justify the raid and persuade you to accept their plea bargain so they can get credit for both the raid and the conviction on this month&#039;s status report. With the proliferation of laws, a prosecutor who wants to get you can always find some convincing evidence of a violation, even if it&#039;s a law you never knew existed (or something Alberto Gonzales dreamed up at Dick Cheney&#039;s request). That&#039;s how Justice works in Bushi Amerika.&lt;/p&gt;
</description>
 <pubDate>Tue, 08 Aug 2006 21:19:13 -0700</pubDate>
 <dc:creator>Anonymous</dc:creator>
 <guid isPermaLink="false">comment 1982 at http://ideas.4brad.com</guid>
</item>
<item>
 <title>The peril of anonymized data</title>
 <link>http://ideas.4brad.com/node/441</link>
 <description>&lt;p&gt;The blogosphere is justifiably abuzz with the release by AOL of &amp;#8220;anonymized&amp;#8221; search query histories for over 500,000 AOL users, trying to be nice to the research community.   After the fury, they pulled it and issued a decently strong apology, but the damage is done.&lt;/p&gt;

&lt;p&gt;Many people have pointed out obvious risks, such as the fact that searches often contain text that reveal who you are.  Who hasn&amp;#8217;t searched on their own name?  (Alas, I&amp;#8217;m now the #7 &amp;#8220;brad&amp;#8221; on Google, a shadow of my long stint at #1.)&lt;/p&gt;

&lt;p&gt;But some other browsers have discovered something far darker.   There are searches in there for things like &amp;#8220;how to kill your wife&amp;#8221; and child porn.   Once that&amp;#8217;s discovered, isn&amp;#8217;t that now going to be sufficient grounds for a court order to reveal who that person was?  It seems there is probable cause to believe user 17556639 is thinking about killing his wife.  And knowing this very specific bit of information, who would impede efforts to investigate and protect her?&lt;/p&gt;

&lt;p&gt;But we can&amp;#8217;t have this happening in general.  How long before sites are forced to look for evidence of crimes in &amp;#8220;anonymized&amp;#8221; data and warrants then nymize it. (Did I just invent a word?)&lt;/p&gt;

&lt;p&gt;After all, I recall a year ago, I wanted to see if Google would sell adwords on various nasty searches, and what adwords they would be.   So I searched for &amp;#8220;kiddie porn&amp;#8221; and other nasty things.   (To save you the stigma, Google clearly has a system designed to spot such searches and not show ads, since people who bought the word &amp;#8220;kiddie&amp;#8221; may not want to advertise on those results.)&lt;/p&gt;

&lt;p&gt;So had my Google results been in such a leak, I might have faced one of those very scary kiddie porn raids, which in the end would find nothing after tearing apart my life and confiscating my computers.  (I might hope they would have a sanity check on doing this to somebody from the EFF, but who knows.  And you don&amp;#8217;t have that protection even if somebody would accord it to me.)&lt;/p&gt;

&lt;p&gt;I expect we&amp;#8217;ll be seeing the reprecussions from this data spill for some time to come.  In the end, if we want privacy from being data mined, deletion of such records is the only way to go.&lt;/p&gt;
</description>
 <comments>http://ideas.4brad.com/node/441#comments</comments>
 <category domain="http://ideas.4brad.com/archives/cat_privacy.html">Privacy</category>
 <pubDate>Mon, 07 Aug 2006 13:51:46 -0700</pubDate>
 <dc:creator>brad</dc:creator>
 <guid isPermaLink="false">441 at http://ideas.4brad.com</guid>
</item>
</channel>
</rss>
