Archives

Date
  • 01
  • 02
  • 03
  • 04
  • 05
  • 06
  • 07
  • 08
  • 09
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31

Anonymizing is hard to do

One of the few positive things over the recent giant AOL data spill (which we have asked the FTC to look into) is it has hopefully taught a few lessons about just how hard it is to truly anonymize data. With luck, the lesson will be “don’t be fooled into thinking you can do it” and not “Just avoid what AOL did.”

There is some Irony that in general, AOL is one of the better performers. They don’t keep a permanent log of searches tied to userid, though it is tied, reports say, to a virtual ID. (I have seen other reports to suggest even this is erased after a while.) AOL also lets you turn off short term logging of the association with your real ID. Google, MSN, Yahoo and others keep the data effectively forever.

Everybody has pointed out that for many people, just the search queries themselves can be enough to identify a person, because people search for things that relate to them. But many people’s searches will not be trackable back to them.

However, the AOL records maintain the exact time of the search, to the second or perhaps more accurately. They also maintain the site the user clicked on after doing the search. AOL may have wiped logs, but most sites don’t. Let’s say you go through the AOL logs and discover an AOL user searched and clicked on your site. You can go into your own logs and find that search, both from the timestamp, and the fact the “referer” field will identify that the user came via an AOL search for those specific terms.

Now you can learn the IP address of the user, and their cookies or even account with your site, if your site has accounts.

If you’re a lawyer, however, doing a case where you can subpoena information, you could use that tool to identify almost any user in the AOL database who did a modest volume of searches. And the big sites with accounts could probably identify all their users who are in the database, getting their account id (and thus often name and email and the works.)

So even if AOL can’t uncover who many of these users are due to an erasure policy, the truth is that’s not enough. Even removing the site does not stop the big sites from tracking their own users, because their own logs have the timestamped searches. And an investigator could look for a query, do the query, see what sites you would likely click on, and search the logs of those sites. They would still find you. Even without the timestamp this is possible for an uncommon query. And uncommon queries are surprisingly common. :-)

I have a static IP address, so my IP address links directly to me. Broadband users who have dynamic IP addresses may be fooled — if you have a network gateway box or leave your sole computer on, your address may stay stable for months at a time — it’s almost as close a tie as a static IP.

The point here is that once the data are collected, making them anonymous is very, very hard. Harder than you think, even when you take into account this rule about how hard it is.

So tell me again why you need a stay on the order stopping the wiretapping?

You probably heard yesterday’s good news that the ACLU prevailed in their petition for an injunction against the NSA warrentless wiretapping. (Our case against AT&T to hold them accountable for allegedly participating in this now-ruled-unlawful program continues in the courts.)

However, the ruling was appealed (no surprise) and the government also asked for, and was granted a stay of the injunction. So the wiretaps won’t stop unless the appeal is won.

But this begs the question, “Why do you need a stay?”

The line from the White House has been that the government engaged in this warrantless wiretapping because the the President had the authority to do that, both inherently and under the famous AUMF. And they wanted to use that authority because they complained the official system mandated by law, requiring process before the FISA court, was just too cumbersome. Even though the FISA law allows immediate emergency wiretaps without a warrant as long as a retroactive application is made soon.

We’ve all wondered just why that’s too cumbersome. But they seemed to be saying that since the President haud the authority to bypass the FISA court, why should they impede the program with all that pesky judicial oversight?

But now we have a ruling that the President does not have that authority. Perhaps that will change on appeal, but for now it is the ruling. So surely this should mean that they just go back to doing it the way the FISA regulations require it? What’s the urgent need for a stay? Could they not have been ready with the papers to get the warrants they need if they lost?

Well, I think I know the answer. Many people suspect that the reason they don’t go to FISA is not because it’s too much paperwork. It’s because they are trying to do things FISA would not let them do. So of course they don’t want to ask. (The FISA court, btw, has only told them no once, and even that was overturned. That’s about all the public knows about all its rulings.) I believe there is a more invasive program in place, and we’ve seen hints of that in press reports, with data mining of call records and more.

By needing this stay, the message has come through loud and clear. They are not willing to get the court’s oversight of this program, no way, no how. And who knows how long it will be until we learn what’s really going on?

Infrared patterns and paint to screw with tourist video/photos

Last week at ZeroOne in San Jose, one of the art pieces reminded me of a sneaky idea I had a while ago. As you may know, many camcorders, camera phones and cheaper digital cameras respond to infrared light. You can check this out pretty easily by holding down a button on your remote control while using the preview screen on your camera. If you see a bright light, you’re camera shoots in infrared.

Anyway, the idea is to find techniques, be they arrays of bright infrared LEDs, or paints that shine well in infrared but are not obvious in visible light, and create invisible graffiti that only shows up in tourist photos and videos. Imagine the tourists get home from their trip to fisherman’s wharf, and the side of the building says something funny or rude that they are sure wasn’t there when they filmed it.

The art piece at ZeroOne used this concept to put up a black monolith to the naked eye. If you pulled out your camera phone or digital camera, you could see words scrolling down the front. Amusing to watch people watch it. Another piece by our friends at .etoy also had people pulling out cameraphones to watch it. They displayed graphics made of giant pixels on a wall just a few feet from you. Up close, it looked like random noise. If you found a way to widen your field of view (which the screen on a camera can do) allowed you to see the big picture, and you could see the images of talking faces. (My SLR camera’s 10mm lens through the optical viewfinder worked even better.)

That piece only really worked at night, though with superbright LEDs I think it could be done in the day. I don’t know if there are any paints to coatings to make this work well. It would be amusing to tag the world with tags that can only be seen when you pull out your camera.

I remember IBM

Everybody’s pulling out IBM PC stories on the 25th anniversary so I thought I would relate mine. I had been an active developer as a teen for the 6502 world — Commodore Pet and Apple ][ and Atari 800 and the like, and sold my first game to Personal Software Inc. back in 1979. PSI was just starting out, but the founders hired me on as their first employee to do more programming. The company became famous shortly thereafter by publishing VisiCalc, which was the first serious PC application, and the program that helped make Apple as a computer company outside the hobby market.

In 1981, I came back for a summer job from school. Mitch Kapor, who had worked for Personal Software in 1980 (and had been my manager at the time) had written a companion for VisiCalc, called VisiPlot. VisiPlot did graphs and charts, and a module in it (VisiTrend) did statistical analysis. Mitch had since left, and was on his way to founding Lotus. Mitch had written VisiPlot in Apple ][ Basic, and he won’t mind if I say it wasn’t a masterwork of code readability, and indeed I never gave it more than a glance. Personal Software, soon to be renamed VisiCorp, asked me to write VisiPlot from scratch, in C, for an un-named soon to be released computer.

I didn’t mention this, but I have never coded in C before. I picked up a copy of the Kernighan and Ritchie C manual, and read it as my girlfriend drove us over the plains on my trip from Toronto to California.

I wasn’t told much about the computer I would be coding for. Instead, I defined an API for doing I/O and graphics, and wrote to a generalized machine. Bizarrely (for 1981) I did all this by dialing up by modem to a unix computer time sharing service called CCA on the east coast. I wrote and compiled in C on unix, and defined a serial protocol to send graphics back to, IIRC an Apple computer acting as a terminal. And, in 3 months, I made it happen.

(Very important side note: CCA-Unix was on the arpanet. While I had been given some access to an Arpanet computer in 1979 by Bob Frankston, the author of VisiCalc, this was my first day to day access. That access turned out to be the real life-changing event in this story.)

There was a locked room at the back of the office. It contained the computer my code would eventually run on. I was not allowed in the room. Only a very small number of outside companies were allowed to have an IBM PC — Microsoft, UCSD, Digital Research, VisiCorp/Software Arts and a couple of other applications companies.

On this day, 25 years ago, IBM announced their PC. In those days, “PC” mean any kind of personal computer. People look at me strangely when I call an Apple computer a PC. But not long after that, most people took “PC” to mean IBM. Finally I could see what I was coding for. Not that the C compilers were all that good for the 8088 at the time. However, 2 weeks later I would leave to return to school. Somebody else would write the library for my API so that the program would run on the IBM PC, and they released the product. The contract with Mitch required they pay royalties to him for any version of VisiPlot, including mine, so they bought out that contract for a total value close to a million — that helped Mitch create Lotus, which would, with assistance from the inside, outcompete and destroy VisiCorp.

(Important side note #2: Mitch would use the money from Lotus to found the E.F.F. — of which I am now chairman.)

The IBM PC was itself less exciting than people had hoped. The 8088 tried to be a 16 bit processor but it was really 8 bit when it came to performance. PC-DOS (later MS-DOS) was pretty minimal. But it had an IBM name on it, so everybody paid attention. Apple bought full page ads in the major papers saying, “Welcome IBM, Seriously.” Later they would buy ads with lines like Steve Jobs saying, “When I invented the personal computer…” and most of us laughed but some of the press bought it. And of course there is a lot more to this story.

And I was paid about $7,000 for the just under 4 months of work, building almost all of an entire software package. I wish I could program like that today, though I’m glad I’m not paid that way today.

So while most people today will have known the IBM-PC for 25 years, I was programming for it before it released. I just didn’t know it!

Why do people put angle brackets around <urls>

Quite frequently in non-HTML documents, such as E-mails, people will enclose their URLs in angle brackets, such as <http://foo.com> What is the origin of this? For me, it just makes cutting and pasting the URLs much harder (it’s easier if they have whitespace around them and easiest if they are on a line by themselves.) It’s not any kind of valid XML or HTML in fact it would cause a problem in any document of that sort.

There’s lot of software out there that parses URLs out of text documents of course, but they all seem to do fine with whitespace and other punctuation. They handle the angle bracket notation, but don’t need it. Is there any software out there that needs it? If not, why do so many people use this form?

Broadcast lectures on cell phones

Many universities are now setting up to broadcast lectures over their LANs, often in video. Many students simply watch from their rooms, or even watch later. There are many downsides to this (fewer show up in class) but the movement is growing.

Here’s a simple addition that would be a bonanza for the cell companies. Arrange to offer broadcast of lectures to student cell phones. In this case, I mean live, and primarily for those who are running late to class. They could call into the number, put on their bluetooth headset and hear the start of the lecture on the way in. All the lecture hall has to do is put the audio into a phone that calls a conference bridge (standard stuff all the companies have already) and then students can call into the bridge to hear the lecture. In fact, the cell company should probably pay the school for all the minutes they would bill.

This need not apply only to lectures at universities. All sorts of talks and large meetings could do the same, including sessions at conferences.

Perhaps it would encourage tardyness, but you could also make the latecomers wait outside (listening) for an appropriate pause at which to enter.

The peril of anonymized data

The blogosphere is justifiably abuzz with the release by AOL of “anonymized” search query histories for over 500,000 AOL users, trying to be nice to the research community. After the fury, they pulled it and issued a decently strong apology, but the damage is done.

Many people have pointed out obvious risks, such as the fact that searches often contain text that reveal who you are. Who hasn’t searched on their own name? (Alas, I’m now the #7 “brad” on Google, a shadow of my long stint at #1.)

But some other browsers have discovered something far darker. There are searches in there for things like “how to kill your wife” and child porn. Once that’s discovered, isn’t that now going to be sufficient grounds for a court order to reveal who that person was? It seems there is probable cause to believe user 17556639 is thinking about killing his wife. And knowing this very specific bit of information, who would impede efforts to investigate and protect her?

But we can’t have this happening in general. How long before sites are forced to look for evidence of crimes in “anonymized” data and warrants then nymize it. (Did I just invent a word?)

After all, I recall a year ago, I wanted to see if Google would sell adwords on various nasty searches, and what adwords they would be. So I searched for “kiddie porn” and other nasty things. (To save you the stigma, Google clearly has a system designed to spot such searches and not show ads, since people who bought the word “kiddie” may not want to advertise on those results.)

So had my Google results been in such a leak, I might have faced one of those very scary kiddie porn raids, which in the end would find nothing after tearing apart my life and confiscating my computers. (I might hope they would have a sanity check on doing this to somebody from the EFF, but who knows. And you don’t have that protection even if somebody would accord it to me.)

I expect we’ll be seeing the reprecussions from this data spill for some time to come. In the end, if we want privacy from being data mined, deletion of such records is the only way to go.

Patient's room phone with basic presence

Those who know about my phone startup Voxable will know I have far more ambitious goals regarding presence and telephony, but during my recent hospital stay, I thought of a simple subset idea that could make hospital phone systems much better for the patient, namely a way to easily specifiy whether it’s a good time to call the patient or not. Something as simple as a toggle switch on the phone, or with standard phones, a couple of magic extensions they can dial to set whether it’s good or not.

When you’re in the hospital, your sleep schedule is highly unusual. You sleep during the day frequently, you typically sleep much more than usual, and you’re also being woken up regularly by medical staff at any time of the day for visits, medications, blood pressure etc.

At Stanford Hospital, outsiders could not dial patient phones after 10pm, even if you might be up. On the other hand even when the calls can come through, people are worried if it’s a good time. So a simple switch on the phone would cause the call to be redirected to voice mail or just a recording saying it’s not a good time. Throw it to take a nap or do something else where you want peace and quiet. If you throw it at night, it stays in sleep mode until 8 or 9 hours. Then it beeps and reverts to available mode. If you throw it in the day, it will revert in a shorter amount of time (because you might forget) however a fancier interface would let you specify the time on an IVR menu. Nurses would make you available when they wake you in the morning, or you could put up a note saying you don’t want this. (Since it seems to be the law you can’t get the same nurse two days in a row.)

In particular, when doctors and nurses come in to do something with you, they would throw the switch, and un-throw it when they leave, so you don’t get a call while in the middle of an examination. The nurse’s RFID badge, which they are all getting, could also trigger this.

Now people who call would know they got you at a good time, when you’re ready to chat. Next step — design a good way for the phone to be readily reachable by people in pain, such as hanging from the ceiling on a retractable cord, or retractable into the rail on the side of the bed. Very annoying when in pain to begin the slow process of getting to the phone, just to have them give up when you get to it.

Anti-Phishing -- warn if I send a password somewhere I've never sent it

There are many proposals out there for tools to stop Phishing. Web sites that display a custom photo you provide. “Pet names” given to web sites so you can confirm you’re where you were before.

I think we have a good chunk of one anti-phishing technique already in place with the browser password vaults. Now I don’t store my most important passwords (bank, etc.) in my password vault, but I do store most medium importance ones there (accounts at various billing entities etc.) I just use a simple common password for web boards, blogs and other places where the damage from compromise is nil to minimal.

So when I go to such a site, I expect the password vault to fill in the password. If it doesn’t, that’s a big warning flag for me. And so I can’t easily be phished for those sites. Even skilled people can be fooled by clever phishes. For example, a test phish to bankofthevvest.com (Two “v”s intead of a w, looks identical in many fonts) fooled even skilled users who check the SSL lock icon, etc.

The browser should store passwords in the vault, and even the “don’t store this” passwords should have a hash stored in the vault unless I really want to turn that off. Then, the browser should detect if I ever type a string into any box which matches the hash of one of my passwords. If my password for bankofthewest is “secretword” and I use it on bankofthewest.com, no problem. “secretword” isn’t stored in my password vault, but the hash of it is. If I ever type in “secretword” to any other site at all, I should get an alert. If it really is another site of the bank, I will examine that and confirm to send the password. Hopefully I’ll do a good job of examining — it’s still possible I’ll be fooled by bankofthevvest.com, but other tricks won’t fool me.

The key needs in any system like this is it warns you of a phish, and it rarely gives you a false warning. The latter is hard to do, but this comes decently close. However, since I suspect most people are like me and have a common password we use again and again at “who-cares” sites, we don’t want to be warned all the time. The second time we use that password, we’ll get a warning, and we need a box to say, “Don’t warn me about re-use of this password.”

Read on for subtleties…  read more »