Portable identity as vaseline
Earlier I wrote an essay on the paradox of identity management describing some counter-intuitive perils that arise from modern efforts at federated identity. Now it's time to expand these ideas to efforts for portable personal data, especially portable social networks.
Partly as a reaction to Facebook's popular applications platform, other social networking players are seeking a way to work together to stop Facebook from taking the entire pie. The Google-lead open social effort is the leading contender, but there are a variety of related technologies, including OpenID, hcard and other microformats. The primary goal is to make it easy, as users move from one system to another, or run sub-abblications on one platform, to make it easy to provide all sorts of data, including the map of their social network, to the other systems.
Some are also working on a better version of this goal, which is to allow platforms to interoperate. As I wrote a year ago interoperation seems the right long term goal, but a giant privacy challenge emerges. We may not get very many chances to get this right. We may only get one.
The paradox I identified goes against how most developers think. When it comes to greasing the skids of data flow, "features" such as portability, ease of use and user control, may not be entirely positive, and may in fact be on the whole negative. The easier it is for data to flow around, the more it will flow around, and the more that sites will ask, and then demand that it flow. There is a big difference between portability between applications -- such as OpenOffice and MS Word reading and writing the same files -- and portability between sites. Many are very worried about the risks of our handing so much personal data to single 3rd party sites like Facebook. And then Facebook made it super easy -- in fact mandatory with the "install" of any application -- to hand over all that data to hundreds of thousands of independent application developers. Now work is underway to make it super easy to hand over this data to every site that dares to ask or demand it.
Sites, unlike programs, are not at all under your control. And they are almost all greedy. The see no reason not to ask for as much data as they might want to use in the future. And they also often have business plans which need user data they can "monitize" to work. Their interests are not aligned with the users when it comes to privacy. Once data is handed over to a site, it's generally permanently out of your control. Even if you know enough about the site to trust it -- something we surely don't know about all Facebook app providers -- all sites undergo management changes or simply changes of thought. While some portable data advocates think of the portability systems as "vaseline" that will grease the skids of smooth interoperation, the truth is it may assist another function of vaseline.
Just as OpenID will cause millions of sites to now demand a login when before they had no need for one, portable personal information formats will cause millions of new sites to demand this highly detailed personal data, simply because they can, and it's easy for the user to provide.
Admittedly this may be better than what's happening now -- where new sites, in order to get data from other bigger sites ask you for your userid and password so they can login as you and scrape the data. That's a huge security risk and teaches users to be phishable. But making this transportation of data so greased that it happens all the time is not the answer either.
Control is not the answer
The primary answer I receive from developers is that the answer will be to give the user complete control over what data is handed over. The OpenSocial and OAuth systems hope to give this control. For example, they are taking the generally good approach that for many types of data, the application will need to make a special query, and that query may result in a dialog box which confirms with the user whether she wants to hand that particular data to that application. This seems good, but intuition predicts and testing reveals that users don't want to be asked 100 questions in dialog boxes about handing over data. They just want to answer one question, if that.
The system also includes the good step of expecting programs to ask the server which does have the data to perform operations on it, never revealing it to the app. For example, say an app wishes to E-mail everybody on your contact list. While you could let the app have the whole contact list, and let it mail to it, you could also provide an API where it asks to mail everybody on the list (or matching other profiles.) It never sees the list, but it can arrange mail to it, if that's approved.
That's good but it's hard to do, because you can not do anything the API designers didn't think of. As soon as somebody has a new app that requires an ability nobody thought of -- and all good apps do that -- the app is forced to go deeper, and ask for all the data so it can do its newfangled thing with it. I can imagine the construction of a bytecode language for operations on the data, but it's hard to make that secure -- a rogue app could easily extract the personal data and find ways to send it elsewhere. True protection would require all parties that make use of private data know how to receive it from the private data host, and again it's hard to plan for the future.
It may be possible to provide an infrastructure where trusted code modules can be prepared that will run on the machine that holds your data. These modules must be vetted by the community to see that they are behaving. This vetting would stop overt attacks but would not stop clever hidden ones. The source code to all code would have to be public, I think.
In addition, you would need to have trusted "combiner" systems. A combiner system would be given the private data from several users who trust one another, and try to do useful things combining it. Again, we would need a way to trust all that code, as well as the host. That's hard.
Negotiation is key
However, the real reason that control is not a sufficient answer is that it suggests a world of individual decisions. This doesn't work nearly so well with privacy as people think, and the massive number of people who have left checked the mandatory "know who I am and access my information" box on Facebook. Fine grained control is not enough, because single individuals have almost no power to negotiate with applications and sites, especially big ones. It's going to be an all or nothing proposition for so many reasons, and the only real choice will be "use the app or don't" for most users. As we've seen, if the app is cool or gets a buzz, users take the "all" approach, because it's easy, and because of the fundamental theorem of privacy -- people don't worry about their privacy until after they've suffered an invasion.
When a site/app wants data, only somebody representing a very large group of users can negotiate the privacy terms. Only that powerful player can say, "we think you're asking for more than you really need -- tell us why you actually need that." And if the answer is not adequate, only a powerful player can say, "Too bad, you're not getting it." The player has to be powerful enough to not only say no, but to mean it -- their "no" will mean that a vast body of users will never use the application until it fixes its demands. That requires a strong trust by the users, who otherwise would often ignore the warnings that say "This site wants too much information, don't join it."
The negotiation problem creates another part of the paradox. A central identity aggregator may be more useful than personal control. However, there is another middle solution, namely protocols that allow a 3rd party to get in the way of these private data exchanges. These trusted third parties, which might be consumer rights advocates like, say, Consumer Reports or EPIC, might publish a service providing opinions on applications and the data they are asking for. Each data request beyond a certain profile would need to be vetted and negotiated. Before handing over my data, my system would check with these services to find out if it's wise to do so. And if not, it would simply refuse. No UI.
In fact, there even has to be some debate about whether to have a warning like "Your privacy service has refused to grant personal data demanded by the application you are using. If you wish to override this, at risk to your personal data, indicate your agreement" is the right choice. Warnings that sound frequently get ignored, and apps may try to do an end-run around the system by making the warnings so frequent that users just give up and approve wholesale changes. The warnings must be rare enough to be noticed, but right now they would not be. It makes sense to make them as frequent as the Google badware warnings, not as frequent as the warnings that come before running any app you download. This is a tough problem.
It is one that needs solving, however.