The paradox of identity management


Since the dawn of the web, there has been a call for a "single sign-on" facility. The web consists of millions of independently operated web sites, many of which ask users to create "accounts" and sign-on to use the site. This is frustrating to users.

Today the general single sign-on concept has morphed into what is now called "digital identity management" and is considerably more complex. The most recent project of excitement is OpenID which is a standard which allows users to log on using an identifier which can be the URL of an identity service, possibly even one they run themselves.

Many people view OpenID as positive for privacy because of what came before it. The first major single sign-on project was Microsoft Passport which came under criticism both because all your data was managed by a single company and that single company was a fairly notorious monopoly. To counter that, the Liberty Alliance project was brewed by Sun, AOL and many other companies, offering a system not run by any single company. OpenID is simpler and even more distributed.

However, I feel many of the actors in this space are not considering an inherent paradox that surrounds the entire field of identity management. On the surface, privacy-conscious identity management puts control over who gets identity information in the hands of the user. You decide who to give identity info to, and when. Ideally, you can even revoke access, and push for minimal disclosure. Kim Cameron summarized a set of laws of identity outlining many of these principles.

In spite of these laws one of the goals of most identity management systems has been ease of use. And who, on the surface, can argue with ease of use? Managing individual accounts at a thousand web sites is hard. Creating new accounts for every new web site is hard. We want something easier.

The paradox

However, here is the contradiction. If you make something easy to do, it will be done more often. It's hard to see how this can't be true. The easier it is to give somebody ID information, the more often it will be done. And the easier it is to give ID information, the more palatable it is to ask for, or demand it. Consider the magstripe found on most driver's licences. This seems like personal identity management. That card is physically under your control, in your wallet. Nobody, except a police officer who suspects you of something, can demand you produce it. You control whether they can just look at it or can scan it.

Yet the very existence of the stripe makes it easy to read all the data on the card. Sure, they could also look in the card and slowly type it all in, or photograph it, but as you know this is rare. If somebody is demanding this card for ID, it's faster for them and for you to have them swipe it rather than type in the number and/or your other information. As a result it seems more "reasonable" for them to ask to swipe it, even if they don't demand it. And thus far more data is collected. (So much that there are legal efforts to limit such scanning.)

This applies even to "ideal" digital identity management systems which let you tweak what information they provide to a web site. In such a system, you can control whether your system offers up a pseudonym or your full name and address. You want that, because if you're buying a book you want to easily tell them where to send it.

However, at the same time this easy ability to offer your address makes it easy to ask. Today, a site that wants to ask for extra information it doesn't really need has a disincentive -- it has to push you to a form where you have to type it in manually. This makes it far more likely they will ask for this only if they really need it. It makes it really unlikely that they will demand it unless they truly need it. It still happens (I routinely see sites asking for phone numbers they don't need) but it happens less often than if providing this information required merely a click.

That's because once you make it trivial to hand over your information, you quickly get to the state where only the privacy zealots put up a fight. And thanks to the fundamental theorem of privacy advocacy -- most people don't care about their privacy until after it's invaded -- this means most people will hand over far more information than needed, and in many cases the few who complain are few enough that companies can safely decide to refuse to deal with them if they won't hand over the information that's so easy to hand over.

It's against our intuition to think of ease of use as a bug, rather than a feature, but sometimes this can be the case.

In addition, single sign-on systems tend to make correlation of user data easier, in spite of their many efforts to try to address this problem. If you use the same ID to sign on at many web sites, it's hard to stop them from correlating that fact if they get together. Of course, most people use the same login on many sites today, but this is less reliable. (When a site demands an E-mail from me I give a different E-mail to each site, which among other things allows me to see if they pass the E-mail address to any 3rd party.) One of the common identity attributes that will be requested with OpenID is an E-mail address, and this becomes harder to vary if you're getting the benefit of the single sign-on.

Needless accounts

Identity management also encourages the creation of "accounts" when they are not strictly needed at all. Should OpenID become a success, every site will want to use it. Sites that would not have troubled users to create an account to use them will now find it trivial to do so. Their current easy alternative -- cookies -- are stored on the user's machine and much more under user control, and much harder to correlate with other sites.

Fully implemented, I predict we'll see "one click account creation" and "one click login" through the user of browser add-ons. This will result in sites that were perfectly usable without an account suddenly demanding them. Why not, after all? Sites with commercial interest are keenly interested in accounts in order to learn about their users and present information to advertisers or investors.

Use in authoritarian regimes

It is also important to consider how the identity management technology we build will be used in places like China, Saudi Arabia or North Korea. Whatever global standards we adopt, especially with open source or free software, will be readily available for use in these countries.

Unfortunately, these countries will not follow the same principles of user control and consent on identity collection that we do. However, we will save them the time and trouble of building their own ID and surveillance infrastructure. They can readily adapt ours.

We may have to ask ourselves what ethical duty we have to the people of those countries. How would we design our systems if we lived in those places? What software would we give away to those governments? Is our own convenience and ease of use worth so much to us that we want to give these technologies to China where they will help restrict the activities of a billion people? This is not an easy question. The real villains are the oppressors, not the authors of the technology, but that doesn't stop us from considering how what we build will be used.

No solution?

There may be no solution to this paradox. Identity disclosure is, in a sense, the opposite of privacy. Any system that assists in identity disclosure is unlikely to help protect privacy. There are technologies, such as secure pseudonyms and anonymity, and non-correlatable identifiers, which can help, but they are tricky.

It may be, oddly, that one thing which can stop the over-collection of information harkens back to the bad centralized systems OpenID and Liberty Alliance wished to displace. A centralized system is able to track what sort of information sites are requesting, and may have superior power to negotiate what is disclosed with those sites. While a single user can't do much about a site that asks for extra information it doesn't need, users banded together can both notice and document this activity, and fight it in concert.

For example, when you go to a site to create an account, it would be nice to be able to complain that it's asking for too much information, and to have people who will investigate that. Those same people, if they agree, could try to negotiate the amount of collected information downwards, speaking not just for themselves but potentially for millions of users of their service. If the site refuses, when their users attempt to hand over the superfluous information, the tools should offer warnings, and indicate boycotts or even the presence of alternatives.

Of course, having one master site like Passport is not the answer to that. But it may make sense to pair distributed ID systems with centralized ID negotiators from the start.

Being a platform for something better

OpenID could become a platform for more sophisticated tools if users adopt them and use them through nice interfaces such as browser add-ons. For example, users could maintain thousands of different OpenIDs all tied to a hidden master ID the user maintains, and then provide a different one to each site. Roaming however can be an issue with this.

How much do they really need?

It's also worth noting that sites tend to vastly overestimate how much data they truly need about users in order to do their job, or even to please their sponsors. In many cases, all sites really need to know is that you are unique -- for example that you're not a duplicate of a user they banned from their system. They may simply need to know you are an adult without knowing anything else about you. Some identity systems have plans to implement this sort of attribute information, in contrast with the personally identifiable information such as names, ID numbers, E-mail addresses, phone numbers, zip codes and the like that we so often see.

Later I will offer a proposal for a system that provides unique anonymous identifiers that should be all that most sites need. In this case, you offer an identifier which can't be correlated with any other identifier, or traced back to the user, but which is the only such identifier possible for a given real person. In this case, if such an identifier gets a negative reputation (such as being banned from a site for spamming or trolling) the user can't shake that negative rap by just getting another ID.


Hi Brad,

Great post, lots of stuff to think about here. Are you aware of the directed identity stuff going on to OpenID 2.0? Essentially it will let you enter the URL to your OpenID provider (rather than your specific OpenID) - your provider will then generate a one-time OpenID that only works for you on that particular site, preventing that site from correlating your accounts. Your provider will still know which accounts you are using, but at least you can chose your provider based on their privacy policy.

Those of us in the tin-foil hat community (and I include myself) can do a lot with the more advanced identity control tools in OpenID. Though we already can do that, and many of us already do, with the old fashioned password reminder tool in the browser or various plugins.

Again, as I said, the easier you make it to hand over identity info, the easier it is to demand it.

You and I understand the value of privacy, but my son is just out of college . . . he's immortal still. He's been on a computer since he was three, and he's never been in danger in any world. The young men and women he went to university with and the young men and women who visit my blog are flattered when programs offer to track their slightest preferences.

The folks at CVS don't understand what they are giving away when they get that 10% discount, so why should my son with no experience understand?

It's not only that easy to go along with giving away the information. It's also that often an ego boost comes with the giving -- look there's my picture on the MyBlogLog widget, isn't that cool? Watch them track me all over the Internet.

It seems that many folks just can't see below the surface. Some days I feel that it could be too late for most of us already.

If you need authenticity without identification ("only the real Mr Anonymous could have written this comment") there is a solution called "trip-codes", used in parts of the web. Basically, the user enters an arbitrary password which is hashed and the hash value is displayed alongside the user name. It's hard to fake, because you'd have to guess the password. However it makes no intrusive assertions about real identity.

Although I can agree that the way OpenID folk is heading is wrong, I cannot agree with everything. I do not agree that making the information sharing easier will inevitably lead to the massive disclose of personal data. I think that proper legislation and a good reputation mechanism can lead to a suitable equilibrium between privacy and disclose.
I strongly recommend you the books by Daniel Solove, especially "The Digital Person". You can find many answers there.

Legislation is not impotent, but it's very far from omnipotent. Information flows around the rules, if you let it out. Regularly we see stories how how you can buy all sorts of records from corrupt government employees. I doubt we can stop corruption.

In addition, we can't solve the problem of future surveillance technologies that most people have not dreamed of. Today there is not adequate AI to understand all this data we've let out, but there will be, and laws won't stop it.

Add new comment