Data Hosting architectures and the safe deposit box


With Facebook seeming to declare some sort of war on privacy, it's time to expand the concept I have been calling "Data Hosting" -- encouraging users to have some personal server space where their data lives, and bringing the apps to the data rather than sending your data to the companies providing interesting apps.

I think of this as something like a "safe deposit box" that you can buy from a bank. While not as sacrosanct as your own home when it comes to privacy law, it's pretty protected. The bank's role is to protect the box -- to let others into it without a warrant would be a major violation of the trust relationship implied by such boxes. While the company owning the servers that you rent could violate your trust, that's far less likely than 3rd party web sites like Facebook deciding to do new things you didn't authorize with the data you store with them. In the case of those companies, it is in fact their whole purpose to think up new things to do with your data.

Nonetheless, building something like Facebook using one's own data hosting facilities is more difficult than the way it's done now. That's because you want to do things with data from your friends, and you may want to combine data from several friends to do things like search your friends.

One way to do this is to develop a "feed" of information about yourself that is relevant to friends, and to authorize friends to "subscribe" to this feed. Then, when you update something in your profile, your data host would notify all your friend's data hosts about it. You need not notify all your friends, or tell them all the same thing -- you might authorize closer friends to get more data than you give to distant ones. This lets each user perform friend-based operations (like reading the stream of updated status notes) on their own machine without going out to other machines. However, this also is more dangerous than the facebook approach in some ways, as you send the data out in advance and you can't easily "unsend" it. On a 3rd party site like Facebook, you can "un-friend" somebody and this will actually stop them from accessing even changes from your past. (At least in practice. In theory, they could be constantly checking and logging such things, offering you no way to delete information.)

One can move closer to that level of protection for actions that are specific to you. For example, if a friend wants to look at your profile, your photos, your videos and other such information specific to you, there is no need to have them do that in advance. The code on your data host can show their web browser what you have authorized the friend to see, and that authorization can be turned off or changed as desired. Again, the friend could program a system to constantly fetch everything and remember it, but most people would not, and in this case it would be obvious if they were doing that.

Ideally we work out a system of what to send out as a feed, in advance, and what needs to be fetched on request. What is fetched on request will be slower, especially if your friend's server is far away.

Most people will not want to spend time tuning how much information they give to friends in the feed. In this case, most people will turn to a company they trust which manages social applications to fine tune what needs to be sent and what doesn't. As you use new social applications, they may request that more information be sent so they can run efficiently, and they will need to get permission from either you, or the company you delegate management of social applications to. It is important that the interface not become too complex, with too many checkboxes, or users will reject it.

The feeds need not be sent peer-to-peer, though that generally would be pretty efficient. Aggregation of feeds can be even more efficient, especially if data hosts want to "poll" rather than get information pushed at them. Such aggregation can be reasonably secure, in that the update packets can be encrypted in a form that the friend's server can decrypt but the aggregator can't. We might end up with an architecture that's mostly central like twitter, without the privacy risks.

As I've noted before, "friend of a friend" applications that are useful are actually quite rare. They are hard to do in this architecture. You don't want to have to be sending a feed out to all the friends of friends, as that explodes in cost very quickly. For such applications, it may make sense that the company that builds the application is authorized to get feeds from the whole network. This company would be held to a higher standard as far as what protections it puts on the data, and what its privacy policy is. Because those applications would be rare, it will be easier to hold them to a higher standard, and watch them more closely.

Consider a typical Facebook session. Your "feed" would already be present on your own data host and be shown to you directly and locally. If you clicked on things in the feed that related to a specific user, the frame in your page would be built by that user's data host, according to what permissions you have to see things.

Visits to shared things like pages, groups and events might be on a central server, or might be on the data host of the person creating them -- this would be up to the people writing that software. You data host would subscribe to a feed of notifications for it -- new emails, new events, new invitations and the like.

Even advertising could be supported, but in this case your data host would be given code, updated on a regular basis, to calculate what ad to show you based on your data. To fully protect privacy, the ad code could even be pre-cached and served from your own host, so that nobody learns -- unless you click on an add -- what ads you saw. (Advertisers might not like this, of course, as they want to learn click-through percentages, but even if the ads are centrally served it still is better than today's situation.)

Of course, editing your profile or other personal data would be done on your data host, which would then publish only the appropriate changes that others should know about to the others that should know it.

It would of course be possible to offer feeds of your changes to sites, and not just friends. This offers a greater privacy risk, and should not actually be necessary most of the time. The idea is those sites should be offering you code which will run on your data host and decide what to do based on that knowledge of your data, rather than ever sending your data out. A site that truly needs to aggregate your data with other people's would need to convince you -- or the proxy you trust -- to authorize such an outgoing feed. The default would be that your data stays home, the exception would be that it goes out. That's a reversal of how sites like facebook work today.


"’s time to expand the concept I have been calling “Data Hosting” — encouraging users to have some personal server space where their data lives, and bringing the apps to the data rather than sending your data to the companies providing interesting apps."

I've already got one of those, it's called a "desktop computer" and I've had it for years.

Add new comment