data hosting

The peril of the Facebook anti-privacy pattern

There’s been a well justified storm about Facebook’s recent privacy changes. The EFF has a nice post outlining the changes in privacy policies at Facebook which inspired this popular graphic showing those changes.

But the deeper question is why Facebook wants to do this. The answer, of course, is money, but in particular it’s because the market is assigning a value to revealed data. This force seems to push Facebook, and services like it, into wanting to remove privacy from their users in a steadily rising trend. Social network services often will begin with decent privacy protections, both to avoid scaring users (when gaining users is the only goal) and because they have little motivation to do otherwise. The old world of PC applications tended to have strong privacy protection (by comparison) because data stayed on your own machine. Software that exported it got called “spyware” and tools were created to rout it out.

Facebook began as a social tool for students. It even promoted that those not at a school could not see in, could not even join. When this changed (for reasons I will outline below) older members were shocked at the idea their parents and other adults would be on the system. But Facebook decided, correctly, that excluding them was not the path to being #1.  read more »

Data Hosting architectures and the safe deposit box

With Facebook seeming to declare some sort of war on privacy, it’s time to expand the concept I have been calling “Data Hosting” — encouraging users to have some personal server space where their data lives, and bringing the apps to the data rather than sending your data to the companies providing interesting apps.

I think of this as something like a “safe deposit box” that you can buy from a bank. While not as sacrosanct as your own home when it comes to privacy law, it’s pretty protected. The bank’s role is to protect the box — to let others into it without a warrant would be a major violation of the trust relationship implied by such boxes. While the company owning the servers that you rent could violate your trust, that’s far less likely than 3rd party web sites like Facebook deciding to do new things you didn’t authorize with the data you store with them. In the case of those companies, it is in fact their whole purpose to think up new things to do with your data.

Nonetheless, building something like Facebook using one’s own data hosting facilities is more difficult than the way it’s done now. That’s because you want to do things with data from your friends, and you may want to combine data from several friends to do things like search your friends.

One way to do this is to develop a “feed” of information about yourself that is relevant to friends, and to authorize friends to “subscribe” to this feed. Then, when you update something in your profile, your data host would notify all your friend’s data hosts about it. You need not notify all your friends, or tell them all the same thing — you might authorize closer friends to get more data than you give to distant ones.  read more »

Why facebook wants you to open up your profile

There is some controversy, including a critique from our team at the EFF of Facebook’s new privacy structure, and their new default and suggested policies that push people to expose more of their profile and data to “everyone.”

I understand why Facebook finds this attractive. “Everyone” means search engines like Google, and also total 3rd party apps like those that sprung up around Twitter.

On Twitter, I tried to have a “protected” profile, open only to friends, but that’s far from the norm there. And it turns out it doesn’t work particularly well. Because twitter is mostly exposed to public view, all sorts of things started appearing to treat twitter as more a micro blogging platform than a way to share short missives with friends. All of these new functions didn’t work on a protected account. With a protected account, you could not even publicly reply to people who did not follow you. Even the Facebook app that imports your tweets to Facebook doesn’t work on protected accounts, though it certainly could.

Worse, many people try to use twitter as a “backchannel” for comments about events like conferences. I think it’s dreadful as a backchannel, and conferences encourage it mostly as a form of spam: when people tweet to one another about the conference, they are also flooding the outside world with constant reminders about the conference. To use the backchannel though, you put in tags and generally this is for the whole world to see, not just your followers. People on twitter want to be seen.

Not so on Facebook and it must be starting to scare them. On Facebook, for all its privacy issues, mainly you are seen by your friends. Well, and all those annoying apps that, just to use them, need to know everything about you. You disclose a lot more to Facebook than you do to Twitter and so it’s scary to see a push to make it more public.

Being public means that search engines will find material, and that’s hugely important commercially, even to a site as successful as Facebook. Most sites in the world are disturbed to learn they get a huge fraction of their traffic from search engines. Facebook is an exception but doesn’t want to be. It wants to get all the traffic it gets now, plus more.

And then there’s the cool 3rd party stuff. Facebook of course has its platform, and that platform has serious privacy issues, but at least Facebook has some control over it, and makes the “apps” (really embedded 3rd party web sites) agree to terms. But you can’t beat the innovation that comes from having less controlled entrepreneurs doing things, and that’s what happens on twitter. Facebook doesn’t want to be left behind.

What’s disturbing about this is the idea that we will see sites starting to feel that abandoning or abusing privacy gives them a competitive edge. We used to always hope that sites would see protecting their users’ privacy as a competitive edge, but the reverse could take place, which would be a disaster.

Is there an answer? It may be to try to build applications in more complex ways that still protect privacy. Though in the end, you can’t do that if search engines are going to spider your secrets in order to do useful things with them; at least not the way search engines work today.

Data hosting could let me make Facebook faster

I’ve written about “data hosting/data deposit box” as an alternative to “cloud computing.” Cloud computing is timesharing — we run our software and hold our data on remote computers, and connect to them from terminals. It’s a swing back from personal computing, where you had your own computer, and it erases the 4th amendment by putting our data in the hands of others.

Lately, the more cloud computing applications I use, the more I realize one other benefit that data hosting could provide as an architecture. Sometimes the cloud apps I use are slow. It may be because of bandwidth to them, or it may simply be because they are overloaded. One of the advantages of cloud computing and timesharing is that it is indeed cheaper to buy a cluster mainframe and have many people share it than to have a computer for everybody, because those computers sit idle most of the time.

But when I want a desktop application to go faster, I can just buy a faster computer. And I often have. But I can’t make Facebook faster that way. Right now there’s no way I can do it. If it weren’t free, I could complain, and perhaps pay for a larger share, though that’s harder to solve with bandwidth.

In the data hosting approach, the user pays for the data host. That data host would usually be on their ISP’s network, or perhaps (with suitable virtual machine sandboxing) it might be the computer on their desk that has all those spare cycles. You would always get good bandwidth to it for the high-bandwidth user interface stuff. And you could pay to get more CPU if you need more CPU. That can still be efficient, in that you could possibly be in a cloud of virtual machines on a big mainframe cluster at your ISP. The difference is, it’s close to you, and under your control. You own it.

There’s also no reason you couldn’t allow applications that have some parallelism to them to try to use multiple hosts for high-CPU projects. Your own PC might well be enough for most requests, but perhaps some extra CPU would be called for from time to time, as long as there is bandwidth enough to send the temporary task (or sub-tasks that don’t require sending a lot of data along with them.)

And, as noted before, since the users own the infrastructure, this allows new, innovative free applications to spring up because they don’t have to buy their infrastructure. You can be the next youtube, eating that much bandwidth, with full scalability, without spending much on bandwidth at all.

Data Deposit Box pros and cons

Recently, I wrote about thedata deposit box, an architecture where applications come to the data rather than copying your personal data to all the applications.

Let me examine some more of the pros and cons of this approach:

The biggest con is that it does make things harder for application developers. The great appeal of the Web 2.0 “cloud” approach is that you get to build, code and maintain the system yourself. No software installs, and much less portability testing (browser versions) and local support. You control the performance and how it scales. When there’s a problem, it’s in your system so you can fix it. You design it how you want, in any language you want, for any OS you want. All the data is there, there are no rules. You can update the software any time, other than the user’s browser and plugins.

The next con is the reliability of user’s data hosts. You don’t control it. If their data host is slow or down, you can’t fix that. If you want the host to serve data to their friends, it may be slow for other people. The host may not be located in the same country as the person getting data from it, making things slower.

The last con is also the primary feature of data hosting. You can’t get at all the data. You have to get permissions, and do special things to get at data. There are things you just aren’t supposed to do. It’s much easier, at least right now, to convince the user to just give you all their data with few or no restrictions, and just trust you. Working in a more secure environment is always harder, even if you’re playing by the rules.

Those are pretty big cons. Especially since the big “pro” — stopping the massive and irrevocable spread of people’s data — is fairly abstract to many users. It is the fundamental theorem of privacy that nobody cares about it until after it’s been violated.

But there’s another big pro — cheap scalability. If users are paying for their own data hosting, developers can make applications with minimal hosting costs. Today, building a large cloud app that will get a lot of users requires a serious investment in providing enough infrastructure for it to work. YouTube grew by spending money like water for bandwidth and servers, and so have many other sites. If you have VCs, it’s relatively inexpensive, but if you’re a small time garage innovator, it’s another story. In the old days, developers wrote software that ran on user’s PCs. Running the software didn’t cost the developer anything, but trying to support on a thousand different variations of the platform did.

With a data hosting architecture, we can get the best of both worlds. A more stable platform (or so we hope) that’s easy to develop for, but no duty to host most of its operations. Because there is no UI in the data hosting platform, it’s much simpler to make it portable. People joked that Java became write-once, debug everywhere for client apps but for server code it’s much closer to its original vision. The UI remains in the browser.

For applications with money to burn, we could develop a micropayment architecture so that applications could pay for your hosting expenses. Micropayments are notoroiusly hard to get adopted, but they do work in more restricted markets. Applications could send payment tokens to your host along with the application code, allowing your host to give you bandwidth and resources to run the application. It would all be consolidated in one bill to the application provider.

Alternately, we could develop a system where users allow applications to cache results from their data host for limited times. That way the application providers could pay for reliable, globally distributed resources to cache the results.

For example, say you wanted to build Flickr in a data hosting world. Users might host their photos, comments and resized versions of the photos in their data host, much of it generated by code from the data host. Data that must be aggregated, such as a search index based on tags and comments, would be kept by the photo site. However, when presenting users with a page filled with photo thumbnails, those thumbnails could be served by the owner’s data host, but this could generate unreliable results, or even missing results. To solve this, the photo site might get the right to cache the data where needed. It might cache only for users who have poor hosting. It might grant those who provide their own premium hosting with premium features since they don’t cost the site anything.

As such, well funded startups could provide well-funded quality of service, while no-funding innovators could get going relying on their users. If they became popular, funding would no doubt become available. At the same time, if more users buy high quality data hosting, it becomes possible to support applications that don’t have and never will have a “business model.” These would, in effect, be fee-paid apps rather than advertising or data harvesting funded apps, but the fees would be paid because the users would take on the costs of their own expenses.

And that’s a pretty good pro.

Data Deposit Box instead of data portability

I’ve been ranting of late about the dangers inherent in “Data Portability” which I would like to rename as BEPSI to avoid the motherhood word “portability” for something that really has a strong dark side as well as its light side.

But it’s also important to come up with an alternative. I think the best alternative may lie in what I would call a “data deposit box” (formerly “data hosting.”) It’s a layered system, with a data layer and an application layer on top. Instead of copying the data to the applications, bring the applications to the data.

A data deposit box approach has your personal data stored on a server chosen by you. That server’s duty is not to exploit your data, but rather to protect it. That’s what you’re paying for. Legally, you “own” it, either directly, or in the same sense as you have legal rights when renting an apartment — or a safety deposit box.

Your data box’s job is to perform actions on your data. Rather than giving copies of your data out to a thousand companies (the Facebook and Data Portability approach) you host the data and perform actions on it, programmed by those companies who are developing useful social applications.

As such, you don’t join a site like Facebook or LinkedIn. Rather, companies like those build applications and application containers which can run on your data. They don’t get the data, rather they write code that works with the data and runs in a protected sandbox on your data host — and then displays the results directly to you.

To take a simple example, imagine a social application wishes to send a message to all your friends who live within 100 miles of you. Using permission tokens provided by you, it is able to connect to your data host and ask it to create that subset of your friend network, and then e-mail a message to that subset. It never sees the friend network at all.  read more »

Syndicate content