data deposit box

The Personal Cloud and Data Deposit Box

Last night I gave a short talk at the 3rd “Personal Clouds” meeting in San Francisco, The term “personal clouds” is a bit vague at present, but in part it describes what I had proposed in 2008 as the “data deposit box” — a means to acheive the various benefits of corporate-hosted cloud applications in computing space owned and controlled by the user. Other people are interpreting the phrase “personal clouds” to mean mechanisms for the user to host, control or monetize their own data, to control their relationships with vendors and others who will use that data, or in the simplest form, some people are using it to refer to personal resources hosted in the cloud, such as cloud disk drive services like Dropbox.

I continue to focus on the vision of providing the advantages of cloud applications closer to the user, bringing the code to the data (as was the case in the PC era) rather than bringing the data to the code (as is now the norm in cloud applications.)

Consider the many advantages of cloud applications for the developer:

  • You write and maintain your code on machines you build, configure and maintain.
    • That means none of the immense support headaches of trying to write software to run on mulitple OSs, with many versions and thousands of variations. (Instead you do have to deal with all the browsers but that’s easier.)
    • It also means you control the uptime and speed
    • Users are never running old versions of your code and facing upgrade problems
    • You can debug, monitor, log and fix all problems with access to the real data
  • You can sell the product as a service, either getting continuing revenue or advertising revenue
  • You can remove features, shut down products
  • You can control how people use the product and even what steps they may take to modify it or add plug-ins or 3rd party mods
  • You can combine data from many users to make compelling applications, particuarly in the social space
  • You can track many aspects of single and multiple user behaviour to customize services and optimize advertising, learning as you go

Some of those are disadvantages for the user of course, who has given up control. And there is one big disadvantage for the provider, namely they have to pay for all the computing resources, and that doesn’t scale — 10x users can mean paying 10x as much for computing, especially if the cloud apps run on top of a lower level cloud cluster which is sold by the minute.

But users see advantages too:  read more »

Speaking on Personal Clouds in SF, and Robocars in Phoenix

Two upcoming talks:

Tomorrow (April 4) I will give a very short talk at the meeting of the personal clouds interest group. As far as I know, I was among the first to propose the concept of the personal cloud in my essages on the Data Deposit Box back in 2007, and while my essays are not the reason for it, the idea is gaining some traction now as more and more people think about the consequences of moving everything into the corporate clouds.

My lighting talk will cover what I see as the challenges to get the public to accept a system where the computing resources are responsible to them rather than to various web sites.

On April 22, I will be at the 14th International Conference on Automated People Movers and Automated Transit speaking in the opening plenary. The APM industry is a large, multi-billion dollar one, and it’s in for a shakeup thanks to robocars, which will allow automated people moving on plain concrete, with no need for dedicated right-of-way or guideways. APMs have traditionally been very high-end projects, costing hundreds of millions of dollars per mile.

The best place to find me otherwise is at Singularity University Events. While schedules are being worked on, with luck you see me this year in Denmark, Hungary and a few other places overseas, in addition to here in Silicon Valley of course.

The "Forgetful Broker" is needed for Data Deposit Box

For some time I’ve been advocating a concept I call the Data Deposit Box as an architecture for providing social networking and personal data based applications in a distributed way that tries to find a happy medium between the old PC (your data live on your machine) and the modern cloud (your data live on 3rd party corporate machines) approach. The basic concept is to have a piece of cloud that you legally own (a data deposit box) where your data lives, and code from applications comes and runs on your box, but displays to your browser directly. This is partly about privacy, but mostly about interoperability and control.

This concept depends on the idea of publishing and subscribing to feeds from your friends (and other sources.) Your friends are updating data about themselves, and you might want to see it — ie. things like the Facebook wall, or Twitter feed. Feeds themselves would go through brokers just for the sake of efficiency, but would be encrypted so the brokers can’t actually read them.

There is a need for brokers which do see the data in certain cases, and in fact there’s a need that some types of data are never shown to your friends.

Crush

One classic example is the early social networking application the “crush” detector. In this app you get to declare a crush on a friend, but this is only revealed when both people have a mutual crush. Clearly you can’t just be sending your crush status to your friends. You need a 3rd party who gets the status of both of you, and only alerts you when the crush is mutual. (In some cases applications like this can be designed to work without the broker knowing your data through the process known as blinding (cryptography).)  read more »

The peril of the Facebook anti-privacy pattern

There’s been a well justified storm about Facebook’s recent privacy changes. The EFF has a nice post outlining the changes in privacy policies at Facebook which inspired this popular graphic showing those changes.

But the deeper question is why Facebook wants to do this. The answer, of course, is money, but in particular it’s because the market is assigning a value to revealed data. This force seems to push Facebook, and services like it, into wanting to remove privacy from their users in a steadily rising trend. Social network services often will begin with decent privacy protections, both to avoid scaring users (when gaining users is the only goal) and because they have little motivation to do otherwise. The old world of PC applications tended to have strong privacy protection (by comparison) because data stayed on your own machine. Software that exported it got called “spyware” and tools were created to rout it out.

Facebook began as a social tool for students. It even promoted that those not at a school could not see in, could not even join. When this changed (for reasons I will outline below) older members were shocked at the idea their parents and other adults would be on the system. But Facebook decided, correctly, that excluding them was not the path to being #1.  read more »

Data Hosting architectures and the safe deposit box

With Facebook seeming to declare some sort of war on privacy, it’s time to expand the concept I have been calling “Data Hosting” — encouraging users to have some personal server space where their data lives, and bringing the apps to the data rather than sending your data to the companies providing interesting apps.

I think of this as something like a “safe deposit box” that you can buy from a bank. While not as sacrosanct as your own home when it comes to privacy law, it’s pretty protected. The bank’s role is to protect the box — to let others into it without a warrant would be a major violation of the trust relationship implied by such boxes. While the company owning the servers that you rent could violate your trust, that’s far less likely than 3rd party web sites like Facebook deciding to do new things you didn’t authorize with the data you store with them. In the case of those companies, it is in fact their whole purpose to think up new things to do with your data.

Nonetheless, building something like Facebook using one’s own data hosting facilities is more difficult than the way it’s done now. That’s because you want to do things with data from your friends, and you may want to combine data from several friends to do things like search your friends.

One way to do this is to develop a “feed” of information about yourself that is relevant to friends, and to authorize friends to “subscribe” to this feed. Then, when you update something in your profile, your data host would notify all your friend’s data hosts about it. You need not notify all your friends, or tell them all the same thing — you might authorize closer friends to get more data than you give to distant ones.  read more »

Why facebook wants you to open up your profile

There is some controversy, including a critique from our team at the EFF of Facebook’s new privacy structure, and their new default and suggested policies that push people to expose more of their profile and data to “everyone.”

I understand why Facebook finds this attractive. “Everyone” means search engines like Google, and also total 3rd party apps like those that sprung up around Twitter.

On Twitter, I tried to have a “protected” profile, open only to friends, but that’s far from the norm there. And it turns out it doesn’t work particularly well. Because twitter is mostly exposed to public view, all sorts of things started appearing to treat twitter as more a micro blogging platform than a way to share short missives with friends. All of these new functions didn’t work on a protected account. With a protected account, you could not even publicly reply to people who did not follow you. Even the Facebook app that imports your tweets to Facebook doesn’t work on protected accounts, though it certainly could.

Worse, many people try to use twitter as a “backchannel” for comments about events like conferences. I think it’s dreadful as a backchannel, and conferences encourage it mostly as a form of spam: when people tweet to one another about the conference, they are also flooding the outside world with constant reminders about the conference. To use the backchannel though, you put in tags and generally this is for the whole world to see, not just your followers. People on twitter want to be seen.

Not so on Facebook and it must be starting to scare them. On Facebook, for all its privacy issues, mainly you are seen by your friends. Well, and all those annoying apps that, just to use them, need to know everything about you. You disclose a lot more to Facebook than you do to Twitter and so it’s scary to see a push to make it more public.

Being public means that search engines will find material, and that’s hugely important commercially, even to a site as successful as Facebook. Most sites in the world are disturbed to learn they get a huge fraction of their traffic from search engines. Facebook is an exception but doesn’t want to be. It wants to get all the traffic it gets now, plus more.

And then there’s the cool 3rd party stuff. Facebook of course has its platform, and that platform has serious privacy issues, but at least Facebook has some control over it, and makes the “apps” (really embedded 3rd party web sites) agree to terms. But you can’t beat the innovation that comes from having less controlled entrepreneurs doing things, and that’s what happens on twitter. Facebook doesn’t want to be left behind.

What’s disturbing about this is the idea that we will see sites starting to feel that abandoning or abusing privacy gives them a competitive edge. We used to always hope that sites would see protecting their users’ privacy as a competitive edge, but the reverse could take place, which would be a disaster.

Is there an answer? It may be to try to build applications in more complex ways that still protect privacy. Though in the end, you can’t do that if search engines are going to spider your secrets in order to do useful things with them; at least not the way search engines work today.

Data hosting could let me make Facebook faster

I’ve written about “data hosting/data deposit box” as an alternative to “cloud computing.” Cloud computing is timesharing — we run our software and hold our data on remote computers, and connect to them from terminals. It’s a swing back from personal computing, where you had your own computer, and it erases the 4th amendment by putting our data in the hands of others.

Lately, the more cloud computing applications I use, the more I realize one other benefit that data hosting could provide as an architecture. Sometimes the cloud apps I use are slow. It may be because of bandwidth to them, or it may simply be because they are overloaded. One of the advantages of cloud computing and timesharing is that it is indeed cheaper to buy a cluster mainframe and have many people share it than to have a computer for everybody, because those computers sit idle most of the time.

But when I want a desktop application to go faster, I can just buy a faster computer. And I often have. But I can’t make Facebook faster that way. Right now there’s no way I can do it. If it weren’t free, I could complain, and perhaps pay for a larger share, though that’s harder to solve with bandwidth.

In the data hosting approach, the user pays for the data host. That data host would usually be on their ISP’s network, or perhaps (with suitable virtual machine sandboxing) it might be the computer on their desk that has all those spare cycles. You would always get good bandwidth to it for the high-bandwidth user interface stuff. And you could pay to get more CPU if you need more CPU. That can still be efficient, in that you could possibly be in a cloud of virtual machines on a big mainframe cluster at your ISP. The difference is, it’s close to you, and under your control. You own it.

There’s also no reason you couldn’t allow applications that have some parallelism to them to try to use multiple hosts for high-CPU projects. Your own PC might well be enough for most requests, but perhaps some extra CPU would be called for from time to time, as long as there is bandwidth enough to send the temporary task (or sub-tasks that don’t require sending a lot of data along with them.)

And, as noted before, since the users own the infrastructure, this allows new, innovative free applications to spring up because they don’t have to buy their infrastructure. You can be the next youtube, eating that much bandwidth, with full scalability, without spending much on bandwidth at all.

Data Deposit Box pros and cons

Recently, I wrote about thedata deposit box, an architecture where applications come to the data rather than copying your personal data to all the applications.

Let me examine some more of the pros and cons of this approach:

The biggest con is that it does make things harder for application developers. The great appeal of the Web 2.0 “cloud” approach is that you get to build, code and maintain the system yourself. No software installs, and much less portability testing (browser versions) and local support. You control the performance and how it scales. When there’s a problem, it’s in your system so you can fix it. You design it how you want, in any language you want, for any OS you want. All the data is there, there are no rules. You can update the software any time, other than the user’s browser and plugins.

The next con is the reliability of user’s data hosts. You don’t control it. If their data host is slow or down, you can’t fix that. If you want the host to serve data to their friends, it may be slow for other people. The host may not be located in the same country as the person getting data from it, making things slower.

The last con is also the primary feature of data hosting. You can’t get at all the data. You have to get permissions, and do special things to get at data. There are things you just aren’t supposed to do. It’s much easier, at least right now, to convince the user to just give you all their data with few or no restrictions, and just trust you. Working in a more secure environment is always harder, even if you’re playing by the rules.

Those are pretty big cons. Especially since the big “pro” — stopping the massive and irrevocable spread of people’s data — is fairly abstract to many users. It is the fundamental theorem of privacy that nobody cares about it until after it’s been violated.

But there’s another big pro — cheap scalability. If users are paying for their own data hosting, developers can make applications with minimal hosting costs. Today, building a large cloud app that will get a lot of users requires a serious investment in providing enough infrastructure for it to work. YouTube grew by spending money like water for bandwidth and servers, and so have many other sites. If you have VCs, it’s relatively inexpensive, but if you’re a small time garage innovator, it’s another story. In the old days, developers wrote software that ran on user’s PCs. Running the software didn’t cost the developer anything, but trying to support on a thousand different variations of the platform did.

With a data hosting architecture, we can get the best of both worlds. A more stable platform (or so we hope) that’s easy to develop for, but no duty to host most of its operations. Because there is no UI in the data hosting platform, it’s much simpler to make it portable. People joked that Java became write-once, debug everywhere for client apps but for server code it’s much closer to its original vision. The UI remains in the browser.

For applications with money to burn, we could develop a micropayment architecture so that applications could pay for your hosting expenses. Micropayments are notoroiusly hard to get adopted, but they do work in more restricted markets. Applications could send payment tokens to your host along with the application code, allowing your host to give you bandwidth and resources to run the application. It would all be consolidated in one bill to the application provider.

Alternately, we could develop a system where users allow applications to cache results from their data host for limited times. That way the application providers could pay for reliable, globally distributed resources to cache the results.

For example, say you wanted to build Flickr in a data hosting world. Users might host their photos, comments and resized versions of the photos in their data host, much of it generated by code from the data host. Data that must be aggregated, such as a search index based on tags and comments, would be kept by the photo site. However, when presenting users with a page filled with photo thumbnails, those thumbnails could be served by the owner’s data host, but this could generate unreliable results, or even missing results. To solve this, the photo site might get the right to cache the data where needed. It might cache only for users who have poor hosting. It might grant those who provide their own premium hosting with premium features since they don’t cost the site anything.

As such, well funded startups could provide well-funded quality of service, while no-funding innovators could get going relying on their users. If they became popular, funding would no doubt become available. At the same time, if more users buy high quality data hosting, it becomes possible to support applications that don’t have and never will have a “business model.” These would, in effect, be fee-paid apps rather than advertising or data harvesting funded apps, but the fees would be paid because the users would take on the costs of their own expenses.

And that’s a pretty good pro.

Data Deposit Box instead of data portability

I’ve been ranting of late about the dangers inherent in “Data Portability” which I would like to rename as BEPSI to avoid the motherhood word “portability” for something that really has a strong dark side as well as its light side.

But it’s also important to come up with an alternative. I think the best alternative may lie in what I would call a “data deposit box” (formerly “data hosting.”) It’s a layered system, with a data layer and an application layer on top. Instead of copying the data to the applications, bring the applications to the data.

A data deposit box approach has your personal data stored on a server chosen by you. That server’s duty is not to exploit your data, but rather to protect it. That’s what you’re paying for. Legally, you “own” it, either directly, or in the same sense as you have legal rights when renting an apartment — or a safety deposit box.

Your data box’s job is to perform actions on your data. Rather than giving copies of your data out to a thousand companies (the Facebook and Data Portability approach) you host the data and perform actions on it, programmed by those companies who are developing useful social applications.

As such, you don’t join a site like Facebook or LinkedIn. Rather, companies like those build applications and application containers which can run on your data. They don’t get the data, rather they write code that works with the data and runs in a protected sandbox on your data host — and then displays the results directly to you.

To take a simple example, imagine a social application wishes to send a message to all your friends who live within 100 miles of you. Using permission tokens provided by you, it is able to connect to your data host and ask it to create that subset of your friend network, and then e-mail a message to that subset. It never sees the friend network at all.  read more »

Syndicate content