Towards a Zero User Interface backup system

I've spoken before about ZUI (Zero User Interface) and how often it's the right interface.

One important system that often has too complex a UI is backup. Because of that, backups often don't get done. In particular offsite backups, which are the only way to deal with fire and similar catastrophe.

Here's a rough design for a ZUI offsite backup. The only UI at a basic level is just installing and enabling it -- and choosing a good password (that's not quite zero UI but it's pretty limited.)

Once enabled, the backup system will query a central server to start looking for backup buddies. It will be particularly interested in buddies on your same LAN (though it will not consider them offsite.) It will also look for buddies on the same ISP or otherwise close by, network-topology wise. For potential buddies, it will introduce the two of you and let you do bandwidth tests to measure your bandwidth.

At night, the tool would wait for your machine and network to go quiet, and likewise the buddy's machines. It would then do incremental backups over the network. These would be encrypted with secure keys. Those secure keys would in turn be stored on your own machine (in the clear) and on a central server (encrypted by your password.)

The backup would be clever. It would identify files on your system which are common around the network -- ie. files of the OS and installed software packages -- and know it doesn't have to back them up directly, it just has to record their presence and the fact that they exist in many places. It only has to transfer your own created files.

Your backups are sent to two or more different buddies each, compressed. Regular checks are done to see if the buddy is still around. If a buddy leaves the net, it quickly will find other buddies to store data on. Alas, some files, like video, images and music are already compressed, so this means twice as much storage is needed for backup as the files took -- though only for your own generated files. So you do have to have a very big disk 3 times bigger than you need, because you must store data for the buddies just as they are storing for you. But disk is getting very cheap.

(Another alternative is RAID-5 style. In RAID-5 style, you distribute each file to 3 or more buddies, except in the RAID-5 parity system, so that any one buddy can vanish and you can still recover the file. This means you may be able to get away with much less excess disk space. There are also redundant storage algorithms that let you tolerate the loss of 2 or even 3 of a larger pool of storers, at a much more modest cost than using double the space.)

All this is, as noted, automatic. You don't have to do anything to make it happen, and if it's good at spotting quiet times on the system and network, you don't even notice it's happening, except a lot more of your disk is used up storing data for others.

It is the automated nature that is so important. There have been other proposals along these lines, such as MNET and some commercial network backup apps, but never an app you just install, do quick setup and then forget about until you need to restore a file. Only such an app will truly get used and work for the user.

Restore of individual files (if your system is still alive) is easy. You have the keys on file, and can pull your file from the buddies and decrypt it with the keys.

Loss of a local disk is more work, but if you have multiple computers in the household, the keys could be stored on other computers on the same LAN (alas this does require UI to approve this) and then you can go to another computer to get the keys to rebuild the lost disk. Indeed, using local computers as buddies is a good idea due to speed, but they don't provide offsite backup. It would make sense for the system, at the cost of more disk space, to do both same-LAN backup and offsite. Same-LAN for hardware failures, offsite for building-burns-down failures.

In the event of a building-burns-down failure, you would have to go to the central server, and decrypt your keys with that password. Then you can get your keys and find your buddies and restore your files. Restore would not be ZUI, because we need no motiviation to do restore. It is doing regular backups we lack motivation for.

Of course, many people have huge files on disk. This is particularly true if you do things like record video with MythTV or make giant photographs, as I do. This may be too large for backup over the internet.

In this case, the right thing to do is to backup the smaller files first, and have some UI. This UI would warn the user about this, and suggest options. One option is to not back up things like recorded video. Another is to rely only on local backup if it's available. Finally, the system should offer a manual backup of the large files, where you connect a removable disk (USB disk for example) and transfer the largest files to it. It is up to you to take that offsite on a regular basis if you can.

However, while this has a UI and physical tasks to do, if you don't do it it's not the end of the world. Indeed, your large files may get backed up, slowly, if there's enough bandwidth.

Comments

Distributed backups like DIBS have been around for a while. I do offsite backup today over the net because I have servers in multiple locations, though it could be easier than the way I do it.

My goal here is getting as close to ZUI as you can. Not only does this mean it's easy to install, and that you actually use it and don't forget to do things needed for your backup. It also means lots of other people do too, so it's easy to find partners who have good bandwidth to you.

I only meant to point out a related open source solution that could be useful when implementing what you want. It might make sense to share your ideas with the DIBS folks, and find out where they are going with their project.

Your ZUI philosophy is a step in the right direction, but it's easier said than done. Have you considered spending some time adding such a ZUI to exiting open source backup tools (e.g. DIBS)? I'm sure the developers could use your help.

-Emin

We started tackling this very problem years ago. Very similar goals:

  • no backup schedules
  • automatic file selection
  • ZUI - Username, password, push start.
  • top notch encryption
  • delta block file transfer
  • unlimited versioning
  • automatically sensing internet usage / user presence
  • running in idle background priority
  • TCP / Nat traversal so users dont' have to config their firewalls (ZUI means routers too!)
  • file transfer prioritization
  • automatic archive verification to insure your backup will restore
  • Web application to support you when you're not at your machine
  • Many more features.. you get the idea.

One big difference - we do all your files to as many destinations as you choose. Our research showed people liked knowing where their data was. Also, for practical restore reasons, it's best to know which machine has all your data so you're backup and running quickly.

One of the big challenges was teaching ZUI users how to backkup to another machine. Obviously you can't discover your friend Bob down the street automatically. How do you "buddy up"?

We decided to go with the "friends" list where you invite people, and then allow them to use a specific machine. Of course, this is only required if it's not another one of your machines you're backing up to.

One of the big lessons learned about this ZUI app is the challenge in communicating it's power. It looks so SIMPLE. How do you charge for something that's more advanced yet looks less sophisticated?

Imagine two cards, one has tons of really sex buttons, dials on the dashboard, etc. It looks hi-tech. It looks complicated, wow, lots of value there.

Now there is our car, no buttons anywhere. The windows go up and down automatically, the AC? Automatic. Steering wheel? It drives itself.

We hid a few buttons and controls in a dusty corner, but you get my point. People often decide based on apparent sophistication.

Just because I don't display 20.392Kb/sec on the screen, doesn't mean I don't know that and am using that info to better your user experience.

There is a screenshot on our site if you want to see it in action.

~Matthew

There is no need to convince people to buy it. The central server needs some funding but only minimal.

However, one workable way to sell it is to make the product free, and then charge a fee -- very clearly explained up front -- when a restore is needed.

You can also pre-sell the restore fee, at a discount to paying it at restore time. Ie. "You can buy the product now including the restore tools and service for $30 or pay $60 when you need to restore."

However, I was expecting this philosophy might eventually be adopted by an open source tool. For one thing, you want the comfort of knowing that you can see the source code of a tool that is going to store you most precious data on other people's machines (friends or otherwise.)

IBM Tivoli backup ("ITSM") already exists in much the form you describe. The trick is, I suppose, to introduce a similar product for cheap/free to encourage home users to do it.

Corporate users know how much backups are worth and are willing to pay for the service.

Home users may feel that their data isn't *that* important, or valuable anyway, and will be unwilling to pay money, even to recover it, if the price is too high.

It'd probably be most economical to get an upfront fee, though - otherwise such a service would go through horrendous cash burn up front.

I don't consider it ZUI if it's only easy to use because you have a paid sysadmin who installs and maintains it for you -- or you end up paying a service provider to manage it. Of course we can solve our system management problems that way. (Or should be able to.)

The key to my proposal is that is has effectively no cost because it uses existing resources at night. It does cost disk space of course but that's traded P2P. And it has close to zero administrative load even for the ordinary user without a paid sysadmin.

I am aware of many automated backup systems which require setup or paying money.

While there are reasons one might not want to go this far, it is conceivable an OS distribution could come with a backup system like this already turned on, so it is giving you offsite backup without you doing a single thing or paying a dime, unless you turn it off. Backup is one of those things that should "just happen."

Right now you wouldn't turn it on by default because people are not quite ready for the disk space cost. In addition, if they don't choose a secure password for their account (to be invisible you would use the password the OS already demands they provide) they would be taking a greater risk with their confidential data, which again should not happen by surprise. Finally, they might not have flat rate bandwidth. However, we're getting closer to a world where points 1 and 3 become unimportant, and the security issue would be the only question.

Add new comment