RAID, backyard backup and the future of backup

Topic: 

Had my second RAID failure last week. In the end, things were OK but the reality is that many RAID implementations are much more fragile than they should be. Write failures on a drive caused the system to hang. Hard reset caused the RAID to be marked dirty, which mean it would not boot until falsely marked clean (and a few other hoops,) leaving it with some minor filesystem damage that was reparable. Still, I believe that a proper RAID-like system should have as its maxim that the user is never worse off because they built a RAID than if they had not done so. This is not true today, both due to fragility of systems, and the issues I have outlined before with deliberately replacing a disk in a RAID, where it does not make use of the still-good but aging old disk when rebuilding the replacement.

A few years ago I outlined a plan for disks to come as two-packs for easy, automatic RAID because disks are so cheap that everybody should be doing it. The two-pack would have two SATA ports on it, but if you only plugged in one, it would look like a single disk, and be a RAID-1 inside. If you gave it a special command, it could look like other things, including a RAID-0, or two drives, or a JBOD concatenation. If you plugged into the second port it would look like two disks, with the RAID done elsewhere.

I still want this, but RAID is not enough. It doesn't save you from file deletion, or destruction of the entire system. The obvious future trend is network backup, which is both backup and offsite. The continuing issue with network backup is that some people (most notably photographers and videographers) generate huge amounts of data. I can come back from a weekend with 16gb of new photos, and that's a long slog over DSL with limited upstream for network backup. To work well, network backup also needs to understand all databases, as a common database file might be gigabytes and change every time there is a minor update to a database record. (Some block-level incrementalism can work here if the database is not directly understood.)

Network backup is also something that should be automatic. There are already peer-to-peer network backups, that make use of the disks of friends or strangers (encrypted of course) but it would be nice if this could "just happen" when any freshly installed computer unless you turn it off. The user must keep the key stored somewhere safe, which is not zero-UI, though if all they want is to handle file-deletion and rollback they can get away without it.

Another option that might be interesting would be the outdoor NAS. Many people now like to use NAS boxes over gigabit networks. This is not as fast as SATA with a flash drive, or RAID, or even modern spinning disk, but it's fast enough for many applications.

An interesting approach would be a NAS designed to be placed outdoors, away from the house, such as in the back corner of a yard, so that it would survive a fire or earthquake. The box would be waterproof and modestly fireproof, but ideally it is located somewhere a fire is unlikely to reach. It could either be powered by power-over-ethernet or could have its own power and even use WIFI (in which case it is only suitable for backup, not as a live NAS.)

This semi-offsite backup would be fast and cheap (network storage tends to be much more expensive than local drives.) It would be encrypted, of course, so that nobody can steal your data. Encryption would be done in the clients, not the NAS, so even somebody who taps the outside wire would get nothing.

This semi-offsite backup could be used in combination with network backup. Large files and new files would be immediately sent to the backyard backup. The most important files could then go to network backup, or all of them, just much more slowly.

A backyard backup could also be shared by neighbours, especially on wifi, which might make it quite cost effective. Due to encryption, nobody could access their neighbour's data.

If neighbours are going to cooperate, this can also be built by just sharing servers or NAS boxes in 2 or more houses. This provides decent protection and avoids having to be outside, but there is the risk that some fires burn down multiple houses depending on the configuration.

A backyard backup would be so fast that many would reverse what I said above, and have no need for RAID. Files would be mirrored to the backyard backup within seconds or minutes. RAID would only be needed for those who need to have systems that won't even burp in a disk failure (which is a rare need in the home) or which must not lose even a few minutes of data.

Comments

It's decades old by now, but does anything even come close to VMS's host-based volume shadowing?

I feel that it is important to keep RAID and backups as two wholly separate concepts/technologies. RAID is meant to be used in instances where you need high availability of your data. So that your system can tolerate individual disk failures, but still serve data. Backups are meant to help recover from catastrophic failure/loss of your system. While you probably did not intend to, I feel your article is clouding the concepts a bit, possibly perpetuating the myth that RAID alone is a backup solution. I have seen too many people drop good money to buy a single RAID appliance to be used as their only storage point for photos and other personal media files.

They are two concepts but there is some overlap. Drive failure is a major cause of data loss for ordinary users, and RAID is a protection against drive failure. I would even venture that drive failure is the largest cause of data loss for most people. (Not the most frequent but it loses so much.)

There are other forms of data loss -- accidental deletion, system corruption, computer attack, destruction of the whole system from fire etc. RAID does not protect against those, so you must also have backups.

But the other reality is that people don't do their backups regularly -- it's hoped that network backup or other such techniques can fix this, but today it's true. RAID is always working. You survive drive failure with no loss of data, when it works right.

There is one negative psychology here -- when you have RAID, you may be even less reliable about doing your offsite backups. But still, drives are so cheap today that most people should have it -- or a solution like the backyard backup I describe. The backyard backup would also provide almost as good protection from drive failure. If done as a differential backup which can restore to any point in recent time, it has the other advantages. Its main flaw is it isn't totally offsite -- a truly large fire, tornado or hurricane could destroy your data. Though it could be built to withstand these things (as well as flood) without a lot of difficulty, depending on the size of the yard.

Add new comment