Hard disks fail. If you prepared properly, you have a backup, or you swap out disks when they first start reporting problems. If you prepare really well you have offsite backup (which is getting easier and easier to do over the internet.)
One way to protect yourself from disk failures is RAID, especially RAID-5. With RAID, several disks act together as one. The simplest protecting RAID, RAID-1, just has 2 disks which work in parallel, known as mirroring. Everything you write is copied to both. If one fails, you still have the other, with all your data. It’s good, but twice as expensive.
RAID-5 is cleverer. It uses 3 or more disks, and uses error correction techniques so that you can store, for example, 2 disks worth of data on 3 disks. So it’s only 50% more expensive. RAID-5 can be done with many more disks — for example with 5 disks you get 4 disks worth of data, and it’s only 25% more expensive. However, having 5 disks is beyond most systems and has its own secret risk — if 2 of the 5 disks fail at once — and this does happen — you lose all 4 disks worth of data, not just 2 disks worth. (RAID-6 for really large arrays of disks, survives 2 failures but not 3.)
Now most people who put in RAID do it for more than data protection. After all, good sysadmins are doing regular backups. They do it because with RAID, the computer doesn’t even stop when a disk fails. You connect up a new disk live to the computer (which you can do with some systems) and it is recreated from the working disks, and you never miss a beat. This is pretty important with a major server.
But RAID has value to those who are not in the 99.99% uptime community. Those who are not good at doing manual backups, but who want to be protected from the inevitable disk failures. Today it is hard to set up, or expensive, or both. There are some external boxes like the “readynas” that make it reasonably easy for external disks, but they don’t have the bandwidth to be your full time disks.
RAID-5 on old IDE systems was hard, they usually could truly talk to only 2 disks at a time. The new SATA bus is much better, as many motherboards have 4 connectors, though soon one will be required by blu-ray drives.
What I propose is readymade RAID modules for PCs, with slightly different goals than found in the commercial RAID market. In particular, it is possible to build a module that fits in a 5.25” DVD drive bay which holds 2, and possibly 3 drive mechanisms from more standard 3.5” drives. It is necessary to use standard mechanisms to make it cheap. Holding 3 is difficult as we would need to use slightly slimmer drives, which are not so readily available, at least for one of the drives. An alternate and useful product would have 2 3.5” drives (mounted sideways) and a slimline DVD drive on top. Such slimline DVD drives are commonly made for laptops.
Inside the module would be one piece board of drive electronics controlling the 2 drives, and possibly performing RAID duties. That’s cheaper, but considered a no-no in the raid world because the failure of the drive electronics would cause the failure of the RAID. Real RAID tries to avoid any single points of failure. However, dead drive electronics rarely cause loss of data, they just cause loss of ability to get at the data. If you can buy and swap in replacement electronics to get back up, it’s reasonably good.
The basic unit would offer RAID-1 (or raid-0 which is a double fast disk with no protection from failure, if desired.) However, it would also have two SATA connectors on it, which could then connect to one or two regular disks in other bays, to do 3 or 4 disk RAID-5. Or they could connect to another dual-drive array similar to the first one, for a 4-disk RAID-5 in two CD/DVD bays. (Admittedly many cases don’t have two CD/DVD bays any more so the former may become a more common choice.)
To be more reliable, the two drive mechanisms in any 2-pack would come from two different drive manufacturers. This is important because sometimes drives from the same batch from the same maker will have the same flaw, and could fail together. The extra drives would of course come from different sources.
Unlike most RAID designs, it would not necessarily be easy to replace one drive in the 2-pack if one failed. It might make sense to require users to replace the entire 2-pack, since this would make it cheaper to manufacture. Why?
- Usually by the time a drive fails in a RAID, drive technology has improved, and current drives are bigger/cheaper/faster/greener.
- Drives are cheap. If one has failed after a few years of service, might as well replace the stack.
- Most users will not be up to doing such a replacement, though service centers could.
The system would be designed to facilitate the double replacement by connecting the old pack (with one bad drive) and any other drives to the new 2-pack.
Ideally, the system would also allow you to connect 2 smaller drives to the 3-pack, for a special RAID-5 that consists of 2 large disks and 2-smaller disks striped to be the 3rd component. No matter what disks you have, no matter what their sizes, it should figure out the best thing to do and do it, and of course handle drives going bad and being replaced. It must use the OS to warn you and guide you through the replacement.
For many users, there would be an “upgrade” process: Take a standard system running on a single drive, then insert the 2-pack and plug the old drive into the 2-pack. It would then build a RAID-5 from the 3 disks, plus create a second partitions that are normal or RAID-1 if the disks are different in size.
Disks are cheap enough and data valuable enough that there’s really no excuse for building all our systems on vulnerable single drives. This should be standard. Of course it can be done with regular drives and software RAID (which is how I’ve done it) but this is beyond most users.
In fact, over time, the drive industry should move to a new form factor that makes a cheap and simple 3-pack drive which fits in existing bays, and make this be the standard sale. They would be tempted to build the 3-packs all from their own parts. As noted this runs the risk of a batch being bad, though it’s still a lot better than what we have today.
Some further notes:
- People must not forget that regular backups and offsite backups are important. We still have more lost file incidents due to software bugs and accidental deletes than we do to hard drive failures, though the latter cause many more losses per incident.
- Yes, this is not as green, since 3 drives consumes more power than one. Efforts to make the 2 or 3-packs more energy efficient should be part of this.
- As an alternate, the 2-pack might work with software RAID, providing hardware to help the software RAID. The extra drive or drives outside the 2-pack would still talk directly to the system via their original connection. Right now 3 gigabit SATA is more than up to this challenge.