For many years, I have been using RAID for my home storage. With RAID (and its cousins) everything is stored redundantly so that if any disk drive fails, you don’t lose your data, and in fact your system doesn’t even go down. This can come at a cost of anywhere from about 25% to 50% of your disk space (but disk is cheap) and it also often increases disk performance. Some years ago I wrote about how disk drives should be sold in form factors designed for easy RAID in every PC, and I still believe that.
RAID comes with a few costs. One of them is that you need to do too much sysadmin to get it working right. The nastiest cost is there are some edge cases where RAID can cause you to lose all your data where you would not have lost it (or all of it) if you had not used RAID. That’s bad — it should never make things worse.
A few years ago I switched to one of the new filesystems which put the RAID-like functionality right into the filesystem, instead of putting that into a layer underneath. I think that’s the right thing, and in fact, fear of layer violations is generally a mistake here. I am using BTRFS. Others use ZFS and a few other players. BTRFS is new and so its support for RAID-5 (Which only costs 25-33% of your space and is fast) is too young, so I use its RAID-1, where everything is just written twice onto two different disks. Unlike traditional RAID, BTRFS will do RAID-1 on more than 2 drives, and they don’t have to be all of equal size. That’s good, though I ran into some problems with the fairly common operation of increasing the size of my storage by replacing my smallest drive with a much larger one.
The long term goal of such systems should be near-trivial sysadmin. The system should handle all drives and partitions thrown at it in a “just works” way. You give it any amount of drives and it figures out the best thing to do, and adapts as you change. You should only need to tell it a few policies, such as how much need you have for reliability and speed and how much space you are willing to pay for it. The systems should never put you at more risk than you ask for, or more risk than you would have had with having just one drive or a set of non-redundant drives. That’s hard, but it is a worthwhile goal.
But I think we could do more, and we could do it in a way that we get better and better storage with less sysadmin.
Multiple drives, but not too many
I think most users will probably stick to 2 drives, and rarely go above 3. The reality is that 4 or more is for servers and heavy users, because each drive takes power and generates heat. However, adding an SSD to the mix is always a good idea but it’s not for redundancy.
The OS should understand what’s happening and reflect it in the filesystem
The truth is not all files need as much redundancy and speed. The OS can know a lot about that and identify:
- Files that are accessed frequently vs. ones not accessed much, or for a long time
- Files that are accessed by interactive applications which cause those applications to be IO bound. (ie. slowed by waiting for the disk.)
- Files that have been backed up in particular ways, and when.
Your OS should start by storing everything redundantly (RAID 1 or 5) until such time as the disk starts getting close to full. When that happens, it should of course alert you it is time to upgrade your drives or add another. But it can also offer another option which ou can explicitly ask for, namely reduce the redundancy on files which are rarely accessed, have not been used for a while, and have been backed up.
It turns out, that’s often a lot of the files on a disk. In particular, the thing that uses up most of the disk space for the ordinary user is their collection of photos and videos. Other than the few that get regular access, there is no actual need for RAID level redundancy on these images. If their own drive is lost, there is a backup where you can get them. They aren’t needed for regular system operation.
The systems already know what files belong to the OS, and can keep them redundant, though most home users are not looking for 100% uptime, they really only want 100% data safety.
To do this right, programs need to tell the OS why they are accessing files. Your photo organizer possibly scans your photo collection regularly, but this scan doesn’t make the files system crucial. My goal is not to have the users designate these things, though that is one option. Ideally the system should figure it out.
The system can also take the most important files, the ones that cause the system to block, and make sure they are both redundantly stored and found on SSD.
Backup needs to be easy and automatic. When systems boot up, they should offer to do backup for others who are nearby and semi-nearby, and then they should trade backup space. My system should offer space to others, and make use of their space for either general backup (if in the same house/company/LAN) and offsite backup (remote but with good bandwidth.) Of course, ISPs and other providers can also provide this space for money.
The key thing is this should happen with almost no setup by the user. One problem for me is that I can come back from a trip with 50gb of new photos, and they would clog my upstream for remote backup. The system should understand what files have priority, and if the backlog gets too much, request I plug in an external USB drive to offer a backup until the backlog can be cleared. Otherwise I should not have to deal with it. Of course, the backup I offer others does not need RAID redundancy. Instead, I should be queried regularly to prove I still have the backups, and if not, the person I am backing up should seek another place.
Of course all remote backup must be encrypted by me. In fact, all disks should be encrypted, but too much desire for security can cause risk of losing all your data. Systems must understand the reduced threat model of the ordinary user and make sure keys are backed up in enough places that the chances of losing them are nil, even if it increases the chance that the NSA might get the keys. This is actually pretty hard. The typical “What was your pet’s name” pseudo security questions are not strong enough, but going stronger makes it more likely there can be key loss. Proposals such as my friendscrow can work if the system knows your social network. They have the advantage that there is zero UI to escrowing the key, and a lot of work to recover it. This is the ideal model because if there is ZUI on storing it, you are sure it will be stored. Nobody minds extra work if they have lost all the normal paths to getting their key.