I've been fortunate enough

I've been fortunate enough to not yet have any HD failures (knock on wood), so I don't know this: What's the typical failure mode for the newer, inexpensive drives? Is is really bit-failures, randomly strewn across the device? Block failures? Complete failures?

Do we really need a RAID solution to address it? (Well, RAID does provide an idealised solution.) Or would a continual (incremental) backup, in the background (e.g. from a journalling file system in the form perhaps of a log-structured file system), with "instant, on-demand restore" serve the bill? The latter, of course, offers a time lag on backup, data compression opportunities, and likely a momentary delay on restore.

Because such a system would buffer file churn, i/o performance should theoretically be levelled out over a longer interval as well, making such a solution more feasible over limited-bandwidth pipes. A bit fault can be corrected fairly quickly; a complete HD failure would take some time to recover, but would be recoverable.

For that matter, are there any popular Error-Correcting Codes that could supplement this? E.g. offshoring the ECC while retaining the source data locally, and retrieving (only) the redundancy data to correct the local data when a local error is detected?

Of course, if the nature of the typical failure (and thus the nature of the demand for data restoration) demands more immediate results, then a RAID solution would be the better approach.

Reply

Please enter Brad's last name above. Case doesn't matter
Please make up a name if you do not wish to give your real one.
The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.

More information about formatting options