This article originally appeared in TidBITS on 2007-03-19 at 12:44 p.m.
The permanent URL for this article is:
Include images: Off

Hard Drive Failures and Contributory Storage

by Adam C. Engst

At last month's 5th USENIX Conference on File and Storage Technologies [1], two academic papers - one from Bianca Schroeder and Garth A. Gibson [2] of Carnegie Mellon University (CMU) and the other by Eduardo Pinheiro, Wolf-Dietrich Weber, and Luiz André Barroso [3] of Google - looked at the reliability of hard drives in large-scale installations. Among other conclusions, the CMU team found that real-world replacement rates were much higher than would have been expected from vendor-provided mean time to failure (MTTF) estimates, and Google's researchers concluded that there was little correlation between failure and either elevated temperature or activity levels. The papers weren't written for the lay audience and aren't easy reading, but they are worth a look if you're interested in when and why hard disk mechanisms fail.

Also interesting is the paper by James Cipar, Mark D. Corner, and Emery D. Berger [4] of the University of Massachusetts Amherst on the Transparent File System (TFS). The goal of TFS is to create a contributory storage system in which multiple people could contribute unused disk space to a shared pool, much as the SETI@home project [5] enables users to contribute unused CPU cycles to the shared task of analyzing radio telescope data. (And yes, there is still an active TidBITS team for SETI@home [6].) Apparently, TFS can contribute all of the unused space on a disk while imposing only a negligible performance drag on the contributor. Prototype source code [7] is available; I'll be curious to see if anyone cleans it up and ports it to MacFUSE [8] (see "MacFUSE Explodes Options for Mac File Systems [9]," 2007-01-29).