Shootout at the Disk Repair Corral
Some things are inevitable. Death, taxes, and disk crashes. One day you will try to open an important file, only to receive a dire error message. Or perhaps you’ll discover that an entire folder has vanished. Worse yet, maybe your Mac won’t even boot, thanks to some sort of disk corruption.
Fortunately, you have a full backup of all your data, so you just restore the missing data from your backup, and you’re back in business. What’s that you say? The last time you backed up was during the Reagan administration?
If an ounce of prevention is worth a pound of cure, then surely the most effective disk repair program is actually a reliable backup utility. My favorite is Dantz’s Retrospect. But whatever backup program you choose, you must use it regularly, so you have a current backup when your hard disk is called to that great clean room in the sky. (See the TidBITS article series, "Have You Backed Up Today?" for more details on setting up a good backup strategy.)
Yet just as so many of us would prefer to lose weight by taking a magic pill rather than through diet and exercise, we’d rather fix a corrupted disk with a disk utility, rather than restoring from a backup, even when a recent backup is available. It can take many hours to do a full restore from a backup, whereas a good disk utility can often fix minor disk errors in minutes.
Some of the Macintosh world’s favorite disk repair programs have recent upgrades, and here I’ll compare the Norton Disk Doctor tool in Symantec’s Norton Utilities 8.0 ($100), Alsoft’s DiskWarrior 3.0 ($80), Micromat’s Drive 10 1.1.4 ($70), SubRosaSoft’s DiskGuardian 2.2 ($70), and Apple’s Disk Utility (free). Although these programs contain a wide variety of disparate features, I concentrate on their disk repair functions in Mac OS X. I chose not to include Prosoft Engineering’s Data Rescue X, because it recovers files onto another disk and does not repair the damaged disk itself.
The user interface and ancillary features of a disk repair program are secondary, because in the event of disaster you care about only one thing: will it get my data back? So let’s concentrate on the heart of the issue: what are the most common disk errors you may experience, and which disk repair programs can save your bacon when you’re unfortunate enough to suffer disk corruption?
In my experience, most people run into three general categories of disk problems: hardware failure, bad sectors, and damaged directories. After a brief examination of how you get started with these programs, given that you can’t repair an active startup disk in Mac OS X, I look at the worst type of problem – hardware failures, after which I examine the soft errors and compare the performance of the disk repair programs.
Booty Call — One disadvantage of Mac OS X is that a disk repair program can’t safely check the startup disk. Despite this, and the fact that Apple’s official line is that checking startup disks is not supported, Norton Disk Doctor and DiskGuardian both allow checking of the startup disk, although they warn against doing so. I consider messing about with startup disks under Mac OS X dangerous, and I advise you not to do it.
Fortunately, there is a simple solution. Restart in "single user mode" by holding down Command-S while the Mac is starting up. In single user mode, you’re dropped into a command line version of Mac OS X, without windows or a mouse pointer. Type "fsck -y" to check (and repair, if necessary) the boot disk, after which you restart the Mac by typing "reboot" (sans quotes for both commands). Disk Utility and fsck rely on the same engine, so running fsck in single user mode is exactly the same as running Disk Utility.
If you’re uncomfortable with the command line for even two commands, you have an alternative. If you boot your Mac using the Mac OS X Installer CD, you can run Disk Utility from there. At the first screen in the Installer, choose Open Disk Utility from the Installer menu.
What about the other disk repair programs? All except DiskGuardian (for now, but a new version is expected soon) come on bootable CDs. Insert the CD, turn on your Mac, and hold down the C key to force the computer to boot from the CD. Bootable CDs are essential in the event that your hard disk is so badly damaged that your computer won’t even start up.
It’s Dead, Jim — Let’s look at what can go wrong now. Hardware failure can result from the electronics on the drive’s controller board burning out, or the heads or the arm developing mechanical problems. Sometimes a problem with the lubricating grease prevents the disk from spinning or the read arm from moving, causing a problem known as "sticktion."
These problems are caused by dropping the disk, by defective components, by static discharge, or even by sheer age. Usually the disk won’t even show up on the Desktop. As far as the disk utilities go, Drive 10 can detect hardware errors with a "Unit Ready" test, which is just what it sounds like. Drive 10 asks the drive "Are you ready?" and the drive replies "No." This test is mostly helpful for confirming hardware errors you probably already suspect.
Hopefully there’s no vital data locked in the dead drive. No software program can repair a disk with hardware problems. If you desperately need to retrieve the data, your only option is a data recovery company such as DriveSavers, who I’ve found to be professional, competent, and expensive. (Also see Jeff Carlson’s report in "DriveSavers to the Rescue" in TidBITS-495.)
Extended Warranties — If your drive has just suffered a hardware failure, you’re probably checking the warranty and hunting for your receipt. Don’t despair, even if your computer or external hard drive is out of warranty from the manufacturer. Many hard drives, even those installed in computers, are also covered by longer warranties provided by the OEM (original equipment manufacturer), the company that actually made the drive. If you’re willing and able to open a case and remove the drive mechanism, you can take advantage of this warranty.
Here’s the trick. Although you may have bought the computer from Apple, or the drive from VST, those companies don’t make drive mechanisms. Instead, your drive was probably made by IBM, Seagate, Maxtor, HP, Western Digital, or another hard drive manufacturer. These companies often offer an independent warranty on their drives, and it’s often longer than the warranty on your Mac or external hard drive. Drive mechanism warranties may be two years, and some run as long as five years. Better yet, if you’ve lost the receipt, the company can sometimes look up a drive’s serial number to verify that it’s still under warranty.
Both HP and IBM have replaced dead drives for me, without a receipt, simply by checking my serial numbers. Even better, IBM didn’t make 14 GB drives anymore, so they replaced my dead one with a 20 GB unit.
SMART Stuff — Some modern hard drives have a feature called SMART, which stands for Self-Monitoring, Analysis, and Reporting Technology. Originally developed by IBM, SMART-compliant drives constantly perform diagnostics to monitor variables like drive temperature, spin up time, and how well the heads stay on track. By noting when these physical parameters slip out of spec, SMART can predict some types of drive failure before they happen, giving you time to back up your data and buy a new drive.
Unfortunately, most hard drives have no way to tell you about an impending disaster; you need a program to query the drive’s SMART statistics. Both DiskWarrior and DiskGuardian can check the drive at regular intervals and alert you if they find trouble.
Using SMART works well if your disk has a factory defect that’s slowly getting worse, because it gives you time to move your data to a new disk. But SMART doesn’t help you deal with the occasional bad sector or corrupted directory, much less catastrophic hardware failures. Since I don’t have a drive with impending hardware failure, I was unable to test the SMART capabilities of DiskWarrior and DiskGuardian.
Unsavory Sectors — Now it’s time to move from pure hardware problems to problems that could be either physical (hardware) or logical (software). Disks are broken up into sectors. Each sector usually holds 512 bytes of data. There are two types of problems that lead to bad sectors: hard errors and soft errors.
Hard errors are caused by physical damage to the disk surface. Dropping the disk and manufacturing defects are the most common causes of hard errors. Although there’s no way to repair hard errors, they can be "fixed" by "sparing" the bad sector. Disks maintain a small number of spare sectors in reserve; when a hard error occurs, the drive controller maps out the bad sector and uses one of the spare sectors in its place.
How are these bad sectors spared? SCSI hard disks provide a SCSI command – "reassign" – to spare a bad sector. A low-level format also spares any bad sectors. On pre-SCSI disks, a low-level format was the only way to fix bad sectors.
Many modern disks, including many internal ATA, FireWire, and USB drives, automatically spare sectors with hard errors the next time the sector is written. That’s helpful, but if data is stored on that sector, programs may be unable to read it successfully, causing problems and making it difficult to spare. Erasing the disk with Apple’s Disk Utility spares any bad sectors if you select the "Zero all data" option.
What about soft errors? In addition to the 512 bytes of data stored in each sector, a few additional bytes hold an error correction code (ECC). When the sector is written, the drive’s controller computes and records the ECC. When the file system later reads that sector, it checks the ECC to make sure the data hasn’t been corrupted. If the ECC doesn’t match the data, it’s called a soft error. The disk surface is fine, but the data on that sector has become scrambled.
Soft errors can be caused if the disk is jarred while it’s writing or if power is lost while writing, either of which can leave a sector half written. Large magnets (such as can be found in electric motors) next to hard disks also tend to have bad effects on the data. As with hard errors, most modern disks repair soft errors automatically the next time the sector is written.
Bad Sector Detector – Norton Disk Doctor, DiskGuardian and Drive 10 (but not DiskWarrior or Disk Utility) claim to detect bad sectors using a test called either a "defective media check" or a "surface scan." Using a proprietary tool that creates soft errors on disks, I tested each program.
Although Norton Disk Doctor claims to be able to find and repair bad sectors, its defective media check didn’t detect the bad sectors on my test disk, erroneously giving it gave a clean bill of health.
DiskGuardian detected the bad sectors, although it took several hours to run a full check. Unfortunately, it didn’t tell me which files used the bad sectors, so I had no way of finding out which files were damaged and would thus need to be restored from backup. DiskGuardian lacks the capability to repair bad sectors.
Like DiskGuardian, Drive 10 detected the bad sectors, but didn’t identify which files were damaged. Confusingly, Drive 10’s report describing the damage claimed it could fix the bad sectors, but I couldn’t find a command to fix them. Micromat tech support confirmed the report was wrong; Drive 10 can’t fix the bad sectors it finds. It’s too bad, since Drive 10 could fix the bad sectors merely by writing zeros to them.
Although Disk Utility cannot scan for bad sectors, it can fix bad sectors on modern disks if you erase the disk with the "Zero all data" option selected.
I must rate all these products unacceptable in dealing with bad sectors. Even though two could detect bad sectors, none of them could tell you which files contain bad sectors, making it impossible to learn which files you should restore from your backup. Only Disk Utility successfully fixed the bad sectors, but at the price of erasing the entire disk.
Ripping the Yellow Pages — We’ve now looked at pure hardware failures, and bad sectors, which can be either hard errors or soft errors, and so far, our disk repair utilities don’t help much at all. Now it’s time to move on to problems that exist entirely in software, the most common type of which are errors in the directory, which tracks the files and folders on the disk. In the case of directory errors, there is nothing wrong with the drive mechanism or the disk surface; instead, the directory information that’s necessary to locate your data on the disk has simply become scrambled. Often your data is intact, if it could just be located.
As an aside, people with important data sometimes use mirrored disks or RAID arrays, which faithfully duplicate each byte on the main disk to a backup disk. If the main disk suffers a hardware failure or develops a bad sector, the backup disk can save the day. However, it’s worth noting that this strategy provides absolutely no protection against directory damage. That’s because the RAID faithfully records all data to the backup disk, whether or not that data is good, which results in both the main disk and the backup disk containing corrupt data. I may sound like your mother telling you to eat your vegetables, but the best protection really is regular backups.
The most common cause of directory damage is crashing. If the computer crashes while a file is being created or saved, causing only part of the change to be written to disk, the directory will contain inconsistent information. Mac OS X crashes far less often than Mac OS 9, but directory-corrupting crashes can and do still occur. Both Mac OS 9 and Mac OS X automatically check and repair the startup disk after a crash, which reduces the incidence of disk damage dramatically.
A new feature in Mac OS X 10.3 Panther that should reduce directory errors even more is the journalled file system. You can enable it in Disk Utility, and it’s usually turned on for disks onto which you install Panther. Here’s how journalling works. Before the file system changes the directory, it leaves a note on the disk saying, "I’m going to make this change in the directory." Then the file system makes the change, and once it finishes, it clears the note. If the file system ever sees an incomplete change note on the disk during startup, it knows something bad happened and "rolls back" the directory to its previous state. You will lose your last change, but the directory won’t suffer any damage.
Other causes of damaged directories include buggy programs that write bad data to disk, buggy programs that overwrite cached data waiting to be written to disk, and even bugs in the file system itself. The first two are much less likely in Mac OS X than in Mac OS 9 because of its file privileges and memory protection, respectively. Bugs in the file system are extremely rare but have occurred at times in the past.
It’s worth noting that directory damage is not always readily apparent. A damaged disk may appear to operate perfectly, but regular use can cause minor errors to grow into serious problems. Most directory problems are easy to fix if they’re caught early but can be difficult, if not impossible, to fix later. That’s why checking and repairing startup disks automatically after a crash is so important, and why it’s essential to leave the Check Disk option turned on in Mac OS 9’s General Controls control panel (the disk check isn’t optional in Mac OS X).
To test how the disk repair utilities perform with different types of directory errors, I created an HFS+ disk image, copied an assortment of files and folders to it, and then used a low-level disk editor to damage various directory data structures. I then duplicated the damaged disk image, and let each utility try to repair its own copy. Each utility repaired an identical disk image, with identical damage.
Errors Speak Volumes — For my first test, I started with relatively simple damage in the volume bit map, which is also known as the allocation file. The volume bit map tracks which blocks on the disk contain files, and which are unused. All five utilities fixed my damaged volume bit map easily.
Next up was damage to the volume header, which tracks vital information about the disk, such as the amount of used and free space, and the locations of the catalog and allocation file. The volume header is stored at the beginning of the HFS+ partition. I erased the volume header’s signature, which makes the file system assume the volume header is corrupt and refuse to use the disk. Fortunately, the file system keeps a backup copy of the volume header at the end of the disk; it’s imaginatively called the alternate volume header. All five of our utilities successfully repaired the disk, although Drive 10 and DiskGuardian couldn’t figure out the name of the damaged disk.
Catalog Catastrophe — The catalog b-tree tracks all the files and folders on the disk. It’s a vitally important part of the directory, and many of my tests focus on it. The catalog is divided into nodes, and each node is divided into records. Most records track a file or folder on the disk, although some contain threads or indexes, which are used internally by the file system to look up files and folders.
The first node in the catalog is called the header node, which points to other key nodes. I erased the header node. Norton Disk Doctor, DiskWarrior, and Drive 10 recreated the header node properly; Disk Utility and DiskGuardian failed to fix it. Once again, Drive 10 couldn’t figure out the name of the damaged disk.
The header node also contains a map which tracks which nodes are used and which are free. I corrupted this map, but my corruption didn’t faze any of the utilities, all of which successfully fixed the header node map.
The nodes in the catalog are linked together in a precise pattern of connections. Horizontal links connect nodes on the same level, and downward links connect the levels. The file system relies on these links to look up files and folders. I damaged these links. As happened when I erased the catalog header node, Norton Disk Doctor, DiskWarrior, and Drive 10 fixed these links, but Disk Utility and DiskGuardian weren’t able to put the links back together.
File and folder records are stored in alphabetical order in the catalog. I rearranged these records, putting them in random order. All the utilities restored the alphabetical order.
Certain characters, such as a colon, are illegal in file and folder names. Normally, the operating system prevents you from typing an illegal character when saving a file or creating a folder, but it’s not inconceivable that unusual circumstances could cause one to appear. I renamed a folder with a colon by inserting the colon directly into the folder record in the catalog. With this test, the results start to become more interesting. Disk Utility and DiskGuardian didn’t detect any problem. Drive 10 noticed the illegal character, but didn’t fix it. Norton Disk Doctor and DiskWarrior both fixed it properly by replacing the colon with a legal character.
More Catalog Corruption — Each catalog node ends with a map that points back to the records in that node. I damaged the map for one of the nodes, which sounds bad, but it’s still possible to find the records by calculating the size of each record to find the next record. Disk Utility and Norton Disk Doctor realized there was a problem, but they couldn’t fix it. Drive 10 and DiskGuardian both identified and fixed the problem, but in the process lost five and six files, respectively. Partial repair isn’t always better than complete failure, since you may believe the disk was repaired successfully and only later – potentially much later – realize that some files have been lost. DiskWarrior fixed the catalog node map properly.
Next, I changed a thread record to be an unknown type of record, which creates two problems. A thread record that the file system relied upon was missing, and it was confronted by a record with an illegal type. Disk Utility and DiskGuardian detected the corruption, but couldn’t fix it. DiskWarrior fixed the problem but lost some of the data in one file. Only Drive 10 and Norton Disk Doctor managed to repair my damage properly.
Note that DiskWarrior moves any files it suspects may have problems into a folder called Rescued Items. In my tests, most of these files turned out to be fine. This approach has the advantage that it’s clear which files may be damaged. But if the Rescued Items folder contains many files, checking them and putting them away can be tedious. Norton Disk Doctor can optionally put aliases to damaged files in a folder, a potentially more helpful feature. But in my tests it didn’t work. Norton Disk Doctor also lists the names of damaged files in its report.
The most important aspect of a file record is the location of the file’s data on the disk. One of the worst sorts of directory damage that you may see happens when two files try to occupy the same physical space at the same time. Different utilities refer to this problem as "overlapping extents" or "cross-linked files." In the best case, one file has entirely overwritten the other, since then one file has valid data, while the other’s data is completely gone. In the worst case, the two files somehow manage to interleave their data, which results in both being damaged beyond repair. I cross-linked two files, thus damaging the files’ catalog records, as well as the volume header and volume bit map. Disk Utility and DiskGuardian repaired the catalog records, the volume header, and the volume bit map, but they didn’t actually separate the two files. In contrast, Drive 10, DiskWarrior and Norton Disk Doctor fixed the damage and separated the files. It’s important to realize that the data in the overwritten file couldn’t be recovered, but not through any failing of these repair programs. When one file overwrites another, the unlucky file has no chance of surviving the encounter.
I See Fragged People — Some months ago, I wrote an article for TidBITS explaining why defragmenting disks generally isn’t worthwhile (see "Optimizing Disks Is a Waste of Time" in TidBITS-686). Although fragmentation is totally normal and acceptable, serious fragmentation requires additional directory structures, and they too can become corrupted and require repair.
Using another proprietary tool, I fragmented a disk very badly, which breaks files into so many pieces that the file record in the catalog b-tree can’t track them all. The file system responds by creating new records in the extents b-tree to help track all the pieces. The extents b-tree is like the catalog b-tree, but exists solely to help track highly fragmented files. Again, severe fragmentation is not inherently a problem, but I had put the disk into a precarious, if legal, state. I then damaged two extent records so the file system couldn’t find all the pieces to two files, but I damaged each in a different way. Disk Utility, DiskGuardian, and Drive 10 all detected my damaged extent records, but weren’t able to repair the damage. DiskWarrior and Norton Disk Doctor performed better, fixing the problem but losing part of the data in one file. Considering the type of damage I inflicted, they did as well as could be expected.
HFS+ volumes are enclosed in a "wrapper," which is actually a plain old HFS volume. The reason for the wrapper is historical. Apple first released HFS+ with Mac OS 8.1. If you connected an HFS+-formatted disk to a Mac running Mac OS 8.0 or earlier – in other words, one that understood only the older HFS format, the wrapper kept the older system from deciding the HFS+ disk was damaged and offering to initialize it. For my next test, I damaged the catalog b-tree header node in the wrapper. Disk Utility and DiskGuardian didn’t notice anything wrong. Drive 10 and DiskWarrior detected the corrupt wrapper, but didn’t fix it. Norton Disk Doctor identified the damage and fixed it properly.
Disks can contain multiple partitions, which are listed in a partition map at the beginning of the disk. Since disk images don’t have partition maps, I used an external FireWire hard disk for this test, in which I damaged the partition map, making the disk driver’s partition overlap the HFS partition. Of all these disk repair utilities, only Norton Disk Doctor claims to check partition maps, and indeed it was the only one to detect the problem, although even it proved incapable of fixing the overlapping partition map. Luckily, damaged partition maps are extremely rare, which may be why none of the other utilities bother to check them.
The Grand Finale — Finally, I decided to recreate the worst damage I’ve ever seen on a Macintosh hard disk. Starting with the badly fragmented disk above, I corrupted and overwrote various parts of the catalog and extent b-trees. In some nodes I corrupted the node header (not to be confused with the header node), in some I munged the data records, and in others I zapped the record offset map. A few lucky nodes suffered all three types of damage at the hands of my disk editor. Only DiskWarrior was able to bring the disk back to a usable state, although 35 files were either lost or partly damaged. That 35 files were lost or damaged is not an indictment of DiskWarrior; the program couldn’t have done any better, considering how much vital information had been destroyed. None of the other utilities managed to repair the disk successfully.
And the Winner Is… Of my 15 tests, DiskWarrior fixed 12 successfully, Norton Disk Doctor fixed 11, Drive 10 fixed 9, DiskGuardian fixed 5, and Disk Utility fixed 4. "Fixed" includes cases where recovery may not have been perfect, but was good enough.
So what, in my professional opinion, should you do if your disk starts acting up? First, try Apple’s free Disk Utility. It may fix only a limited set of problems, but when Disk Utility finds a problem, it’s invariably correct, and it applies fixes only when it’s absolutely certain it knows the correct fix. I’ve never seen Disk Utility accidentally make a problem worse, something the other utilities can do, even if only very occasionally.
If Disk Utility doesn’t succeed, let DiskWarrior do battle with your damaged directory. It was our overall winner, and it deserves its excellent reputation. DiskWarrior can also show you a preview of the repairs before you accept them, which lets you check that a damaged file or folder really was fixed before DiskWarrior makes the fix permanent.
If DiskWarrior fails, give Norton Disk Doctor a try, since it can address some problems that DiskWarrior misses. After that, try sacrificing chickens. Seriously, if the combination of Disk Utility, DiskWarrior, and Norton Disk Doctor can’t repair your disk, you can either restore your data from backup, or, if that’s not possible, decide if the data is sufficiently important to pay DriveSavers for recovery.
I still think the most important data protection utility you should own is a backup program. But sometimes a good disk repair program can save the day by repairing minor damage quickly so you don’t have to run through the time-consuming process of reinitializing your hard disk and restoring from backup.
[David Shayer was a senior engineer on Norton Utilities for Macintosh 3.0, 4.0, and 5.0. Before that he worked on Public Utilities, a disk repair program that won the MacUser Magazine Editor’s Choice Award, and on Sedit, a low-level disk editor.]
PayBITS: If David’s detailed and expert testing told you which
disk utility you should rely on, reward his efforts via PayBITS!
Read more about PayBITS: <http://www.tidbits.com/paybits/>