About a year ago, we bought an Intel-based Xserve with a pair of 80 GB SATA drives to act as our primary Web server. When the boot drive went flaky on us in October 2008, we were able to recover from the backup on the second drive and off-site backups, if a little shakily (see “TidBITS Outage Causes Editors Outrage”, 2008-10-07). But although we were able to bring the machine back online, we didn’t trust the drive that had failed. Since the Xserve has three drive bays, the obvious solution was to purchase another drive. Sounds simple, doesn’t it? Not so much.
You cannot buy a bare hard drive and insert it into an Xserve, as you can with a Mac Pro (and having just added a drive to my new Mac Pro, I can say that Apple did a stunningly nice job in making it easy to add drives, especially in comparison to the awful approach they used in the Power Mac G5). Instead, Xserves require Apple Drive Modules, which are custom carriers containing drives.
For users accustomed to buying inexpensive hard drives, Apple’s pricing on the Apple Drive Modules comes as a bit of a shock. An 80 GB SATA ADM costs $200 from Apple, and a 1 TB SATA ADM costs $450. In comparison, a bare 80 GB SATA drive can be purchased for a measly $35, and a 1 TB drive is only about $100. That would seem to point toward buying a new SATA drive and swapping it into the bad drive’s ADM. However, when I started down that path, a number of problems arose, such that I bailed on a quick solution and simply purchased a new 80 GB SATA ADM to replace the bad one.
First, I wasn’t sure whether my Xserve had SATA drives, as I thought, because System Profiler on the Xserve shows nothing on the SATA bus, instead including all drives on the SAS bus. (SAS stands for Serial Attached SCSI, and is a high-performance data transfer technology that supports fast SCSI drives and is downward compatible with SATA drives.) After some discussion with knowledgeable folks on the MacEnterprise list and careful reading of the drive details in the SAS section of System Profiler, it became clear that both SAS and SATA drives are shown in the SAS section, with SATA drives having “ATA” as the Manufacturer, and showing “Yes” in the SATA Device line.
Second, once I knew that I had SATA drives in my ADMs, I started investigating if there were any gotchas involved in replacing the drives. There turned out to be surprisingly little hard information about this, with some people having replaced an ADM’s drive with no trouble and others experiencing performance or reliability issues. I did find a few discussions about how replacing drives isn’t recommended, but giving no solid sources.
Confused, I contacted Apple to discuss why ADMs are so expensive in comparison to bare drives, exactly what an ADM does, what Apple recommends users do with failing ADMs, and whether or not replacing a drive in one is a good idea. That conversation revealed a great deal of interesting information about the ADM and shed some light on what people with flaky ADM drives should do.
Drive Selection — The most important fact to know about ADMs is that Apple doesn’t use just any drives. We’ve all benefited from the amazingly low cost of storage. But whenever manufacturers compete on price, they cut corners every way they can to reduce costs. Although drive reliability is generally good, everyone who buys bare drives regularly has a drive vendor they refuse to patronize due to bad experiences in the past. (As is often the case, these people all hate different vendors, depending on which one was having a bad run at any given time.)
Since the Xserve is designed to be in constant use – 24 hours a day, 7 days a week, for years at a time – Apple doesn’t use the least expensive drives available, since those drives are designed for more normal duty cycles in desktop computers – 8 to 10 hours per day, with variable use during that time. Instead, Apple works closely with drive manufacturers to select drives with more durable components, going so far as to pick specific head and media combinations. This is commonplace in the industry – other sources told me that drives sourced by manufacturers like Apple, Dell, and HP have generally better reliability than off-the-shelf retail drives.
Apple calls these beefier models “server-class” drives; you may also see terms like “RAID edition” used to differentiate them from low-margin retail hard drives. Apple generally considers server-class drives to be high-end SATA drives, in comparison with “enterprise-class” drives, which are the highest performance drives (Fibre Channel and high-end SCSI in the recent past, now 15,000 RPM SAS) with the highest mean time between failure ratings.
So the first reason not to slap an off-the-shelf SATA drive into an ADM is that the drive may simply not be able to handle the constant use.
Custom Firmware — Another reason to avoid off-the-shelf SATA drives for ADMs in production servers is that Apple works closely with drive manufacturers to customize the firmware in drives destined for the Xserve. Details vary by drive, but the bulk of the firmware changes involve tuning the drive for performance and thermal behavior.
According to Apple, most drive firmware is, not surprisingly, tuned for optimal performance with Windows, which reportedly reads and writes relatively small data blocks. In contrast, Mac OS X works with larger blocks. Tuning the firmware’s caching algorithms to match with the size of Mac OS X’s desired blocks can improve performance. This is a non-trivial task, since there are a number of different caches involved, between the drive and the operating system, and tuning them all for optimal performance is an art.
The main other area where firmware tuning helps is with thermal behavior. Today’s large drives use a technology called “perpendicular recording” and I was told that these drives go into a “read-after-write” mode at certain temperatures to ensure data reliability. Having to read every bit written reduces performance, so Apple tweaks the firmware of drives used in the Xserve’s ADMs to reduce the likelihood that the drive will go into this mode. Apple can do this because the ranges of the Xserve’s normal operating temperatures are known, whereas retail drives have to assume a worst-possible thermal environment. Thus, it’s much more likely that an off-the-shelf drive will drop into read-after-write mode more quickly than a drive in an Xserve.
Other industry sources confirmed that it’s common for computer manufacturers to work with drive vendors to tune drive firmware for performance, but several went further, noting that computer manufacturers put drives under consideration through serious testing, which can reveal problems in how drive firmware handles error conditions. Some firmware changes are designed to reduce the likelihood of data corruption.
It’s difficult to learn much about hard drive firmware online, since drive manufacturers guard their firmware closely to reduce the likelihood that the firmware will be hacked. That can be counterproductive, since additional public scrutiny could reduce the likelihood of bugs like the one that generated the recent debacle surrounding Seagate drives. A firmware bug could cause a number of Seagate drive models to become inoperable after being powered off. If that wasn’t bad enough, the fixed firmware reportedly caused additional problems for some of the affected drives.
A final fact to realize about the custom firmware in ADM drives is that the Xserve’s Server Monitor software is designed to monitor about a dozen variables reported by the drive’s firmware and report pre-failure warnings if those variables stray outside acceptable limits. Using an unsupported drive may prevent Server Monitor from being able to report on the drive’s health.
Smart Carrier — Part of the explanation for why an ADM costs significantly more than a bare retail drive revolves around the ADM carrier itself. It’s not just a physical sled, but also includes a controller board, temperature sensor, and a pair of LEDs that report on both drive activity and drive status. The ADM’s temperature sensor integrates with the Xserve’s cooling system to increase airflow to drives that are getting too hot.
Apple also told me that the rubber grommets that hold the drive to the ADM carrier are chosen specifically to match each drive’s vibrational characteristics. Different drives use different types of rubber in an attempt to reduce vibration as much as possible. I gather this is a bit more important with the 15,000 RPM SAS drives, given their very high rotational speed.
Extensive Testing — Most electronics exhibit what’s called a “bathtub curve” of failures. That means that the likelihood of failure is rather high early in the lifetime of a drive, then drops and levels out for its useful lifespan, and then rises back up as the hardware simply wears out. From a user perspective, you want to avoid drives that will die early on. (There’s no way to avoid the eventual death of a drive, but given the speed with which data and disk capacity grows, the hope is you’ll want to replace a drive with a larger one before it fails on its own out of old age.)
To reduce the likelihood of drive infant mortality and other early-life problems, Apple subjects every drive shipped in an Xserve or ADM to 48 to 60 hours of non-stop testing. That’s usually enough to weed out drives that will fail immediately.
Apple also rejects any drives that show any hard or soft errors during the testing. Even though drives automatically map out such errors, statistically speaking, if a drive experiences any hard or soft errors during the initial burn-in testing, it’s more likely to fail. One source told me that a number of the hardware RAID chipsets will refuse to work with drives whose firmware has mapped out bad blocks. He found that drives rejected by a RAID worked fine in a Drobo, which apparently is happy to accept a drive with mapped-out blocks.
Obviously, this sort of testing benefits both customers, who are less likely to suffer drive failures, and Apple, in reducing warranty repairs, but there’s no question that it has associated costs that Apple will pass on to the customer in the form of higher ADM prices.
The Practical Upshot — After researching this topic, I’m convinced that although replacing a dead drive in an ADM is possible – Apple explicitly does not prevent it – it’s not a good idea if the Xserve in question is a production server. If you do decide to go this route, I strongly recommend that you get a drive that’s designed for RAID or server use. Also, note that Apple makes both SAS and SATA ADMs, and drives are not interchangeable between the two. So if you have a SAS ADM, you must put a SAS drive in it.
As I thought about my initial reactions to my drive’s flakiness, I realized that the problem is that Apple is essentially selling enterprise-level hardware to Mac users accustomed to mass-market products. I’m certainly familiar with running server software, but before the Xserve was released, I used standard Macs as servers – heck, for a long time, one of our servers was a Performa 6400. There’s nothing wrong with repurposing a Mac designed for everyday use as a server, as long as you realize that it’s not designed with server tasks in mind, and could suffer from performance or reliability problems when put into that role.
In other words, the belief that replacing a drive in an ADM is a no-brainer is thinking like a Mac user, not like a system administrator managing a production server. A sysadmin would prefer to avoid cheap hardware that’s likely to cause future problems in such a situation because it’s a false economy. But since moving from a hand-me-down desktop Mac to an Xserve is an easy jump to make, Apple has essentially attracted a class of customers who don’t yet think like sysadmins when it comes to production servers. And since Apple’s focus is so strongly on the consumer market, the company doesn’t make a significant effort to educate Xserve customers about what they’re getting into.
(There is one instance where bare drives are required for ADMs: Apple’s now-discontinued Xserve RAID. It takes older ATA-based ADMs, which aren’t readily available new, forcing those who own Xserve RAIDs to replace bare drives with whatever they can find to maintain RAID integrity in the face of failed drives. This is non-trivial, since all the RAID drives must be the same size, but the investment in an Xserve RAID is high enough that owners are justifiably going to great lengths to keep them active.)
With that in mind, I went looking to see how much comparable drive modules for HP and Dell servers would cost. Ignoring the fact that it took ages to sort out what servers might be comparable to the Xserve, when I finally found rack-mounted servers with hot-swappable drive modules, the prices from HP and Dell turned out to be even higher than Apple’s. Admittedly, Apple offers only four options – 80 GB and 1 TB SATA ADMs for $200 and $450, and 73 GB and 300 GB SAS ADMs for $300 and $650 – whereas HP and Dell offer a full range of sizes and won’t even go as low as 80 GB for SATA drives. But HP’s and Dell’s prices are either comparable (for the 73 GB SAS drive) or $200 to $250 higher (for the 1 TB SATA and 300 GB SAS drives).
To sum up, there are multiple good reasons why ADMs cost more than bare retail drives of the same size, it’s possible but not recommended to replace the drive in one, and Apple is in no way charging an unusual premium for ADMs.