In the latest releases of both my ebooks on backups (“Take Control of Mac OS X Backups” and “Take Control of Easy Backups in Leopard“), I include sidebars titled “(Sparse) Bundles of Joy,” in which I describe Leopard’s new sparse bundle disk image format, used by Time Machine for network backups. Because this format is quite interesting, has potentially broad application, and hasn’t received much attention, I’d like to say a bit more about it here.
Managing Your Image — By way of review, a disk image is a special file that can also behave like a disk – that is, if you double-click the image, a new volume appears in the Finder; this volume can contain any number of files and folders, and you can open or copy them just as you would do with the contents of any other volume. Disk images typically have the extension .dmg and are often used to distribute software. Unlike ordinary folders, disk images can be compressed, encrypted, and/or made read-only, and can be opened on any Mac, all without the use of any third-party software. If you want to distribute a whole set of files and be sure that they remain perfectly intact on the other end, using a disk image is an excellent way to do so.
Over the years, Apple has created a variety of different formats for disk images. The sparse image (extension .sparseimage), for example, was an improvement over the .dmg format in that it could grow automatically in size as needed (up to a preset maximum). Prior to Leopard, Mac OS X used sparse images for things like local copies of your iDisk (if you have iDisk Sync turned on in the MobileMe pane of System Preferences) and FileVault (which used an encrypted sparse image). In both cases, the images could begin relatively small, rather than occupying lots of unused space on your disk even when they contained little data.
But sparse images, like .dmg images, had a problem. Making any change to their contents marked the entire image file as changed. If you were doing incremental backups that included a large disk image file, say, this meant that even the tiniest change would result in the entire file having to be backed up again. For example, I used to store private documents on a 10 GB encrypted .dmg disk image. But I couldn’t back up the disk image file itself, because it changed every day and I’d rapidly run out of disk space if I kept backing it up. So instead, I had to separately back up the contents of the mounted image to an encrypted archive, which was an inconvenience.
Bundle Up — When I upgraded to Mac OS X 10.5 Leopard the first time, one thing I noticed immediately was that a copy of my local iDisk sparse disk image was sitting on my Desktop, while a new disk image, this time with the extension .sparsebundle, was stored in a subfolder of ~/Library/FileSync. Leopard had taken the liberty of converting my iDisk image to a new format – a sparse bundle – and put the old one on my Desktop as a backup, presumably in case anything had gone wrong during the conversion. So what’s with the new format and why should you care?
A sparse bundle looks and acts just like a sparse image – it can grow in size, can optionally be compressed or encrypted, and so on. What’s different is that it isn’t actually a single file, as all previous disk image formats were. It’s a bundle (also known as a package) – a folder that Mac OS X treats as a single file, which is also true of applications. (To verify this, you can Control-click or right-click a sparse bundle, choose Show Package Contents from the pop-up menu, and browse through its contents.) Inside that package is a folder full of bands – files that are each 8 MB in size, as many as are needed to hold the image’s data.
What’s cool about this is that if you change something on a sparse bundle (adding or modifying a file, for instance), only the band(s) containing that data change, not the whole bundle. As a result, assuming your backup software treats the contents of bundles as individual files, you no longer have to back up a huge disk image just because a tiny file changed. Your backup software only has to copy the 8 MB band(s) containing any of that file’s data (often only one). So I converted my encrypted sparse image to an encrypted sparse bundle, and now I can include it along with all my other files in my ordinary backups.
Nuts and Bolts — You can create and modify disk image files (of whichever sort) using Disk Utility, located in /Applications/Utilities, or with the command-line tool diskutil if you’re so inclined. For example, to create a new, encrypted sparse bundle, you’d follow these steps:
- In Disk Utility, choose File > New > Blank Disk Image.
- Fill in the filename, location, volume name, and maximum size; leave the format as Mac OS Extended (Journaled).
- Choose either 128-bit or 256-bit AES encryption from the Encryption pop-up menu. Leave Partitions set as it is.
- From the Image Format pop-up menu, choose Sparse Bundle Disk Image.
- Click Create. Enter and verify a password and click OK.
Although Disk Utility can also convert one format to another (using the Images > Convert command), I’ve had some trouble with this method, and I’ve generally found it more reliable to create a new image from scratch and copy the contents of the old image manually.
The Future of Sparse Bundles — As I mentioned earlier, Time Machine stores your backups in sparse bundles when you’re backing up over a network (to another Mac running Leopard, or to a Time Capsule). The Leopard version of FileVault also uses the sparse bundle format now, which may decrease its susceptibility to disk errors. (I’m still no fan of FileVault, though, because apart from the threat of losing data to file corruption, I prefer much greater control over what is, and isn’t, encrypted.) But what I find most exciting about sparse bundles is the problems they could potentially solve, if more developers used them.
Let’s go back to the problem of backing up huge files that change frequently. If you use Parallels Desktop or VMware Fusion to run Windows on your Mac, this is still an issue, because those programs still store all their data in monolithic disk image files. Similarly, Microsoft Entourage uses a single big database file to store all your email, contacts, and calendar information. So conventional wisdom says you should exclude files like these from Time Machine or other backup programs that run frequently, because otherwise your backups will take an excessively long time and require tons of disk space. Unfortunately, that also means you have to find some other, more cumbersome way to back up that data – or leave it unprotected.
If Parallels, VMware, and Microsoft were to adopt the sparse bundle format for their respective data storage needs, at least as an option, this problem could disappear. (This approach would work only under Leopard, however.) In fact, I know of at least one attempt to trick Entourage into using a sparse bundle, though the process is rather elaborate and geeky, and I haven’t tried it myself. Similar acrobatics could possibly be performed with virtualization programs, basically forcing them to store their existing disk images on sparse bundles, but it would be better by far if users didn’t have to jump through such hoops.
Although Entourage and virtualization programs are among the most prominent examples, undoubtedly many other applications that deal with very large files could also benefit from using sparse bundles. For all I know, perhaps developers are already hard at work bundling up their images, or perhaps technical problems I’m unaware of (beyond the requirement for Leopard) make it harder than I imagine. But for the sake of speedy and space-efficient backups, I certainly hope the sparse bundle rapidly becomes a favorite format for storing large amounts of data.