Macintosh Internet File Format Primer
Last week’s article in TidBITS-444 about the need for developers to support MacBinary III brought in a lot of email from people confused about various Macintosh file formats that appear on the Internet. What do these formats do, and how do they interact? I hope this article, written with the help of Leonard Rosenthol (the original developer of StuffIt Expander), will clear up confusion.
First off, Mac users regularly deal with three kinds of encoding formats: archiving formats (like StuffIt, Compact Pro, and Zip), binary packaging formats, and transfer encoding formats. Archiving formats bundle multiple files into a single file, compressing the originals in the process. I’ll assume that everyone understands the concept behind lossless compression – you replace repeating patterns of data within a file with a token representing those patterns, thus reducing the amount of data needed to represent the original.
Binary packaging and transfer encoding formats are more complicated, and worse, they can be combined in the same format. Binary packaging formats include MacBinary, AppleSingle, and AppleDouble. Transfer encoding formats include uuencode, Base64, and quoted-printable. Last but not least, the venerable BinHex straddles the fence, providing both binary packaging and transfer encoding. Should you wish to encode or decode files manually, shareware utilities exist for some specific formats, and Aladdin’s StuffIt Deluxe 4.5 provides translators for many as well. TidBITS readers can get a discount on StuffIt Deluxe – see the sponsorship area at the top of the issue for details.
MacBinary — MacBinary is an old format, first proposed back in 1985 by Dennis Brothers on MAUG on CompuServe. Two years later, a group of developers made additions to the format to handle HFS and other changes in the Mac OS. That format, MacBinary II, has survived to this day, although it is being replaced by MacBinary III to address the issue raised by the new file flags originally slated for Mac OS 8 and appearing now in Mac OS 8.5. Interestingly, the MacBinary II format has an option for a "secondary header" for future expansion, but it’s never been used.
As a binary packaging format, MacBinary bundles up the data fork and the resource fork, along with Finder information about the file (type, creator, etc.). In bundling the two forks of a Macintosh file together, MacBinary protects the file when it’s uploaded to a Unix machine or PC, neither of which support Macintosh two-forked files. Without MacBinary, resource forks would be lost, which could be disastrous or merely annoying, depending on the file.
Support for MacBinary is broad within applications, though users have seldom understood it because MacBinary support is generally transparent. When you upload a Macintosh file using Anarchie or Fetch, for instance, both default to uploading in MacBinary format. Part of the problem, I suspect, is that many people who upload files don’t know to change the file name to reflect MacBinary’s file extension – .bin. Some FTP applications, notably Fetch, try to add the .bin extension automatically.
Although application support has been broad, it’s seldom deep. For instance, although Anarchie and Fetch deal with MacBinary correctly and transparently, Web browsers generally pawn MacBinary files off on a helper application, such as MacBinary II+ or StuffIt Expander. Without the appropriate helper application, you can’t decode the file.
Worse, many Web servers don’t set the proper MIME type for MacBinary files, which can result in old Web browsers trying to display the MacBinary file as text within the browser window. In this case, there are two solutions, short of updating to a modern Web browser. First, click and hold on the link and choose Download Link to Disk from the pop-up menu that appears in many browsers. Once you’ve downloaded the file, drop it on StuffIt Expander. Second, convince the Web site administrator in question to set the correct MIME type for MacBinary files. They should download in binary mode, with the "bin" extension, and using the MIME type "application/x-macbinary" (type and creator fields can be set to BINA and SITx so double-clicking the file launches StuffIt Expander).
Even more confusing, it turns out that Internet Config, and thus Internet Explorer, incorrectly sets the MacBinary MIME type to "application/macbinary", which is an uncompleted proposal. And, older versions of Netscape Navigator (at least 2.x, and possibly 3.x) don’t set any MIME type for MacBinary at all. Without agreement between Web servers and browsers, there’s a good chance that something will go wrong when a user clicks a link to a MacBinary file. In theory, forcing a download to disk and then decoding manually should work, but with so many variables, there’s no telling. My recommendation: set the MIME type for MacBinary to "application/x-macbinary" wherever you see it, including Internet Config, Web servers, and older versions of Netscape Navigator. Other settings for handling MacBinary, such as the file suffix and an application to handle the files, should be the same as those mentioned above for Web servers.
Again, MacBinary is a binary packaging format, not a transfer encoding format. Thus, it requires a full 8-bit connection whenever the MacBinary file is being transferred. If someone still behind an obsolete 7-bit connection to the Internet tries to transfer a MacBinary file, it will be damaged in transit. (Some 7-bit connections still exist, but are rapidly fading into the mists of Internet past because both FTP and the Web’s HTTP use 8-bit connections.) For that extremely uncommon situation, a transfer encoding format like BinHex is also necessary.
AppleSingle — Like MacBinary, AppleSingle is an 8-bit binary packaging format. Most people have seen AppleSingle only in the attachment encoding format pop-up menu in Eudora or Emailer. Apple developed AppleSingle and AppleDouble in the days of A/UX, Apple’s original version of Unix. Since A/UX didn’t support two-forked Macintosh files, Apple needed a way to share files between A/UX and the Mac OS. MacBinary was the obvious choice, but because MacBinary stores the data fork first in the file, you couldn’t do live edits on MacBinary files within A/UX. So, Apple reversed the order in which the forks were stored in MacBinary files (so programs could append or remove data without moving the resource fork), added some never-used options for future expansion, and called it AppleSingle.
Thanks to its extensibility, AppleSingle is a better format than MacBinary (it even supports the new Finder flags appearing in Mac OS 8.5), but it suffers from one significant problem: almost nothing supports it for file transfer. Email programs can often handle AppleSingle attachments, and Fetch can both upload and download AppleSingle files. Web browsers, however, are clueless. When the file flag issue that resulted in MacBinary III first arose, the immediate thought was to evangelize AppleSingle. But, without existing code, developers were loathe to put in the work necessary to support AppleSingle and to unleash a new file extension on the Mac community. MacBinary III was the course of least resistance.
AppleSingle use is primarily restricted to occasional email messages between Macintosh users (although Apple also uses AppleSingle when storing disk images). As a cross-platform format, AppleSingle falls down because it bundles the two forks of a Macintosh file together in such a way that they can’t easily be separated – that’s what AppleDouble is for. In short, unless you’re sending an attachment to another Macintosh user, there’s no point in using AppleSingle, and even then, AppleDouble works as well.
Since AppleSingle is an 8-bit format, AppleSingle files must be protected by a transfer encoding format when sent via email, since email protocols like SMTP (unlike FTP and HTTP) aren’t guaranteed or required to be safe for 8-bit files.
AppleDouble — AppleDouble stores the two forks of a Macintosh file as separate files (bundling Finder information like type and creator in with the resource fork). AppleDouble also came about because of A/UX. You could share files between the Mac OS and A/UX using AppleSingle if programs on both sides understood AppleSingle. However, if you were sharing files with other flavors of Unix using NFS (Network Filing System), programs on those machines were unlikely to understand AppleSingle. AppleDouble’s technique of storing a Macintosh file as two separate files enabled those other Unix users to edit just the data fork without disturbing the resource fork. You won’t see files stored in AppleDouble on Internet file sites, since it’s silly to store two files for each file uploaded.
However, AppleDouble’s way of splitting Mac files proved extremely useful for email, since each part of the file could be a different MIME part within a single message. That enables the receiving email program to pick out the parts it understands. If a Mac email program receives an AppleDouble-encoded attachment, it knows to read both forks and put them back together. If a non-Macintosh email program receives an AppleDouble-encoded attachment, however, it saves only the data fork, throwing away the resource fork.
In fact, AppleDouble is the standard for sending Macintosh files in email according to the Macintosh extensions to MIME, written by Eudora author Steve Dorner. Like AppleSingle, AppleDouble doesn’t lose Mac OS 8.5’s new file flags, which makes it the default format of choice for attachments in the future. Keep in mind, however, the fact that AppleSingle and AppleDouble can both store these new file flags, doesn’t mean that older applications properly read the new flags or write them out again. So, even though these formats don’t require any changes, applications still might.
Once again, AppleDouble is an 8-bit binary packaging format, so any AppleDouble email attachments must be protected by a transfer encoding format, generally Base64.
uuencode — uuencode is one of the oldest of the Internet transfer encoding formats. Designed many years ago for transferring binary files via the text-only UUCP protocols that carried email and Usenet news, uuencode takes 8-bit data and converts it to 7-bit text. There are many uuencode implementations, but files are easily recognizable because they all start with a "begin" line that specifies the file name and the Unix permissions for the file (generally 644).
Transfer encoding formats like uuencode are essentially the reverse of compression software. They take groups of 8-bit bytes and represent them in a series of 7-bit characters. Thus, files encoded with a transfer encoding format always grow after encoding, up to 35 percent.
uuencode doesn’t know about Macintosh file forks and encodes only the data fork of a Mac file. Not all Mac files have resource forks or store necessary information in them (Microsoft Word files, for instance, have resource forks, but they don’t contain any information essential to the file), so uuencode proves an acceptable way to transfer files to Windows users. Support for uuencode is more common in Windows programs, so I fall back to uuencode when sending attachments to Windows users if AppleDouble fails (which it might if the recipient’s email program isn’t MIME-savvy).
Base64 — Like uuencode, Base64 represents 8-bit files using 7-bit characters so they can withstand being transmitted over 7-bit links or using 7-bit protocols. Base64 wasn’t developed originally for the Internet, but for PostScript Level 2 for computers to transfer binary data to printers.
Unlike uuencode, which sports numerous utilities, Base64 is almost entirely transparent to the user, being handled in the background by email programs. This transparency comes as a result of Base64 being a modern format that’s handled by modern programs. In the past, computers and programs were less capable and users tended to be more sophisticated, so it was less necessary to make transfer encoding support transparent.
Quoted-Printable — Quoted-printable, which most people know of only indirectly because of Eudora’s QP button, is a transfer encoding format designed for representing high ASCII characters (special characters with diacritical marks, for instance) in 7-bit form. The important difference between quoted-printable and the other transfer encoding formats is that quoted-printable encoded text remains mostly human-readable. Quoted-printable isn’t used for encoding files, just the text of email messages.
We’ve all seen email messages with equal sign characters at the ends of the lines – that’s an indication of a quoted-printable message not being decoded, usually because the MIME header that specifies the quoted-printable encoding was deleted. This problem crops up primarily in mailing list digests, since mailing list software removes all but the essential headers from messages before combining them into a digest. The solution to this problem is MIME digests, which maintain headers for each message within the digest, facilitating bursting the digest into a mailbox and retaining special headers that specify transfer encoding.
BinHex — I’ve saved the most confusing format for last. BinHex is a binary packaging and transfer encoding format originally developed by Tim Mann for the TRS-80 in the early 1980s and rewritten completely by Yves Lempereur for the Macintosh in 1984. Yves was also part of the team that created the MacBinary format, and he wrote the BinHex 5.0 program to support it. This caused some confusion because his BinHex 4.0 program used the BinHex format, whereas BinHex 5.0 used MacBinary.
The BinHex format solves both problems facing transmission of Mac files on the Internet. When you encode a file in BinHex, the encoder first combines the file’s forks, then converts the 8-bit result into a 7-bit file. The combination of these functions made BinHex the most popular way to deal with Mac files on the Internet because you could be pretty sure that whatever you did, the original file would survive.
I shouldn’t imply that BinHex is an entirely good thing any more. It was great back when 7-bit links were common, but in today’s world, the 30 to 35 percent by which files grow when binhexed is an unnecessary waste of space and downloading time. Also, BinHex doesn’t store Mac OS 8.5’s new file flags, which will render it less useful in the near future.
That said, BinHex won’t go away any time soon. Support for BinHex is widespread, and everyone recognizes the BinHex .hqx extension and knows how to decode BinHex files (drop them on StuffIt Expander). Because BinHex files are 7-bit text, they’re more likely to survive badly configured applications, and in the past, it was even common practice to copy them out of an email message or a browser window into a text file, save it, and decode it. If you see BinHex text in today’s software, something has probably gone wrong, and it’s likely that the file will be damaged. Another minor advantage of BinHex’s text format is that it can have human-readable descriptions above the BinHex code.
The main argument I’ve seen for why people like BinHex is that it works. That’s true, and I’d never encourage someone to switch to something that worked badly. I’ve heard of situations where BinHex decoders deal better with huge files than MacBinary decoders, and the whole MIME type debacle with MacBinary that I mentioned above has also caused trouble. At the same time, though, our goal should be to identify and eliminate the problems that affect today’s modern formats, rather than sticking, DOS-like, with what we’re used to.