This article originally appeared in TidBITS on 2009-07-26 at 2:25 p.m.
The permanent URL for this article is: http://tidbits.com/article/10433
Include images: Off

Cause of Font Cache Bug Revealed?

by Matt Neuburg

The other day I was using TextMate [1] to run a simple Ruby script and an odd thing happened: the script suddenly started producing nonsense. There was nothing really wrong with the script itself, but TextMate appeared to have lost its mind; instead of showing me the actual string resulting from the script, like "ogopogo," it was omitting some of the letters, like "gpg." I restarted the computer and everything was fine after that. But I was left wondering what the heck had just happened.

I posted a query to the TextMate users newsgroup, and someone responded: "WebKit is used to render the HTML output window, and it has been known to behave strangely from time to time. Another possibility is that your font caches had become corrupted. Either of these problem could have been corrected by a reboot." Oh, yes, the font cache bug. I'd forgotten all about it, and I certainly had not connected it with TextMate's output. But I did know about the font cache bug. Indeed, I had referred to it implicitly, years before, in my review of Smasher (see "Insider Smashes Suitcases [2]," 2005-09-26).

The Mac OS X font cache bug is an intermittent misbehavior of fonts on Mac OS X, typically affecting any application that displays Web pages with the built-in WebKit engine (Safari, OmniWeb, TextMate, BBEdit, and CSSEdit are examples). The bug can also mar the display of PDFs, I believe. A quick Google search turns up some pages that talk about it, including this one [3] which provides some images of a corrupted Web page display, and a YouTube video [4] showing characters randomly disappearing and reappearing (much like what I was experiencing myself). Rob Griffiths mentions the bug in a recent Macworld article. And, going back further in time, John Gruber had an extensive series [5] of articles about it in 2005.

The occurrence of the font cache corruption bug on my machine has been less frequent in recent years; indeed, I'm not certain I've ever seen it on Leopard (I was using TextMate on Tiger when the bug struck me). Still, the question remains as to what actually triggers the bug.

Now it appears there's an answer. The problem seems to be caused, as one might expect, by a combination of two things: badly behaved fonts, and Apple's font caching mechanism. But in what way are the fonts badly behaved, and what's wrong with the font caching mechanism? The details come from an unexpected quarter of the Mac OS X world: the users of TeX.

TeX (pronounced "tech"), for those who don't know, is a typesetting program by the venerable Donald Knuth [6]. It's often used for the production of scientific and mathematical books and papers. There are various Mac OS X TeX implementations, and it was while I was glancing over some Web pages connected with these, reading about TeXShop [7] and MacTeX [8], that I noticed a link to a page about the font cache bug. I read the page, and my jaw dropped. Brilliant and determined detective work by some TeX power users has recently laid the blame for font cache corruption at the door of a TeX utility called pdftex, which lies at the heart of TeX implementations because it is used to pipe the TeX output directly to a PDF. If you receive and open a PDF that was created with pdftex, you run the risk of triggering the font cache bug on your machine.

Here's why (and now I am basically just quoting from the explanation by Richard Koch, the creator of TeXShop). A PDF file contains embedded copies of the fonts that it uses. Those copies consist of mathematical instructions for drawing the font's characters (that's what PDF is all about). These mathematical instructions are often expressed, in part, as PostScript subroutines for drawing partial shapes used by multiple characters, like this:

dup 372 {
    11 5 div
    6 38 5 div
    41 5 div
    0 61 5 div
    rrcurveto
    closepath
    endchar
}

Now, you may not be able to read that (how many of us are fluent in PostScript?), but it turns out that there's a bug in that subroutine. After the "endchar" line, the routine is supposed to have a "return" statement, and it doesn't. These subroutines were being incorrectly formed by the then-current version of pdftex.

However, the incorrectly formed subroutines had no obvious manifestation in the resulting PDF file, because pdftex was forming them incorrectly only in the case of characters which, while part of the font, were never used in that particular PDF. For characters that were being used in that PDF, pdftex was forming the subroutines correctly. Thus, the issue could never be directly detected.

But here's the problem: When such a PDF was opened on Mac OS X, Apple's font caching mechanism came along and stored these subroutines anyway - that's why it's called font caching! - so it would know how to draw those characters of that font if it encountered them later. So if it did encounter those characters of that font later, these subroutines would be called, and since the subroutines were corrupt, the font's drawing procedures would be wrong.

So the bug was being triggered by opening a "bad PDF," but it had no effect on the "bad PDF" itself; it was only later, if other characters of the same font happened to be used anywhere in the system where the font caches were called upon (such as through Preview or a WebKit-reliant application), that the corruption would manifest itself. And you know something? Sure enough, when I saw the problem in TextMate, I had been reading a TeX-generated PDF file earlier that same day.

What's the upshot for you, the end user? First, you may acquire, or may already have on your machine, the occasional "bad PDF" file, and if you open it, this might trigger the font cache bug, which will manifest itself as character corruption later on until you restart the computer or otherwise rebuild the font caches. You may be able to identify these by doing a Spotlight contents search for "pdfTeX" (if you sort the results by Kind, remember that a PDF can be listed either as an "Adobe PDF Document" or as "Portable Document Format"). A more specific search, avoiding PDFs that merely mention pdftex, would be "Encoding software contains pdftex." (On accessing the "Encoding software" search criterion through the "Other" pop-up menu item, see my "Spotlight Strikes Back: In Leopard, It Works Great [9]," 2007-11-01.) You can't fix a "bad PDF," but at least you'll have some notion of which PDF files might trigger the bug.

Second, it's perfectly possible that there are other causes of font cache corruption besides PDF files generated with TeX, so let's not heap all the blame on the TeX users - after all, they're the ones who found the source of the problem in pdftex.

Third, newer PDF files generated with TeX are unlikely to cause the problem, because the TeX folks have also fixed the problem in pdftex.

Fourth, Apple changed the font caching mechanism in Leopard, but it looks like the problem can still occur (though it seems to me that it occurs less often). In any case it is ultimately up to Apple to rewrite its routines to deal more robustly with bad fonts; now that the TeX power users have been able to show Apple exactly how the bug is triggered, perhaps Apple will be able to correct it.

[1]: http://macromates.com/
[2]: http://db.tidbits.com/article/8263
[3]: http://www.creativetechs.com/iq/garbled_fonts_troubleshooting_guide.html
[4]: http://www.youtube.com/watch?v=OX9kc1ZDI7U
[5]: http://daringfireball.net/2005/03/font_caches_gone_wild
[6]: http://en.wikipedia.org/wiki/Donald_Knuth
[7]: http://www.uoregon.edu/~koch/texshop/
[8]: http://www.tug.org/mactex/
[9]: http://db.tidbits.com/article/9283