HTML Crunchers Fuel Compression Obsession
Graphic designers hit a stumbling block a few years ago when the Web threatened to become The Next Big Thing. It had been acceptable to pack as much detail as possible into every row of pixels in a huge image. But designers who took on Web work discovered that images needed to be as small as possible. Compression became the holy grail of Web design.
Although this quest led to the creation of a new industry and a disproportionate number of how-to books, only recently has attention focused on optimizing the HTML files that make up every Web site. Two utilities have emerged to shave even more bytes from your Web files. Mizer, from Antimony Software, and VSE HTMLTurbo, from Voget Selbach Entertainment, can reduce the size of HTML files without harming their functionality.
Don’t Byte Me If I Strip — Image compression relies on two notions: either replace repeating values with a shorter description of those values (known as "lossless compression" and used in GIF files), or remove unnecessary information without revealing noticeable degradation (known as "lossy compression" and used in JPEG files). (For an overview of image compression, see "A Closer View of Web Graphics" in NetBITS-007.) You can’t apply lossless compression to HTML files because Web browsers aren’t designed to read, decode, and display compressed text files. That leaves lossy compression: strip out unnecessary information but leave the content and HTML tags intact.
So what’s expendable? Without trying to say what’s worthwhile on the Web, there are unnecessary elements in a typical HTML file. Line breaks, tabs, and spaces that aren’t used in the page content are the most obvious; they consume space despite being invisible. Although HTML purists (and validation programs) may object, most Web browsers can correctly interpret pages without some elements, such as quote marks around tag attributes (like <IMAGE HEIGHT="50">) and tags added by some HTML editors (like <NATURALSIZEFLAG>).
You could also attack comment tags (which don’t appear in a Web browser but are used to embed notes, represented as <!– COMMENT HERE –>). However, some Web servers add preexisting content from templates or perform an action dictated by commented commands, making this option potentially dangerous.
You could do all this by hand if you had the time, but since no one does, instead check out the aforementioned utilities to have the work done for you. The stripped files look awful without the tabs, line breaks, and spaces that make the text easy to read. That’s why the creators of both Mizer and HTMLTurbo recommend HTML compression happen just before uploading. That way, the smaller files reside on the Web server, while your editable copies remain on your hard disk. Apply necessary updates to your local files, then replace the server files with new optimized copies.
Getting Wiser with Mizer — To process a file using Mizer, drop it onto Mizer’s application icon. You end up with three files: the optimized HTML file, a backup copy of the original, and a log file reporting the amount of compression achieved. You can modify those and other options by launching the program directly and choosing Preferences from the File menu. Mizer also includes a setting called Tag Optimization that removes closing tags such as </LI>, </HEAD>, and </HTML>, even though that’s against official HTML rules.
In addition to compressing individual files, Mizer can crunch an entire folder of Web files dropped onto it, enabling you to process a local copy of your Web site in one shot.
Mizer is scriptable, so you can incorporate it as an automated step within your Web page creation process. For instance, a sample script provided with Mizer optimizes files then uploads them to your Web server using Fetch.
Blasting Text with VSE HTMLTurbo — Like Mizer, HTMLTurbo involves a drag & drop operation to optimize HTML files, but it offers more configuration options. For example, from the Preferences dialog box, you can specify that comment tags and <META> tags be stripped (you can also remove just the <META NAME="generator"> tag).
HTMLTurbo can notify you when it encounters errors in your HTML code, but its implementation is crude, popping up a dialog box that stops processing until dismissed. Fortunately, you can turn this option off.
HTMLTurbo can display a Results window that uses the amount of bytes saved to estimate how much bandwidth you can save over a period of time. By selecting a file and typing in the approximate number of hits that page receives, HTMLTurbo reports average savings by day, month, and year. I wouldn’t classify this as hard data, but it’s interesting to see the effect of your efforts, especially if your Web hosting fees are based on actual bandwidth used.
I threw two complete sites at the programs. The larger one, weighing in at 22,713,440 bytes (22.7 MB) was reduced to 21,589,258 bytes (saving 1,124,182 bytes, or 4.95 percent) by Mizer, and 21,488,988 bytes (saving 1,224,452 bytes, or 5.39 percent) by HTMLTurbo. Note that these figures represent the entire site, graphics and all. The second site, which was much more modest, shrunk 14.8 percent from 134,236 bytes to 114,265 bytes (Mizer) and 15.5 percent to 113,445 bytes (HTMLTurbo).
Compression Quibbles — Overall, I was pleased with the 5 to 15 percent compression I saw in my informal results. I wasn’t able to identify any page elements that broke due to the optimization, and in several cases load times seemed to improve. However, despite both programs’ enthusiastic claims, real-world speed differences are influenced by outside factors such as Internet traffic, your computer, and your method of Internet access.
In fact, the problems I found with each program were related more to interface and action, rather than results. My largest gripe about Mizer relates to processing a folder of several HTML files. Although the program makes backup copies of the original files, they’re scattered within the original directory instead of in a new folder; this meant that for my large site example, which contained 1,466 files in several nested folders, I had to separate the compressed versions from the originals manually.
HTMLTurbo introduced its own variation of this problem: it tosses every processed file into one directory – if you compress more than one file from different sites on your hard disk, you must sort them out (hoping that none share the same name, like index.html). Another quibble with HTMLTurbo is its complete lack of information on exactly what it strips from HTML files. Some people may not want that level of detail, but I want to know what’s being done to the HTML I’ve labored over (this is also why I’m often dubious about WYSIWYG HTML editors). Mizer, though slightly less flexible, makes up for it by precisely explaining its actions in the ReadMe file.
Please Squeeze the Cheese — For designers who want to squeeze the most out of their HTML, both utilities are well suited to the task. Mizer 1.2 is available for purchase through TidBITS sponsor Digital River for $69.95; although a demo is not available, Antimony Software guarantees a full refund within the first 30 days. VSE HTMLTurbo is available as a 1.2 MB download. The demo version is fully functional for 21 days, after which it costs $79.95 to obtain a registration code.