In TidBITS-334, we looked at the PowerPC processor family and some of the terms and technologies associated with it. If you read the article, your probably know the difference between 68K and PowerPC chips, why clock speed and clock multipliers are important, the difference between Level 1 and Level 2 caches, and the differences among different PowerPC chips. Part 2 builds on this information and examines additional software and hardware components of the Power Macintosh.
Emulators Forever — If there’s a single thing that made the Power Macintosh successful, it’s the 68K emulator built into its system software. Conceptually, the 68K emulator sits between the PowerPC processor and executing code. If code is written for the PowerPC (such code is considered "native"), the emulator does nothing; if the code is written for 68K machines, the emulator translates it to PowerPC code (at a very low level) and passes it to the PowerPC processor. Without a 68K emulator, non-native programs wouldn’t run at all on a Power Mac.
The 68K emulator enabled Apple to move the Macintosh to a new processor architecture while retaining strong compatibility with existing programs – undeniably a good thing. At the same time, 68K emulation is also the Achilles heel of the Power Mac because the performance of 68K emulation can’t compare to that of native PowerPC code. When the Power Macs were introduced, Power Mac users often took a step backward in performance because the vast majority of Mac software was only available for 68K machines. Though some native applications appeared quickly, major tools like QuarkXPress, Microsoft Office, and FileMaker Pro took a while to become Power Mac-native.
Further, though Apple has ported many critical portions of the system software to take advantage of the PowerPC, much of the system still relies on the 68K emulator. Thus, even high-end Power Macintoshes are caught in a quagmire of 68K code, reducing their potential real-world performance even when running native applications.
If 68K code is so slow, then how long will 68K emulators be around? That’s simple: Apple has to keep a 68K emulator in the system forever.
First, the Mac OS relies heavily on the 68K emulator, and though System 8 will contain substantially more Power Mac-native code than System 7.5.3, it’s unlikely the entire operating system will ever be fully native. At a basic level, it’s not worth the effort to port everything, particularly little-used, non-performance-related portions of the system.
Second, Apple has a vested interest in making sure 68K code and applications continue to run. Almost every Power Mac user owns software written for 68K machines that will never be ported to PowerPC. A good example is Ambrosia Software’s arcade classic Maelstrom, which is largely written in 68K assembly language. Porting Maelstrom to the PowerPC would be an enormous undertaking; yet, more than two years after the introduction of the Power Macintosh, Maelstrom continues to run fine in emulation, and is actually a good test of 68K emulators.
Keeping 68K emulation in the system doesn’t mean that improvements can’t be made. Apple’s original 68K emulator was static, translating 68K instructions to PowerPC code one at a time. Emulation performance can be improved with larger Level 1 or Level 2 processor caches (emulator performance is better on the PowerPC 603e chip than the original 603 due to a larger Level 1 cache); however, it’s also possible to build a smarter emulator.
With the PCI Power Macs, Apple introduced a significantly faster dynamic recompilation (DR) emulator. The DR emulator watches the 68K code for loops and stashes the translated PowerPC code for later use, rather than translating the same 68K instructions over and over again. However, the DR emulator comes with a slightly higher price in terms of compatibility: programs that do not operate correctly on 68040 machines with their processor caches enabled may not run correctly. Also, Apple’s DR emulator only works on PCI Power Macs; the ROMs of earlier Power Macs don’t support it.
A good alternative is Speed Emulator, part of Connectix’s Speed Doubler. (See TidBITS-292.) Speed Emulator is also a DR emulator, and though it uses more memory than Apple’s, it also significantly outperforms it and runs on any Power Mac. Speed Emulator’s additional performance is particularly obvious in some areas; for instance, it significantly speeds up the Apple Event Manager, a feature particularly appreciated by AppleScript users.
Both Apple’s and Connectix’s emulators imitate a 68LC040, which is a problem if you need to use a 68K program with a program that specifically requires a floating point unit (FPU). In the 68K family, FPUs were originally a separate chip devoted to floating point math operations. With the 68040, Motorola built most FPU functions directly into the processor, then (in a cost-cutting move) removed them in the 68LC040. Programs requiring an FPU won’t run under emulation on a Power Mac because they correctly determine that an FPU isn’t available.
If you need to use programs requiring an FPU on a Power Macintosh, you have two choices: SoftwareFPU and PowerFPU, both from John Neil and Associates. These programs emulate a 68K FPU, allowing 68K programs that require an FPU to function. SoftwareFPU, a $10 shareware product, works fine, though it’s not PowerPC native and must pump its math calls through the 68K emulator. PowerFPU is a $20 commercial product that provides PowerPC-native FPU emulation. Since native PowerPC floating point functions are speedy, PowerFPU’s performance can be quite good.
The Magic Bus — When evaluating the performance of a computer, most users refer to the machine’s processor type and clock speed, primarily because these terms are common, occasionally comparable, and liberally used in marketing materials. However, another major factor in a computer’s overall performance is its bus, the main data path between the processor and other components.
The easiest way to explain a bus is by analogy: think of your computer as a small, one-road town. Most of your computer’s components live on the road, and the road must be used every time information has to travel between components. A traffic light controls travel, and a complex series of local laws governs who can go ahead, who has to wait, and how often people can get on or off the road. Two things control how fast traffic moves: how many lanes the road has ,and how often the traffic light changes. One thing controls how efficiently traffic moves: local traffic laws.
In this analogy, the bus width is the number of lanes in the road, the bus speed is how often the traffic light changes, and the hardware architecture and operating system are the traffic laws.
- Bus Width: The bus width is literally how many bits can move across the bus at the same time. Power Macs have a 64-bit bus, meaning 64 bits can travel across the bus simultaneously. Previous Macs had a 32-bit bus, and early Macs had a 16-bit bus. As you might expect, a 64-bit bus is about twice as fast as a 32-bit bus, since it can move twice as much material in the same amount of time. However, a 64-bit bus is also more expensive to manufacture.
- Bus speed: The clock oscillator controls bus speed, as well as processor speed. Basically, a clock oscillator is a tiny quartz crystal that vibrates a certain number of times per second. It’s like a metronome for a computer, controlling everything from disk access and screen redraws to memory access and networking, and making sure everything happens in sync.
- Hardware architecture: How traffic flows over the bus is a function of the computer’s hardware and operating system design. For example, in older computers, writing data from RAM to a disk meant every piece of information in RAM had to go across the bus to the processor, then back across the bus to the disk system, which would write the information and report back when it finished. These days, it’s more common for computers to have a "private road" between RAM and disks. There are numerous other instances of hardware and software engineering in all Macintosh models that strive to improve bus efficiency.
Dishing Your Buses — The analogy above is a vast over-simplification – in reality, a Macintosh has a number of different buses, most of which exist in sub-systems. SCSI, Ethernet, serial ports, RAM, expansion slots (NuBus and PCI), and input devices all have separate buses, each of which has its own width and (sometimes) its own oscillator.
Bus speed is an important factor when considering upgrades. Clock chipping, a popular, inexpensive method for upgrading Quadras and first-generation Power Macs, involves replacing the computer’s clock oscillator with a faster one. Although it invalidates Apple’s warranty and not all Macs can be clock chipped successfully (success rates are around 90 percent), replacing the clock chip speeds up the computer’s processor and bus, often making for a good all-around performance improvement. For detailed information on clock-chipping, check out Marc Schrier’s clock chipping FAQ.
Many PCI Power Macs and clones have both their clock oscillators and processor chips on a removable CPU daughter card, providing a built-in upgrade path to faster clocks and processors. This design permits you to replace the processor and the clock oscillator at the same time. However, in many cases there’s still a limit to how fast the main bus can go. In Apple’s current models, the upper limit is 50 MHz; Power Computing’s PowerTowers go to 60 MHz. This doesn’t mean that daughter card upgrades won’t be worthwhile for these machines, but rather that they won’t improve the performance of every aspect of the system beyond a certain point.
Similarly, upgrade cards for from vendors like Apple and DayStar for earlier Mac models (from the IIci through the Quadra series) should be evaluated not only on the basis of the promised clock speed of the PowerPC chip, but also in terms of the performance constraints imposed by other hardware. In many cases, these cards must traverse a comparatively slow, narrow bus to get data from disks, ports, other devices, and/or RAM, yielding real-world performance levels considerably lower than Power Macs with equivalent processor speeds. Though these upgrades might be adequate for being able to run PowerPC code, they’re rarely equivalent to the performance of a used Power Mac and often cost just as much.
The Myth Of Clock Speed — In the end, what does all this mean for buying a Macintosh these days?
Be wary of hype surrounding the raw clock speed of a particular machine. Although processor speed is (of course) related to performance – and many computer vendors trumpet little more than the clock speeds of their machines – many other factors (processor type, cache, emulation, bus speed, system software, and more) contribute to a machine’s real-world performance.
As an example, Power Macs achieve their high processor speeds by using clock multipliers built into their PowerPC processors, allowing the chips to run faster than the machine’s clock oscillator. There’s no question this improves performance, but there are limits to how much bang-for-the-buck this technique will produce. There’s a real performance difference between a 120 MHz machine using a 6x clock multiplier on 20 MHz bus and a 120 MHz machine using a 2x clock multiplier on a 60 MHz bus. Though both machines would function, the first machine will take much more time to access disks, networks, memory, and peripheral cards than the second machine. Even though they’d be roughly equivalent in terms of raw processor performance, the first machine is going to spend more of its processor cycles waiting for its hardware.
Also, pay attention to what processor a particular machine uses. In terms of raw processor power, a 120 MHz PowerPC 604 is significantly (50 to 75 percent) faster than a 120 MHz PowerPC 601 or 603e, just by the nature of the chip designs. However, in real world terms, a machine with a PowerPC 604 might only mildly outperform a 120 MHz 603e with a fast bus, fast video, fast disks, and a good emulator.
If you can’t judge computers by their clock speed, what can you use? Increasingly, the only meaningful measures of real-world performance are produced by benchmark applications like Speedometer, MacBench, and Norton Utilities System Info.
I don’t feel the results of these programs can be accepted as gospel. Though tests on my Macs produced results in the right ballpark for each machine, none of these applications produced consistent results in repeated testing. Still, programs like these at least attempt to analyze more than a processor’s performance, and if results are sufficiently averaged across a wide range of configurations, they might give a reasonable idea of a machine’s real-world performance.
For More Information — These two articles have covered a lot of territory, and I hope they dispelled some confusion about what different bits of hardware do and how you can relate their specifications to real world performance. If you’d like more information, I’d recommend the following technical sources.
For details on PowerPC processors, look at Motorola’s and IBM’s information, as well as the PowerPC FAQ:
If you’re interested in how processors are officially benchmarked (and what a SPECint95 means!), check with the source:
Finally, if you’re curious about how the PowerPC chip works in the middle of a Macintosh, I recommend this introduction from Apple’s Developer University: