26 July 1993 Show excerpts

TidBITS#186/26-Jul-93

This week brings several corrections and clarifications of previous articles, RAM prices increasing, the pen-based PowerBook project disappearing, and the postponement of the online Congressional hearing. In the rumor department, Apple releases another hardware update and Prodigy appears on the Internet. Finally, Roy McDonald of Connectix anchors the issue with a thoughtful paper on software acceleration.

I need to issue a correction and an apology. There is no way Jesse Helms could be involved with the House of Representatives’ pilot project because he is a Senator, not a Representative. That’s the correction. The apology is this: It was inappropriate to imply that Senator Jesse Helms would not participate in an Internet project. Whatever the Senator’s politics, and whether or not I agree with them, there is room on the Internet for all opinions.

And as long as I’m apologizing, just a quick note to clarify that the blurb for TidBITS #185 inappropriately implied that Microseeds no longer supports Rival due to a problem with Rival. We know of no problems with Rival and congratulate the authors on their admirable efforts to support existing users and enhance Rival. If only all companies were as diligent in their customer service.

The Macintosh LC 520 is sold in Canada, or so <[email protected]> writes to tell us. Unlike in the U.S., where Apple currently sells the LC 520 only to the education market, normal people can buy the LC 520 in Canada. The Canadian versions of the LC 520 sport 8 MB of RAM, an 80 MB or 160 MB hard disk, and an internal CD300i. All that, and apparently at a price lower than a comparably equipped LC III.

Bite the Purple Bullet and buy yourself a bigger PowerBook drive. Along with the cases that various hard drive vendors sell, you can now buy a $99 NuBus card from ETC Peripherals that will accept your 2.5" hard drive (or a 3.5" low-power drive if you buy an optional Purple Bullet Expander). The Purple Bullet provides a selectable SCSI address switch, an LED activity light, ETC Disk Tools 4.0, and all the necessary cables.

ETC — 800/876-4ETC — 813/884-2863 — 813/888-9535 (fax)

In the unconfirmed news department, I hear that Apple has finished the Macintosh Hardware System Update 2.0, but that the update is not yet generally available. It fixes a many bugs on many different models of the Mac, many of which seem to be more operating system bugs than hardware problems, but who’s complaining? We’ll let you know more details once we have a copy ourselves. Also, I just heard that Prodigy is beta-testing an Internet gateway. We don’t know if there’s any cost to Prodigy users, but they need new software, called Mail Manager, and can sign up for Internet access at JUMP INTERNET. Prodigy users can receive Internet mail via the address format

[email protected]

where "abcd12a" is the recipient’s user ID. It’s about time Prodigy appeared on the Internet, but don’t expect a reliable gateway right away since it’s still in testing. Again, details as they arise.

Fireworks weren’t the only thing blowing up on the Fourth of July this year. In Japan the Sumitomo epoxy plant, which made most of the epoxy used in constructing DRAM chips, blew up. Along with parts of the factory, RAM prices immediately skyrocketed. The explosion wasn’t the only factor in recent price increases, since supplies have been barely meeting demand in the PC industry, which has squeezed the smaller Macintosh market.

Although the price of SIMMs was of course affected, a less obvious effect may show up in the price of configured computers. It’s unclear if Apple will raise prices to compensate, or if Apple is even all that affected, given that the company was well-insulated from price shifts in the last price scare.

There’s nothing to do about the problem except wait it out, although I’ve heard the opinion that the price jumps happening at the moment are temporary and that prices will drop again in a few weeks or a month, though not to previous levels.

Along with all the layoffs, Apple has cut back projects deemed non-essential. Among them was the pen-based PowerBook, probably a modified Duo.

In some ways it’s a shame that such projects are dying, because even if they never lead to real products, the research often benefits Apple in other ways. However, I’m not surprised that Apple shelved the pen research for the time being, at least until they see how popular the MessagePad proves, given its pen interface.

Several years ago handwriting research and pen interfaces were all the rage, and one developer I spoke with said that to receive venture capital you essentially had to include pen computing in your business plan. However, with GO’s PenPoint remaining a niche operating system and with the demise of the much-touted Momenta pen notebook, pen computing fell out of favor with even the venture capitalists, and last I heard pen computing in your business plan meant almost certain rejection.

In many ways, I think it all comes down to the manner in which you wish to record and manipulate information. The concept of a writing stick has been around for hundreds of years – first because it was necessary to make an impression and later because it could leave a trail of lead or ink behind it. But does that make sense as an interface to a computer, where you can’t make an impression and where there is no permanence to a marked trail? I don’t wish to imply that the current interfaces to computers are anything special, or that there aren’t applications that lend themselves to a pen interface. The pen’s primary advantage is that people know what to do with it, not that it’s usable as a universal interface for a computer.

Nonetheless, it’s shame that Apple dropped the project, if only because only through experimentation will Apple (or anyone) determine the interface methods that work and those that don’t. It’s becoming painfully clear that the Macintosh interface is due for an overhaul despite its still-obvious lead over Windows. But that’s another editorial.

— Information from:
[email protected]

Fresh off the heels of correcting my egregious mistake regarding Senator Helms, it seems that the Online Congressional Hearing was postponed until later in the year. The story behind the delay is interesting.

The Internet Town Hall depends on donations from many organizations, many of which are commercial entities. Given the cost of computers, software, and network links, this isn’t surprising, and in fact, it’s an example of how even competing companies can cooperate for the community good, much as people cooperate on the Internet.

Along with everything else, the Online Congressional Hearing was going to transmit audio and video over the Internet, and to avoid destroying the standard links, ARPA volunteered the use of their high-speed experimental DARTNET, whose underlying facilities are operated by Sprint. The Internet Town Hall folks asked if Sprint would like to join, and in the process provide a high-speed link to the hearing room. Sprint expressed some concerns about the ethical considerations of donating the link to the government, even for this use alone, so the subcommittee postponed the hearing for several months.

The problem is that donations to the underlying infrastructure of the congressional committee could be construed as expenses which the government would have to reimburse. The idea is to avoid it seeming as though the committee was beholden to a specific interest group. I have a feeling that things are not so squeaky clean in Washington as this may imply, but I approve of the Internet Town Hall folks making sure that the Internet is kept above any such impropriety. We hope this hearing will happen in a few months and not end up sucked into a giant black hole of government investigations.

You can still email comments to <[email protected]> to be forwarded to the Subcommittee staff. You can also ask to be added to a list that will be notified when the hearing is rescheduled.

— Information from:
Carl Malamud — [email protected]

Presented at the Sumeria Technologies & Issues Conference

Hardware gets faster every year. We’ve all come to expect it. And, a huge amount of work is going on right now to ensure that next year the same thing will happen.

Software gets more features. And unfortunately, all too often, the presumption that fast hardware will take up the slack has meant that inelegant software design needlessly eats up performance advances. The irony is that software improvements are often far more dramatic in their impact than hardware improvements. Hardware is the tortoise, advancing relentlessly in tens of percents per year; software is the hare – on occasion it leaps orders of magnitude.

This article reviews what has been done in software acceleration on the Mac, highlighting how much more could be done right now. I aim to persuade you to think about Mac performance as a hybrid of hardware and software acceleration and perhaps shift your priorities a little in favor of pushing the envelope on code rather than silicon.

Decade of Macintosh Hardware Advances — Let’s start by seeing what can be done with hardware. How has Macintosh hardware improved in performance over the past 10 years?

The original 128K Mac had an effective speed of roughly 1/2 MIP. Today’s Quadra 950 provides about 8 MIPs. Of course, the Quadra 950 is relatively expensive, so on a real $/MIP basis, the growth is only eight-fold, equivalent to a yearly average improvement of 26 percent.

SCSI, NuBus, and AppleTalk speeds have changed less. SCSI may be about twice as fast as it originally was. The new Cyclone NuBus standard will give a four times performance boost. AppleTalk is basically unchanged. And, although EtherTalk has led to a high-speed network standard bandwidth that is roughly twenty times better than what we had in 1984, actual throughput is roughly only a factor of five better.

Typical RAM installation has grown from 128K to the current average of 6 MB, a 50 times growth, or about 50 percent per year. Access speeds of main storage have only improved about a factor of two (although caching has mitigated this otherwise fatal limitation).

Common hard drives seek an average of about five times faster and have ten times the capacity than they did when drives first shipped for the Mac Plus. The average transfer rate hasn’t improved by much more than a factor of two.

Overall, we might imagine a "Speedometer" increase of as much as a factor of 20 over the past decade (with perhaps much more than that for floating-point operations).

That’s not to say that hardware can’t make occasional big leaps, too. RISC processors will provide a roughly three times performance jump on one-third the die size, for an overall price-performance step of ten times in what will probably be a two to three year transition period. DSP can also accelerate certain processes by an order of magnitude.

But, taken all together, typical jobs on a constant-priced Mac have been able to be performed roughly 25 percent faster every year, solely because of technical advances in hardware and increased performance for the price. This means hardware performance doubles roughly every three years, a rate likely to continue for the foreseeable future.

Software Advances — While hardware advances are relentless and pervasive, software improvements are often more specific in their impact. The performance results, however, can be dramatic.

For a familiar example, consider the case of ‘Find File’ running under System 6 versus System 7. For fun, we recently took a Mac Plus running System 7 and raced it against a Mac IIci using System 6. The System 7 software was running on hardware five years older than the System 6 version. Still, Find File went slightly faster on the Plus, because Find File is roughly ten times faster in its current form.

Unfortunately, it often takes a long time for well-known software techniques to enter the commercial sector. For instance, it was many years after the introduction of the first spreadsheet (VisiCalc) before sparse and virtual array techniques were used. If you wanted a 50 by 1,000 cell spreadsheet, you had to have 50,000 cells worth of RAM (say, 800K), even if most cells were empty.

Sparse techniques would have allowed you to use only the amount of memory taken by full cells, and virtual techniques to use disk space as well, at the cost of slower calculation. But the marketing war focussed on porting to new platforms and adding new features, not on saving RAM. A few engineer-years could have saved users tens of millions of dollars worth of RAM.

Many new technologies which seem to arrive because of hardware advances are in fact largely enabled by software breakthroughs. We did a rough analysis of the increased performance in a variety of frontier technologies over the past five years and tried to assess what fraction of speed improvements came from software as opposed to hardware. We concluded that the software components for the various technologies were:

  Voice recognition         80%
  Handwriting recognition   80%
  Dynamic 3D graphics       60%
  Compression               50%

In all cases, some hardware improvement was necessary in order to make the technologies practical, (e.g. DSP) but better software, particularly better software algorithms were the most important enabling technology.

Components of Speed — Where does the speed come from? You can break the software design process into three components: algorithms, implementation, and compilation.

The largest range of performance difference comes from algorithm selection. This may also be the area of poorest performance in the industry today. Factors of 10 and 100 losses in performance are common. Why is this?

Consider the basic Order theory of algorithms. Every computer algorithm can be classed by Order. For example, an Order N algorithm takes twice as long when you run it on twice as much data. An Order N-squared algorithm takes four times as long. Lots of computational problems are easy to code as N-squared algorithms, but can be rewritten with difficulty to scale as NlogN.

A famous example was the introduction of the Fast Fourier Transform in the mid-60’s, an NlogN algorithm that replaced the previous N-squared algorithm.

A 1,024 point transform could thus be performed 100 times faster by this new software method. So this advance was comparable in speed to over 20 years of general-purpose hardware speed improvement. And, it was accomplished through a software change which, once developed, had no marginal cost over the prior solution.

Unfortunately, plenty of commercial software ships every day containing inefficient algorithms. Sorting records in a database is a familiar example where NlogN algorithms can be used but aren’t always. When you scale your data from 10 to 100 records, pixels, or whatever, it means the algorithm may take 100 times longer to run, when it only needs to take twenty times longer.

It’s easy to see why it happens. From the technical perspective, debugging and benchmarking is often done on limited data sets that don’t reveal how badly the code will bog down in real world applications. And the real world constantly increases data set size, often at an exponential rate. Screen diagonal and pixel resolution are two common parameters which quadruple data set size when the parameters double.

Over in marketing, they know that software is not as rigorously benchmarked for speed as hardware, because comparisons are often more difficult to apply. So feature lists and time-to-market become disproportionately important factors.

Good algorithms are not enough. Implementation counts as well. For example, suppose you need code for looking up records in a database. An efficient algorithm for this is Order N – twice as many records means twice as long a search.

The usual way to accomplish this is to index the records in a binary tree. Then you need to do log(2) N index lookups to get the location. To find a single record in a 1,000 record data base requires 10 lookups.

But, if each of these lookups involves a separate hard drive access, the implementation is poor, even though the algorithm is optimal. A better (and more typical) implementation would bring some or all of the directory information into RAM at the time of the first disk hit and cache it there for the next nine lookups. Whether or not you use an optimized algorithm, if the implementation is three times slower than necessary, the overall performance suffers by the same ratio.

Good implementation is often a matter of deep familiarity with the target hardware platform, a familiarity which is increasingly difficult to achieve as technology life cycles shrink ever shorter.

Also, the code we write is not the code the system runs. Between the two stands a compiler.

Within the Mac world one can find a range of commercial C compilers that vary by as much as 30 percent or more in ultimate compiled code performance. To do better than that, one must write in assembler, and here the variations are even greater. To put it bluntly, it’s not hard to do a lot better than MPW.

Looking beyond the Mac, we must face the fact that much more effort has gone into optimizing 80×86 compilers than 680×0 products. As Windows has gained market share, more and more cross-platform benchmarks are being published of essentially identical object code compiled for Windows versus Mac and run on similarly powered CPUs. The Windows products tend to run faster because the compilers are, by and large, a little bit better. The most striking example I’ve seen was a recent PC Magazine benchmark of WordPerfect where the Windows advantage was substantial. This is not because of a superior operating system, but because of the availability of a better optimized compiler.

With the move from CISC to RISC architecture, and especially with the move to superscalar pipelines, ever more burden is placed upon the compiler. If sloppy compilers can be written for CISC machines, time-to-market pressures could produce RISC compilers which have even more of an effect.

The trend in the software industry today is in the opposite direction of this theme. We are all sacrificing performance in favor of time-to-market. Object Oriented Programming is the epitome of this trade-off. Now, there’s nothing wrong with OOP, and it’s great that we’ll all soon be writing Newton applications by dragging and dropping resources from the object pool.

But OOP is an obvious formula for inefficient code. Witness the feel of the Finder in System 6 vs. System 7. In many applications I’ll guess that early products will be sketched in OOP and later, more mature products or versions will be coded at lower levels.

Lately we’ve been thinking about starting a development house that specializes in knocking off popular OOP-based products with C or assembler-based me-too versions. We’d be second to market but we’d win the benchmark wars every time.

System Software — System software is particularly important because of its pervasive impact on performance. Well-written, native-mode system calls are critical to good performance for a wide range of software products, and can to some extent overcome limitations imposed by inefficient compilers. If most of the computer’s time is spent in highly-optimized system calls, the inefficiencies of the calling program can easily be overlooked.

On the downside, many advances in system software have undermined performance. Windowing systems and multitasking both advance overall productivity, but add overhead which slows routine operation. The user gets new functionality, but it doesn’t come for free, and it affects all applications.

Moreover, advances often improve performance in ways that are difficult to define quantitatively. Both virtual memory and RAM disk technology can significantly enhance Mac productivity, but it’s hard to benchmark their contributions. For example, Connectix end-user studies of Virtual and MAXIMA customers indicate that either product can increase total work output per session by 5-20 percent, but results vary widely according to the type of work performed and the system configuration.

An area of particular interest to Connectix is the use of advanced, dynamic disk caching techniques, utilizing all of the often "wasted" RAM on computers to avoid unnecessary disk access. The benefits of this are two-fold:

First, disk accesses are usually a hundred to a thousand times slower than RAM accesses, so tremendous speed improvements can be achieved. Preliminary benchmarks on our Velocity caching product show an overall work throughput increase of about 25 percent. That’s not bad for a low-cost software extension considering what it costs to accomplish the same boost in hardware.

Second, caching has become increasingly important because of portable computing. PowerBook users will enjoy considerable battery life extension through the elimination of unneeded disk spin-ups, which typically account for 10 percent of power use in a battery-powered PowerBook session. Many PowerBook users also complain that their PowerBooks seem sluggish compared to comparable desktop systems – mainly, it appears, because of the random annoying delays of drive spin up.

The key to a successful caching strategy involves maximizing the available cache size and filling it with the data most likely to be called for next by the CPU. Velocity incorporates unique advances in both of these areas, which I look forward to discussing in the future.

Input/Output — One of the most productive areas for software acceleration is in the I/O domain, both internal to the system, and over a network. After all, processing has three major steps – you get the information, then you process it, then you spit out the results. Two thirds I/O, one third processing.

Consider the following thought experiment: Watch a typical user for an hour. She opens files, launches applications, enters alphanumeric data, spell checks, calculates, sends email, closes windows. Now, double the processor speed. Maybe she’ll save 5 minutes out of the hour. Instead, suppose you double the I/O speeds – SCSI, ADB, AppleTalk, and NuBus. How much does she save then? Our testing indicates it’s also about five minutes, and it’s certainly within a factor of two of that either way for most sessions.

Moreover, a lot of the time saved will occur during periods when the user would be especially annoyed at delays. Most people are prepared to watch their clock spin a few seconds when calculating, but have less patience when saving or opening a document. The system just doesn’t seem to be working as hard then.

Hardware I/O speeds are generally not improving quite as fast as raw computation speeds. But a lot can be done in software here. Many I/O bottlenecks give 10 to 1 or even 100 to 1 speed delays. Even though they are only relevant to system operation a small fraction, say 10 percent of the time, addressing these bottlenecks can have a big impact. If you want a graphic example of this, compare benchmark data of third-party 25 versus 33 MHz accelerator boards. With a 33 percent higher clock speed, you often see benchmarks only 10 or 20 percent better, because I/O is setting the pace.

Networks — Enormous increases in network bandwidth are becoming available because of the introduction of new technologies, particularly optical transmission. The underlying structure of network data transmission on the Mac is starting to be strained by these capabilities.

I recently spoke with a vendor who successfully developed an attractive low-cost, high-performance FDDI card with about ten times the effective speed of today’s Ethernet systems. It failed as a product, however, because the throughput of the network bottlenecked at both ends of the link by packet creation and decoding time. This seems like an area ripe for new software paradigms.

Video — There has been little improvement in the software that drives Mac video over the years. This reflects the fact that the Mac started with an excellent foundation, the original version of QuickDraw. Subsequent versions have improved screen draw times by about a factor of two, and big improvements in the future seem unlikely.

User/System — Finally, there is one bandwidth limitation which dominates all others in importance, one link in the I/O chain responsible for 99 percent of the wasted clock cycles in every Macintosh. This, of course, is the interface between the user and the system. Far outweighing compiler, implementation, and even swamping the effect of new algorithms is how efficiently a user can communicate her wishes to the machine, and how in turn the machine can let the user understand or appreciate the results and implications of those actions. The ultimate bandwidth limitation, and the single most important way to improve the total performance of the user-system combination is the user interface metaphor.

The Mac established its special position in the industry by virtue of its unique ability to address this one issue. Essentially, the key technology that enabled it to do so was software. But more remains to be done, and the pace of improvement in the last five years has not been particularly impressive. For all the two thousand engineer years that went into its development, is the Mac a lot easier to use under System 7 than it was before? I don’t believe so, and I hope we’re in for some paradigm shifting breakthroughs here. Personal computing could use such a shot in the arm today.

Conclusion — Time-to-market and feature list forces are driving software developers to work in ever higher-level programming languages and to pay less and less attention to the efficiency of the underlying code. Because hardware speed has increased over the years, they have been able to get away with this for some time.

But considering how much effort goes into pushing the speed envelope of the hardware, it seems like users would be well served if more emphasis were placed on software acceleration. In everything from mainstream applications to system software, users do care about speed and software will often be the best price-performance technology to provide it.

TidBITS#186/26-Jul-93

Administrivia

The Macintosh LC 520

Bite the Purple Bullet

In the unconfirmed news department

RAM Prices Increase

Pen-Based PowerBook Crossed Out

Online Congressional Hearing Postponed

Software Acceleration