Series: You Can Say That Again
Boss your Mac with PlainTalk or ListenDo, or dictate with iListen or IBM's ViaVoice!
Article 1 of 4 in series
by Matt Neuburg
[Note: I am indebted for technical assistance to my father, Ned Neuburg, who was on the ARPA steering committee in the 1970s; and to Erik Sea, IBM's Development Lead for ViaVoice/Mac, for answering some key queries.] Classic science fiction, by and large, has proven both myopic and optimistic when it comes to computersShow full article
[Note: I am indebted for technical assistance to my father, Ned Neuburg, who was on the ARPA steering committee in the 1970s; and to Erik Sea, IBM's Development Lead for ViaVoice/Mac, for answering some key queries.]
Classic science fiction, by and large, has proven both myopic and optimistic when it comes to computers. Increased brain power was an obvious prediction, but few foresaw that computers would also become small, cheap, and ubiquitous, with all the tremendous attendant sociological implications. On the other hand, by all accounts we should long ago have been talking to our computers. Where is HAL 9000? The QWERTY keyboard is a clumsy dinosaur; of course you'd eventually like your computer to read your thoughts, but in the meantime, why can't you just tell it what to do? Well, to a large extent, you can; you wouldn't want to hand over control of a mission-critical task to a voice-driven computer just yet, but your computer need no longer be as deaf as a post either.
Wreck a Nice Beach -- You've probably heard of ARPA, the advanced research wing of the U.S. Department of Defense during the Cold War; you're certainly familiar with one of its creations, the Internet. Another ARPA project was to have computers know what people were saying - called "speech recognition". (I once proposed the term "autoglossomerolysis," but somehow it didn't catch on.) In the early 1970s, ARPA threw massive amounts of funding at the problem.
The major obstacle was the acoustic model, which may be imagined as phonemic analysis. How can the computer work out whether a vowel is "ah" or "ee", whether a consonant is "p" or "t", or even where the phoneme boundaries are? Most researchers expected that computers would find the features of speech, corresponding to how the mouth produced the sounds: "this is a voiced guttural stop, that is a rounded front vowel". What the ARPA-funded research demonstrated, though, was that you could make more significant practical progress by doing something much more crude. First, characterize the raw sound by a minimal set of numbers; then, match those numbers against a template - e.g., this sound is a "p" because numerically it looks like a prerecorded "p".
The trick here lies in the notion "looks like." James Baker, then a graduate student at Carnegie-Mellon University, applied to speech recognition pattern-matching a probabilistic mathematical device called a "hidden Markov model" (HMM). The results proved so superior in that first ARPA funding round that all modern speech recognition uses HMM - a fact which is astounding for two reasons. First, HMM is fundamentally not only crude but almost certainly wrong - however our ears and brains hear and analyze speech, HMM is surely not it. Second, it's amazing that we've been doing speech recognition the same way for so long. To be sure, modern HMM is vastly more sophisticated than in those days; and one should not underestimate the importance of software optimization, a direction pioneered, again, by James Baker, who went on to found Dragon Systems. But the really important development has been in hardware. Computers are now about a thousand times faster and a thousand times larger in resources, and a thousand times smaller in size and cost, than in those early days, so they have at last begun to meet speech recognition's mathematical demands.
In the early 1990s, Apple created its own system-level speech recognition component, PlainTalk. But PlainTalk's genius lies in its compromises: it doesn't need training for a particular user, but it does only discrete speech recognition, matching a short phrase to a finite list of predefined possibilities. The holy grail is continuous speech recognition (CSR) - basically, you talk and the computer types. And CSR is definitely here, thanks to IBM's ViaVoice Enhanced Edition.
Hail CSR -- HAL 9000 notwithstanding, the obstacles to continuous speech recognition are severe, as the history of IBM's research illustrates. They started as early as the 1950s and were among the recipients of ARPA's early funding; yet only within the last five years has IBM marketed consumer-level dictation software. Just consider: The acoustic model must find your phonemes despite the way sounds are disguised by word boundaries and sentence stress. Yet unlike discrete speech recognition, your "command" is never clearly over, so the acoustic model must also be extremely fast, to keep up with you. Plus, it isn't the only model involved: there must be a linguistic model to group your phonemes into words, matched not from some tiny list but from a possible vocabulary of tens of thousands of words.
Thus, to be at all practical, present-day continuous speech recognition requires that the acoustic model be trained for the particular speaker's voice quality and pronunciation and the characteristics of the microphone and the environment. ViaVoice handles this by having you read certain stories that it presents to you when the program first starts up. (You can repeat this procedure later to refine your model, and ViaVoice maintains multiple models so it can be used by different people, or by the same person in different surroundings.) The linguistic model, meanwhile, requires a dictionary: ViaVoice includes a default dictionary, and presumably calculates initial pronunciations based on your acoustic model; it also includes five specialty dictionaries, such as cooking or finance, of which you can turn on one at a time.
Even so, ViaVoice clearly cannot know every word you'll say or every quirk of your pronunciation, so it provides three features for expanding and refining the models:
You can add to your vocabulary directly through a dialog where you type a word and record a pronunciation for it.
You can have ViaVoice scour a text document for unknown words; it asks you which of these you're likely to use and prompts you to record pronunciations.
In the course of dictating, as you correct ViaVoice's mistakes, it learns. In particular, this happens when you select a word and dictate it again, and when you use the Correction Window, which lists alternatives to the selected problem word. Also, when you save, you are again prompted for pronunciation of unknown words.
ViaVoice also extends your vocabulary through macros and commands. Macros are expressions typed differently from their pronunciation, such as punctuation ("comma" and "period") and boilerplate like "email@example.com" (whose pronounced phrase might be "my email address"). Macros can have rules for automatically interacting with their surroundings; that's how you ensure, for example, that a period is snug against the preceding word, has a space after, and the next word is capitalized. Commands trigger actions, not typing; they are mostly built-in, and what commands are available depends upon what environment you're in.
Seven, They Are Seven -- ViaVoice's functionality is divided between seven main applications (and about a dozen minor ones). This sounds confusing, but the implementation isn't: "packages" (locked folders) conceal the various applications in the Finder, and they start up and shut down automatically as necessary. In the description that follows, I give approximate RAM footprints with virtual memory off, because ViaVoice is so much faster that way.
You initiate a session by opening SpeakPad (12 MB); this starts up Background Engine (3 MB, invisible) and VoiceCenter (3 MB).
VoiceCenter appears as a windoid floating over everything on your computer, and is the command center for ViaVoice as a whole. It contains some buttons and a pop-up menu, and is where you turn the microphone on and off, and initiate management of your macros, dictionary, and acoustic model, as well as bring up the correction window.
SpeakPad looks like a rudimentary word processor, but it accepts dictation and can obey a lot of vocal commands for cursor selection and movement, cutting and pasting, and so forth. Since you can also manage the correction window vocally, a dictation session, if you're patient, can be virtually hands-free. Furthermore, SpeakPad is scriptable, and ViaVoice has a cool feature similar to PlainTalk: you can expand its command set through AppleScripts, where a script is triggered when you say its name. I use this to increase ViaVoice's cohesion with other applications; for example, while writing parts of this review, I dictated into SpeakPad and then said "Transfer to Nisus" to trigger a custom script which copied the text from SpeakPad and pasted it into Nisus Writer.
Besides SpeakPad, you can dictate into Microsoft Word, Internet Explorer, Outlook Express, or AppleWorks. To invoke this feature, you start up the Direct Dictation application (1 MB, invisible), which invokes Dictation Manager (4 MB, invisible), as well as Background Engine and VoiceCenter if they aren't up already. Once VoiceCenter is floating over (let's say) Microsoft Word, you turn on the microphone and say "Begin direct dictation", and then you can speak to type into Word.
To set up your microphone volume level and test for background noise, you run Setup Assistant (9 MB), a single window consisting of a sequence of panels you navigate through arrow buttons. You also use Setup Assistant to analyze your documents or create your voice model, in each case with a different set of panels. User and voice model management is performed through ViaVoice Settings (6 MB), which presents a control panel-type window and lets you edit your macros or vocabulary, again through a different window in each case. Each of these programs quits automatically when you close its window.
I Come To Bury CSR... From installation onwards, I have found ViaVoice buggy, bizarre, or downright infuriating. On one of my computers, it wouldn't install; on the other, it would install but it crashed when I tried to create my acoustic model. So I sneakily installed it on the second computer and copied it to the first, where it runs great; there, I trained the model and copied the data back to the second. Direct Dictation also crashes on that computer (both crashes are due to the highly machine-specific way ViaVoice tries to tell your computer not to sleep during dictation); but I don't miss it, as this feature is rather dubious anyway - it's much slower than dictating into SpeakPad, and ViaVoice easily gets out of sync with what's in the document.
As you read a story to create your acoustic model, ViaVoice highlights words to show where the computer thinks you are, but sometimes it highlights the wrong word and you can't figure out what it wants from you. Preferences that you set are sometimes forgotten before you even click the OK button. Your Keyboard menu can end up set to the wrong keyboard after using Direct Dictation. Often the microphone won't come on, or ViaVoice refuses to quit. If you dictate with lots of text selected, a dialog asks if you really want to overwrite the selection; if you say yes, your dictated words appear backwards!
In SpeakPad, ViaVoice insists on controlling capitalization and spacing, and often gets them wrong. Extra spaces or other characters sometimes mysteriously appear. Saying a punctuation mark sometimes causes the preceding several words to be omitted from the typescript. Little things like double-click-and-drag to select words don't work quite right. You can't examine any of the included dictionaries, so you can't intelligently add a vocabulary item in advance: you must wait until ViaVoice errs.
ViaVoice initially involves some 80 MB of disk space, and hundreds of files whose purpose you're not told; its Temp folder then grows and grows (I'm told it gets cleaned up when it hits 250 MB). The manual is cheesy, ugly, and uninformative; the command reference sheet is inaccurate and incomplete. In short, this is a huge, rather inflexible program that takes over your computer and exhibits a poor sense of design, little understanding of Mac interface and conventions, and not much idea of the user's needs.
...And To Praise It -- And yet, unless you are utterly naive, under 12, or raised entirely on science fiction, ViaVoice in action seems nothing short of miraculous. You speak, and by golly, words appear on the screen - for the most part, the right words! Certainly the recognition engine has its limitations, but these afflict all recognition engines to date. For instance, despite its showpiece examples of correctly detected homonyms ("Write the right letter to Mr. Wright"), ViaVoice often makes mistakes that even a modicum of grammatical or syntactic knowledge would have eliminated - because it has no such knowledge: it knows some likely contexts for some words, but it doesn't know English. Also, as my father points out, the worst speech recognition problem is that when things go wrong the computer can't tell you why ("speak louder / slower," or whatever), for the simple reason that it doesn't know: the models being automatic and probabilistic, we can construct them and match against them, but cannot know how they actually work (like HAL 9000!).
For increased accuracy, some simple precautions are helpful. When you first train your acoustic model, read sufficient material, and use the same tone of voice in which you'll be dictating; I find a neutral monotone works best (like HAL 9000!). Each time you start up ViaVoice, do the audio setup; this takes only a minute. When ViaVoice errs, correct it, because that's how it learns. Finally, let ViaVoice train you: you must speak continuously but not too quickly, naturally but not sloppily, carefully but not exaggeratedly - if you force your final consonants, for example, ViaVoice will hear not a clearer consonant but an extra word. Remember, it's only a machine!
Perhaps the hardest thing for me has been learning to dictate at all. When I start talking, I usually have only the vaguest idea what I'm going to say; so I tend to choke under the pressure of improvising a constant flow of slow, clear, well-formed phrases. It's good practice, I've found, to read aloud; and one of my uses for ViaVoice has been to transcribe some old hand-written letters. However, I do often use it to compose email messages, and I did use it to draft parts of this review.
The Last Word -- Computer speech recognition is here, and although I wouldn't like to predict just how, I believe it will change everything. Perhaps certain common speech recognition homonym errors will become accepted spellings. Perhaps computer input will soon be a hybrid of mouse, keyboard, and voice. In any case, we're on the brink of a new age, and anyone who likes can step across and put a foot into it. Now - open the pod bay doors, please, HAL.
ViaVoice Enhanced requires Mac OS 9.0.4 and a Power Mac G3/300 or better; the faster the processor and the more RAM, the better - but this will improve only speed, not accuracy. It costs $130 and comes with an Andrea USB headset, but any noise-cancelling microphone will do, such as the iParrott or the Andrea PlainTalk headset that came with the previous version.
If your computer doesn't meet these requirements, you might like to try the previous version, ViaVoice Millennium. It isn't quite as good, but it works decently, requires only Mac OS 8.5.1 and at least a Power Mac G3/233, and at $75, which isn't much more than the value of the included headset, must be termed a bargain.
Article 2 of 4 in series
by Matt Neuburg
In TidBITS-544, I wrote about continuous speech recognition on the Mac using IBM's ViaVoice, which enables you to dictate sentences and have the computer type themShow full article
In TidBITS-544, I wrote about continuous speech recognition on the Mac using IBM's ViaVoice, which enables you to dictate sentences and have the computer type them. ViaVoice also does some discrete speech recognition, meaning you can say certain predefined commands to it, such as to select the next word, paste text, or turn off the microphone. But if you only want to give your computer spoken commands, you probably can, right now, for free - with Apple's own system-level discrete speech recognition feature, PlainTalk.
What Day Is It? PlainTalk's first rumblings were felt in 1990, when speech recognition labs complained of a sudden "brain drain." Apple, sparing no expense, was hiring every researcher it could find. After about a year of intensive work, Apple began demonstrating the fruits of its labors, code-named Casper, which became publicly available as PlainTalk in the AV Macs of 1993; it was then made standard in 1994 as part of System 7.1.2, with the emergence of the PowerPC-based Macs. Since then, all PowerPC Macs, and even some 68K machines, have been awaiting your spoken orders. Yet, many users are unaware of this, because speech recognition isn't present by default - you must specify it explicitly when you do a system installation. To install it, insert your Mac OS CD-ROM, launch Mac OS Install, and when you get to the Install Software screen, click the Customize button, select English Speech Recognition, and deselect everything else before continuing with the installation process.
Open Speech Help -- PlainTalk speech recognition appears as four software components. The Speech control panel must be present. The Speech Recognition extension enables any program to do speech recognition; but of itself it does nothing, so Apple also provides an interface, the Speakable Items extension that lets you open any item in your Speakable Items folder (which is in your Apple Menu Items folder) by saying the item's name.
There is also a hardware component - the microphone. Apple designed a special microphone for speech recognition, called the PlainTalk microphone, recognizable by its longer jack and unusual shape. This almost killed speech recognition on the Mac, because people didn't know how to use the microphone (and Apple, as usual, provided no instructions), so they thought it was broken. You do not speak into the "face" of the microphone; you lay the microphone on top of your monitor with the "face" upwards, and speak into the "top" of the microphone, which faces you. Some recent machines with built-in microphones don't need this external one; but iMacs do require it despite the built-in microphone, and the situation is confusing for other machines as well - if in doubt, perform an Apple Tech Info Library search on "plaintalk and microphone" and pray for clarification. Snazzy noise-cancelling speech recognition headsets work too.
With speech recognition installed, go into the Speech control panel and set up Listening options: do you want to have to hold down a key, such as Escape, all during each command, or do you want to leave recognition on constantly, perhaps prefixing your commands by some introductory expression (such as "Computer" or "Yo!")? Next, turn on Speakable Items. A "helper" floating window appears, showing that Speakable Items is running, and you can give commands; "show me what to say" is a good first command. Depending on your choice of animated icon, you'll see various images suggesting that speech recognition is sleeping, listening, obeying, or confused.
Make This Speakable -- PlainTalk doesn't need training for your voice, but before you can say anything the system must have a complete list of everything you are allowed to say; recognition consists of finding the best match from that list. In the Speakable Items interface, the list is precisely the contents of the Speakable Items folder. Unfortunately, as the list grows, PlainTalk becomes less confident and more likely to execute a mismatch or report no match at all. You should remove from the Speakable Items folder every command you're not likely to use; and you should take advantage of an important feature, new in Mac OS 9, that lets you associate a command with a specific application, by putting it in a folder with that application's name inside the Application Speakable Items folder.
What sort of thing can a command be? Basically, it's anything you can open from the Finder. If the command is an alias, it opens a file or a folder, or starts up an application. If the command is a stand-alone AppleScript, it runs the script. Many such scripts are included (don't forget to look in the cleverly concealed More Speakable Items folder), and you can of course write your own, so you can do whatever AppleScript can do. A particularly cool feature in Mac OS 9 is that speech recognition is itself scriptable, so you can write an AppleScript script that provides its own list of things the user can say, responding to each in some custom manner; to learn more, download the Scripting Speech help module.
Speak in Macro -- AppleScript, however, has its limits: it can drive only programs that are scriptable. If this falls short of your needs, consider version 5 of QuicKeys, which appeared a few months ago. I've discussed QuicKeys extensively in TidBITS, and version 5's support for speech recognition is significant. QuicKeys, as you know, is a macro program, meaning that it can type, push buttons, choose menu items, and click the mouse; now, through speech recognition, a QuicKeys action can be triggered by your voice.
QuicKeys' speech interface is simple but clever. The command phrase that triggers an action is up to you: it can be the action's name, but it needn't be. Moreover, although QuicKeys is independent of Speakable Items (because they provide two different interfaces to Speech Recognition), the two can coexist, and can be turned on and off individually; the "helper" floating window is present if either is on. As with Speakable Items, you can specify an introductory expression as a prefix to command phrases; you can thus channel your command to the correct listener. For example, in the Speech Control panel, I specified that Escape must be held down during a command, with no prefix; but in QuicKeys I specified that commands must be prefixed by "QuicKeys". Now "What time is it?" works, and "QuicKeys press Home" works too.
Turn Speakable Items Off -- Another discrete speech-recognition offering is MacSpeech's ListenDo, a Speakable Items replacement. The two are not compatible, but that's okay, because ListenDo is better; indeed, it's what Apple should have done in the first place. Speakable Items is clumsy to operate and maintain: you toggle it off and on in a control panel, view commands as items in the Finder, and edit scripts in some third place (such as Apple's Script Editor). But ListenDo provides a single centralized interface: it's an application, so recognition is on when it's running and off when it's not, and its windows let you view and organize commands and edit their scripts. Also, every item in Speakable Items is an application, so each time you perform a command, you add it to your Recent Applications list under the Apple menu, which is maddening; with ListenDo, that doesn't happen.
Furthermore, like QuicKeys, ListenDo is a macro program, with native commands for typing, pushing buttons, choosing menu items, and clicking the mouse. But ListenDo improves upon QuicKeys in two important ways. First, it's free. Second, it provides a completely dynamic interface to choosing from menus: you say a menu's name, that menu pops down and holds, you say an item in that menu, and the menu item is chosen. Where both AppleScript and ListenDo's native macro abilities fall short, you can supplement them with another scriptable macro program; for example, when I say "Close all but the front window," ListenDo tells OneClick to perform this action.
Tell Me a Joke -- With all this rich choice of options for ordering my computer about, which do I personally use on a daily basis? ListenDo is my favorite, but the real answer is none, because I find PlainTalk speech recognition technology to be flaky and undependable. It's a toss-up whether a command will be understood at all; even worse, PlainTalk has an unaccountable habit of going deaf. This happens on both my computers, so I tend to feel that the problem lies at system level, not in some extension conflict or machine-specific shortcoming (though I'd be happy to be proven wrong). And because the problem is systemic, it doesn't matter which interface I use, because they all rely on Speech Recognition, which is what isn't working. The only solution is to reinitialize PlainTalk by toggling Speakable Items, QuicKeys speech, or ListenDo off and on; and that's too much trouble
However, if you're among the many people longing for speech recognition on the Mac, and you haven't yet tried Apple's own speech recognition technology, don't turn a deaf ear to the easy availability of PlainTalk and the improvements on it offered by QuicKeys 5 and ListenDo.
Article 3 of 4 in series
by Matt Neuburg
With the release of its much-anticipated iListen dictation software, MacSpeech, Inc. has at long last fired a real salvo in its hitherto mostly verbal rivalry with IBM's ViaVoiceShow full article
With the release of its much-anticipated iListen dictation software, MacSpeech, Inc. has at long last fired a real salvo in its hitherto mostly verbal rivalry with IBM's ViaVoice. Although the two programs are outwardly similar - each initially presents a series of windows where you adjust your microphone and train your voice model by reading some stories, and is then represented by a small global floating window where you turn the microphone on and off - they are marked by radically different philosophies. ViaVoice centers around its own voice-driven word processor, SpeakPad; you can dictate into a few other applications through plug-ins or scripting, but this feature is slow and unreliable. (See "Talk Is Cheap: ViaVoice Enhanced Edition" in TidBITS-544.) iListen, on the other hand, has no word processor; you just dictate into any application. This magic is accomplished through the same macro power that characterized MacSpeech's earlier ListenDo (see "Bossing Your Mac with PlainTalk" in TidBITS-545); essentially, iListen hooks into your Macintosh at a low level and acts as a ghostly typist at an invisible keyboard.
This approach has its advantages. First, iListen comes with all the macro power of ListenDo (except for ListenDo's ability to let you call out names of menus and menu items), so in addition to typing through dictation, you can tell your Mac to start up applications, close windows, click the mouse, and so forth - and these commands are triggered through iListen's internal speech recognition engine, not PlainTalk, so they work much more reliably. Second, iListen has a lighter feel then ViaVoice. SpeakPad is a clunky program, a substandard word processor whose files are huge (because the program is recording your voice so that it can respond to your corrections by improving its voice model later) and slow to save. iListen, on the other hand, basically just types; what you're actually working in is your favorite word processor, email program, outliner, or whatever - in other words, you're in some program that you actually like. So, while ViaVoice feels like a huge application that has taken over your computer, iListen feels more like a huge system extension adding dictation functionality to your computer behind the scenes.
Since iListen can't edit your document or improve its internal models on the fly, you're always essentially dictating a first draft, in the expectation of using hands and keyboard to fix mistakes. But that's not such a terrible thing; you just chatter away carelessly, and clean up later, or even at the same time, in a sort of voice-and-hands partnership. More of a problem is that there's no access to the program's internal vocabulary; ViaVoice lets you enter a word and train its pronunciation, but iListen has no such ability, so it can't learn any expressions it doesn't already know, or even adapt to your quirks of pronunciation. (For example, I have no way to let iListen know that I say "neither" as "NYE-ther.") A spelling mode in part makes up for this, but in most cases it isn't worth using; since you'll be cleaning up manually anyway, you'll probably just let iListen's mistakes stand during the first pass.
MacSpeech has promised a future free upgrade that will include a vocabulary trainer, the capability to improve the voice model by correcting errors, and the missing speakable menus macro feature. Meanwhile, MacSpeech was probably wise to release this version now; it gave them something to sell over the holiday season and show at next week's Macworld Expo. Besides, even if you think of iListen in its present state as more of a demonstration than a finished, full-featured program, it's a great demonstration, and very definitely usable.
To be sure, iListen takes up a healthy chunk of RAM (about 60 MB), and does bog the computer down a bit, plus starting it up and switching modes can be slow; and it probably isn't without bugs - I think it reconfigures my Energy Saver settings incorrectly, for example, and it seems not to work at all in Microsoft Word on my machine. But the speech recognition engine is astoundingly nimble, easily able to match my normal pace of dictation, and quite decently accurate, especially considering that so far I've only read three of the dozen or so training stories that come with it (you're urged to do all of them). And even ViaVoice isn't perfectly accurate, after all, though my copy, now trained to a fare-thee-well, does make vastly fewer errors than iListen. Thus, you may well prefer iListen despite its missing pieces, because it's so pleasant and easy, it's available in any program, and it doubles as a voice-driven macro program. You won't have a totally hands-free experience, but you can use your voice to order your computer about and to get a first draft of your words down on virtual paper, and that might be all you really need.
iListen requires Mac OS 9, a Macintosh with a PowerPC G3 or G4 processor, and 128 MB RAM. It costs $130 but is presently $100 if downloaded from MacSpeech's Web site (a 40 MB download, which takes up about 130 MB installed); there's a $30 rebate for ViaVoice users. iListen also requires a noise-cancelling microphone (not included; about $50).
Article 4 of 4 in series
by Matt Neuburg
The goal of a continuous speech recognition program is to let you dictate what your computer should type. In December 1999, when IBM shipped the first Mac version of such a program, the sound from most users wasn't dictation but a groanShow full article
The goal of a continuous speech recognition program is to let you dictate what your computer should type. In December 1999, when IBM shipped the first Mac version of such a program, the sound from most users wasn't dictation but a groan. ViaVoice Millennium Edition was huge, ugly, clunky, sluggish, and confusing; it mangled punctuation and capitalization; its accompanying word-processor, SpeakPad, couldn't even select text reliably. The whole affair felt like a port by folks who had never seen a Mac and couldn't write even a SimpleText clone. The auspices were not good. A few months later, though, things improved with ViaVoice Enhanced Edition; it still had plenty of rough edges, but with care and patience it was definitely usable for creating first drafts and for transcribing paper documents.
When Mac OS X shipped early in 2001, ViaVoice wouldn't run as a Classic application, so users had to reboot into Mac OS 9, or hold their tongues, while hoping for a Mac OS X-native version. There was relief when IBM previewed such a version that summer, which turned to outrage when it transpired that a hefty percentage of ViaVoice for Mac OS X's $170 price tag would be charged even to users upgrading from an earlier version.
There is good news on two fronts. First, IBM has modified its position on upgrade pricing: Enhanced Edition owners can download the Mac OS X version free, or purchase a CD-ROM version for $6 (a third version includes hard copy manuals for $20). Millennium Edition owners qualify only for a $40 rebate, though, and only until 04-May-02. Second, ViaVoice for Mac OS X, which shipped just after Christmas, turns out to be a major improvement: it looks and feels Mac OS X-native, its accuracy is astounding, and it can now type into and invoke Command-key shortcuts in any application.
Better Overall -- To be sure, some of my positive response could be merely a consequence of the aesthetic and systemic changes wrought by Mac OS X itself. Naturally my computer now sports oodles of cheap RAM and hard disk space, or I wouldn't be using Mac OS X at all; and I've become accustomed to applications that are secretly folders, processes that run secretly in the background, and files secreted all over the computer. Much of what in ViaVoice seemed offensive under Mac OS 8.6 or 9.0.4 therefore seems normal in Mac OS X.
Still, there's no doubt that ViaVoice's command window is highly Aquatic, with its odd shape, its drop-down drawer, its brushed-metal 3D look, and its liquid round buttons. SpeakPad now seems almost indistinguishable from TextEdit. The various ancillary windows are generally well-behaved and consist mostly of standard widgets. In short, there's scarcely anything not to like about the interface at all. And I don't see how one can deny that ViaVoice's recognition behavior is amazingly accurate and robust. It probably helps that dictation and commands can now be distinguished, either by enabling separate modes or by prefixing a vocative (such as "Computer") to commands; but the improvement seems to go well beyond that. I now routinely dictate paragraph after paragraph without an error. When I demonstrated ViaVoice at Macworld Expo, it performed flawlessly despite background noises that included an extensive round of laughter and applause from the audience.
ViaVoice's new ability to dictate anywhere was previously the sole province of MacSpeech's iListen, which hasn't yet shipped a Mac OS X version. ViaVoice Enhanced allowed dictation into a select few programs, but there were painful difficulties coordinating typed material with subsequent voice commands for correction and editing. Now IBM has wisely abandoned this strategy - instead, correction and editing work only in SpeakPad, and all ViaVoice can do elsewhere is type. It's true that this means if you work outside SpeakPad you can't train ViaVoice through its correction feature. But its accuracy is so good that there won't likely be many mistakes anyway. Plus, ViaVoice can now type Command-key shortcuts (global or unique to a particular application, like QuicKeys), as well as run AppleScript scripts. Taken together, those two capabilities mean you can drive most applications quite effectively. For example, in my Macworld Expo demonstration, I told ViaVoice to launch Eudora, create a new message, address it to my parents, put in a subject, tab to the body area, type the body of the letter, save the letter, and quit Eudora, all without using my hands.
Quirks Remain -- ViaVoice still has some problems that need working out. Sometimes the correction window refuses to activate, or seems to leap away beneath my hand. The microphone comes on unexpectedly, such as after SpeakPad reads text aloud. ViaVoice still does odd things with spacing next to punctuation, especially when correcting. The manual is no longer "cheesy," but it still isn't informative about technical matters such as what's installed where, or how best to incorporate AppleScript. And informed vocabulary maintenance is still impossible; for example, specialized vocabularies for such topics as computers and cuisine are provided, but with no way to learn what words they include.
Nevertheless, with this revision IBM has taken ViaVoice another generation forward, from merely acceptable to downright enjoyable. Whether you intend to dictate your memoirs into SpeakPad or give your hands an occasional break from typing into Eudora, ViaVoice deserves consideration for a place in your stable of essential Mac OS X applications.
IBM ViaVoice requires a non-UFS Mac OS X 10.1 installation, with many hundreds of free megabytes on the boot partition and all the RAM you can afford. A 300 MHz G3 or higher processor is necessary; faster is better, PowerPC G4 chips are better than PowerPC G3s, and pre-August 1998 machines and upgrade cards are not supported.