Previous Issue | Search TidBITS | TidBITS Home Page | Next Issue
Want a larger monitor? Read on for how to expand the viewable image on most CRT-based displays. Also in this issue, Matt Neuburg reviews weighs in with a review of IBM's ViaVoice speech recognition software. Major releases this week include Interarchy 3.8, an updated and renamed version of Anarchie, and Adobe GoLive 5.0, the latest version of the powerful Web design package. This week's poll: which futuristic technologies do you most want to see become reality?
Copyright 2000 TidBITS Electronic Publishing. All rights reserved.
Information: <email@example.com> Comments: <firstname.lastname@example.org>
This issue of TidBITS sponsored in part by:
READERS LIKE YOU! You can help support TidBITS via our voluntary
contribution program. Special thanks this week to Samson Tu,
Kevin Fong, and Richard Pharo for their generous support!
WinStar Northwest Nexus. Visit us at <http://www.nwnexus.com/>.
Internet business solutions throughout the Pacific Northwest.
Small Dog Electronics: G4 Bundle: $1999!
G4/400 MT 64/10 GB/DVD/56K (AGP) with EXTRA 128 MB RAM
& Apple Studio Display 15" LCD Digital (Graphite)
For Details: <http://www.smalldog.com/> -- 802/496-7171
Aladdin Systems: Tune in to over 3,400 live webcasts worldwide
with new Aladdin Tuner 3.0! Live streaming audio & video, plus
play your CDs and MP3s. Aladdin Tuner puts it all in one place!
Download now at: <http://www.aladdinsys.com/tuner/>
MAC WEB HOSTING: Custom solutions to your Macintosh Web hosting
needs. digital.forest offers high-speed Internet connections,
backups, earthquake hardened racks, 24-hour monitoring and
security, and toll-free tech support: <http://www.forest.net/>
GIGABIT SPEEDS for big file transfers. Turn your network into
an ULTIMATE high speed network with Farallon's Gigabit Switches
& a PCI card for 10/100/1000 Mbps connectivity. Free technical
support & 3 YR. WARRANTY! <http://www.farallon.com/tb/gigabit/>
FIND FONTS, TRY FONTS, BUY FONTS at MyFonts.com!
Now featuring a faster, more accurate search engine.
900 new ITC fonts now ready for direct purchase.
Click here to explore the font world: <http://www.myfonts.com/>
Anarchie Updated, Renamed to Interarchy 3.8 -- Stairways Software today released a significant update to their popular shareware FTP client Anarchie. In the process, Stairways decided to rename Anarchie to Interarchy and to use Interarchy as the company's new identity after failing to recover the anarchie.com domain from a cybersquatter. Interarchy 3.8 supports FTP listing, upload, download, and mirroring; HTTP listing, download, and mirroring; Whois, Finger, and DNS lookups; traceroutes; and TCP, ICMP, and UDP tests. Interarchy can also show the status of your network, watch all network traffic on your Mac, and display a list of all current connections. Finally, Interarchy now offers daemons (tiny servers) for Finger, Whois, TCP echo, UDP echo, Ident, Daytime, Time, and NTP (all turned on with the Safe Daemons menu item), along with a Telnet daemon that accepts and executes AppleScript scripts. They're all off by default. Most interesting, however, are Interarchy's skin-like "wands," which are totally customizable graphical interfaces to Interarchy's functionality. To give you an idea how a wand could be useful, I'm planning to make one that helps me troubleshoot Internet connectivity problems with buttons for ping and traceroute tests to my various servers. Overall, Interarchy 3.8 is a powerful and flexible collection of Internet tools that feels haphazard initially; it remains to be seen if Interarchy's wands will succeed at establishing order. Interarchy 3.8 is a 3.9 MB download and costs $50 shareware, but it's free to users of Anarchie 3.x (it picks up your existing serial number) and to registered users of the Stairways shareware programs Interarchy supersedes. [ACE]
Adobe GoLive 5 Ships -- Adobe has begun shipping Adobe GoLive 5, its flagship Web design package. The new version adds a number of features that better reflect the way Web sites are now designed and deployed: the Design feature enables fast site diagramming and prototyping; Dynamic Link makes it easy to tie information stored in a database to Web pages; 360Code ensures that existing HTML isn't reformatted by GoLive; and the WebDAV (Web Distributed Authoring and Versioning) implementation enables version control and asset management for design teams working on the same project. GoLive also includes Smart Objects that make it possible to drag Photoshop, Illustrator, or LiveMotion files onto a page and dynamically update them later without going through the process of repeatedly exporting Web-ready copies. GoLive 5 is available now as an electronic download for $299. Owners of PageMill or earlier versions of GoLive can upgrade for $99; a competitive upgrade of $149 is also offered to owners of Macromedia Dreamweaver or Microsoft FrontPage. The program requires a PowerPC-based Mac running Mac OS 8.6 or later, and at least (but preferably more than) 48 MB of available RAM. [JLC]
We'll Take the Fifth! Congratulations to TidBITS publisher Adam Engst for his fifth place ranking in the MacDirectory "Most Influential Figure in the Mac Industry" poll of nearly 200 MacDirectory readers. As in last month's MDJ Power 25 poll, Apple iCEO Steve Jobs took first place by a wide margin, but this time, Microsoft chairman Bill Gates was second, followed by Apple co-founder Steve Wozniak and the inimitable Guy Kawasaki (currently the CEO of Garage.com). [GD]
Poll Preview: (Apple) Pie in the Sky -- Matt Neuburg examines IBM's ViaVoice speech recognition software for the Macintosh below, and his review started us thinking about other futuristic aspects of system design. Attempts of varying success have been made at handwriting recognition, eliminating pesky wires between computer components, and miniaturization, and we've even seen hints of technologies like virtual reality interfaces, biometric security (like Mac OS 9's voice password), heads-up displays embedded in glasses, and even brainwave recognition (from IBVA Technologies). But the success of these proof-of-concept technologies has often been hampered not by the implementation, but by user acceptance. For this week's poll, then, help us identify the directions to explore by telling us which technologies you most want to see on current or future Macs. Register your vote on our home page, and if we've missed an important future trend in system design, tell us on TidBITS Talk at <email@example.com>. [ACE]
by Adam C. Engst and Geoff Duncan <firstname.lastname@example.org>
Last week's quiz presented various different possibilities for seeing more on your Mac's desktop. The correct answer was that all of the options enabled you to see more, although they work in different ways. Let's look at each of the answers, since although most of you probably know this information (about two thirds of the quiz respondents answered correctly), it's good to pass on to friends or relatives who are less experienced with the Mac.
Adjusting screen resolution means changing the number of pixels that define the height and width of your screen. You can see more on your desktop running at a resolution of 1024 by 768 than at a resolution of 640 by 480 - although, depending on your monitor, a lower resolution may be more comfortable for reading text or other tasks. Use the Monitors control panel or the screen resolution Control Strip module to adjust your screen resolution. You can adjust screen resolution on the fly, though items on your desktop may be rearranged if you choose a smaller size (and if you change resolutions frequently, check "Tools We Use: Desktop Resetter" in TidBITS-466 for details on a utility that remembers desktop icon positions). Many novice users don't realize they can change screen resolution and end up working at a resolution that's less than ideal for the type of work they do or for their eyesight. When we set up Macs for friends or relatives, we always show them different screen resolutions and ask which they prefer.
Adding another monitor is one of the Mac's greatest unsung features: most Macs can drive two or more monitors that combine together to form a single extended desktop. Multiple monitors are a great way to increase productivity: imagine researching some topic in a Web browser on one screen while writing in your word processor on another. Not all Macs support multiple monitors, but almost every model that can handle multiple video cards can handle multiple displays. Also, some Macs that physically support multiple monitors do so only in video mirroring mode, where both screens display the same image rather than combining to create a single larger desktop. We've written extensively about multiple monitors in the past (see our "Multiple Monitors!" article series for details and advice), and the topic has come up frequently in TidBITS Talk.
Virtual desktops are similar to multiple monitors in that they enable you to have a larger desktop size. Unlike multiple monitors, however, the extended desktop isn't displayed on a separate screen; instead, you can scroll or shift your primary screen(s) to bring additional desktop space into view. Some video cards enable virtual desktops through their driver software; other software-only system enhancements such as AWOL Software's Virtual Desktop and Pierre-Luc Paour's Virtual also offer virtual desktop capabilities, though neither has been updated recently (and Virtual's author doesn't recommend using it with Mac OS 9).
Adjusting Screen Geometry -- This was our "trick" answer (although deductive logic is still legal in most countries, so you could have figured out that all the answers were correct once you realized that at least two of the others were correct). By adjusting screen geometry, we mean twiddling with the horizontal and vertical size and position controls on your display to reduce or eliminate the black border that surrounds your computer's display. On many screens half an inch or more of black can be eliminated from the top, bottom and sides of the display, effectively increasing the display's physical area.
Increasing the image size on your monitor to eliminate the black band has the immediate advantage of increasing the size (but not the number) of pixels, essentially making the existing desktop image appear larger and easier on the eyes. That in turn might make the next higher resolution (which does increase the number of pixels) more palatable, and since a higher resolution enables you to see more information at once, it would have the effect of increasing productivity.
This trick works only on CRT-based monitors, not on LCD-based displays, which use all their available pixels all the time. The reason it works is that the electron gun that paints pixels on the screen can be adjusted to light up the phosphors in the otherwise dark band around the edges of the screen. Now, if this is such a great tip, you might ask why the black band exists at all. The answer lies in the downside to increasing the size of the screen image - the necessary twiddling involved in increasing the size, repositioning the image, and eliminating distortion (where the edges aren't parallel, or where they bend inward or outward) is likely to deform the image slightly. Will you notice? Perhaps, but unless you're a graphics user who cares about the precision of an image's dimensions, you're unlikely to care. Personally, I feel the benefit of the increased image size well outweighs the disadvantage of the slightly distorted pixel dimensions.
Making the changes takes a few minutes of trial and error, potentially preceded by some consultation with your monitor's manual. (For those with iMacs, the controls are all software-based and accessible through the Monitors control panel.) The important controls are generally labeled (with abbreviations being common) Horizontal Size and Vertical Size (or sometimes Zoom), along with Horizontal Position and Vertical Position. First, increase the horizontal size control to fill as much of your screen as you can. However, the image often isn't exactly centered to begin with, so you generally need to fiddle with the horizontal position as well. Then repeat the process with the vertical size and position controls. After you've changed the size and position of the screen image, look at the edges of the screen. If they're concave or convex, or not appropriately parallel, use the geometry controls (which can have a variety of names) to straighten the edges, rotate the entire image, and help make the various edges parallel.
Your monitor should remember the new settings, although I've seen them drift over time, so if you ever notice something that's not quite right, tweak the screen geometry controls to bring it back to just the way you like it. And I encourage you to take this information - both the instructions on eliminating the unused black band and the bits about screen resolution, multiple monitors, and virtual desktops - and pass it on to less experienced users who can benefit from either a slightly larger screen image or the increased resolution it makes palatable.
by Matt Neuburg <email@example.com>
[Note: I am indebted for technical assistance to my father, Ned Neuburg, who was on the ARPA steering committee in the 1970s; and to Erik Sea, IBM's Development Lead for ViaVoice/Mac, for answering some key queries.]
Classic science fiction, by and large, has proven both myopic and optimistic when it comes to computers. Increased brain power was an obvious prediction, but few foresaw that computers would also become small, cheap, and ubiquitous, with all the tremendous attendant sociological implications. On the other hand, by all accounts we should long ago have been talking to our computers. Where is HAL 9000? The QWERTY keyboard is a clumsy dinosaur; of course you'd eventually like your computer to read your thoughts, but in the meantime, why can't you just tell it what to do? Well, to a large extent, you can; you wouldn't want to hand over control of a mission-critical task to a voice-driven computer just yet, but your computer need no longer be as deaf as a post either.
Wreck a Nice Beach -- You've probably heard of ARPA, the advanced research wing of the U.S. Department of Defense during the Cold War; you're certainly familiar with one of its creations, the Internet. Another ARPA project was to have computers know what people were saying - called "speech recognition". (I once proposed the term "autoglossomerolysis," but somehow it didn't catch on.) In the early 1970s, ARPA threw massive amounts of funding at the problem.
The major obstacle was the acoustic model, which may be imagined as phonemic analysis. How can the computer work out whether a vowel is "ah" or "ee", whether a consonant is "p" or "t", or even where the phoneme boundaries are? Most researchers expected that computers would find the features of speech, corresponding to how the mouth produced the sounds: "this is a voiced guttural stop, that is a rounded front vowel". What the ARPA-funded research demonstrated, though, was that you could make more significant practical progress by doing something much more crude. First, characterize the raw sound by a minimal set of numbers; then, match those numbers against a template - e.g., this sound is a "p" because numerically it looks like a prerecorded "p".
The trick here lies in the notion "looks like." James Baker, then a graduate student at Carnegie-Mellon University, applied to speech recognition pattern-matching a probabilistic mathematical device called a "hidden Markov model" (HMM). The results proved so superior in that first ARPA funding round that all modern speech recognition uses HMM - a fact which is astounding for two reasons. First, HMM is fundamentally not only crude but almost certainly wrong - however our ears and brains hear and analyze speech, HMM is surely not it. Second, it's amazing that we've been doing speech recognition the same way for so long. To be sure, modern HMM is vastly more sophisticated than in those days; and one should not underestimate the importance of software optimization, a direction pioneered, again, by James Baker, who went on to found Dragon Systems. But the really important development has been in hardware. Computers are now about a thousand times faster and a thousand times larger in resources, and a thousand times smaller in size and cost, than in those early days, so they have at last begun to meet speech recognition's mathematical demands.
In the early 1990s, Apple created its own system-level speech recognition component, PlainTalk. But PlainTalk's genius lies in its compromises: it doesn't need training for a particular user, but it does only discrete speech recognition, matching a short phrase to a finite list of predefined possibilities. The holy grail is continuous speech recognition (CSR) - basically, you talk and the computer types. And CSR is definitely here, thanks to IBM's ViaVoice Enhanced Edition.
Hail CSR -- HAL 9000 notwithstanding, the obstacles to continuous speech recognition are severe, as the history of IBM's research illustrates. They started as early as the 1950s and were among the recipients of ARPA's early funding; yet only within the last five years has IBM marketed consumer-level dictation software. Just consider: The acoustic model must find your phonemes despite the way sounds are disguised by word boundaries and sentence stress. Yet unlike discrete speech recognition, your "command" is never clearly over, so the acoustic model must also be extremely fast, to keep up with you. Plus, it isn't the only model involved: there must be a linguistic model to group your phonemes into words, matched not from some tiny list but from a possible vocabulary of tens of thousands of words.
Thus, to be at all practical, present-day continuous speech recognition requires that the acoustic model be trained for the particular speaker's voice quality and pronunciation and the characteristics of the microphone and the environment. ViaVoice handles this by having you read certain stories that it presents to you when the program first starts up. (You can repeat this procedure later to refine your model, and ViaVoice maintains multiple models so it can be used by different people, or by the same person in different surroundings.) The linguistic model, meanwhile, requires a dictionary: ViaVoice includes a default dictionary, and presumably calculates initial pronunciations based on your acoustic model; it also includes five specialty dictionaries, such as cooking or finance, of which you can turn on one at a time.
Even so, ViaVoice clearly cannot know every word you'll say or every quirk of your pronunciation, so it provides three features for expanding and refining the models:
You can add to your vocabulary directly through a dialog where you type a word and record a pronunciation for it.
You can have ViaVoice scour a text document for unknown words; it asks you which of these you're likely to use and prompts you to record pronunciations.
In the course of dictating, as you correct ViaVoice's mistakes, it learns. In particular, this happens when you select a word and dictate it again, and when you use the Correction Window, which lists alternatives to the selected problem word. Also, when you save, you are again prompted for pronunciation of unknown words.
ViaVoice also extends your vocabulary through macros and commands. Macros are expressions typed differently from their pronunciation, such as punctuation ("comma" and "period") and boilerplate like "firstname.lastname@example.org" (whose pronounced phrase might be "my email address"). Macros can have rules for automatically interacting with their surroundings; that's how you ensure, for example, that a period is snug against the preceding word, has a space after, and the next word is capitalized. Commands trigger actions, not typing; they are mostly built-in, and what commands are available depends upon what environment you're in.
Seven, They Are Seven -- ViaVoice's functionality is divided between seven main applications (and about a dozen minor ones). This sounds confusing, but the implementation isn't: "packages" (locked folders) conceal the various applications in the Finder, and they start up and shut down automatically as necessary. In the description that follows, I give approximate RAM footprints with virtual memory off, because ViaVoice is so much faster that way.
You initiate a session by opening SpeakPad (12 MB); this starts up Background Engine (3 MB, invisible) and VoiceCenter (3 MB).
VoiceCenter appears as a windoid floating over everything on your computer, and is the command center for ViaVoice as a whole. It contains some buttons and a pop-up menu, and is where you turn the microphone on and off, and initiate management of your macros, dictionary, and acoustic model, as well as bring up the correction window.
SpeakPad looks like a rudimentary word processor, but it accepts dictation and can obey a lot of vocal commands for cursor selection and movement, cutting and pasting, and so forth. Since you can also manage the correction window vocally, a dictation session, if you're patient, can be virtually hands-free. Furthermore, SpeakPad is scriptable, and ViaVoice has a cool feature similar to PlainTalk: you can expand its command set through AppleScripts, where a script is triggered when you say its name. I use this to increase ViaVoice's cohesion with other applications; for example, while writing parts of this review, I dictated into SpeakPad and then said "Transfer to Nisus" to trigger a custom script which copied the text from SpeakPad and pasted it into Nisus Writer.
Besides SpeakPad, you can dictate into Microsoft Word, Internet Explorer, Outlook Express, or AppleWorks. To invoke this feature, you start up the Direct Dictation application (1 MB, invisible), which invokes Dictation Manager (4 MB, invisible), as well as Background Engine and VoiceCenter if they aren't up already. Once VoiceCenter is floating over (let's say) Microsoft Word, you turn on the microphone and say "Begin direct dictation", and then you can speak to type into Word.
To set up your microphone volume level and test for background noise, you run Setup Assistant (9 MB), a single window consisting of a sequence of panels you navigate through arrow buttons. You also use Setup Assistant to analyze your documents or create your voice model, in each case with a different set of panels. User and voice model management is performed through ViaVoice Settings (6 MB), which presents a control panel-type window and lets you edit your macros or vocabulary, again through a different window in each case. Each of these programs quits automatically when you close its window.
I Come To Bury CSR... From installation onwards, I have found ViaVoice buggy, bizarre, or downright infuriating. On one of my computers, it wouldn't install; on the other, it would install but it crashed when I tried to create my acoustic model. So I sneakily installed it on the second computer and copied it to the first, where it runs great; there, I trained the model and copied the data back to the second. Direct Dictation also crashes on that computer (both crashes are due to the highly machine-specific way ViaVoice tries to tell your computer not to sleep during dictation); but I don't miss it, as this feature is rather dubious anyway - it's much slower than dictating into SpeakPad, and ViaVoice easily gets out of sync with what's in the document.
As you read a story to create your acoustic model, ViaVoice highlights words to show where the computer thinks you are, but sometimes it highlights the wrong word and you can't figure out what it wants from you. Preferences that you set are sometimes forgotten before you even click the OK button. Your Keyboard menu can end up set to the wrong keyboard after using Direct Dictation. Often the microphone won't come on, or ViaVoice refuses to quit. If you dictate with lots of text selected, a dialog asks if you really want to overwrite the selection; if you say yes, your dictated words appear backwards!
In SpeakPad, ViaVoice insists on controlling capitalization and spacing, and often gets them wrong. Extra spaces or other characters sometimes mysteriously appear. Saying a punctuation mark sometimes causes the preceding several words to be omitted from the typescript. Little things like double-click-and-drag to select words don't work quite right. You can't examine any of the included dictionaries, so you can't intelligently add a vocabulary item in advance: you must wait until ViaVoice errs.
ViaVoice initially involves some 80 MB of disk space, and hundreds of files whose purpose you're not told; its Temp folder then grows and grows (I'm told it gets cleaned up when it hits 250 MB). The manual is cheesy, ugly, and uninformative; the command reference sheet is inaccurate and incomplete. In short, this is a huge, rather inflexible program that takes over your computer and exhibits a poor sense of design, little understanding of Mac interface and conventions, and not much idea of the user's needs.
...And To Praise It -- And yet, unless you are utterly naive, under 12, or raised entirely on science fiction, ViaVoice in action seems nothing short of miraculous. You speak, and by golly, words appear on the screen - for the most part, the right words! Certainly the recognition engine has its limitations, but these afflict all recognition engines to date. For instance, despite its showpiece examples of correctly detected homonyms ("Write the right letter to Mr. Wright"), ViaVoice often makes mistakes that even a modicum of grammatical or syntactic knowledge would have eliminated - because it has no such knowledge: it knows some likely contexts for some words, but it doesn't know English. Also, as my father points out, the worst speech recognition problem is that when things go wrong the computer can't tell you why ("speak louder / slower," or whatever), for the simple reason that it doesn't know: the models being automatic and probabilistic, we can construct them and match against them, but cannot know how they actually work (like HAL 9000!).
For increased accuracy, some simple precautions are helpful. When you first train your acoustic model, read sufficient material, and use the same tone of voice in which you'll be dictating; I find a neutral monotone works best (like HAL 9000!). Each time you start up ViaVoice, do the audio setup; this takes only a minute. When ViaVoice errs, correct it, because that's how it learns. Finally, let ViaVoice train you: you must speak continuously but not too quickly, naturally but not sloppily, carefully but not exaggeratedly - if you force your final consonants, for example, ViaVoice will hear not a clearer consonant but an extra word. Remember, it's only a machine!
Perhaps the hardest thing for me has been learning to dictate at all. When I start talking, I usually have only the vaguest idea what I'm going to say; so I tend to choke under the pressure of improvising a constant flow of slow, clear, well-formed phrases. It's good practice, I've found, to read aloud; and one of my uses for ViaVoice has been to transcribe some old hand-written letters. However, I do often use it to compose email messages, and I did use it to draft parts of this review.
The Last Word -- Computer speech recognition is here, and although I wouldn't like to predict just how, I believe it will change everything. Perhaps certain common speech recognition homonym errors will become accepted spellings. Perhaps computer input will soon be a hybrid of mouse, keyboard, and voice. In any case, we're on the brink of a new age, and anyone who likes can step across and put a foot into it. Now - open the pod bay doors, please, HAL.
ViaVoice Enhanced requires Mac OS 9.0.4 and a Power Mac G3/300 or better; the faster the processor and the more RAM, the better - but this will improve only speed, not accuracy. It costs $130 and comes with an Andrea USB headset, but any noise-cancelling microphone will do, such as the iParrott or the Andrea PlainTalk headset that came with the previous version.
If your computer doesn't meet these requirements, you might like to try the previous version, ViaVoice Millennium. It isn't quite as good, but it works decently, requires only Mac OS 8.5.1 and at least a Power Mac G3/233, and at $75, which isn't much more than the value of the included headset, must be termed a bargain.
Non-profit, non-commercial publications and Web sites may reprint or link to articles if full credit is given. Others please contact us. We do not guarantee accuracy of articles. Caveat lector. Publication, product, and company names may be registered trademarks of their companies. TidBITS ISSN 1090-7017.
Previous Issue | Search TidBITS | TidBITS Home Page | Next Issue