Recent discussions on USENET have mentioned a new version of MacInTalk that’s supposedly in the works at Apple. Last year MacInTalk sparked some heated debates when Apple announced the aging speech-synthesis software would no longer be supported and could not be counted on to work with future system software or hardware releases. The new version sounds like it’s a step ahead of the old software which, while it was certainly handy, was hardly impressive as speech synthesis went.
At the Apple Australian University Consortium Conference last month, Caroline Henton of the ATG talked about the new software and gave a demonstration, which was said to be very impressive. The new MacInTalk will run on all Macs, and is purely software-based. It will support multiple voices, though the version that was demonstrated only included an American English female voice.
Technically, the software uses concatenative synthesis, which presumably means that the component sounds of natural speech are assembled, or concatenated, in the right order to generate understandable, natural-sounding utterances. This differs from two other forms of speech synthesis: formant synthesis, which generates utterances based on the characteristic sound waves of spoken sounds and combinations of sounds; and articulatory synthesis. I can’t really even guess about the latter, despite a linguistic background, except to offer the speculation of Matthew T. Russotto, who suggested that articulatory synthesis might attempt to imitate the sound properties of the human throat and mouth.
The good news is that, whatever the technology behind it, the new MacInTalk is intended to sound as natural as possible. This contrasts with a common approach that trades naturalness for intelligibility. With well-planned utterances, pure intelligibility is a little less of a concern, because the human listener can "fill in" bits and pieces of missing sound when what’s being said sounds natural enough. This is accomplished in face to face communications partially through unconscious lip-reading, though hints such as context and previous conversations help to fill in the rest, especially on the telephone or in other situations where lip-reading isn’t feasible.
MacInTalk has certainly made a difference for the Macintosh; it has allowed games to speak, but it has also allowed sightless Mac users to "hear" what’s on the screen through software such as OutSpoken. A new version that sounds like natural speech and works on new Macs will be welcome.