Piercing the Babel: Online Translation for the Masses

The Internet is global in reach, but most Web sites are in English. This is changing as other countries adopt the Internet with the same fervor as English-speaking nations, but it will take time to catch up. Unfortunately for some, fortunately for others, this means that you must understand English to get the most out of the Web. Or do you?

As businesses, particularly U.S. businesses, start working on a global scale, they're confronted with the daunting task of translating and localizing products and documentation. One of the biggest handicaps for Americans is their lack of familiarity with foreign languages - not only do most Americans study only two or three years of a foreign language in school, but they seldom need to use that language, since the U.S. tends to be quite insular with regard to international issues and communication.

Too few businesses are making the necessary efforts to communicate globally. It is certainly much easier for English-speaking businesses to work in English than to translate texts into the many languages of their customers, but it's worth recalling the old adage that you buy in your language, and sell in your customer's language.

(I should note that TidBITS is an exception to this generality, since teams of dedicated volunteers have long translated TidBITS into a variety of languages, currently including Dutch, French, German, Japanese, Russian, and Spanish. Additional volunteers and translations are always welcome.)


Parlez-vous English? Many programs and Web sites now enable you to translate documents and Web pages from a foreign language into English, from English to another language, or between other language pairs. One such well-known service, AltaVista's Babel Fish, offers 12 language pairs, and others offer even more.


Yet, what are these translations really worth? Can they replace human translators? Can they be used for professional purposes? They claim to "translate anything," yet the real problem lies in the definition of the word "translate."

I work as a freelance translator, so you may think that my goal here is to denigrate these machine translation services so you would be more interested in paying for human translation services instead. But, no, it is not that simple. I will try to be as balanced and realistic as possible, so you can truly understand both the process of translation and the value of translation Web sites and software.

Machine translation ranks high among the holy grails of computing, starting in the early days of computing back in the 1950s. Predictions of efficient and accurate machine translation programs have been commonplace since. Yet, in spite of increased computing power, new natural language processing algorithms, and decades of experience, it's clear that we're still far from attaining this goal. (A recent feature article in Wired examines this question extensively.)

Why is this the case? Translation is a complex process that goes far beyond merely replacing one word with another. In short, translation is about changing texts from one mindset to another. The best translation programs (not the ones sold to consumers, but those used by international organizations such as the European Union) have huge databases and complex algorithms that work by examining phrases before words to ensure accurate collocations (words that go together, such as "jumbo shrimp" as opposed to "big shrimp"). These programs work best on very limited vocabularies, and, in some cases, they can be quite effective. I have seen the output from the European Union's translation software, and although it is quite good, it requires both a limited, controlled vocabulary and a translator to edit the output.

Although it is relatively easy to parse a sentence and find "standard" structures, computer programmers have tried desperately to account for the many exceptions in language, and the many multiple meanings of words that native speakers of a language resolve instantaneously. Unfortunately, no language (except perhaps an artificial language such as Esperanto) can be easily described, and every language's vocabulary is full of tricky words. Human languages are not designed to be structurally consistent, like computer languages or even markup languages like XML, and so many influences come into play during a language's evolution that the complexity becomes insurmountable.

In addition, the results depend greatly on the type of text you want to translate. Take three simple sentences:

  1. Apple introduced its new Power Mac G4 minitower computers, complete with built-in gigabit Ethernet.

    In this first sentence, you have a self-contained thought, in a relatively simple structure:

    [subject/noun] [verb] [object/noun phrase]

    While the noun phrase serving as the object is perhaps a bit difficult to parse, it's not impossible. What is important is that all the information is included in the sentence.

  2. The group giving away the free tanks only stays alive because it is staffed by volunteers, who are lined up at the edge of the street with bullhorns, trying to draw customers' attention to this incredible situation.
    (Neal Stephenson, "In the Beginning was the Command Line")

    Sentence 2, however, is much more complex (I won't bother to try and map out its structure). It is longer, it contains several clauses, and - making it especially difficult to translate - it contains intertextual elements that refer back to previous sentences. What are the tanks? Who are the volunteers? Why do they have bullhorns?

    Translation software can only translate the words it knows, in whole sentences, and cannot look at the fuller context of any bit of language. Now consider an even worse example.

  3. The Deliverator belongs to an elite order, a hallowed category. He's got esprit up to here.
    (Neal Stephenson, Snow Crash)

    This third sentence raises the bar even higher. Here's an invented word, "Deliverator," and what is a "hallowed category?" Plus, what could this expression "He's got esprit up to here" possibly mean? Translation software usually just botches vernacular language. Slang and creative language are beyond the purview of such programs, and they spit out a mess of unrelated words when translating this type of text.

The net result is that translation software can work well with simple, technical texts. Even AltaVista's Babel Fish points out in its FAQ that, "Machine translation produces reasonable results in many cases. But you should not rely on it." It also stresses that it allows you to "grasp the general intent of the original, not to produce a polished translation."

Back and Forth -- Many people have written about translation Web sites, and most have used a method of translating a text and then back-translating it to compare with the original. This technique produces highly humorous results, but it's not really useful.

Let me give you an example of why this is so: when you translate from one language to another, you start by translating the words, but, if a cultural concept is more important than the words used to express it, you need to change the words. For instance, look at proverbs, since they are more culturally charged than most other texts. While I should avoid counting my chickens before they hatch in English, I shouldn't sell a bear's skin before killing it in French (vendre la peau de l'ours avant de l'avoir tue). This is an extreme example, but you can see the effect cultural baggage has on a translation. So back-translating proverbs like these may lead to a good guffaw but gives you no real sense of the translation program's value.

The Brighter Side -- This is not to imply that machine translation is worthless - nothing could be further from the truth. It's perfect when you find a potentially useful Web page in a language you don't understand. Or, what if you receive a message in another language and want to find out if it was really meant for you or if it's just foreign spam? No one would hire a professional translator for such informal needs, and these are ideal uses for Web-based translation services.

But many companies and public institutions are going well beyond these minimal needs. An initial machine translation pass can save a great deal of time if the terminology database used is specific enough to the application, and if the original text is well-written. Most importantly, though, these organizations realize that for machine translated documents to convey the information contained in the original as completely as possible, they must be post-edited by human translators.

As a result, the machine translation industry is thriving, as major corporations invest in large-scale machine translation solutions. R&D expenditures are rising fast, and the number of companies and research centers working on the subject is impressive. Linguists, long considered only slightly more employable than poets, can now pursue interesting career tracks.

The Other Side -- Unfortunately, while high-quality machine translation can work well in informal situations and for very specific uses, the advertising for machine translation software and sites is misleading many people into believing that machine translation can replace a human translator. These consumer-oriented programs can perform a find-and-replace for certain words and phrases, and they can spit out a text with some similarity to the original. But that's not really translation, it's a glorified find/replace feature, and in many ways these consumer-oriented translation programs are diminishing the realization that good translation is an incredibly complex task.

Machine translation cannot provide a well-written text, nor can it truly provide a translation that takes into account the cultural aspects of a text. Machine translation also can not do anything of any value with literary texts, where metaphor and style are essential. Only human translators can do these things, despite what many people are being led to believe.

So while the translation profession's importance is growing in leaps and bounds in this expanding global economy, the profession is simultaneously facing increasing questions as to its value. Personally, I have no fear of losing work, as more and more customers become aware that not only do they need translations, but they need high-quality translations by sensitive and experienced translators. But for many translators, particularly in certain languages and specialties, these programs are having a negative effect, at least in my opinion; for another point of view, see this article in Translation Journal.


This trend is by no means limited to translators. Many professionals who work primarily with words, including technical writers, editors, indexers, and librarians, are fighting to keep their professions from being demeaned by the periodic promises that increased CPU power will enable a computer to work with words as flexibly and fluently as an experienced and highly trained person. It's just not true, and believing that it is impoverishes us by undervaluing language itself, perhaps the highest achievement of our species.

[Kirk McElhearn is a freelance translator and technical writer living in a village in the French Alps.]


