Once the novelty of asking Siri, Apple’s “Intelligent Personal Assistant,” to open the pod bay doors, or to beam one up, has worn off and you’ve learned what magic phrasings to use, Siri starts to show herself — or, for users in some countries, himself — off as a rather intelligent assistant indeed.
But since Siri is billed as being personal, as well as intelligent, Apple has granted Siri the capability to listen, and to speak, in a number of languages — the original iPhone 4S with iOS 5 spoke English, French, and German, with Japanese, Korean, Spanish, Italian, and both Mandarin and Cantonese Chinese showing up in Siri’s repertoire at various dates afterwards.
But let’s be honest here — it’s almost entirely meaningless to speak of “English” as a single linguistic construct: just ask any British speaker of the language what we make of Americans talking of patting someone’s fanny. Realising that there is an entire world of variety within English (George Bernard Shaw only slightly exaggerated when he had Professor Higgins claim, in “Pygmalion,” to be able to locate an accent in London to within two streets; certainly he was right that, in England at least, we notice minute regional variations, and we care), Apple originally programmed Siri with the capability to work in three different dialects — American, Australian, and the tautological British English; Canadian English
followed, almost as an afterthought, but sounds the same as the American English voice.
The British English version is one of only three versions of Siri that speaks with a man’s voice, the other two being French and Swiss French; infer what you will as to why Apple chose to flip genders like this. Siri’s British accent, incidentally, would appear to be a generic southern-English variety, with the “a” sound in “ask” and “answer” lengthened.
It is worth noting that iOS offers no way of selecting different input and output voices — if you want to speak to Siri in American English, for example, then Siri will reply to you in the same voice, no matter how much you might prefer an Australian female voice or a British male accent.
(Apple has strongly resisted referring to Siri’s gender, but in the real world, where Siri speaks with a clearly female or male voice, it is nearly impossible for people not to anthropomorphize Siri as female or male. I’m not going to fight it.)
Other languages, too, have been modified and regionalised — German, for example, has been divided into German (Germany), which sounds rather like Hochdeutsch, and German (Switzerland), or Schweizerdeutsch. Why these two dialects have been favoured over, say, Austrian German is as arbitrary a decision as the omission, currently, of Irish or New Zealand English. Similarly, Spanish is available in Mexican, Spanish, and U.S. varieties; French is separated into variants for France, Canada, and Switzerland; and Mandarin into flavours labelled “China” and “Taiwan.”
So how much do these fine-grained divisions between regional dialects of English actually matter? I come from Salford, in the north of England, near Manchester — for “Downton Abbey” fans reading this, think “below stairs” and you’ll have a decent handle on my accent. Alternatively, just listen to any of the recordings I’ve made of my TidBITS articles over the last year. Totally and lucidly comprehensible, I trust you’ll agree, but does Siri? Specifically, does American Siri? How about Australian Siri?
There are plenty of differences between American and British usages of English. I did, when an innocent college student in Pennsylvania in my 20s, make the mistake of asking the young lady next to me in class for a rubber when I made a mistake, and — and I swear that this is true — I did, just once, suggest that I should come and knock another young lady up on the way to class one morning. (That latter young lady is now my wife, incidentally, so perhaps it wasn’t such a gaffe.)
But while it is endlessly entertaining to trot out these classic misunderstandings, the reality is that genuine problems and substantial differences are rare enough that, well, they make for great stories. In daily life, it’s pronunciations of words that really makes a difference. The obligatory “lieutenant” aside — we all know it’s correctly pronounced leftenant, so there’s no point in flogging that one any further here, and, since I don’t actually know lieutenants, I rarely have the need to ask Siri to call one for me — there are differences enough in accents across the Atlantic that a one-size-fits-all Siri might well struggle.
A test, then, was in order. The simple sentence “Schedule some water and butter for quarter to four on the third of February” is quite the minefield. “Schedule,” correctly pronounced “shejule,” is more commonly pronounced “sked-jule” in the United States. “Butter” and “water” both end in an “r” which is emphasized as “arrr” in American English, whereas Brits don’t pronounce that “r” basically at all. The “t” phoneme in these words is also different — the British “waw-teh” becomes the American “wodderr.” In my experience of living among Americans, “quarter to four” is an unusual way of giving a time — “three forty-five” would be more common. And, again, there are the “r”
and “t” modifications, even if an American were to use the British style. Finally, the second month is pronounced as something resembling “Febry” in Britain, while Americans tend to enunciate each syllable distinctly, as in “Febroo-airy.” A somewhat contrived sentence, then, but one that contains commands (“schedule,” however you pronounce it), dates, and times — a decent challenge for an allegedly intelligent and linguistically aware personal assistant.
The testing subjects would be me — the control, obviously, speaking the Queen’s English — my wife, Deborah, who grew up in the United States and speaks with a standard midwestern American accent (I love her anyway) and our teenage daughter, who lived in Florida for the first twelve years of her life, has spent the last three in New Zealand, and, to my endless chagrin, has yet to shed her American accent.
I went first. British Siri heard and understood correctly first time. I was impressed. I tried a second time, and, again, communication took place quite nicely. By way of a test, I tried pronouncing “schedule” the American way; he still understood. I switched to Siri’s Australian voice, and started to encounter a few difficulties — “water and butter” was routinely understood as “water bottle.” This, I suppose, is fair enough, given that Australian English has a tendency to turn a “t” found between two vowels, such as the middle consonant sound of “water” or “butter,” into a “d,” as Americans are wont to do.
Then I tried American Siri, and matters became quite bizarre — so much so that I had to take screenshots to be sure I would be able to reproduce Siri’s attempts to understand me. “Sejal Sewalt Rubalcava for 3:45 on 3 February” was her first guess; next came “Sejal someone Trembleton for 3:45 on 3 February.” “Show George Washington for 3:45 on 3 February” was also nonsense, but less so than “Children’s Wincherm both of frequentatives Solecita February.” At this point, Siri had quite clearly given up and was tossing out random words she’d heard other people saying. Certainly, what she was hearing bore little or no resemblance to anything I was saying.
Next came Deborah, with her American voice. She set her iPhone’s Siri to be American, and issued the same instruction — no problem. But when her Siri crossed the Atlantic and became British, suddenly he thought she was saying “Schedule some minor in Barrowford kind of fire and the third of anyway.” Australian Siri did much better — “Schedule someone and butter for 3:45 on 3 February.” Our American-raised daughter was, similarly, understood first time by American Siri, but British Siri heard the same nonsense about Barrowford and a minor, and Australian Siri thought she had heard “Schedule someone Bonofiglio define Vodafone.”
The first word in this sequence, “schedule,” was clearly a problem. It’s a sufficiently high-frequency word within the Siri context that it’s imperative that Siri be able to understand both standard pronunciations, and yet, certainly within American English, it can only handle the American pronunciation, while the British and Australian settings seem able to handle both.
While Apple doesn’t give much away about how Siri parses sentences, it seems fairly clear that the linguistic analysis that goes on when an iPhone sends an audio sample to Apple’s servers for processing revolves around a primarily functional, rather than strictly grammatical, system. In our test sentence, the first word encountered is “Schedule,” which flags up to Siri that we want to add a calendar item; what comes after that word will then include the event, and will likely include some combination of date, time and location. “Tell” and “ask” both trigger text messages, while “call” and synonyms involve phone calls, and in each case will be followed by either a name to be found in Contacts or a phone number.
The problem, then, seems to be that if that first word — the trigger word, so to speak — is misunderstood, then the entire function of the command is likely to be misunderstood. But when I returned to safer, more familiar ground — a time, say, or a date, which is more constrained by certain formats — then Siri tended to fare better.
I was interested to note that, even though all three of us used the format “quarter to four” in our utterances, in each case that Siri understood this, rendering the time onscreen as 3:45, which leads us to another interesting finding.
When Apple first taught Siri Japanese, I decided I had to try it out. I told my iPhone 妻に電話して下さい (Tsuma ni denwa shite kudasai — “call my wife”). Deborah is in my Contacts list, of course, and she’s listed as my “wife” — in English, not in Japanese. But when I told Siri, in Japanese, to call tsuma, not “my wife,” Siri called Deborah. There is, clearly, some rather deft background processing going on here — the meaning of a word or phrase, if we can speak of such when dealing with computers and programming, works on a level at which a flag such as “wife” can be triggered by a number of different tokens — even in different languages.
Siri’s language processing would appear to operate independently of its input and output language preferences, and so I do find myself wondering why we have to keep Siri’s input and output in lockstep. I clearly have much more success when Siri listens to me in British English, but I don’t much care for the British English voice (and I wonder why it’s the only male English voice). Given that output and responses are, seemingly, abstractions of a lower-level processing system, why not allow me to set my iPhone’s voice to, say, Australian? (Alas, there’s no New Zealand voice yet; our iPhones come set to British by default.)
Apart from personal preference — I might prefer a female voice, if nothing else — all synthesized voices are going to sound slightly off, and will annoy us in exactly the way our children do when they consistently mispronounce some word they’ve read but not heard, with the added frustration of not being able to correct the synthesized voice. I’ve noticed, however, that when one hears a mispronunciation in a foreign accent, it’s far less off-putting, and sometimes even charming. So if Siri let me set British English as my input language and Australian English as my output language, I could much more easily forgive her pronunciation missteps. Indeed, why not enable any of the myriad voices that have long inhabited a Mac?
Of course, this might lead to some iPhones sounding like Zarvox, so maybe it’s not the best idea.