For years, speakers of languages that use alphabets or symbols other than those found in the set of Roman characters used in English and Western European languages have been rather put out. They can’t use their own characters to represent full domain names. While certain alphabets, like Cyrllic, are supported through a strange mapping system known as punycode for parts of a domain name, the top-level domain (TLD), like .com, still has to be entered in English. That’s about to change.
The global Internet naming and numbering authority, ICANN (Internet Corporation for Assigned Names and Numbers) launched a test on October 15th that would enable a complete domain name to be entered using characters found outside of the Roman alphabet. The test domains are all called example.test in their native languages; the .test TLD is reserved for just such purposes. ICANN has put up a wiki page at each of 11 test languages’ addresses for people to experiment with.
Punycode converts alphabetic letters and symbols that are not found in the Roman alphabet into an obscure sequence starting with xn--. The test domain in Cyrllic, for instance, renders as xn--80akhbyknj4f in punycode. (This resulted in the potential for spoofing domain names that the folks at the Shmoo Group uncovered back in 2005; see “Don’t Trust Your Eyes or URLs,” 2005-02-14.)
Putting together this test almost provoked an international crisis. The original plan was to take the word hippopotamus in each tested language and insert the digits 1 and 8 in the middle to make it nonsensical. This was derailed when, one news service reports (although I can find no confirming documents at ICANN), that an Israeli registrar found the word that ICANN suggested for that river-dwelling creature in Hebrew was actually an expletive. The less politically sensitive example.test was chosen instead.
There are still issues to be resolved, such as whether .com in every language or spelling would all map to the U.S.-controlled .com domain. But this is the first real step towards eliminating the assumption of an English-speaking, Roman-character component to every domain name.