I don’t know how many of you have had the opportunity to view some of the extended ASCII characters on PC-clones, but they are pretty funny. You find little smiley-faces, all the suits in a deck of cards, and lots of other fun characters. Unfortunately, these cute characters are just about useless, and they aren’t that easy to use anyway. The Mac, on the other hand, has a decent extended ASCII character set with useful typographical symbols like *, [tm],(c),[tm], and a whole slew of accented letters. Even still, if you are used to a foreign language, the Mac doesn’t even begin to cut it. Even though the designers of the Mac made better choices than the designers of the PC, both machines (and indeed almost all machines) use an 8-bit character set. This means that you can have only 256 characters. Initially, I suspect that the computer designers thought that 256 characters would be more than enough, but these designers also thought 128K was a good amount of standard memory and 640K was more than anyone would ever want. So if you want to transfer files to a PC from a Mac, you have to be careful to use only the first 128 characters of the ASCII character set, because those are the only ones that will be the same. I won’t even mention EBCDIC because it irritates me too much.

Well, the companies responsible for these difficulties have banded together in what seems to be an unprecedented level of cooperation to form Unicode Inc. The Who’s Who membership list includes Microsoft, IBM, Aldus, NeXT, Apple, GO, Sun, Metaphor, Lotus, Novell, and – for a little academic interest – the Research Libraries Group. Unicode is working on a new standard for character representation that will use a 16-bit character set that will support more than 27,000 characters out of a possible 65,000. Of course all applications will have to be rewritten to support Unicode (although Apple’s Script Manager should make that process easier for Mac developers once it supports Unicode), but once all applications support Unicode the ASCII barrier will fall.

The best part of Unicode is that because all the major computer companies are involved, it has a very good chance of being implemented correctly on all major platforms, including future versions of the Mac, OS/2, and Windows. There is a competing proposal, ISO 10646, the relative merits of which are being bandied about on a BITNET discussion list called HUMANIST (and in other places, no doubt). A couple of the arguments center on floating accents, which Unicode supports and which ISO 10646 doesn’t, and whether or not ISO’s method of variable length bytes (8, 16, 24, or 32 bits) makes any sense at all. I don’t totally understand the issues on either side, since I work with only one language that can’t be represented in ASCII, but I get the gist of it all. My non-ASCII language is ancient Greek, and let me tell you, it’s an incredible pain to transcribe Greek letters into a reasonable ASCII facsimile, working primarily on sound and visual similarities. Ugh.

The languages that will benefit the most (or which we will benefit the most from, it’s not clear) from Unicode are the Oriental ideographs, which have few visual analogues and whose pronunciations are often too subtle for coding in ASCII, unlike Greek, which can be made to sound enough like English to be comprehensible. I didn’t see the Unicode guide to the Chinese, Japanese, or Korean, because they are too large to send. I do have a copy of the draft proposal, and I’m extremely impressed by Unicode’s completeness. Included are a number of mathematical operators, geometric shapes, currency symbols, a full set of punctuation marks, basic dingbats, and a bunch of languages I’d never heard of, such as Gurmukhi, Devanagari, Oriya, and Bopomofo.

