At 16 Jul 2010 18:33:42 +1000, Giles Lean <giles.lean%pobox.com@localhost> wrote: Subject: Re: wide characters and i18n > > 2. the idea that the use of Unicode is sufficient excuse to > provide any of the functionality of locales This is why the problem is usually broken down into two different sections, with only occasional overlap in ideal scenarios! :-) Internationalisation (I18N) and Localisation (L10N) The first time I learned these two were better separate than together I learned a whole lot of new things and many light bulbs came on bright and bells rang clear for me. There's also multilingualisation (M17N), which in a sense is ideally a better term than localisation, since it implies implicitly performing localisation for every target locale all at once, but it seems that term only gets used in some domains, so perhaps it's best to stick to L10N. Indeed Plan 9 did not address localisation at all (and sadly the paper doesn't use that more formal term either) -- it was, after all, initially built in America for Americans, by Americans. ;-) Indeed the paper actually sates in many places that Plan 9 (at the time) did not even begin to address the issue of localisation. One might say they even punted on I18N, but as others have pointed out the paper already mentions these caveats As the paper concludes, it "at least [has] the capacity to be international." BTW, I think Plan 9's insistence that everything "textual" inside the system always be in Unicode in UTF-8 all the time is one of its key features. That means _everything_ coming into the system has to be converted before it can be used usefully by any application, or indeed to have any meaning whatsoever. This solves some of the niggles you worried about. The combination of Plan 9's universal use of Unicode in UTF-8, and its policy of requiring everything to be converted to Unicode in UTF-8 either on input, import, or at least before it can be used, makes for the firm foundations of a system upon which one can _begin_ the next task of localisation. This is where IEEE POSIX / UNIX(tm) _should_ go, IMNSHO. Get rid of all the old non-UTF-8 crap for different character sets. Ideally get rid of ANSI/ISO "wide char" crap too -- for the reasons given in the Plan 9 paper (though maybe choose 32-bits for Runes?). Then, and only then, begin thinking about how to do locales better. (Yes, I know where to find Plan 9 and how to run it! :-)) (BTW, it would be good to have a recording or transcript of Pike's when he presented the "Hello World" paper at Usenix '93. It really helped set the context and I think give more advice than the paper alone, though the paper really stands up well, and indeed tries to teach us many lessons which we still have not even come close to learning yet.) > Which still leaves open the problem of locales and issues of > multi-lingual documents and applications where a single > Unicode glyph really should be represented differently > depending upon what language it is being used for, but I did > say at the start of this too-lengthy message that the issues > get ugly. :-) -- Greg A. Woods Planix, Inc. <woods%planix.com@localhost> +1 250 762-7675 http://www.planix.com/
Attachment:
pgpyovxYUcMxC.pgp
Description: PGP signature