Anders Jackson <[EMAIL PROTECTED]> writes: > > For the input part, the complexity hits whatever component it is that > > converts unicode or utf8 to a local charset like latin1 (and given the > > current level of support for utf8 in tools like emacs and TeX, I don't > > think eightbit charsets will be abandoned very soon). > > But do you need to get in ALL Unicode characters from the console in > every locale? I think Unicode for internal representation is a Good > Thing.
The complexity I'm thinking of comes fromt he fact that several latin-1 characters have *several* valid and equivalent representations in unicode, and a unicode to latin-1 converter has to treat them *all* correctly. That can be dealt with, but it's more complex than just converting scancodes to characters in the user's character set. And I feel *strongly* that doing unicode without getting normalization and equivalence issues right is a very very bad thing, worse than the current chaos of various 8-bit character sets. Incompatibilities because one program or system uses iso-8859-1 and another uses iso-8859-5 are well known, easy to understandd, and one can often tell programs what character set to use. Incompatibilities because two programs, both using unicode, require different normalization (and thus have broken unicode conformance), e.g. one program insisting that my last name is spelled with an "LATIN SMALL LETTER O WITH DIAERESIS" and another with "COMBINING DIAERESIS", are harder both to understand and work around. You will never see a program with options to configure what particular normalization it should use for "ö", and other characters with several equivalent unicode representations. > But using X for telling how to get characters from keyboard scancodes > to Unicode is compatible with using Unicode internaly. Huh? I don't understand you. My point is that it is easier to convert X keysyms to the user's choice of local character set (be that latin1 or utf8 or whatever) than to convert from unicode, because, as far as I'm aware, X keysums have a simple one to one mapping between characters and integers, without any of those equivalence rules which you have to understand and implement in order to deal with unicode properly. > Add composing later. Doing unicode sans composing characters may be a start, but it is *not* really "unicode". /Niels _______________________________________________ Bug-hurd mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/bug-hurd