On Mon, Feb 06, 2006 at 03:01:41PM +0100, ???ek Kry?tof wrote: > I just second this. Only IMO the UCS2 (fixed two bytes per character) would > be much more appropriate to a modern UNICODE system. The variable length (2 > to 3 bytes ) UTF-8 encoding can marginally save some space (depending on > language) but introduces nasty overhead to character handling - even the most > trivial string functions have to check for character boundaries (e.g. even > detecting the string length itself is not a trivial operation in UTF-8 !!! or > having a fixed length buffer you can never tell in advance how many > characters will fit into it - it depends on the language again). > > Windows used to have mulitbyte characters in the past (Win95,98) but luckily > managed to get rid of this with Windows NT and higher and now both the kernel > and userspace is UCS2. Why should Linux again enter the blind alley of > Windows 95? > > Cheers > Krystof
Have youi looked at Unicode lately? It isn't a sizteen-bit code anymore. (Was it ever?) It doesn't fit in two bytes. If you chop it to two, you miss the vast majority of traditional Chinese characters, as well as (I believe) character sets such as Tolkien's Elvish. -- hendrik -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]