On Fri, Feb 11, 2011 at 08:16:54PM -0200, Henrique de Moraes Holschuh wrote: > On Fri, 11 Feb 2011, Lars Wirzenius wrote: > > However, I'm curious: is there a lot of software that is broken with > > Unicode, particularly with the UTF-8 encoding? I can't remember anything > > much in recent times. > > 2. Anything that cannot deal with Supplementary planes. > > This includes the use of UCS-2 instead of UTF-16, as it cannot represent > the Supplementary planes. python 3 when not compiled to use UCS-4 memory > hog mode is an example, I am told.
Using UCS-2 is hardly better than using ISO-8859-1 or any other ancient charset. Using either UTF-16 or UCS-4 can be a memory hog, that's why to pick UTF-8 for regular use. Except for some rare cases (CJK with no formatting or markup), it uses less memory and can be passed as-is to POSIX file functions. Picking a random subset of Unicode is like putting day-of-the-year in one byte variable since this way you support 70% of uses and it conserves memory... -- 1KB // Microsoft corollary to Hanlon's razor: // Never attribute to stupidity what can be // adequately explained by malice. -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20110212020220.ga26...@angband.pl