On Wed, 12 May 2010 18:29:58 +0100 Robert Pearce <r...@bdt-home.demon.co.uk> wrote:
> Because Linux is natively UTF-8 and therefore handles UTF-8 strings. > MinGW sits on top of Windows, which is UCS-16 - that is, any unicode > string must use wide characters throughout, so any "normal" string has > to be translated. The default behaviour of Windows is to assume that > such traditional strings are CP1252 or some such, and therefore make > the wrong translation to UCS when presented with UTF-8. This thread has been full of incorrect information, so please read these before discussing anything: http://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings http://en.wikipedia.org/wiki/UTF-16 http://en.wikipedia.org/wiki/UTF-8 Windows has been using UTF-16 since win2k (it was using UCS-2 before that). UTF-16 is compatible with UCS-2 (that is, it can hold any Unicode code point that can be expressed via UCS-2), but the reverse is not true. UTF-8 is not compatible with anything except ASCII (IIRC). Any Unicode code point can be expressed in UTF-8 and UTF-16, but not UCS-2. Environments which use UTF-8 (variable-width encoding, 1, 2, 3 or 4 bytes per code point): GTK+, most of the web, XML (by default), text-mode Linux (usually). Environments which use UTF-16 (variable-width encoding, 2 or 4 bytes per code point): Newer Java, .NET, Qt, Windows (>= 2k). Environments which use UCS-2 (fixed-width encoding, 2 bytes per code point): Older Java, Windows (< 2k). There's also UTF-32 (sames as UCS-4), a fixed-width 4-byte encoding, but I'm not sure if anyone uses it. Personally, I like UTF-8 most, since every ASCII file can be read as UTF-8, and every UTF-8 string can be stored in plain std::string (or 0-terminated char*). Cheers, Alexander _______________________________________________ gtkmm-list mailing list gtkmm-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtkmm-list