On Mon, 31 Jan 2011 at 14:15:40 +0200, alex bodnaru wrote: > glong read2, written2; > GError *error = NULL; > const wchar_t *ws = L"\x0428\x04d9\x043d\x0431\x04d9"; > gchar *conv2 = g_utf16_to_utf8(ws, -1, &read2, &written2, &gerror); > gunichar *ws2 = g_utf16_to_ucs4(ws, -1, &read2, &written2, &gerror); > /*ws2 should be: {0x0428,0x04d9,0x043d,0x0431,0x04d9} > but it's not.*/
(That won't compile as written: "error" and "gerror" are not the same name.) You seem to be assuming that wchar_t* is always a UTF-16 string. This is not the case: wchar_t is typically 16 bits on Windows but 32 bits on Unix. In particular, the platform ABI used on Debian has 32-bit wchar_t. (A wchar_t* also doesn't have to be Unicode.) When compiled with -fshort-wchar, code similar to that works (but will probably be incompatible with other platform libraries). For best results, use gunichar * (or a pointer to another 32-bit type) for UCS-4, gunichar2 * (or a pointer to another 16-bit type) for UTF-16 or UCS-2, gchar * (or a pointer to another 8-bit type) for UTF-8 or legacy encodings like ISO-8859-*, and only use wchar_t (whose size and encoding are unspecified) if you must interact with platform APIs that use it. You can convert between standard encodings (UTF-16, UCS-4, etc.) and the unspecified encoding used by wchar_t* by passing "WCHAR_T" to g_iconv_open() or g_convert(). S -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org