Hi, I would like to deny the creation of an Unicode string containing characters outside the range [U+0000; U+10FFFF]. The check is already present in some places (e.g. the builtin chr() function), but not everywhere. The last important function is PyUnicode_FromWideChar, function used to decode text from the OS.
The problem is that test_locale fails on Solaris with such checks. I would like to know how to handle Solaris issues. One possible solution is to not handle issues, and just raise exceptions and skip the failing tests on Solaris ;-) Another solution is to modify locale.strxfrm() on all platforms to return a list of int, instead of a str. The type of the result is not really important, we just have to be able to compare two results (equal, greater, lesser or equal, etc.). Another solution? -- The two Solaris issues: - in the hu_HU locale, localeconv() returns U+30000020 for the thousands separator - locale.strxfrm() calls wcsxfrm() which returns characters in the range [0x1000000; 0x1FFFFFF] For localeconv(), it is the b'\xA0' byte string decoded from an encoding looking like ISO-8859-?? (b'\xA0' is not decodable from UTF-8). It looks like a bug in the decoder. It also looks like OpenIndiana doesn't use ISO-8859 locale anymore, only UTF-8 locales (which is much better!). I'm unable to reproduce the issue on my OpenIndiana VM. For wcsxfrm(), I'm not sure of the range. Example of a result: {0x1010163, 0x1010101, 0x1010103, 0x1010101, 0x1010103, 0x1010101, 0x1010101}. It looks like wcsxfrm() uses the result of strxfrm() by grouping bytes 3 by 3 and add 0x1000000 to each group. Example of strxfrm() output for the same input: {0x01, 0x01, 0x63, 0x01, 0x01, 0x01, ...}. See http://bugs.python.org/issue13441 for more information. Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com