On Jan 29 10:21, Eric Blake wrote: > On 01/29/2011 09:01 AM, Corinna Vinschen wrote: > >> So, using UTF-16 surrogate encodings for characters outside the basic > >> plane violates POSIX, but it's the best we can do for those characters. > > > > Right, and we discussed this already on this list. Or the developer > > list, I don't remember. Maybe we should have stick to the base plane > > and only use UCS-2 to be more POSIX compatible. > > The burden is on the application, not on cygwin. If the application > wants POSIX behavior, then they obey __STDC_ISO_10646__ and use ONLY > characters from the basic plane (no surrogates), at which point their > use of wchar_t fits the POSIX definition (one wchar_t per character). > The moment they pass a surrogate, they are no longer honoring the > restriction documented by __STDC_ISO_10646__ so they are no longer under > the rules of POSIX, and then cygwin can do whatever it wants (and in
Erm... hang on. __STDC_ISO_10646__ and the POSIX requirement are two different beasts. I still think that __STDC_ISO_10646__ does not restrict a 2 byte wchar_t to UCS-2. Per the definition UTF-16 is a valid coded representation of characters from ISO/IEC 10646. So, to say it with your words, the moment applications pass a surrogate, they are no longer under the rules of POSIX, but they still honor the restriction documented by __STDC_ISO_10646__. However, *usually* an application shouldn't really notice that a surrogate has been used, at least as long as they only manipulate entire strings. > this case, QoI demands that we honor surrogates to the best of our > ability for full UTF-16 support, and you can have multi-wchar_t > characters just as you already have multi-byte UTF-8 char characters). > In other words, cygwin IS being POSIX-compliant by advertising only the > Unicode 4.0 character set in the __STDC_ISO_10646__, while still > supporting Unicode 5.2 (should we upgrade to Unicode 6.0?) as an > extension when you no longer care about POSIX. > > > However, the POSIX definition doesn't contradict what I said about the > > definition of __STDC_ISO_10646__ as far as I'm concerned. > > Yep - I think we're in violent agreement :) Hmm, I'm not quite sure, see above. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple