Rich Felker wrote: > $ printf %d\\n \'À > -61 > (expected 192) > > This should be 192 regardless of locale on any system where wchar_t > values are ISO-10646/Unicode. Bash is incorrectly reading the first > byte of the UTF-8 which happens to be -61 when interpreted as signed > char; on a Latin-1 based locale it will probably give -63 instead. > > Both POSIX and common sense are clear that the numeric values > resulting from 'c should be the wchar_t value of c and not the value > of the first byte of the multibyte character; from the SUSv3 printf(1) > documentation: > > Note that in a locale with multi-byte characters, the value of a > character is intended to be the value of the equivalent of the > wchar_t representation of the character as described in the > System Interfaces volume of IEEE Std 1003.1-2001. > > Language lawyers could argue that on 'single-byte' locales perhaps the > byte value should be used; however, strictly speaking a single-byte > locale is simply a special case of a multi-byte one, and sanity should > win in any case.
You're correct that the bash printf should understand multibyte characters in a multibyte locale, but not that returning a multibyte character when a user hasn't asked for one by setting the locale is more "sane." Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer Live Strong. No day but today. Chet Ramey, ITS, CWRU [EMAIL PROTECTED] http://cnswww.cns.cwru.edu/~chet/