builtin printf behaves incorrectly with "c and 'c character-value arguments

Rich Felker Thu, 01 Nov 2007 04:13:14 -0800

$ printf %d\\n \'À
-61
(expected 192)

This should be 192 regardless of locale on any system where wchar_t
values are ISO-10646/Unicode. Bash is incorrectly reading the first
byte of the UTF-8 which happens to be -61 when interpreted as signed
char; on a Latin-1 based locale it will probably give -63 instead.


Both POSIX and common sense are clear that the numeric values
resulting from 'c should be the wchar_t value of c and not the value
of the first byte of the multibyte character; from the SUSv3 printf(1)
documentation:

     Note that in a locale with multi-byte characters, the value of a
     character is intended to be the value of the equivalent of the
     wchar_t representation of the character as described in the
     System Interfaces volume of IEEE Std 1003.1-2001.

Language lawyers could argue that on 'single-byte' locales perhaps the
byte value should be used; however, strictly speaking a single-byte
locale is simply a special case of a multi-byte one, and sanity should
win in any case.

Fixing the issue should be easy; asciicode() in builtins/printf.def
simply needs to be changed to decode the character with mbrtowc rather
than reading the byte (and perhaps also should be renamed...).

Rich

builtin printf behaves incorrectly with "c and 'c character-value arguments

Reply via email to