On 5/11/17 8:56 AM, Eduardo Bustamante wrote: > The C with acute accent character: https://en.wikipedia.org/wiki/%C4%86 > > - Upper case > dualbus@debian:~$ printf '\U0106\n' > Ć > > - Lower case > dualbus@debian:~$ printf '\U0107\n' > ć > > Now, in bash, if you type in ć, then run readline `upcase-word' on it, > instead of ending up with the UTF-8 multibyte string for U+0106 (0xC4 > 0x86), you end up with 0x07 0x87. > > The parameter expansion doesn't seem to have that problem so I think > it's a bug in readline:
Thanks for the report. This is a bug in readline. > For some reason, rl_change_case thinks `c` is ASCII: > > (gdb) call isascii((unsigned char)c) > $8 = 1 Because when you cast it to unsigned char, it masks all but the least significant 8 bits, which results in a valid ascii character. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU c...@case.edu http://cnswww.cns.cwru.edu/~chet/