On 5/11/17 8:56 AM, Eduardo Bustamante wrote:
> The C with acute accent character: https://en.wikipedia.org/wiki/%C4%86
> 
> - Upper case
> dualbus@debian:~$ printf '\U0106\n'
> Ć
> 
> - Lower case
> dualbus@debian:~$ printf '\U0107\n'
> ć
> 
> Now, in bash, if you type in ć, then run readline `upcase-word' on it,
> instead of ending up with the UTF-8 multibyte string for U+0106 (0xC4
> 0x86), you end up with 0x07 0x87.
> 
> The parameter expansion doesn't seem to have that problem so I think
> it's a bug in readline:

Thanks for the report. This is a bug in readline.

> For some reason, rl_change_case thinks `c` is ASCII:
> 
> (gdb) call isascii((unsigned char)c)
> $8 = 1

Because when you cast it to unsigned char, it masks all but the least
significant 8 bits, which results in a valid ascii character.


-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    c...@case.edu    http://cnswww.cns.cwru.edu/~chet/

Reply via email to