On 6/24/17 1:41 PM, Eduardo A. Bustamante López wrote:
> I was looking through this old thread:
> http://seclists.org/oss-sec/2014/q3/851
>
> It looks like the issue reported in there is still there:
>
> dualbus@debian:~$ LANG=zh_CN.GBK printf 'echo \u4e57\n' |LANG=zh_CN.GBK bash
> �\
> dualbus@debian:~$ LANG=en_US.UTF8 printf 'echo \u4e57\n' |LANG=en_US.UTF8
> bash
> 乗
This shows that if it's a valid character in the current locale, bash will
convert it and read it back. `printf' takes the unicode encoding (in this
case, a three-byte character) and runs it through iconv to try and convert
it to a valid multibyte character in the current locale.
> dualbus@debian:~$ LANG=zh_CN.GBK printf '\u4e57' | od -tx1 -An
> 81 5c
>
> It looks like it doesn't detect that \x81\x5c is a single character, and
> instead treats the multibyte character as separate characters.
It's apparently not a single character in that locale.
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU [email protected] http://cnswww.cns.cwru.edu/~chet/