It seems a strange inconsistency, though: Double-quoted strings (and, really, pretty much all other Bash syntax as far as I have seen) recognize 0x81 0x5C as a two-byte character rather than treating 0x5C as a backslash within the quoting syntax, but $'..' strings unconditionally treat 0x5C as a backslash... Is there any reason a disparity like that would be desirable?
----- Original Message ----- From: chet.ra...@case.edu To:"George" <tetsu...@scope-eye.net>, "Eduardo_A._Bustamante_López" <dual...@gmail.com>, <bug-bash@gnu.org> Cc:<chet.ra...@case.edu> Sent:Mon, 26 Jun 2017 11:04:42 -0400 Subject:Re: Fwd: Non-upstream patches for bash (2014) On 6/25/17 11:08 PM, George wrote: > On Sun, 2017-06-25 at 12:23 -0400, Chet Ramey wrote: >> On 6/24/17 1:41 PM, Eduardo A. Bustamante López wrote: >> >>> dualbus@debian:~$ LANG=zh_CN.GBK printf 'u4e57' | od -tx1 -An 81 5c It >>> looks like it doesn't detect that x81x5c is a single character, and >>> instead treats the multibyte character as separate characters. >> >> >> It's apparently not a single character in that locale. >> > > Yes it is! > > https://en.wikipedia.org/wiki/GBK > x81 x5C is a two-byte character from level GBK/3. OK. The terminal emulator I'm using simply doesn't render the glyph. > But unless I've misunderstood something, it seems to be behaving correctly > already. At least, with the exception of within $'..' quotes. It is behaving correctly. $'...' works using bytes. You can get it to expand a byte sequence to a multibyte character using u or x, but it works on bytes and always has, just like in C. Since 0x5c introduces an escape sequence, that's how it's treated. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU c...@case.edu http://cnswww.cns.cwru.edu/~chet/