Re: \c escape within $'...' can produce mangled UTF-8

Dmitry Groshev Sun, 15 Aug 2010 03:02:26 -0700

On 15/08/2010, Dennis Williamson <dennistwilliam...@gmail.com> wrote:
> It only consumes two bytes on my system (or one if it's followed by
> another escape or a closing quote).


You are wrong. Try "echo $'\x{123456}AB'" and look at the result.
Or read the source code: lib/sh/strtans.c

> "Backslash-escaped characters" refers to the "c" in "\c" not the
> characters that follow it.

Given that documentation doesn't say anything like that anywhere, and
given that _every other escape_ operates on characters (accepting only
ASCII chars, and leaving multibyte ones alone) - inventing an
exception specifically for "\c" would look quite contrived.

> It's the responsibility of your code to put an ASCII character after
> the \c.

My code is fine, thank you. ;-) Given that I never had any use for
"\c" when there is "\x".
Instead I found this weirdness in the Bash source code when writing my
own function for interpreting (some of) shell syntax.

> There's no way for Bash to guess that the 0xD0 is part of a
> Unicode character or the byte that it is.

Everything between 0x80 and 0xFF is part of (possibly invalid)
multibyte sequence in UTF-8. Read up on the UTF-8 encoding, and don't
make wrong guesses again.

-- 
-= With best regards, Dmitry Groshev =-

Re: \c escape within $'...' can produce mangled UTF-8

Reply via email to