On 8/14/10 3:01 PM, Dmitry Groshev wrote: > Configuration Information [Automatically generated, do not change]: > Machine: i686 > OS: linux-gnu > Compiler: gcc > Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='i686' > -DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='i686-pc-linux-gnu' > -DCONF_VENDOR='pc' -DLOCALEDIR='/usr/local/share/locale' > -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H -I. -I. -I./include > -I./lib -g -O2 > uname output: Linux wjlair 2.6.24.5-smp #1 SMP Fri Aug 14 19:13:09 MSD > 2009 i686 AMD Athlon(tm) 64 X2 Dual Core Processor 5200+ AuthenticAMD > GNU/Linux > Machine Type: i686-pc-linux-gnu > > Bash Version: 4.1 > Patch Level: 0 > Release Status: release > > Description: > In UTF-8 locale (such as ru_RU.UTF-8), a \c escape within > $'...' results in an invalid UTF-8 string if followed by an UTF-8 > character: ansicstr() in lib/sh/strtrans.c consumes and converts the > character's first byte, leaving the rest of UTF-8 sequence as it were.
I'm not sure why you think this is a bug. The \c escape is described as converting to a control character; control characters are always a single byte; the conversion to a control character therefore consumes one byte. It's not the business of $'...' conversion to ensure that the result is a valid multibyte character string. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRU c...@case.edu http://cnswww.cns.cwru.edu/~chet/