On 2/25/2019 11:32 AM, Chet Ramey wrote:
> On 2/25/19 11:17 AM, Olga Ustuzhanina wrote:
>
>   
>
> This is an invalid multibyte character. The \xc2 is the valid first byte
> of a multibyte character, but the next byte read makes the sequence
> invalid. The read builtin resynchronizes on the following byte. There's
> currently no facility to push back the invalid parts of a multibyte
> character. There might be a way to do it if the read is buffered inside
> bash, but the `-d' option makes it unbuffered.
>   
----
    Note: this is in bash 4.4.12 -- is there supposed to be a behavior
difference in 5.0?

If I change the previous example to use default IFS
as a delimiter...same as previous function,
then print the same string, using LF's instead
of NUL's:

ntc() { while read -r input; do printf "$input;" ; done ; }
printf $'\xc2\n\n\n\n'|ntc|hexdump -C        
00000000  c2 3b 3b 3b 3b                                    |.;;;;|
00000005

In this case, the decode of \xc2 doesn't swallow the following
character.

But in 4.4.12, using IFS='':

ntc() {  while IFS='' read -r input; do printf "$input;" ; done ; }

gives no output regardless of whether the 1st character is decoded
correctly or not.  I.e.
printf $'\xc2\xa9\x00\x00\x00\x00'|ntc|hd
 and
printf $'\xc2\00\00\00\00'|ntc|hexdump -C

both result in no output.  Is that what happens on 5.x?






Reply via email to