On 2/25/2019 11:32 AM, Chet Ramey wrote: > On 2/25/19 11:17 AM, Olga Ustuzhanina wrote: > > > > This is an invalid multibyte character. The \xc2 is the valid first byte > of a multibyte character, but the next byte read makes the sequence > invalid. The read builtin resynchronizes on the following byte. There's > currently no facility to push back the invalid parts of a multibyte > character. There might be a way to do it if the read is buffered inside > bash, but the `-d' option makes it unbuffered. > ---- Note: this is in bash 4.4.12 -- is there supposed to be a behavior difference in 5.0?
If I change the previous example to use default IFS as a delimiter...same as previous function, then print the same string, using LF's instead of NUL's: ntc() { while read -r input; do printf "$input;" ; done ; } printf $'\xc2\n\n\n\n'|ntc|hexdump -C 00000000 c2 3b 3b 3b 3b |.;;;;| 00000005 In this case, the decode of \xc2 doesn't swallow the following character. But in 4.4.12, using IFS='': ntc() { while IFS='' read -r input; do printf "$input;" ; done ; } gives no output regardless of whether the 1st character is decoded correctly or not. I.e. printf $'\xc2\xa9\x00\x00\x00\x00'|ntc|hd and printf $'\xc2\00\00\00\00'|ntc|hexdump -C both result in no output. Is that what happens on 5.x?