On 2/25/19 5:42 PM, Olga Ustuzhanina wrote: > On Mon, 25 Feb 2019 12:59:38 -0800 > L A Walsh <b...@tlinx.org> wrote: > >> In this case, the decode of \xc2 doesn't swallow the following >> character. > > I want to clarify that \xc2 (and other characters in the range > mentioned above) can only swallow a \0. Other characters are > unaffected.
The other characters wouldn't be treated as a delimiter either. The \0 is `swallowed' because it's the C string terminator. The \0 gets added to the input string, but it's not treated as a delimiter, since it's part of the invalid multibyte sequence. Then the next character is read, that \0 is treated as a delimiter, and the input string is assigned to the variable, including the \0. That gets treated as a normal C string terminator, since variable values can't contain NULs. (This is why read discards \0 unless it's a delimiter. It would terminate the value assigned to the variable.) Bash-4.4 returned different results because it didn't attempt to validate reading multibyte characters at all unless it was reading a fixed number of characters. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU c...@case.edu http://tiswww.cwru.edu/~chet/