>>>>> On Sun, 15 Nov 2015, Ulrich Mueller wrote: > Description: > In an UTF-8 locale like en_US.UTF-8, the case-modifying > parameter expansions sometimes return invalid UTF-8 encodings.
> This seems to happen when the UTF-8 byte sequences that are > encoding upper and lower case have different lengths. Even more interesting effects happen if the string contains a character whose UTF-8 encoding gets *longer* after case conversion, because then the terminating null byte will be overwritten. For example, U+0250 "LATIN SMALL LETTER TURNED A" is represented by a two byte sequence in UTF-8, while its uppercase equivalent U+2C6F needs three bytes: $ LC_ALL=en_US.UTF-8 $ x=$'aaaaa\xc9\x90' $ y=${x^^} $ echo -n "$y" | od -t x1 0000000 41 41 41 41 41 e2 90 af 6f 6d 65 2f 75 6c 6d 0000017 y contains some trailing garbage (could be a part of $HOME or $PWD).