On Wed, Jan 8, 2025, at 1:25 PM, Jeff Ketchum wrote: > I ran into a strange bug using newer versions of bash, I haven't isolated > it to a specific release.
It looks like 5.0 introduced the problem. > In using unicode group separator character U 241D, > https://www.compart.com/en/unicode/U+241D, 0x241D > I set the IFS to this unicode, and have U+241E and U+241F characters in the > data. > When assigning to an array, and using for var in "${array[@]}"... > it ends up splitting the data at unexpected locations. > > I don't get this behaviour when the array isn't quoted > > [...] > > I wrote a script that will easily reproduce this: Here's a version that I think is more legible: $ cat /tmp/foo.bash LC_ALL=en_US.UTF-8 gs=$'\u241D' rs=$'\u241E' us=$'\u241F' data="a${gs}b${rs}c${us}d" IFS=$gs # Original variable printf '"$data" - %q\n' "$data" printf ' $data - %q\n' $data echo # Positional parameters set -- $data printf '"$@" - %q\n' "$@" printf ' $@ - %q\n' $@ echo # Multi-element array arr1=($data) declare -p arr1 printf '"${arr1[@]}" - %q\n' "${arr1[@]}" printf ' ${arr1[@]} - %q\n' ${arr1[@]} echo # Single-element array arr2=("$data") declare -p arr2 printf '"${arr2[@]}" - %q\n' "${arr2[@]}" printf ' ${arr2[@]} - %q\n' ${arr2[@]} $ ~/build/bash-5.3-testing/bash /tmp/foo.bash "$data" - a␝b␞c␟d $data - a $data - b␞c␟d "$@" - a "$@" - b␞c␟d $@ - a $@ - b␞c␟d declare -a arr1=([0]="a" [1]="b␞c␟d") "${arr1[@]}" - a "${arr1[@]}" - $'b\342' "${arr1[@]}" - $'\236c\342' "${arr1[@]}" - $'\237d' ${arr1[@]} - a ${arr1[@]} - b␞c␟d declare -a arr2=([0]="a␝b␞c␟d") "${arr2[@]}" - $'a\342' "${arr2[@]}" - '' "${arr2[@]}" - $'b\342' "${arr2[@]}" - $'\236c\342' "${arr2[@]}" - $'\237d' ${arr2[@]} - a ${arr2[@]} - b␞c␟d It's interesting that "$@" works fine, while "${arr[@]}" doesn't. -- vq