On 11/17/19 4:25 AM, Chris Carlen wrote:
Bash Version: 5.0
Patch Level: 0
Release Status: release
Description:
UTF-8 multibyte char string split into bytes rather than characters.
Repeat-By:
#!/bin/bash
shopt -s extglob
LC_ALL="en_US.UTF-8"
# E.g., normal/expected behavior:
# Create a string:
A=abc
# Replace left virtual empty strings with spaces, putting separated
# chars into positional parameters, then print them quoted:
set -- ${A//?()/ }
echo "${@@Q}" #-> 'a' 'b' 'c'
# E.g., abnormal behavior:
# write 'REVERSE PILCROW SIGN' to B, then repeat as above:
printf -v B '\u204B'
set -- ${B//?()/ }
echo "${@@Q}" #-> $'\342' $'\201' $'\213'
Yes, this is a problem. The null match requires advancing through the
string by one character, instead of one byte. I'll fix it.
Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU c...@case.edu http://tiswww.cwru.edu/~chet/