On 11/17/19 4:25 AM, Chris Carlen wrote:

Bash Version: 5.0
Patch Level: 0
Release Status: release

Description:
   UTF-8 multibyte char string split into bytes rather than characters.

Repeat-By:

#!/bin/bash

shopt -s extglob
LC_ALL="en_US.UTF-8"

# E.g., normal/expected behavior:

# Create a string:
A=abc

# Replace left virtual empty strings with spaces, putting separated
# chars into positional parameters, then print them quoted:
set -- ${A//?()/ }
echo "${@@Q}"       #-> 'a' 'b' 'c'

# E.g., abnormal behavior:

# write 'REVERSE PILCROW SIGN' to B, then repeat as above:
printf -v B '\u204B'
set -- ${B//?()/ }
echo "${@@Q}"       #-> $'\342' $'\201' $'\213'

Yes, this is a problem. The null match requires advancing through the
string by one character, instead of one byte. I'll fix it.

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    c...@case.edu    http://tiswww.cwru.edu/~chet/

Reply via email to