Hi The encoding of six bytes long utf-8 sequences is buggy. Today unicode requires at most 4 bytes long utf-8 sequences but if we handle 5 and 6 too then let's do it the right way.
The attached patch was created using a fresh master clone (d894cfd104086ddf68c286e67a5fb2e02eb43b7b). I'm writing a tool (pxargs, an xargs variant) that can accept input strings in shell-quoted format and used the bash manual and source code as references. I haven't compiled the latest bash sources to check the bug but it is likely to affect ANSI-C quoted strings like $'\U7fffffff'. Best Regards, Istvan Pasztor
six_bytes_long_utf8_sequences_d894cfd104086ddf68c286e67a5fb2e02eb43b7b.patch
Description: Binary data