Re: Fwd: odd behavior of length(), match() and field splitting with multi-byte characters

2024-08-20 Thread Brian Inglis via Cygwin
There do seem to be anomalies in Cygwin handling of SMP characters, perhaps due to conversion to or misinterpretation as UTF-16/UCS-2 surrogates? 🔍 U+01f50d f0 9f 94 8d d83d dd0d 🔎 U+01f50e f0 9f 94 8e d83d dd0e $ wc -lwcmL <<< 🔎 1 0 3 5 0 $ wc -lwcmL <<< 🔍

Re: Fwd: odd behavior of length(), match() and field splitting with multi-byte characters

2024-08-20 Thread Ed Morton via Cygwin
Is there any more information I can provide for someone to be able to look into this bug?     Ed. On 7/6/2024 7:26 AM, Ed Morton wrote: I posted the below bug report to the GNU awk bugs mailing list, https://lists.gnu.org/archive/html/bug-gawk/2024-07/msg0.html, the feedback there is that