[Redirecting to bug-gnulib. This is a question about 'mcel' from gnulib.]

Collin Funk wrote in
<https://lists.gnu.org/archive/html/coreutils/2025-08/msg00066.html>:
> I noticed that mcel does not see the following characters as equal in a
> UTF-8 locale:
>
>    è (U+0065 + U+0300)
>    è (U+00E8)
>
> This is because mcel_isbasic (U+0065) sees an ASCII character and does
> not normalize it using the following U+0300.
>
> Is this intentional or not?

Yes, it is intentional. The reason is that Unicode text that is exchanged
between programs is supposed to be in NFC normalization form [1].

For many years, one exception to this rule were file names on macOS HFS+
file systems, which are in NFD. This caused lots of trouble with non-ASCII
file names on macOS. But fortunately, Apple has phased out HFS+.

See also [2].

Bruno

[1] https://www.unicode.org/faq/normalization.html#2
[2] https://www.w3.org/TR/charmod-norm/




Reply via email to