Paul Eggert <egg...@cs.ucla.edu> wrote: > arn...@skeeve.com wrote: > > The only FIXMEs I see are both in the _LIBC part of the code, and > > there's only two: one in regexec.c and one in regcomp.c. > > In that case I guess there isn't a problem. > > I am a little concerned that unibyte locales use bytes whereas multibyte > locales > use characters for range expressions. As I understand it, this means Turkish > range expressions are interpreted differently depending on whether the locale > uses UTF-8 or ISO/IEC 8859-9. Is that really what Turkish-speakers want?
It's a sad fact of life that users have to be aware of their character set / locale and understand the consequences of what they choose to use (or what their OS has chosen for them upon installation). This is just another aspect of that. > That being said, it doesn't matter all that much nowadays now that UTF-8 has > taken over, so it's probably not worth much of our time to worry about this > discrepancy. For what it's worth, > https://w3techs.com/technologies/details/en-iso885909/all/all says that only > 0.06% of websites still use ISO/IEC 8859-9, down from 0.09% a year ago (and > down > from 0.7% in 2010, so this is a factor-of-10 decline in 8 years). I totally agree that it's not worth worrying about. It's a too small tail to be wagging such a big dog. Thanks, Arnold