> From: arn...@skeeve.com > Date: Sun, 06 Nov 2022 11:25:27 -0700 > Cc: bug-texinfo@gnu.org, arn...@skeeve.com > > Eli Zaretskii <e...@gnu.org> wrote: > > > > and similarly capable C libraries > > > > Are there such libraries in existence, when locale data is considered? > > Which ones? > > macOS, and Solaris, to name two. I think AIX as well.
That's not what I know. I think glibc is quite unique. > Obviously texindex, and gawk underneath it, can't do more than what > the underlying C library and installed locales enable. But on systems > where they can (which is not just GLIBC), it should be possible to do > more than they currently do now. You'll just trade one set of bug reports for another, that's all. There's no way to ensure on Posix systems that an arbitrary locale is installed. (Ironically, Windows is in much better shape here.) So the problems will remain, and their manifestations will be as hard to understand as now, they will just be different, and will probably involve quite a bit of mojibake. If we want to solve this properly, we need to decode the text into the internal UTF-8 encoding, process it in UTF-8, and then encode it back when writing the index. Which probably means we either should add such capabilities to Gawk, or do it with some tool other than Gawk. I think the latter is more practical, unfortunately, since Gawk doesn't really have i18n capabilities, it can only use a single locale, and that locale must be externally installed. Since texi2any just went through the same process, I think Perl is probably a good candidate to replace Gawk as an implementation language for texindex. Another possibility is Python.