Hi. Thanks for the report. As written, texindex is indeed suitable only for English; when I wrote it ~ 9 years ago, nobody said anything about support for other languages.
I think this can be remedied, although there may be issues with awk versions besides gawk as most don't support Unicode or other multibyte character sets. Arnold Werner LEMBERG <w...@gnu.org> wrote: > > [texindex (GNU texinfo) 6.8dev] > [GNU Awk 4.2.1, API: 2.0] > [openSUSE Leap 15.4] > > > There are two bugs with texindex, making it basically unusable for > everything except English as the main document language. For the > report below, here is an input file. > > ``` > \input texinfo.tex > > @documentencoding UTF-8 > @documentlanguage ca > > @findex a > @findex à > @findex u > @findex ù > > @printindex fn > > @bye > ``` > > * The first, really severe bug is that the resulting output is > completely broken if `texindex` is called with `LANG=C`. Saying > > ``` > LANG=C texi2pdf sort-ca.texi > ``` > > creates the following `.fns` output > > ``` > \initial {0xc3} > \entry{\code {à}}{1} > \entry{\code {ù}}{1} > \initial {A} > \entry{\code {a}}{1} > \initial {U} > \entry{\code {u}}{1} > ``` > > As can be seen, the `\initial` line contains a single byte (where > '0xc3' is a real byte), which suprisingly doesn't make pdftex abort, > but both xetex and luatex stop with errors. I have to use a UTF-8 > locale like `en_US.utf8` to get decent output. > > I consider it very bad that `texindex` is locale-dependent. IMHO > the proper solution is to make `texinfo.tex` emit a document > encoding statement to the (unsorted) index file that in turn gets > acknowledged by `texindex`. > > * While `texindex` is sensitive to the locale regarding the input > encoding, it isn't for collation: any `LANG` or `LC_COLLATE` setting > gets ignored. Similarly, it ignores the `@documentlanguage` > instruction to derive a sorting order. For example, the Catalan > order for the above example should be 'aàuù', however, in the output > it is sorted as `àùau'. > > The proper fix would be to make `texinfo.tex` emit a document > language statement to the (unsorted) index file that in turn gets > acknowledged by `texindex`. > > > Werner