Hello, I implemented index sorting in C with XS interface in texi2any. When unicode collation is wanted, based on my understanding of Eli suggestions, a collation locale is set to "en_US.utf-8", by newlocale (LC_COLLATE_MASK, "en_US.utf-8", 0) and then strxfrm_l is used (which should be the same as using strcoll_l). With conversion in C/with XS set with environment variable TEXINFO_XS_CONVERT=1 and for now only for HTML, if TEST customization variable is not set.
On my debian GNU/Linux, the result is good except for the treatment of spaces. Indeed, spaces (and non alphanumeric characters, but it is not really an issue) are ignored when sorting, which sticks to the Unicode collation standard, but leads to an awkward sorting for indices, for example 'H r' is sorted after 'Ha'. In perl, it is possible to customize the Unicode::Collate collation, we use 'variable' => 'Non-Ignorable'. Here is the corresponding comment in the code: # The 'Non-Ignorable' for variable collation elements means that they are # treated as normal characters. This allows to have spaces and punctuation # marks sort before letters. # http://www.unicode.org/reports/tr10/#Variable_Weighting If somebody knows how to get the same result in C, please tell. Also I have no idea how portable this setup is, but I guess testers and time will tell. -- Pat