On Thu, Oct 12, 2023 at 11:39:14AM +0200, Patrice Dumas wrote: > Hello, > > There is a translation to C of texi2any code going on, for the future, > after the next release, mainly for the conversion to HTML in a first > step. > > One thing I could not find easily in C is something to replace the > Unicode::Collate perl module for index entries sorting using 'smart' > rules for sorting, that could be either found in Gnulib, included easily > in the Texinfo distribution or would be, in general, installed. Unless > I missed something, there is no such facility in libunistring, it seems > to be in libICU, but I do not know how easy it could be > integrated/shipped with Texinfo and I do not think that it is installed > in the general case.
It's all in the future, but I am slightly concerned about is duplicating in Texinfo existing system facilities. For example, for avoiding use of wcwidth, our use of which depends on setting a UTF-8 locale, and using the wchar_t type. Is every program that uses wcwidth supposed to supply their own implementation instead, and isn't this wasteful? https://www.gnu.org/software/gnulib/manual/html_node/Characters.html may be informative on the drawbacks of wchar_t. I have seen implementations of wcwidth and it does not look very large, so not very wasteful of space for every program to reimplement it using Unicode code points instead, but still in principle it should be a standard system library. Doing collation properly is more complicated than wcwidth, I believe, using large tables of codepoints. It seems that the code is already there in the C libraries but only available through setting the locale. One option is that we require systems to have a UTF-8 locale installed to get correct output. (We'd have to find some other solution for MS-Windows.) I don't know if libunistring aspires to become a standard system library for handling UTF-8 data but if we use it for other UTF-8 processing it would make sense to use it for collation. I suggest writing to Bruno Haible to ask if he has plans to include collation functionality in libunistring in the future. I am currently reading through "Unicode Technical Standard #10" and although I don't understand a lot of it yet, it seems feasible that we could implement it in C.