On Sun, Feb 04, 2024 at 12:17:16PM +0100, Patrice Dumas wrote: > Here is my updated thinking on the possibilities > > 1) lexicographic sorting on unicode strings (corresponds to > USE_UNICODE_COLLATION=0 currently) > 2) unicode default sorting obtained by Unicode::Collate in Perl and > strxfrm_l in C with "en_US.utf-8", the current default ("en_US.utf-8" > could be different on different platforms, a list instead of only one > possibility if "en_US.utf-8" is not always available...) > 3) sorting based on @documentlanguage using, in perl > Unicode::Collate::Locale with locale @documentlanguage and in C > strxfrm_l with "@documentlanguage.utf-8" (at least on GNU/Linux, > the locale name setup for strxfrm_l could be different on other platforms). > 4) sorting based on a customization variable, such as COLLATION_LANGUAGE. > it would be the same as the previous one, with @documentlanguage > replaced by COLLATION_LANGUAGE. > 5) sorting based on the user locale, using strxfrm in C and > "use locale" and regular sorting on unicode (internal perl encoded) strings > in Perl.
I forgot about one possibility, until there is a possibility to have Non-ignorable Weighting in C it could make sense to have as another possibility for C, the possibility to call perl code to obtain 2), which would lead to 6) in C use Perl sorting corresponding to 2). Could be named 'perldefault'. The possibility to use Perl sorting corresponding to 2) in C is already implemented, and currently used if TEST=1. -- Pat