On Sun, Feb 04, 2024 at 08:38:45PM +0100, Patrice Dumas wrote: > Thanks. This is very confusing to me, then, as it is not told that way > in perllocale, especially the section: > https://perldoc.perl.org/perllocale#Category-LC_COLLATE%3A-Collation%3A-Text-Comparisons-and-Sorting > There is more information in the end of the page that may correspond > better to the perlop information. Not important at all anyway > since we agree that using the user locale is not a good idea in any case.
Yes, it appears to say the opposite: Perl uses the platform's C library collation functions "strcoll()" and "strxfrm()". That means you get whatever they give. On some platforms, these functions work well on UTF-8 locales, giving a reasonable default collation for the code points that are important in that locale. (And if they aren't working well, the problem may only be that the locale definition is deficient, so can be fixed by using a better definition file. Unicode's definitions (see "Freely available locale definitions") provide reasonable UTF-8 locale collation definitions.) Starting in Perl v5.26, Perl's use of these functions has been made more seamless. This may be sufficient for your needs. For more control, and to make sure strings containing any code point (not just the ones important in the locale) collate properly, the Unicode::Collate module is suggested. So COLLATE_LOCALE (if we go with that naming) could potentially be implemented in Perl as well, if we are able to temporarily switch the locale. Speed could be an issue, though. (Although the documentation says the result of strxfrm is cached, so maybe not.) I guess that the other documentation is either out of date, or they were mandating Unicode::Collate as more portable than relying on the platform's C library.