> Other than that I do not have much other idea than disabling it, for > instance if documentlanguage is en. The result with Unicode::Collate is > better for accented letters, but not so useful in english. There could > even be a customization variable to use Unicode::Collate even in > english.
Another possibility is to use getSortKey: "$sortKey = $Collator->getSortKey($string)" -- see 4.3 Form Sort Key, UTS #10. Returns a sort key. You compare the sort keys using a binary comparison and get the result of the comparison of the strings using UCA. $Collator->getSortKey($a) cmp $Collator->getSortKey($b) is equivalent to $Collator->cmp($a, $b) >From perlperf man page: Using a subroutine as part of your sort is a powerful way to get exactly what you want, but will usually be slower than the built-in alphabetic "cmp" and numeric "<=>" sort operators. It is possible to make multiple passes over your data, building indices to make the upcoming sort more efficient, and to use what is known as the "OM" (Orcish Maneuver) to cache the sort keys in advance. The cache lookup, while a good idea, can itself be a source of slowdown by enforcing a double pass over the data - once to setup the cache, and once to sort the data. Using "pack()" to extract the required sort key into a consistent string can be an efficient way to build a single string to compare, instead of using multiple sort keys, which makes it possible to use the standard, written in "c" and fast, perl "sort()" function on the output, and is the basis of the "GRT" (Guttman Rossler Transform). Some string combinations can slow the "GRT" down, by just being too plain complex for its own good. We could try caching sort keys and see if it is fast enough. If so, we could still use Unicode::Collate without any setting for this.