On Fri, 2025-07-25 at 13:05 +0300, JānisE wrote: > I seem to not be able to get PostgreSQL to sort rows by a string column > respecting the diacritics. > > I read [1] that it's possible to define a custom collation having collation > strength "ks" > set to "level2", which would mean that it's accent-sensitive. > > However, when I try to actually sort using that collation, the order seem to > be accent-insensitive. > > For example: > > CREATE TABLE test (string text); > INSERT INTO test VALUES ('bar'), ('bat'), ('bär'); > CREATE COLLATION "und1" (provider = icu, deterministic = false, locale = > 'und-u-ks-level1'); > CREATE COLLATION "und2" (provider = icu, deterministic = false, locale = > 'und-u-ks-level2'); > CREATE COLLATION "und3" (provider = icu, deterministic = false, locale = > 'und-u-ks-level3'); > SELECT * FROM test ORDER BY string collate "und1"; > SELECT * FROM test ORDER BY string collate "und2"; > SELECT * FROM test ORDER BY string collate "und3"; > > All three collations give me the same order: bar < bär < bat, although an > accent-sensitive > order would be bar < bat < bär > > This does lose "bär", meaning that those strength levels do have some kind of > an effect on "DISTINCT": > SELECT DISTINCT string COLLATE "und1" FROM test; > > But it's not working on "ORDER BY". > > Do I misunderstand the collation capabilities? Is there a way to actually get > an accent-sensitive order?
Yes, I thing you misunderstand what "accent sensitive" means. It means that 'bar' <> 'bär'. Natural language collations compare strings on different levels: - 'bar' and 'bär' are identical on the first level (base character) - 'bar' and 'bär' are different on the second level (accent) - there are two more levels, the third being case Strings are ordered by the first level first, then by the second, and so on. I recommend reading Peter's excellent blog: http://peter.eisentraut.org/blog/2023/05/16/overview-of-icu-collation-settings So you end up with 'bar' < 'bär' < 'bat', because the first two compare equal on level 1. What you are looking for is a collation where accents are a first-level difference. The only way to do that with ICU collations, as far as I know, is to add explicit rules, like in this example: https://stackoverflow.com/a/77288282/6464308 > Also, is there a way to see what options are there for the default built-in > collations? > I don't see, for example, the used "ks" level in the "pg_collation" table > data. You can see that in the "colllocale" column. The name of the ICU locale determines its capabilities. Yours, Laurenz Albe