On Wed, Mar 25, 2026 at 5:35 PM Tom Lane <[email protected]> wrote:
>
> Tomas Vondra <[email protected]> writes:
> > On 3/26/26 00:40, Tom Lane wrote:
> >> I believe what's happening there is that in cs_CZ locale,
> >> "V" doesn't follow simple ASCII sort ordering.
>
> > With cs_CZ all letters sort *before* numbers, while in en_US it's the
> > other way around. V is not special in any way.
>
> Ah, sorry, I should have researched a bit instead of relying on
> fading memory.  The quirk I was thinking of is that in cs_CZ,
> "ch" sorts after "h":
>
> u8=# select 'h' < 'ch'::text collate "en_US";
>  ?column?
> ----------
>  f
> (1 row)
>
> u8=# select 'h' < 'ch'::text collate "cs_CZ";
>  ?column?
> ----------
>  t
> (1 row)
>
> Regular hex encoding isn't bitten by that because it doesn't
> use 'h' in the text form ... but this base32hex thingie does.
>
> However, your point is also correct:
>
> u8=# select '0' < 'C'::text ;
>  ?column?
> ----------
>  t
> (1 row)
>
> u8=# select '0' < 'C'::text collate "cs_CZ";
>  ?column?
> ----------
>  f
> (1 row)
>
> and that breaks "text ordering matches numeric ordering"
> for both traditional hex and base32hex.  So maybe this
> is not as big a deal as I first thought.  We need a fix
> for the new test though.  Probably adding COLLATE "C"
> would be enough.

Thank you for the report and the analysis.

I've reproduced the issue with "cs_CZ" collation and adding COLLATE
"C" to the query resolves it. It seems also a good idea to add a note
in the documentation too as users might face the same issue. For
example,

To maintain the lexicographical sort order of the encoded data, ensure
that the text is sorted using the C collation (e.g., using COLLATE
"C"). Natural language collations may sort characters differently and
break the ordering.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


Reply via email to