On Thu, Mar 26, 2026 at 10:59 AM Andrey Borodin <[email protected]> wrote: > > > > > On 26 Mar 2026, at 22:30, Masahiko Sawada <[email protected]> wrote: > > > > Feedback is very welcome. > > The patch is fine from my POV. > > Please consider these small improvements to the patch. Basically, we > reference to formula stated by RFC where possible. > 0001 is intact.
Thank you for the suggestion. It looks good to me. I've merged these patches and am going to push barring any objections. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
From 35d321a9e3216052c917b4d1a61b93ecb1414e42 Mon Sep 17 00:00:00 2001 From: Masahiko Sawada <[email protected]> Date: Thu, 26 Mar 2026 10:17:23 -0700 Subject: [PATCH v2] doc: Clarify collation requirements for base32hex sortability. While fixing the base32hex UUID sortability test in commit 89210037a0a, it turned out that the expected lexicographical order is only maintained under the C collation (or an equivalent byte-wise collation). Natural language collations may employ different rules, breaking the sortability. This commit updates the documentation to explicitly state that base32hex is "byte-wise sortable", ensuring users do not fall into the trap of using natural language collations when querying their encoded data. Co-Authored-by: Masahiko Sawada <[email protected]> Co-Authored-by: Andrey Borodin <[email protected]> Discussion: https://postgr.es/m/cad21aoawx1d6basguqxm0mzpxpwb07kgaoaaahjnhhenbdy...@mail.gmail.com --- doc/src/sgml/func/func-binarystring.sgml | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml index 0aaf9bc68f1..dc6b7e57ea7 100644 --- a/doc/src/sgml/func/func-binarystring.sgml +++ b/doc/src/sgml/func/func-binarystring.sgml @@ -778,18 +778,26 @@ <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-7"> RFC 4648 Section 7</ulink>. It uses the extended hex alphabet (<literal>0</literal>-<literal>9</literal> and - <literal>A</literal>-<literal>V</literal>) which preserves the lexicographical - sort order of the encoded data. The <function>encode</function> function + <literal>A</literal>-<literal>V</literal>) which preserves the sort order of + the encoded data when compared byte-wise. The <function>encode</function> function produces output padded with <literal>'='</literal>, while <function>decode</function> accepts both padded and unpadded input. Decoding is case-insensitive and ignores whitespace characters. </para> <para> - This format is useful for encoding UUIDs in a compact, sortable format: + This format is useful for encoding UUIDs in a compact, byte-wise sortable format: <literal>rtrim(encode(uuid_value::bytea, 'base32hex'), '=')</literal> produces a 26-character string compared to the standard 36-character UUID representation. </para> + <note> + <para> + To maintain the lexicographical sort order of the encoded data, + ensure that the text is sorted using the C collation + (e.g., using <literal>COLLATE "C"</literal>). Natural language + collations may sort characters differently and break the ordering. + </para> + </note> </listitem> </varlistentry> -- 2.53.0
