Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

Chao Li Mon, 23 Mar 2026 19:25:35 -0700

> On Mar 24, 2026, at 09:17, Masahiko Sawada <[email protected]> wrote:
> 
> On Fri, Mar 20, 2026 at 7:24 AM Aleksander Alekseev
> <[email protected]> wrote:
>> 
>> Hi,
>> 
>>>> Also, a small nitpick is that we can use uint32 instead of uint64 for
>>>> 'bits_buffer'. I've attached the updated patch as well as the
>>>> difference from the previous version.
>>> 
>>> Then I suggest using uint32 for the bits_buffer variable in
>>> base32hex_encode() too. Also we should use 1U instead of 1ULL with
>>> uint32.
>> 
>> CI is not happy with the new test:
>> 
>> ```
>> SELECT decode('あ', 'base32hex'); -- error
>> -ERROR:  invalid symbol "あ" found while decoding base32hex sequence
>> +ERROR:  invalid symbol "ã" found while decoding base32hex sequence
>> ```
>> 
>> Although it passes locally. My best guess is that something is off
>> with the database encoding on CI and that we shouldn't use this test.
>> We have a similar test which uses ASCII symbols only.
> 
> Good catch. Yes, we should not use this test depending on the database
> encoding and it seems we can omit this test in the first place.
> 
> The patch looks basically good to me. I've made some changes to the
> regression test part as I want to have round-trip tests. I've merged
> the tests checking the sortability to the existing tests and added
> round-trip tests. With this change, we can test round-trip tests and
> sortability tests with random UUID value in every test run while
> minimizing the test time. Feedback is very welcome.
> 
> 
> Regards,
> 
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com
> <v11-0001-Add-base32hex-support-to-encode-and-decode-funct.patch>

V11 overall looks good to me. A couple of comments wrt the tests.

1
```
+               else
+                       ereport(ERROR,
+                                       
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                                        errmsg("invalid symbol \"%.*s\" found 
while decoding base32hex sequence",
+                                                       pg_mblen_range(s - 1, 
srcend), s - 1)));
```

I noticed the use of pg_mblen_range(). Since base32hex itself is single-ASCII 
only, my first thought was that we might not need a multibyte-aware function 
here. But then I realized that a multibyte string could still be passed to 
decode(), and pg_mblen_range() helps report the invalid multibyte character 
correctly in the error message.

I tested it like this:
```
evantest=# SELECT decode('你哈', 'base32hex');
ERROR:  invalid symbol "你" found while decoding base32hex sequence
```

The error message looks good to me. So maybe it would be worth adding a test 
case for decode() with multibyte characters.

2
```
+       accepts both padded and unpadded input. Decoding is case-insensitive 
and ignores
+       whitespace characters.
```

The doc says whitespace characters are silently ignored, so I tested this:
```
evantest=# SELECT decode('a a==', 'base32hex');
 decode
--------
 \x52
(1 row)

evantest=# SELECT decode(' a a==', 'base32hex');
 decode
--------
 \x52
(1 row)

evantest=# SELECT decode(' a a== ', 'base32hex');
 decode
--------
 \x52
(1 row)
```

It looks like leading, trailing, and embedded whitespace are all ignored. But I 
don’t see a test case covering this behavior, so maybe it would be good to add 
one.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

Reply via email to