I have a concern with base32hex_decode(). It only checks where the first =
appears,
but it does not validate the final group length or the required amount of
padding.
Because of that, some invalid inputs are accepted silently.

For example:

postgres=# SET bytea_output = hex;
SET
postgres=# SELECT '0' AS input, decode('0', 'base32hex');
 input | decode
-------+--------
 0     | \x
(1 row)

postgres=# SELECT '000' AS input , decode('000', 'base32hex');
 input | decode
-------+--------
 000   | \x00
(1 row)

postgres=# SELECT '24=' as input , decode('24=', 'base32hex');
 input | decode
-------+--------
 24=   | \x11
(1 row)

These looks good, but if we verify that with python:
% python3 - <<'PY'
import base64

tests = [
    "24",
    "24======",
    "0",
    "000",
    "24=",
]

for s in tests:
    try:
        out = base64.b32hexdecode(s, casefold=True)
        print(f"{s!r} -> OK {out.hex()}")
    except Exception as e:
        print(f"{s!r} -> ERROR: {e}")
PY

The outputs are:
'24' -> ERROR: Incorrect padding
'24======' -> OK 11
'0' -> ERROR: Incorrect padding
'000' -> ERROR: Incorrect padding
'24=' -> ERROR: Incorrect padding

I might be missing some context here, so I wanted to ask: is this behavior
intentional,
or would it make sense to enforce stricter validation for Base32hex input?

Best regards,

Chengxi Sun

Reply via email to