I have a concern with base32hex_decode(). It only checks where the first =
appears,
but it does not validate the final group length or the required amount of
padding.
Because of that, some invalid inputs are accepted silently.
For example:
postgres=# SET bytea_output = hex;
SET
postgres=# SELECT '0' AS input, decode('0', 'base32hex');
input | decode
-------+--------
0 | \x
(1 row)
postgres=# SELECT '000' AS input , decode('000', 'base32hex');
input | decode
-------+--------
000 | \x00
(1 row)
postgres=# SELECT '24=' as input , decode('24=', 'base32hex');
input | decode
-------+--------
24= | \x11
(1 row)
These looks good, but if we verify that with python:
% python3 - <<'PY'
import base64
tests = [
"24",
"24======",
"0",
"000",
"24=",
]
for s in tests:
try:
out = base64.b32hexdecode(s, casefold=True)
print(f"{s!r} -> OK {out.hex()}")
except Exception as e:
print(f"{s!r} -> ERROR: {e}")
PY
The outputs are:
'24' -> ERROR: Incorrect padding
'24======' -> OK 11
'0' -> ERROR: Incorrect padding
'000' -> ERROR: Incorrect padding
'24=' -> ERROR: Incorrect padding
I might be missing some context here, so I wanted to ask: is this behavior
intentional,
or would it make sense to enforce stricter validation for Base32hex input?
Best regards,
Chengxi Sun