On Wed, 25 Mar 2026 at 21:40, David Rowley <[email protected]> wrote:
>
> On Thu, 26 Mar 2026 at 03:46, Matthias van de Meent
> <[email protected]> wrote:
> > Attached is v2, which adds the new function to the docs, in addition
> > to rebasing the patch onto master.
>
> In [1] I was looking into reported bugs with the datum_image code not
> behaving correctly. It turns out this is due to hash_numeric() using
> the wrong PG_RETURN_* macro, resulting in it not correctly
> sign-extending negative 32-bit ints.

Thank you for refering me to this.

> I did some analysis to see if
> there were any user-visible changes from fixing that in the back
> branches. I didn't find any.  What you're adding here would add one. I
> think we might need to become more strict about using the correct
> return macro before we expose this stuff, else it's going to be a
> rather scary process to fix bugs when it could affect the
> datum_image_equal result.

I'm not sure we can reliably enforce that sign extension is always
going to be consistent for the same type; specifically looking at
unsigned integers as example. PG's tuple deserialization would see 4
byte alignment and sign-extend it as if it were a signed integer, but
returned values would prefer non- sign-extended values for more
efficient casts from larger unsigned types.

> The thing about Memoize slowly finding all the bugs in the
> datum_image_* code is that we're free to fix these bugs, as the use
> case (the Memoize hash table) only lasts as long as the query.  If we
> expose all of this to users, then they could persist things in tables.
> 6911f8037 fixed another problem that resulted in the datum_image code
> not doing the right thing.

I'm happy to mark this function as STABLE for now (to prevent its
inclusion in permanent storage), and/or to adjust the code to adjust
for subnormal inputs (values with incorrect/inconsistent/unexpected
sign extensions).

> I also proposed a hack in [2] to fix the sign-extension problem. I had
> hoped that might be temporary, but if we were to expose this function
> at the SQL level, then I doubt we'd be brave enough to remove it.

pg_datum_image_equal is supposed to only care about the bytes of the
datum that contain the actual value, so I can surely update it to do
that. I'll be happy to only apply that part to the code inside
pg_datum_image_equal; that should remove the part that exposes the
issue directly to users.

> This would work correctly if I pushed the patch in [2]. But as of yet,
> I'm not sure about it. I at least think we should delay doing this
> until we've come up with a fix that we're confident in.

I'm happy with any of the options: Having your patch at [2] get
committed, me applying [2]'s changes in pg_datum_image_equal, or
marking the function STABLE (and so, disallowing using its output in
permanent storage).

Note that false negatives are unlikely to be a problem, whilst false
positives might have issues. Sign extension issues here would create
false negatives, so likely wouldn't be a huge problem.

Kind regards,

Matthias van de Meent
Databricks (https://www.databricks.com)

PS. isn't this an issue with OIDs getting read from disk vs copied
around in memory, too? Oids are unsigned, whilst the deserialization
path assumes signed, so I'd expect that to have different outputs when
the from-disk OID you're checking has its not-sign bit 31 set?

> [2] 
> https://postgr.es/m/CAApHDvreF-UiqBaHtRTQWQ6z1X9snstJW%2Bdfb2DU5GOb-uPEBg%40mail.gmail.com


Reply via email to