On Thu, Jul 10, 2025 at 5:48 AM Dominique Devienne <ddevie...@gmail.com> wrote:
> We store scientific information in PostgreSQL, and some of that is > bytea and large, thus we must "chunk it" both for performance, and not > be limited to 1GB (we do exceed that, in rare occasions). > > Recently I added md5/sha1 hashing support for such values (for various > reasons, to track corruptions in our ETLs, now fixed, but also in the > future for custom smart sync and such), and was surprised to discover > there are no aggregate versions of those functions, neither the > built-in md5 one (now with bytea overload), nor for pgcrypto. Did I > miss something? > > Any chance this might be added in the future? > > Seems so logical to me, that these hashing functions were available > are aggregates, I can't be the first one to think of that, can it? > > Thanks, --DD > > PS: The holly-grail IMHO, would be built-in support for hashing, with > intelligent lazy-compute and persistence correctly reset on changes. > Probably opt-in. > > PPS: Built-in Support for xxh64, or at least in official pgcrypto, > would also be nice. With aggregate! > > PPPS: I remember Oracle LOBs supporting a built-in implicit mime-type > attribute on them, which was used as the content-type with served over > the built-in WebDAV extension for the DB. I wish PostgreSQL had > something like that too. > > PPPPS: the lo extension is not viable for us, FWIW. All the above > should be opt-in on bytea columns IMHO. > Does it have to match the md5 of the 'unchunked' variant exactly? If not, maybe you can rig a custom aggregate that would just "hash amend" the chunks using the n-1 chunk has as salt, this would be fast and easy, at the cost of not matching the unchunked variant. I had to do something very similar with compression, I had a need to be able to compress bytea values with lz4 at SQL level due to limited support for extensions in the cloud. it works great...if a bit slow, and requires the database to handle the decompression. merlin