Hi Ashutosh,

On Tue, Mar 24, 2026 at 11:24 PM Ashutosh Bapat
<[email protected]> wrote:
> I know we already have a couple of hand-aggregation functions but I am
> hesitant to add more of these. Question is where do we stop? For
> example, the current function is useless if someone wants to find the
> parts of a relation which are hot since it doesn't include page
> numbers. Do we write another function for the same? Or we add page
> numbers to this function and then there's hardly any aggregation
> happening. What if somebody wanted to perform an aggregation more
> complex than just count() like average number of buffers per relation
> or distribution of relation buffers in the cache, do they write
> separate functions?

I think the problem this solves for, which is a very common question I
hear from end users, is "how much of this table/index is in cache" and
"was our query slow because the cache contents changed?".

It can't provide a perfect answer to all questions regarding what's in
the cache (i.e. it won't tell you which part of the table is cached),
but its in line with other statistics we do already provide in
pg_stat_user_tables etc., which are all aggregate counts, not further
breakdowns.

Its also a reasonable compromise on providing something usable that
can be shown on dashboards, as I've seen in collecting this
information using the existing methods from small production systems
in practice over the last ~1.5 years.

> Another problem is the maintenance cost these functions bring. For
> example, with the resizable shared buffer project we have another
> function to stress test.

Can you expand how your testing would be impacted? I hear you on not
adding many unnecessary functions, but the basic paradigm of how it
iterates over buffers here is very similar to the other functions in
pg_buffercache, its just shifting the aggregation to be at a different
level.

> Looking at the function, I see it uses a hash table to aggregate the
> data. To some extent it's duplicating the functionality we already
> have - aggregates using hashing. Are we going to duplicate
> functionality everywhere we require aggregation on top of a system
> function? These functions will then be missing any optimizations we do
> to hash aggregation in future. Can we instead investigate the reason
> the aggregation on top of pg_buffercache output requires so much more
> time than doing it in the function and fix that as much as we can? I
> know some slowness will come from tuplestore APIs, tuple formation and
> deformation but I won't expect it to be 10 times slower.

I don't think this is fixable outside the function, and I'd be
surprised if you could get comparable performance, unless you had an
extreme case where it was < 100 buffers per relation. There are just
too many layers involved where we'd keep the full set of buffer
entries vs the grouped version. I'm happy to be convinced otherwise,
but I won't be the one pushing forward that effort myself.

Thanks,
Lukas

-- 
Lukas Fittl


Reply via email to