Re: pg_buffercache: Add per-relation summary stats

Ashutosh Bapat Wed, 25 Mar 2026 21:21:29 -0700

On Wed, Mar 25, 2026 at 10:19 PM Masahiko Sawada <[email protected]> wrote:
>
> On Tue, Mar 24, 2026 at 11:47 PM Lukas Fittl <[email protected]> wrote:
> >
> > Hi Ashutosh,
> >
> > On Tue, Mar 24, 2026 at 11:24 PM Ashutosh Bapat
> > <[email protected]> wrote:
> > > I know we already have a couple of hand-aggregation functions but I am
> > > hesitant to add more of these. Question is where do we stop? For
> > > example, the current function is useless if someone wants to find the
> > > parts of a relation which are hot since it doesn't include page
> > > numbers. Do we write another function for the same? Or we add page
> > > numbers to this function and then there's hardly any aggregation
> > > happening. What if somebody wanted to perform an aggregation more
> > > complex than just count() like average number of buffers per relation
> > > or distribution of relation buffers in the cache, do they write
> > > separate functions?
> >
> > I think the problem this solves for, which is a very common question I
> > hear from end users, is "how much of this table/index is in cache" and
> > "was our query slow because the cache contents changed?".
> >
> > It can't provide a perfect answer to all questions regarding what's in
> > the cache (i.e. it won't tell you which part of the table is cached),
> > but its in line with other statistics we do already provide in
> > pg_stat_user_tables etc., which are all aggregate counts, not further
> > breakdowns.
> >
> > Its also a reasonable compromise on providing something usable that
> > can be shown on dashboards, as I've seen in collecting this
> > information using the existing methods from small production systems
> > in practice over the last ~1.5 years.
>
> Regarding the proposed statistics, I find them reasonably useful for
> many users. I'm not sure we need to draw a strict line on what belongs
> in the module. If a proposed function does exactly what most
> pg_buffercache users want or are already writing themselves, that is
> good enough motivation to include it.
>
> I think pg_visibility is a good precedent here. In that module, we
> have both pg_visibility_map() and pg_visibility_map_summary(), even
> though we can retrieve the exact same results as the latter by simply
> using the former:
>
> select sum(all_visible::int), sum(all_frozen::int) from
> pg_visibility_map('test') ;
>


A summary may still be ok, but this proposal is going a bit farther,
it's grouping by one subset which should really be done by GROUP BY in
SQL. And I do

I am afraid that at some point, we will start finding all of these to
be a maintenance burden. At that point, removing them will become a
real pain for the backward compatibility reason. For example
1. The proposed function is going to add one more test to an already
huge testing exercise for shared buffers resizing.
2. If we change the way to manage buffer cache e.g. use a tree based
cache instead of hash + array cache, each of the functions which
traverses the buffer cache array is going to add work - adjusting it
to the new data structure - and make a hard project even harder. In
this case we have other ways to get the summary, so the code level
scan of buffer cache is entirely avoidable.

If I am the only one opposing it, and there are more senior
contributors in favour of adding this function, we can accept it.

-- 
Best Wishes,
Ashutosh Bapat

Re: pg_buffercache: Add per-relation summary stats

Reply via email to