On Wed, Mar 25, 2026 at 12:40 AM Masahiko Sawada <[email protected]> wrote:
>
> Hi Lukas,
>
> On Sat, Feb 28, 2026 at 3:59 PM Lukas Fittl <[email protected]> wrote:
> >
> > Hi,
> >
> > See attached a patch that implements a new function,
> > pg_buffercache_relation_stats(), which returns per-relfilenode
> > statistics on the number of buffers, how many are dirtied/pinned, and
> > their avg usage count.
>
> Thank you for the proposal!
>
> Paul A Jungwirth, Khoa Nguyen, and I reviewed this patch through the
> Patch Review Workshop, and I'd like to share our comments.
>
> >
> > This can be used in monitoring scripts to know which relations are
> > kept in shared buffers, to understand performance issues better that
> > occur due to relations getting evicted from the cache. In our own
> > monitoring tool (pganalyze) we've offered a functionality like this
> > based on the existing pg_buffercache() function for a bit over a year
> > now [0], and people have found this very valuable - but it doesn't
> > work for larger database servers.
> >
> > Specifically, performing a query that gets this information can be
> > prohibitively expensive when using large shared_buffers, and even on
> > the default 128MB shared buffers there is a measurable difference:
> >
> > postgres=# WITH pg_buffercache_relation_stats AS (
> > SELECT relfilenode, reltablespace, reldatabase, relforknumber,
> >                                                 COUNT(*) AS buffers,
> > COUNT(*) FILTER (WHERE isdirty) AS buffers_dirty,
> > COUNT(*) FILTER (WHERE pinning_backends > 0) AS buffers_pinned,
> > AVG(usagecount) AS usagecount_avg
> > FROM pg_buffercache
> > WHERE reldatabase IS NOT NULL
> > GROUP BY 1, 2, 3, 4
> >
> >  )
> > SELECT * FROM pg_buffercache_relation_stats WHERE relfilenode = 2659;
> >
> >  relfilenode | reltablespace | reldatabase | relforknumber | buffers |
> > buffers_dirty | buffers_pinned |   usagecount_avg
> > -------------+---------------+-------------+---------------+---------+---------------+----------------+--------------------
> >         2659 |          1663 |           5 |             0 |       8 |
> >             0 |              0 | 5.0000000000000000
> >         2659 |          1663 |           1 |             0 |       7 |
> >             0 |              0 | 5.0000000000000000
> >         2659 |          1663 |      229553 |             0 |       7 |
> >             0 |              0 | 5.0000000000000000
> > (3 rows)
> >
> > Time: 20.991 ms
> >
> > postgres=# SELECT * FROM pg_buffercache_relation_stats() WHERE
> > relfilenode = 2659;
> >  relfilenode | reltablespace | reldatabase | relforknumber | buffers |
> > buffers_dirty | buffers_pinned | usagecount_avg
> > -------------+---------------+-------------+---------------+---------+---------------+----------------+----------------
> >         2659 |          1663 |           1 |             0 |       7 |
> >             0 |              0 |              5
> >         2659 |          1663 |      229553 |             0 |       7 |
> >             0 |              0 |              5
> >         2659 |          1663 |           5 |             0 |       8 |
> >             0 |              0 |              5
> > (3 rows)
> >
> > Time: 2.912 ms
> >
> > With the new function this gets done before putting the data in the
> > tuplestore used for the set-returning function.
>
> Overall, we find that the proposed feature is useful. The proposed way
> is much cheaper, especially when the number of per-relation stats is
> not large.
>
> Here are review comments on the v1 patch:
>
> ---
> -   pg_buffercache--1.5--1.6.sql pg_buffercache--1.6--1.7.sql
> +   pg_buffercache--1.5--1.6.sql pg_buffercache--1.6--1.7.sql \
> +   pg_buffercache--1.7--1.8.sql
>
> Since commit 4b203d499c6 bumped the version from 1.6 to 1.7 last
> November, we think we don't need to bump the version again for this new
> feature.
>
> ---
> +/*
> + * Hash key for pg_buffercache_relation_stats — groups by relation identity.
> + */
> +typedef struct
> +{
> +   RelFileNumber relfilenumber;
> +   Oid         reltablespace;
> +   Oid         reldatabase;
> +   ForkNumber  forknum;
> +} BufferRelStatsKey;
> +
> +/*
> + * Hash entry for pg_buffercache_relation_stats — accumulates per-relation
> + * buffer statistics.
> + */
> +typedef struct
> +{
> +   BufferRelStatsKey key;      /* must be first */
> +   int32       buffers;
> +   int32       buffers_dirty;
> +   int32       buffers_pinned;
> +   int64       usagecount_total;
> +} BufferRelStatsEntry;
>
> Can we move these typedefs above function prototypes as other typedefs
> are defined there?
>
> ---
> +   relstats_hash = hash_create("pg_buffercache relation stats",
> +                               128,
> +                               &hash_ctl,
> +                               HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
>
> It might be worth considering simplehash.h for even better performance.
>
> ---
> +   while ((entry = (BufferRelStatsEntry *) hash_seq_search(&hash_seq)) != 
> NULL)
> +   {
> +       if (entry->buffers == 0)
> +           continue;
> +
>
> We might want to put CHECK_FOR_INTERRUPTS() here too as the number of
> entries can be as many as NBuffers in principle.
>
> ---
> We've discussed there might be room for improvement in the function
> name. For example, pg_buffercache_relations instead of
> pg_buffercache_relation_stats might be a good name, since everything
> in this module
> is stats.  if we drop "_stats" then "relation" should be plural, to
> match other functions in the module ("pages", "os_pages",
> "numa_pages", "usage_counts").

I know we already have a couple of hand-aggregation functions but I am
hesitant to add more of these. Question is where do we stop? For
example, the current function is useless if someone wants to find the
parts of a relation which are hot since it doesn't include page
numbers. Do we write another function for the same? Or we add page
numbers to this function and then there's hardly any aggregation
happening. What if somebody wanted to perform an aggregation more
complex than just count() like average number of buffers per relation
or distribution of relation buffers in the cache, do they write
separate functions?

Another problem is the maintenance cost these functions bring. For
example, with the resizable shared buffer project we have another
function to stress test.

Looking at the function, I see it uses a hash table to aggregate the
data. To some extent it's duplicating the functionality we already
have - aggregates using hashing. Are we going to duplicate
functionality everywhere we require aggregation on top of a system
function? These functions will then be missing any optimizations we do
to hash aggregation in future. Can we instead investigate the reason
the aggregation on top of pg_buffercache output requires so much more
time than doing it in the function and fix that as much as we can? I
know some slowness will come from tuplestore APIs, tuple formation and
deformation but I won't expect it to be 10 times slower.

-- 
Best Wishes,
Ashutosh Bapat


Reply via email to