Hi Lukas,

On Sat, Feb 28, 2026 at 3:59 PM Lukas Fittl <[email protected]> wrote:
>
> Hi,
>
> See attached a patch that implements a new function,
> pg_buffercache_relation_stats(), which returns per-relfilenode
> statistics on the number of buffers, how many are dirtied/pinned, and
> their avg usage count.

Thank you for the proposal!

Paul A Jungwirth, Khoa Nguyen, and I reviewed this patch through the
Patch Review Workshop, and I'd like to share our comments.

>
> This can be used in monitoring scripts to know which relations are
> kept in shared buffers, to understand performance issues better that
> occur due to relations getting evicted from the cache. In our own
> monitoring tool (pganalyze) we've offered a functionality like this
> based on the existing pg_buffercache() function for a bit over a year
> now [0], and people have found this very valuable - but it doesn't
> work for larger database servers.
>
> Specifically, performing a query that gets this information can be
> prohibitively expensive when using large shared_buffers, and even on
> the default 128MB shared buffers there is a measurable difference:
>
> postgres=# WITH pg_buffercache_relation_stats AS (
> SELECT relfilenode, reltablespace, reldatabase, relforknumber,
>                                                 COUNT(*) AS buffers,
> COUNT(*) FILTER (WHERE isdirty) AS buffers_dirty,
> COUNT(*) FILTER (WHERE pinning_backends > 0) AS buffers_pinned,
> AVG(usagecount) AS usagecount_avg
> FROM pg_buffercache
> WHERE reldatabase IS NOT NULL
> GROUP BY 1, 2, 3, 4
>
>  )
> SELECT * FROM pg_buffercache_relation_stats WHERE relfilenode = 2659;
>
>  relfilenode | reltablespace | reldatabase | relforknumber | buffers |
> buffers_dirty | buffers_pinned |   usagecount_avg
> -------------+---------------+-------------+---------------+---------+---------------+----------------+--------------------
>         2659 |          1663 |           5 |             0 |       8 |
>             0 |              0 | 5.0000000000000000
>         2659 |          1663 |           1 |             0 |       7 |
>             0 |              0 | 5.0000000000000000
>         2659 |          1663 |      229553 |             0 |       7 |
>             0 |              0 | 5.0000000000000000
> (3 rows)
>
> Time: 20.991 ms
>
> postgres=# SELECT * FROM pg_buffercache_relation_stats() WHERE
> relfilenode = 2659;
>  relfilenode | reltablespace | reldatabase | relforknumber | buffers |
> buffers_dirty | buffers_pinned | usagecount_avg
> -------------+---------------+-------------+---------------+---------+---------------+----------------+----------------
>         2659 |          1663 |           1 |             0 |       7 |
>             0 |              0 |              5
>         2659 |          1663 |      229553 |             0 |       7 |
>             0 |              0 |              5
>         2659 |          1663 |           5 |             0 |       8 |
>             0 |              0 |              5
> (3 rows)
>
> Time: 2.912 ms
>
> With the new function this gets done before putting the data in the
> tuplestore used for the set-returning function.

Overall, we find that the proposed feature is useful. The proposed way
is much cheaper, especially when the number of per-relation stats is
not large.

Here are review comments on the v1 patch:

---
-   pg_buffercache--1.5--1.6.sql pg_buffercache--1.6--1.7.sql
+   pg_buffercache--1.5--1.6.sql pg_buffercache--1.6--1.7.sql \
+   pg_buffercache--1.7--1.8.sql

Since commit 4b203d499c6 bumped the version from 1.6 to 1.7 last
November, we think we don't need to bump the version again for this new
feature.

---
+/*
+ * Hash key for pg_buffercache_relation_stats — groups by relation identity.
+ */
+typedef struct
+{
+   RelFileNumber relfilenumber;
+   Oid         reltablespace;
+   Oid         reldatabase;
+   ForkNumber  forknum;
+} BufferRelStatsKey;
+
+/*
+ * Hash entry for pg_buffercache_relation_stats — accumulates per-relation
+ * buffer statistics.
+ */
+typedef struct
+{
+   BufferRelStatsKey key;      /* must be first */
+   int32       buffers;
+   int32       buffers_dirty;
+   int32       buffers_pinned;
+   int64       usagecount_total;
+} BufferRelStatsEntry;

Can we move these typedefs above function prototypes as other typedefs
are defined there?

---
+   relstats_hash = hash_create("pg_buffercache relation stats",
+                               128,
+                               &hash_ctl,
+                               HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);

It might be worth considering simplehash.h for even better performance.

---
+   while ((entry = (BufferRelStatsEntry *) hash_seq_search(&hash_seq)) != NULL)
+   {
+       if (entry->buffers == 0)
+           continue;
+

We might want to put CHECK_FOR_INTERRUPTS() here too as the number of
entries can be as many as NBuffers in principle.

---
We've discussed there might be room for improvement in the function
name. For example, pg_buffercache_relations instead of
pg_buffercache_relation_stats might be a good name, since everything
in this module
is stats.  if we drop "_stats" then "relation" should be plural, to
match other functions in the module ("pages", "os_pages",
"numa_pages", "usage_counts").

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


Reply via email to