Hi Lukas,
On Sat, Feb 28, 2026 at 3:59 PM Lukas Fittl <[email protected]> wrote:
>
> Hi,
>
> See attached a patch that implements a new function,
> pg_buffercache_relation_stats(), which returns per-relfilenode
> statistics on the number of buffers, how many are dirtied/pinned, and
> their avg usage count.
Thank you for the proposal!
Paul A Jungwirth, Khoa Nguyen, and I reviewed this patch through the
Patch Review Workshop, and I'd like to share our comments.
>
> This can be used in monitoring scripts to know which relations are
> kept in shared buffers, to understand performance issues better that
> occur due to relations getting evicted from the cache. In our own
> monitoring tool (pganalyze) we've offered a functionality like this
> based on the existing pg_buffercache() function for a bit over a year
> now [0], and people have found this very valuable - but it doesn't
> work for larger database servers.
>
> Specifically, performing a query that gets this information can be
> prohibitively expensive when using large shared_buffers, and even on
> the default 128MB shared buffers there is a measurable difference:
>
> postgres=# WITH pg_buffercache_relation_stats AS (
> SELECT relfilenode, reltablespace, reldatabase, relforknumber,
> COUNT(*) AS buffers,
> COUNT(*) FILTER (WHERE isdirty) AS buffers_dirty,
> COUNT(*) FILTER (WHERE pinning_backends > 0) AS buffers_pinned,
> AVG(usagecount) AS usagecount_avg
> FROM pg_buffercache
> WHERE reldatabase IS NOT NULL
> GROUP BY 1, 2, 3, 4
>
> )
> SELECT * FROM pg_buffercache_relation_stats WHERE relfilenode = 2659;
>
> relfilenode | reltablespace | reldatabase | relforknumber | buffers |
> buffers_dirty | buffers_pinned | usagecount_avg
> -------------+---------------+-------------+---------------+---------+---------------+----------------+--------------------
> 2659 | 1663 | 5 | 0 | 8 |
> 0 | 0 | 5.0000000000000000
> 2659 | 1663 | 1 | 0 | 7 |
> 0 | 0 | 5.0000000000000000
> 2659 | 1663 | 229553 | 0 | 7 |
> 0 | 0 | 5.0000000000000000
> (3 rows)
>
> Time: 20.991 ms
>
> postgres=# SELECT * FROM pg_buffercache_relation_stats() WHERE
> relfilenode = 2659;
> relfilenode | reltablespace | reldatabase | relforknumber | buffers |
> buffers_dirty | buffers_pinned | usagecount_avg
> -------------+---------------+-------------+---------------+---------+---------------+----------------+----------------
> 2659 | 1663 | 1 | 0 | 7 |
> 0 | 0 | 5
> 2659 | 1663 | 229553 | 0 | 7 |
> 0 | 0 | 5
> 2659 | 1663 | 5 | 0 | 8 |
> 0 | 0 | 5
> (3 rows)
>
> Time: 2.912 ms
>
> With the new function this gets done before putting the data in the
> tuplestore used for the set-returning function.
Overall, we find that the proposed feature is useful. The proposed way
is much cheaper, especially when the number of per-relation stats is
not large.
Here are review comments on the v1 patch:
---
- pg_buffercache--1.5--1.6.sql pg_buffercache--1.6--1.7.sql
+ pg_buffercache--1.5--1.6.sql pg_buffercache--1.6--1.7.sql \
+ pg_buffercache--1.7--1.8.sql
Since commit 4b203d499c6 bumped the version from 1.6 to 1.7 last
November, we think we don't need to bump the version again for this new
feature.
---
+/*
+ * Hash key for pg_buffercache_relation_stats — groups by relation identity.
+ */
+typedef struct
+{
+ RelFileNumber relfilenumber;
+ Oid reltablespace;
+ Oid reldatabase;
+ ForkNumber forknum;
+} BufferRelStatsKey;
+
+/*
+ * Hash entry for pg_buffercache_relation_stats — accumulates per-relation
+ * buffer statistics.
+ */
+typedef struct
+{
+ BufferRelStatsKey key; /* must be first */
+ int32 buffers;
+ int32 buffers_dirty;
+ int32 buffers_pinned;
+ int64 usagecount_total;
+} BufferRelStatsEntry;
Can we move these typedefs above function prototypes as other typedefs
are defined there?
---
+ relstats_hash = hash_create("pg_buffercache relation stats",
+ 128,
+ &hash_ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
It might be worth considering simplehash.h for even better performance.
---
+ while ((entry = (BufferRelStatsEntry *) hash_seq_search(&hash_seq)) != NULL)
+ {
+ if (entry->buffers == 0)
+ continue;
+
We might want to put CHECK_FOR_INTERRUPTS() here too as the number of
entries can be as many as NBuffers in principle.
---
We've discussed there might be room for improvement in the function
name. For example, pg_buffercache_relations instead of
pg_buffercache_relation_stats might be a good name, since everything
in this module
is stats. if we drop "_stats" then "relation" should be plural, to
match other functions in the module ("pages", "os_pages",
"numa_pages", "usage_counts").
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com