Hi Hackers, I'd like to propose a new system view, pg_stat_wal_records, that exposes per-resource-manager, per-record-type WAL generation counts.
*Sample Output:*
postgres=# SELECT * FROM pg_stat_wal_records ORDER BY count DESC LIMIT 10;
resource_manager | record_type | count | stats_reset
------------------+----------------+--------+-------------------------------
Heap | INSERT | 500000 | 2026-03-26 22:15:00.12345+00
Transaction | COMMIT | 500000 |
Btree | INSERT_LEAF | 53821 |
Heap | HOT_UPDATE | 12744 |
XLOG | FPI | 8923 |
*The Gap:*
Postgre already has pg_stat_wal for aggregate WAL volume (bytes, full-page
images, buffers), and pg_walinspect (superuser access required) for
post-hoc forensic analysis of individual WAL segments. But I don't see a
lightweight, observability tool that answers in real time which record
types are responsible for the WAL. Additionally, pg_walinspect runs against
on-disk WAL files, which is expensive. This view will be useful for
monitoring systems to poll cheaply.
*Use cases:*
WAL volume investigation: see which record types dominate WAL generation in
real time without touching disk.
Monitoring integration: Prometheus/Grafana can poll the view to track WAL
composition over time and alert on anomalies.
Replication tuning: identify whether WAL volume is dominated by data
changes, index maintenance, FPIs, or vacuum activity to guide tuning.
Extension debugging: custom WAL resource managers get visibility
automatically.
*Key design decisions*
*Counting mechanism:*
The counting mechanism is a single backend-local array increment in
XLogInsert():
pgstat_pending_wal_records[rmid][(info >> 4) & 0x0F]++;
This indexes into a uint64[256][16] array (32 KB per backend) using the
rmgr ID and the 4-bit record-type subfield of the WAL info byte. Counters
are flushed to shared memory via the standard pgstat infrastructure.
I am using per-backend pending array instead of direct shared-memory
writes. The counter is incremented in backend-local memory and flushed to
shared memory by the existing pgstat flush cycle. Don't expect to see any
contention in the hot path (please see perf results below).
Fixed 256×16 matrix. All 256 possible rmgr IDs × 16 possible record types.
This accommodates core resource managers and any custom WAL resource
managers from extensions without configuration. The 32 KB per-backend cost
is modest. Uses rm_identify() for human-readable names. The SRF calls each
resource manager's rm_identify callback to translate the info byte into a
readable record type name (for example INSERT, COMMIT, VACUUM, HOT_UPDATE).
Added the reset functionality via pg_stat_reset_shared('wal_records'),
consistent with the existing pattern for wal, bgwriter, archiver, etc.
View skips zero-count entries, keeping output clean.
*Performance overhead*
Benchmarked with pgbench (scale 50, 16 clients, 16 threads, 30s,
synchronous_commit=off) on 64 vCPU machine with data and WAL on NVMe:
*Configuration Avg TPS*
With patch 42,266
Without patch 42,053
The overhead is within measurement noise (~0.5%). The increment hits a
backend-local, L1-hot array and is dwarfed by XLogInsert's existing CRC,
locking, and memcpy work.
Attached a draft patch, please share your thoughts.
Thanks,
Satya
v1-0001-pg-stat-wal-records.patch
Description: Binary data
