This is the patch.

shawn wang <[email protected]> 于2026年3月23日周一 21:19写道:

> Hi hackers,
>
>  == Motivation ==
>
> We operate a fleet of PostgreSQL instances with logical replication. On
> several occasions, we have experienced production incidents where logical
> decoding spill files (pg_replslot/<slot>/xid-*.spill) grew uncontrollably —
> consuming tens of gigabytes and eventually filling up the data disk. This
> caused the entire instance to go read-only, impacting not just replication
> but all write workloads.
>
> The typical scenario is a large transaction (e.g. bulk data load or a
> long-running DDL) combined with a subscriber that is either slow or
> temporarily disconnected. The reorder buffer exceeds
> logical_decoding_work_mem and starts spilling, but there is no upper bound
> on how much can be spilled. The only backstop today is the OS returning
> ENOSPC, at which point the damage is already done.
>
> We looked for existing protections:
>
>    - max_slot_wal_keep_size: limits WAL retention, but does not affect
>    spill files at all.
>    - logical_decoding_work_mem: controls *when* spilling starts, but not
>    *how much* can be spilled.
>    - There is no existing GUC, patch, or commitfest entry that addresses
>    spill file disk quota.
>
>
> The "Report reorder buffer size" patch (CF #6053, by Ashutosh Bapat)
> improves observability of reorder buffer state, which is complementary —
> but observability alone cannot prevent disk-full incidents.
>
> == Proposed solution ==
>
> The attached patch adds a new GUC:
> logical_decoding_spill_limit (integer, unit kB, default 0)
>
> When set to a positive value, it limits the total size of on-disk spill
> files per replication slot. Key design points:
>
>    1. Tracking: We add two new fields: - ReorderBuffer.spillBytesOnDisk —
>    current total on-disk spill size for this slot (unlike spillBytes which is
>    a cumulative statistic counter, this is a live gauge). -
>    ReorderBufferTXN.serialized_size — per-transaction on-disk size, so we can
>    accurately decrement the global counter during cleanup.
>    2. Increment: In ReorderBufferSerializeChange(), after a successful
>    write(), both counters are incremented by the size written.
>    3. Decrement: In ReorderBufferRestoreCleanup(), when spill files are
>    unlinked, the global counter is decremented by the transaction's
>    serialized_size.
>    4. Enforcement: In ReorderBufferCheckMemoryLimit(), before calling
>    ReorderBufferSerializeTXN(), we check: if (spillBytesOnDisk + txn->size >
>    spill_limit) ereport(ERROR, ...) This is only checked on the spill-to-disk
>    path — not on the streaming path (which involves no disk I/O).
>    5. Behavior on limit exceeded: An ERROR is raised with
>    ERRCODE_CONFIGURATION_LIMIT_EXCEEDED. The walsender exits, but the slot's
>    restart_lsn and confirmed_flush are preserved. The subscriber can reconnect
>    after the DBA:
>       1. increases logical_decoding_spill_limit, or
>       2. increases logical_decoding_work_mem (to reduce spilling), or
>       3. switches to a streaming-capable output plugin (which avoids
>       spilling entirely).
>    6. Default 0 means unlimited — fully backward compatible.
>
> == Why per-slot, not global? ==
>
> Each ReorderBuffer instance lives in a single walsender process and
> corresponds to exactly one replication slot. A per-slot limit is:
>
>    - Lock-free (no shared memory coordination needed)
>    - Simple to reason about (each slot has its own budget)
>    - Sufficient to protect against disk-full (the DBA sets the limit
>    based on available disk / number of slots)
>
> A global (cross-slot) limit could be layered on top later if needed, but
> would require shared-memory counters with spinlock/atomic protection.
>
> == Performance impact ==
>
>    - Hot path (in-memory change queuing): zero overhead.
>    - Spill path: one integer comparison before serialization, one integer
>    addition after write() — negligible compared to the I/O cost.
>    - Cleanup path: one integer subtraction after unlink() — negligible.
>
>
> Looking forward to feedback.
> Thanks,
> Shawn.
>

Attachment: 0001-Add-logical_decoding_spill_limit-GUC-to-cap-spill-file-limit.patch
Description: Binary data

Reply via email to