On Tue, Mar 24, 2026 at 2:42 PM Bharath Rupireddy
<[email protected]> wrote:
>
> Hi,
>
> On Mon, Mar 23, 2026 at 4:36 PM Masahiko Sawada <[email protected]> wrote:
> >
> > I've studied the discussion on this thread and the patch. I understand
> > the purpose of this feature and agree that it's useful especially in
> > cases where orphaned (physical or logical) replication slots prevent
> > the xmin from advancing and inactive_since based slot invalidation
> > might not fit.
> >
> > And +1 for treating both the slot's xmin and catalog_xmin similarly
> > with the single GUC.
>
> Thanks for reviewing the patch.
>
> > > >> I made the following design choice: try invalidating only once per
> > > >> vacuum cycle, not per table. While this keeps the cost of checking
> > > >> (incl. the XidGenLock contention) for invalidation to a minimum when
> > > >> there are a large number of tables and replication slots, it can be
> > > >> less effective when individual tables/indexes are large. Invalidating
> > > >> during checkpoints can help to some extent with the large table/index
> > > >> cases. But I'm open to thoughts on this.
> > > >
> > > > It may not solve the intent when the vacuum cycle is longer, which one
> > > > can expect on a large database particularly when there is heavy bloat.
> > >
> > > This design choice boils down to the following: a database instance
> > > having either 1/ a large number of small tables or 2/ large tables.
> > > From my experience, I have seen both cases but mostly case 2 (others
> > > can correct me). In this context, having an XID age based slot
> > > invalidation check once per relation makes sense. However, I'm open to
> > > more thoughts here.
> >
> > ISTM that checking the XID-based slot invalidation per table would be
> > more bullet-proof and cover many cases. How about checking the
> > XID-based slot invalidation opportunity only when the OldestXmin is
> > older than the new GUC? For example, we can do this check in
> > heap_vacuum_rel() based on the VacuumCutoffs returned by
> > vacuum_get_cutoffs(). If we invalidate at least one slot for its XID,
> > we can re-compute the OldestXmin.
>
> Agreed. Here's the patch that moves the XID-age based slot
> invalidation check to vacuum_get_cutoffs. This has some nice
> advantages: 1/ It makes the check once per table (to help with large
> tables). 2/ It makes the check less costly since we rely on already
> computed OldestXmin and nextXID values. 3/ It avoids the checkpointer
> to do XID-age based slot invalidation which keeps the usage of this
> GUC simple with no additional costs to the checkpointer - just the
> vacuum (both vacuum command and autovacuum) does the invalidation when
> needed.
>
> I moved the new tests to the existing TAP test file
> t/019_replslot_limit.pl alongside other invalidation tests.
>
> I added detailed comments around InvalidateXIDAgedReplicationSlots and
> slightly modified the docs.
>
> Please find the v3 patch for further review.
Thank you for updating the patch. I think the patch is reasonably
simple and can avoid unnecessary overheads well due to XID-based
checks. Here are some comments:
+ /*
+ * Try to invalidate XID-aged replication slots that may interfere with
+ * vacuum's ability to freeze and remove dead tuples. Since OldestXmin
+ * already covers the slot xmin/catalog_xmin values, pass it as a
+ * preliminary check to avoid additional iteration over all the slots.
+ *
+ * If at least one slot was invalidated, recompute OldestXmin so that this
+ * vacuum benefits from the advanced horizon immediately.
+ */
+ if (InvalidateXIDAgedReplicationSlots(cutoffs->OldestXmin, nextXID))
+ {
+ cutoffs->OldestXmin = GetOldestNonRemovableTransactionId(rel);
+ Assert(TransactionIdIsNormal(cutoffs->OldestXmin));
+ }
vacuum_get_cutoff() is also called by VACUUM FULL, CLUSTER, and
REPACK. I'm not sure that users would expect the slot invalidation
also in these commands. I think it's better to leave
vacuum_get_cutoff() a pure cutoff computation function and we can try
to invalidate slots in heap_vacuum_rel(). It requires additional
ReadNextTransactionId() but we can live with it, or we can make
vacuum_get_cutoffs() return the nextXID as well (stored in *cutoffs).
---
+ /* ensure it's a "normal" XID, else TransactionIdPrecedes misbehaves */
+ /* this can cause the limit to go backwards by 3, but that's OK */
+ if (!TransactionIdIsNormal(cutoffXID))
+ cutoffXID = FirstNormalTransactionId;
+
+ if (TransactionIdPrecedes(oldestXmin, cutoffXID))
+ {
+ invalidated = InvalidateObsoleteReplicationSlots(RS_INVAL_XID_AGE,
+ 0,
+ InvalidOid,
+ InvalidTransactionId,
+ nextXID);
+ }
I think it's better to check the procArray->replication_slot_xmin and
procArray->replication_slot_catalog_xmin before iterating over each
slot. Otherwise, we would end up checking every slot even when a long
running transaction holds the oldestxmin back.
---
+ if (cutoffXID < FirstNormalTransactionId)
+ cutoffXID -= FirstNormalTransactionId;
and
+ if (!TransactionIdIsNormal(cutoffXID))
+ cutoffXID = FirstNormalTransactionId;
These codes have the same comment but are doing a slightly different
thing. I guess the latter is missing '-'?
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com