Hi,

Orphaned prepared transactions cause escalating harm the longer they
persist:


   1. *Lock retention* :  All locks (row-level, table-level, advisory)
   acquired during the transaction are held indefinitely, blocking concurrent
   DML and DDL
   2. *VACUUM blockage :* The prepared transaction's XID becomes the oldest
   running transaction, preventing VACUUM from freezing tuples or reclaiming
   dead rows across the entire cluster, leading to table and index bloat
   3. *Transaction ID wraparound risk *: In extreme cases, the frozen XID
   horizon cannot advance, eventually threatening XID wraparound shutdown
   4. *Resource consumption *: Shared memory slots
   (max_prepared_transactions) remain occupied; the WAL records for the
   prepared state persist

Today, the only remediation is manual intervention: a DBA must discover the
orphan (via pg_prepared_xacts), determine it's truly abandoned, and issue
ROLLBACK PREPARED. PostgreSQL already has timeout-based safety nets for
other "stuck" session states such as, idle_in_transaction_session_timeout,
idle_session_timeout, statement_timeout, but no equivalent for prepared
transactions. This patch fills that gap.


*How it works ?*

CleanupOrphanedPreparedTransactions():

  Phase 1 — Collect candidates (under TwoPhaseStateLock, shared mode):
    for each GlobalTransactionData (gxact) in TwoPhaseState:
      if gxact->valid AND
         TimestampDifferenceExceeds(gxact->prepared_at, now, timeout):
        save gxact->gid to candidate list

  Phase 2 — Roll back each candidate (lock released):
    for each saved GID:
      lock = LockGXactForCleanup(gid)
      if lock succeeded:
        FinishPreparedTransaction(gid, isCommit=false)
        log: "rolling back orphaned prepared transaction %s"


*Safety Properties*

   1. Timeout = 0 (default): Feature is completely disabled, no behavior
   change from default PostgreSQL
   2. No false positives on active transactions: The check uses
   prepared_at, which is set once at PREPARE TRANSACTION time. A transaction
   that is actively being committed/rolled back by a client will either
   complete before the timeout or be skipped by LockGXactForCleanup (which
   returns NULL if the gxact is already locked by another backend)
   3. Crash-safe: If the checkpointer crashes during cleanup, the prepared
   transaction's WAL state is unchanged, it remains prepared and will be
   cleaned up after recovery
   4. Idempotent: If the GID was already resolved between Phase 1 and Phase
   2, LockGXactForCleanup returns NULL and the cleanup is silently skipped


-- 
Regards,
Nikhil Chawla
Twitter <https://twitter.com/chawlanikhil24> | LinkedIn
<http://linkedin.com/in/chawlanikhil24>

Attachment: 0001-Add-prepared_orphaned_transaction_timeout-GUC.patch
Description: Binary data

Reply via email to