Re: Adding REPACK [concurrently]

Srinath Reddy Sadipiralla Tue, 24 Mar 2026 19:57:34 -0700

Hallo Alvaro,

On Wed, Mar 25, 2026 at 4:02 AM Alvaro Herrera <[email protected]>
wrote:


>
> Many thanks for the review.  I have applied fixes for these, so here's
> v44.
>

Thanks for the patches.

- 0004 is Antonin's bugfix from the crash reported by Srinath.
>

I think it's "0004 is Srinath's bugfix from the crash reported by Srinath."
;-)
after i provided the analysis and fix for the crash[1], Antonin tried to
reproduce
this crash using isolation tester , for this he even proposed changes to
isolation tester (so cool ... btw i reviewed it) [2] .

i have done another round of stress testing on V43 , this time with more
tests,
as i did previously [3] did concurrency test - went well,

crash test:
after crash i observed that repack worker files are cleaned during server
restart
using RemovePgTempFiles but the transient table relation files remains
not cleaned up, maybe we can do cleanup for this as well during server
restart,
I will think about this more.

physical replication test where I did REPACK (concurrently) on primary and
checked if data is intact using the 4 verifications I did here [3] on
replica - went well.

Then as suggested by Alvaro off-list I checked the lock upgrade behavior
during the table swap phase. I observed that if another transaction holds a
conflicting lock on the table when the swap is attempted, it can lead to
“transient table” data loss during a manual or timeout abort.
when a REPACK (concurrent) waits for a conflicting lock to be released and
eventually hits a
lock_timeout (or is cancelled via ctrl+c), the transaction aborts. During
this abort,
the cleanup process triggers smgrDoPendingDeletes. This results in the
removal
of all transient table relfiles and decoder worker files created during the
process.
This effectively wipes out the work done by the transient table creation
before
the swap could successfully complete, this happens because during transient
table creation we add the table to the PendingRelDelete list.


rebuild_relation→make_new_heap->heap_create_with_catalog→heap_create→table_relation_set_new_filelocator→RelationCreateStorage
/*
* Add the relation to the list of stuff to delete at abort, if we are
* asked to do so.
*/
if (register_delete)
{
PendingRelDelete *pending;
pending = (PendingRelDelete *)
MemoryContextAlloc(TopMemoryContext, sizeof(PendingRelDelete));
pending->rlocator = rlocator;
pending->procNumber = procNumber;
pending->atCommit = false; /* delete if abort */
pending->nestLevel = GetCurrentTransactionNestLevel();
pending->next = pendingDeletes;
pendingDeletes = pending;
}
and the base/5/pgsql_tmp/ files also gets unlinked during the decoding
worker cleanup,
I think this cleanup of transient table relfiles and decoder files makes
sense because
we don’t have any resume operation in which we can re-use the transient
table’s files,
please correct me if I am not getting your point here.

test scenario:

session 1:
postgres=# repack (concurrently) stress_victim;
had a breakpoint rebuild_relation_finish_concurrent->
LockRelationOid(old_table_oid, AccessExclusiveLock); just before getting
the exclusive lock.
with lock_timeout = 5s

session 2:
postgres=# BEGIN;
SELECT * FROM stress_victim LIMIT 1;
-- left it open
BEGIN
 id  | balance |
           payload
-----+---------+---------------------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
--------------
 170 |      65 | d12f400c4d0d3c49818f88597e16cf29
d12f400c4d0d3c49818f88597e16cf29d12f400c4d0d3c498
18f88597e16cf29d12f400c4d0d3c49818f88597e16cf29d1
2f400c4d0d3c49818f88597e16cf29d12f400c4d0d3c49818
f88597e16cf29
(1 row)
-- this gets us a conflicting lock (AccessShareLock) on the same table,
REPACK (concurrently) is running on.

session 1:
release the breakpoint and now the backend waits for the conflicting lock
to be released.
in between if lock_timeout occurs then transaction aborts.
postgres=# repack (concurrently) stress_victim;
ERROR:  canceling statement due to lock timeout
CONTEXT:  waiting for AccessExclusiveLock on relation 16637 of database 5

Now we can see the transient table relfiles and decoder worker files
getting cleaned up.

[1] -
https://www.postgresql.org/message-id/CAFC%2Bb6qk3-DQTi43QMqvVLP%2BsudPV4vsLQm5iHfcCeObrNaVyA%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/flat/4703.1774250534%40localhost
[3] -
https://www.postgresql.org/message-id/CAFC%2Bb6o2yzA80YmfEhmMO9puN8qvGRvr-15BBLn3UmJxPfpr2w%40mail.gmail.com

-- 
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/

Re: Adding REPACK [concurrently]

Reply via email to