Fix memory ordering in WAIT FOR LSN wakeup mechanism WAIT FOR LSN uses a Dekker-style handshake: the waker stores an LSN position then reads minWaitedLSN; the waiter stores its target into minWaitedLSN then reads the position. Without a barrier between each side's store and load, a CPU may satisfy the load before the store becomes globally visible, causing either side to miss a concurrent update. The result is a missed wakeup: the waiter sleeps indefinitely until the next unrelated event.
Fix by embedding the required barriers into the atomic operations on minWaitedLSN: - In updateMinWaitedLSN(), use pg_atomic_write_membarrier_u64() so the waiter's preceding heap update is visible before the new minWaitedLSN value is published. - In WaitLSNWakeup(), use pg_atomic_read_membarrier_u64() in the fast-path check so the waker's preceding position store is globally visible before minWaitedLSN is read. The waiter side is also covered by the barrier semantics already present in GetCurrentLSNForWaitType(): GetWalRcvWriteRecPtr() uses an explicit read barrier (from patch 0001), while the remaining getters acquire a spinlock, which implies the same ordering. Also call ResetLatch() unconditionally after WaitLatch(), following the standard latch loop pattern. WaitLatch() does not guarantee that all simultaneously true wake conditions are reported in one return, so a timeout can race with SetLatch(). If we skip ResetLatch() on a timeout return, the code performs further asynchronous-state checks before consuming the latch, violating the latch API's required wait/reset pattern. That can leave the latch set across loop exit and cause a later unrelated WaitLatch() in the same backend to return immediately. Reported-by: Andres Freund <[email protected]> Discussion: https://postgr.es/m/zqbppucpmkeqecfy4s5kscnru4tbk6khp3ozqz6ad2zijz354k%40w4bdf4z3wqoz Author: Xuneng Zhou <[email protected]> Reviewed-by: Andres Freund <[email protected]> Reviewed-by: Alexander Korotkov <[email protected]> Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/a80a593ab63696a0ad0e5c10b9e1b99aaa98032e Modified Files -------------- src/backend/access/transam/xlogwait.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
