Fix memory ordering in WAIT FOR LSN wakeup mechanism

WAIT FOR LSN uses a Dekker-style handshake: the waker stores an LSN
position then reads minWaitedLSN; the waiter stores its target into
minWaitedLSN then reads the position.  Without a barrier between each
side's store and load, a CPU may satisfy the load before the store
becomes globally visible, causing either side to miss a concurrent
update.  The result is a missed wakeup: the waiter sleeps indefinitely
until the next unrelated event.

Fix by embedding the required barriers into the atomic operations on
minWaitedLSN:

- In updateMinWaitedLSN(), use pg_atomic_write_membarrier_u64() so the
  waiter's preceding heap update is visible before the new minWaitedLSN
  value is published.

- In WaitLSNWakeup(), use pg_atomic_read_membarrier_u64() in the
  fast-path check so the waker's preceding position store is globally
  visible before minWaitedLSN is read.

The waiter side is also covered by the barrier semantics already present
in GetCurrentLSNForWaitType(): GetWalRcvWriteRecPtr() uses an explicit
read barrier (from patch 0001), while the remaining getters acquire a
spinlock, which implies the same ordering.

Also call ResetLatch() unconditionally after WaitLatch(), following the
standard latch loop pattern.  WaitLatch() does not guarantee that all
simultaneously true wake conditions are reported in one return, so a
timeout can race with SetLatch().  If we skip ResetLatch() on a timeout
return, the code performs further asynchronous-state checks before
consuming the latch, violating the latch API's required wait/reset
pattern.  That can leave the latch set across loop exit and cause a
later unrelated WaitLatch() in the same backend to return immediately.

Reported-by: Andres Freund <[email protected]>
Discussion: 
https://postgr.es/m/zqbppucpmkeqecfy4s5kscnru4tbk6khp3ozqz6ad2zijz354k%40w4bdf4z3wqoz
Author: Xuneng Zhou <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Alexander Korotkov <[email protected]>

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/a80a593ab63696a0ad0e5c10b9e1b99aaa98032e

Modified Files
--------------
src/backend/access/transam/xlogwait.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)

Reply via email to