On Tue, Mar 24, 2026 at 04:09:37PM -0400, Andres Freund wrote: > The test is extremely unstable on windows. On CI 10/16 runs since the test in > failed due to it, afaict.
I am not surprised by that. Windows is good in catching race conditions. > I don't see how a test with a timeout setting that's anywhere remotely close > to 10ms could be expected to be stable. Well, the low value of deadlock_timeout is not the problem. The shorter the better to make the test go faster with more deadlock checks. What the buildfarm is telling is that we do not have the time to process the deadlock_timeout request, and that we would need to pay the cost of longer sleep if coded this way. This is going to cost in runtime on fast machines where the CheckDeadLock() call would happen quickly. And on slow machines, we don't have the guarantee that the sleep would be long enough to process the deadlock request. This test would be better if rewritten as a TAP test, I guess, with a NOTICE injection point before the CheckDeadLock() call in ProcSleep() to make sure that the second session processes the deadlock timeout request while it is waiting on its lock to be acquired. One trickier part is that we only care about the deadlock_timeout in s2, because we want to measure the wait it has waited until the lock could be acquired, meaning that we should make s1 use a large deadlock_timeout to avoid interferences with a global injpoint. I don't have the credits to test that in the CI for this month, unfortunately, and this creates noise in the CI for the work of other folks in this release cycle, so I am going to remove this test for now. > Also, anything that requires short sleeps (like pg_sleep(0.05);) is extremely > likely to be a long time test stability hazard. It's a huge "test smell" to > me, to the point that I think every single sleep in a test needs a comment > explaining why that one use of sleep is correct, and that comment better be > signed in blood. Yep, agreed. -- Michael
signature.asc
Description: PGP signature
