On Thu, Aug 06, 2020 at 12:48:10AM +0200, Michał Mirosław wrote: > On Thu, Aug 06, 2020 at 12:29:36AM +0300, Peter Pentchev wrote: > > On Wed, Aug 05, 2020 at 10:52:31PM +0200, Michał Mirosław wrote: > [...] > > > Using print-debugging, I see that it stops at wait_for_child line just > > > after printing the version. It seems that something is reaping the child > > > before the main thread has a chance to wait for it. > > > > OK, so the only thing that comes to my mind now is that you may be > > hitting a crazy, crazy race between register_child() and child_reaper(), > > and I say "a crazy, crazy race", because the test has to (apparently > > reproducibly) receive the CHLD signal exactly between the check and > > the creation in register_child()'s first "$children{...} //= ...cv" > > statement. > > Well, there is nothing that prevents SIGCHLD arriving between fork() and > register_child(). You could test this with more confidence (though not > 100%-reliably) by putting 'exit 1' just at the start of ($pid == 0) branch.
Nah, the problem is not just "between fork() and register_child()". It really must arrive at a very specific moment in time, because the //= operations for setting $children{$pid}{cv} try to make sure that a new value is not set (that is, a new condition variable is not created) if there already is such an element in the array. So the race is indeed between the //= in register_child() and the //= in child_reaper() - that is, child_reaper() must be invoked (SIGCHLD must arrive) *during* the execution of the //= in register_child(). Unless I'm missing something, which is not at all out of the question :) > > Can you apply the following patch and show me the output of running > > the test? > > Sure, but I got no patch. :-) Oof. Not my day, is it... Here it is... I hope. G'luck, Peter -- Peter Pentchev r...@ringlet.net r...@debian.org p...@storpool.com PGP key: http://people.FreeBSD.org/~roam/roam.key.asc Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13
commit 859acd0603a5bc74620df4949e1450805b7ba151 Author: Peter Pentchev <p...@storpool.com> Date: Thu Aug 6 00:26:32 2020 +0300 Diagnostic output for the runtime test's child reaper. diff --git a/debian/tests/runtime b/debian/tests/runtime index f594d9a..81cef23 100755 --- a/debian/tests/runtime +++ b/debian/tests/runtime @@ -55,19 +55,25 @@ sub unregister_child_reaper() sub child_reaper() { + say 'RDBG child_reaper() invoked'; while (1) { my $pid = waitpid -1, WNOHANG; my $status = $?; + say "RDBG - pid $pid status $status"; if (!defined $pid) { die "Could not waitpid() in a SIGCHLD handler: $!\n"; } elsif ($pid == 0 || $pid == -1) { + say 'RDBG - done'; last; } else { + say 'RDBG - '.(exists $children{$pid} ? '' : 'not ').'found in the children hash'; $children{$pid}{cv} //= AnyEvent->condvar; + say 'RDBG - cv '.$children{$pid}{cv}.': '.($children{$pid}{cv}->ready ? '' : 'not ').'ready'; $children{$pid}{cv}->send($status); } } + say 'RDBG - out of the child_reaper() loop'; } sub register_child($ $) @@ -76,6 +82,7 @@ sub register_child($ $) # Weird, but we want it to be at least reasonably atomic-like $children{$pid}{cv} //= AnyEvent->condvar; + say "register_child: pid $pid cv ".$children{$pid}{cv}; my $ch = $children{$pid}; $ch->{pid} = $pid;
signature.asc
Description: PGP signature