On Thu, Aug 06, 2020 at 12:48:10AM +0200, Michał Mirosław wrote:
> On Thu, Aug 06, 2020 at 12:29:36AM +0300, Peter Pentchev wrote:
> > On Wed, Aug 05, 2020 at 10:52:31PM +0200, Michał Mirosław wrote:
> [...]
> > > Using print-debugging, I see that it stops at wait_for_child line just
> > > after printing the version. It seems that something is reaping the child
> > > before the main thread has a chance to wait for it.
> > 
> > OK, so the only thing that comes to my mind now is that you may be
> > hitting a crazy, crazy race between register_child() and child_reaper(),
> > and I say "a crazy, crazy race", because the test has to (apparently
> > reproducibly) receive the CHLD signal exactly between the check and
> > the creation in register_child()'s first "$children{...} //= ...cv"
> > statement.
> 
> Well, there is nothing that prevents SIGCHLD arriving between fork() and
> register_child(). You could test this with more confidence (though not
> 100%-reliably) by putting 'exit 1' just at the start of ($pid == 0) branch.

Nah, the problem is not just "between fork() and register_child()".
It really must arrive at a very specific moment in time, because
the //= operations for setting $children{$pid}{cv} try to make sure that
a new value is not set (that is, a new condition variable is not
created) if there already is such an element in the array. So the race
is indeed between the //= in register_child() and the //= in
child_reaper() - that is, child_reaper() must be invoked (SIGCHLD must
arrive) *during* the execution of the //= in register_child().

Unless I'm missing something, which is not at all out of the question :)

> > Can you apply the following patch and show me the output of running
> > the test?
> 
> Sure, but I got no patch. :-)

Oof. Not my day, is it... Here it is... I hope.

G'luck,
Peter

-- 
Peter Pentchev  r...@ringlet.net r...@debian.org p...@storpool.com
PGP key:        http://people.FreeBSD.org/~roam/roam.key.asc
Key fingerprint 2EE7 A7A5 17FC 124C F115  C354 651E EFB0 2527 DF13
commit 859acd0603a5bc74620df4949e1450805b7ba151
Author: Peter Pentchev <p...@storpool.com>
Date:   Thu Aug 6 00:26:32 2020 +0300

    Diagnostic output for the runtime test's child reaper.

diff --git a/debian/tests/runtime b/debian/tests/runtime
index f594d9a..81cef23 100755
--- a/debian/tests/runtime
+++ b/debian/tests/runtime
@@ -55,19 +55,25 @@ sub unregister_child_reaper()
 
 sub child_reaper()
 {
+       say 'RDBG child_reaper() invoked';
        while (1) {
                my $pid = waitpid -1, WNOHANG; 
                my $status = $?;
+               say "RDBG - pid $pid status $status";
 
                if (!defined $pid) {
                        die "Could not waitpid() in a SIGCHLD handler: $!\n";
                } elsif ($pid == 0 || $pid == -1) {
+                       say 'RDBG   - done';
                        last;
                } else {
+                       say 'RDBG   - '.(exists $children{$pid} ? '' : 'not 
').'found in the children hash';
                        $children{$pid}{cv} //= AnyEvent->condvar;
+                       say 'RDBG   - cv '.$children{$pid}{cv}.': 
'.($children{$pid}{cv}->ready ? '' : 'not ').'ready';
                        $children{$pid}{cv}->send($status);
                }
        }
+       say 'RDBG - out of the child_reaper() loop';
 }
 
 sub register_child($ $)
@@ -76,6 +82,7 @@ sub register_child($ $)
 
        # Weird, but we want it to be at least reasonably atomic-like
        $children{$pid}{cv} //= AnyEvent->condvar;
+       say "register_child: pid $pid cv ".$children{$pid}{cv};
 
        my $ch = $children{$pid};
        $ch->{pid} = $pid;

Attachment: signature.asc
Description: PGP signature

Reply via email to