On Sat, Oct 22, 2016 at 08:15:46PM +0200, gregor herrmann wrote: > On Fri, 21 Oct 2016 23:13:53 +0300, Niko Tyni wrote: > > > This package occasionally fails its autopkgtest checks on ci.debian.net. > > > > https://ci.debian.net/packages/libs/libserver-starter-perl/unstable/amd64/
I've been looking at this for half a day, and it's annoyingly hard to reproduce. Running t/01-starter.t in a loop, I've seen it deadlock a dozen times or so altogether. When it happens, strace shows the child is calling accept() and its parent is waiting for it to exit. Adding instrumentation mostly makes it go away. It does seem like the parent killing the child with TERM succeeds, but the child never executes its $SIG{TERM} handler. I haven't been able to figure out why. Perhaps the handler gets interrupted by another signal - my first thought was SIGPIPE but adding a handler for that didn't show anything. Given it fails somewhat regularly on both ci.debian.net and tests.reproducible-builds.org, possibly a faster machine would improve the chances of reproducing it. Just getting the log of 'strace -f -olog prove -l t/01-starter.t' when it locks up would help tremendously, but I ran it for two hours or so like that without a single lockup. OTOH, reading https://rt.cpan.org/Public/Bug/Display.html?id=73711 I get the impression that the test suite is riddled with races that are worked around by sprinkling sleep() calls in the test code. Even though it feels like giving up, I suggest either disabling the test suite or somehow guarding it with a timeout and making failures non-fatal. Perhaps we should devise something very simple instead for a single basic test? -- Niko Tyni nt...@debian.org