@Jonas: Did you confirm it is SIGHUP? I remember that we were not sure whether the signal kind was SIGHUP or SIGINT.
- Raphael Am Mi., 5. Dez. 2018 um 10:25 Uhr schrieb Pavel Labath via lldb-dev <lldb-dev@lists.llvm.org>: > > On 05/12/2018 03:49, Jonas Devlieghere via lldb-dev wrote: > > Hi everyone, > > > > Since we switched to lit as the test driver we've been seeing it getting > > killed as the result of a SIGHUP signal. The problem doesn't reproduce on > > every machine and there seems to be a correlation between number of > > occurrences and thread count. > > > > Davide and Raphael spent some time narrowing down what particular test is > > causing this and it seems that TestChangeProcessGroup.py is always > > involved. However it never reproduces when running just this test. I was > > able to reproduce pretty consistently with the following filter: > > > > ./bin/llvm-lit ../llvm/tools/lldb/lit/Suite/ --filter="process" > > > > Bisecting the test itself didn't help much, the problem reproduces as soon > > as we attach to the inferior. > > > > At this point it is still not clear who is sending the SIGHUP and why it's > > reaching the lit test driver. Fred suggested that it might have something > > to do with process groups (which would be an interesting coincidence given > > the previously mentioned test) and he suggested having the test run in > > different process groups. Indeed, adding a call to os.setpgrp() in lit's > > executeCommand and having a different process group per test prevent us > > from seeing this. Regardless of this issue I think it's reasonable to have > > tests run in their process group, so if nobody objects I propose adding > > this to lit in llvm. > > > > Still, I'd like to understand where the signal is coming from and fix the > > root cause in addition to the symptom. Maybe someone here has an idea of > > what might be going on? > > > > Thanks, > > Jonas > > > > PS > > > > 1. There's two places where we send a SIGHUP ourself, with that code > > removed we still receive the signal, which suggests that it might be coming > > from Python or the OS. > > 2. If you're able to reproduce you'll see that adding an early return > > before the attach in TestChangeProcessGroup.py hides/prevents the problem. > > Moving the return down one line and it pops up again. > > _______________________________________________ > > lldb-dev mailing list > > lldb-dev@lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev > > > > Hi Jonas, > > Sounds like you have found an interesting issue to debug. I've tried > running the command you mention locally, and I didn't see any failures > in 100 runs. > > There doesn't seem to be anything in the TestChangeProcessGroup which > sends a signal, though I can imagine that the act of changing a process > group mid-debug could be enough to confuse someone to send it. However, > I am having trouble reconciling this with your PS #2, because if > attaching is sufficient to trigger this (i.e., no group changing takes > place), then this test is not much different than any other test where > we spawn an inferior and then attach to it. > > I am aware of one other instance where we send a spurious signal, though > it's SIGINT in this case > <https://github.com/llvm-mirror/lldb/blob/master/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp#L3645>. > The issue there is that we don't check whether the debug server has > exited before we send SIGINT to it (which it normally does on its own at > the end of debug session). So if the debug server does exit and its pid > gets recycled before we get a chance to send this signal, we can end up > killing a random process. > > Now this may seem unrelated to your issue, but SIGHUP is sent > automatically as a result of a process losing its controlling tty. So, > if that SIGINT ends up killing the process holding the master end of a > pty, this could result in some SIGHUPs being sent too. Unfortunately, > this doesn't fully stack up either, because the process holding the > master pty is probably a long-lived one, so its pid is unlikely to match > one of the transient debugserver pids. Nevertheless, it could be worth > just commenting out that line and seeing what happens. > > For debugging, maybe you could try installing a SIGHUP handler into the > lit process, which would dump the received siginfo_t structure. Decoding > that may provide some insight into who is sending that signal (si_pid) > and why (si_code). > > As for adding process group support into lit, I think that having each > test run (*not* each executed command) in it's own group is reasonable. > However, be aware that this changes the behaviour of how all signals (in > particular the SIGINT you get when typing ^C) get delivered. AFAIK, lit > doesn't have any special code for cleaning up the spawned processes and > relies on the fact that ^C will send a SIGINT to the entire "foreground > process group" and terminate stuff. If you start creating a bunch of > process groups, you may need to add more explicit termination logic too. > > cheers, > pl > _______________________________________________ > lldb-dev mailing list > lldb-dev@lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev _______________________________________________ lldb-dev mailing list lldb-dev@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev