Re: [Lldb-commits] [PATCH] D13124: test runner: switch to pure-Python timeout mechanism

Todd Fiala via lldb-commits Thu, 24 Sep 2015 08:36:54 -0700

tfiala added a comment.

In http://reviews.llvm.org/D13124#252564, @labath wrote:


> I don't want to stand in the way of progress (and I do think that getting rid 
> of the timeout dependency is progress), but this implementation regresses in 
> a couple of features compared to using timeout:
>
> - timeout tries (with moderate success) to cleanup children spawned by the 
> main process. your implementation will (afaik) kill only the main process. 
> This is especially important for build bots, since leaking processes will 
> starve the bot's resources after a while (and we have had problems with this 
> on our darwin build bot).


Fair enough.  Either yesterday or the day before, I put up a change that 
ensures the dotest inferiors are in a separate process group.  I could (on 
non-Windows) send that process group a SIGQUIT.  The reason I prefer not to use 
SIGQUIT is that it is catchable, and thus can be ignored.  Once something can 
be ignored, the runner needs to be able to handle the timed out child really 
not ending with a SIGQUIT.  That means either more logic around timing out on 
the "wait for death", or possibly not reaping.  In the case of a test runner, 
I'd probably put the focus on making sure the test runner makes it through 
ahead of getting the coredump from a process that times out.

That said, my initial implementation of this did do the following:

- Start with a SIGQUIT (via subprocess.Process.terminate(), so does something 
reasonable on Windows).  Wait for 5 seconds to see if it wraps up (by way of 
waiting for the communicate() thread to die).
- If still not dead, send a SIGKILL (via subprocess.Process.kill()).

I thought the added complication wasn't worth it, but it's not that 
complicated.  And would not regress behavior on the Linux side.  The only diff 
would be, on non-Windows, send a kill to the process group (which has already 
been created for the inferior dotest), and just use the terminate() call on 
Windows.

I can give that a whirl.  It's a bit more code but is not going to be too bad.

> - we intentionally had timeout terminate the child processes with SIGQUIT, 
> because this produces core files of the terminated processes. I have found 
> this very useful when diagnosing the sources of hanging tests. I'd like to 
> have the possibility to do this, even if it will not be enabled by default.

> 

>   Do you think that the benefits of this change outweigh these issues? If so, 
> and they don't cause our bots to break, then we can put it in (and i'll 
> probably implement the code dumping feature when I find some time), but I 
> though you should be aware of these downsides.





http://reviews.llvm.org/D13124



_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Re: [Lldb-commits] [PATCH] D13124: test runner: switch to pure-Python timeout mechanism

Reply via email to