Address an execution race in `close_wait_program' and use `catch' in killing pending force-kills issued there in the recovery of a stuck test case, in case the force-kill sequence has completed before the command to kill the sequence had a chance to run, so that no error is thrown and a testsuite run does not get interrupted early like:
PASS: gcc.c-torture/execute/postmod-1.c -O0 (test for excess errors) Executing on remote-localhost: .../gcc/testsuite/gcc/postmod-1.exe (timeout = 15) spawn [open ...] WARNING: program timed out ERROR: tcl error sourcing .../gcc/testsuite/gcc.c-torture/execute/execute.exp. ERROR: child process exited abnormally while executing "exec sh -c "exec > /dev/null 2>&1 && kill -9 $exec_pid"" (procedure "close_wait_program" line 57) invoked from within "close_wait_program $spawn_id $pid wres" (procedure "local_exec" line 104) [...] "uplevel #0 source .../gcc/testsuite/gcc.c-torture/execute/execute.exp" invoked from within "catch "uplevel #0 source $test_file_name"" testcase .../gcc/testsuite/gcc.c-torture/execute/execute.exp completed in 196 seconds === gcc Summary === # of expected passes 1 -- therefore not letting `execute.exp' continue (here with the GCC `c' testsuite invoked with `execute.exp=postmod-1.c' for 8 compilation and 8 execution tests). The completion of the force-kill sequence would have to happen in the window between the `wait' command has returned, which would at worst happen as a result of the final `kill -9' command in the sequence, and the `kill -9 $exec_pid' command issued here, and the `sleep 5' command issued at the end of the force-kill sequence makes the likelihood of such a scenario low, but this might still happen with a loaded host system and there is no drawback from using `catch' here, so let's do it. * lib/remote.exp (close_wait_program): Use `catch' in killing pending force-kills. Signed-off-by: Maciej W. Rozycki <ma...@wdc.com> --- Hi, I have only observed it in a debug scenario, where an artificial delay was inserted before the `wait' command referred in the change description, while tracking down a testsuite hang with a stuck test case, but as noted the use of `catch' here is otherwise harmless and while the likelihood of the scenario where the race triggers might be epsilon it is not nil. Therefore, please apply. FAOD this has been formatted for `git am' use. Maciej --- lib/remote.exp | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) dejagnu-remote-close-wait-kill-catch.diff Index: dejagnu/lib/remote.exp =================================================================== --- dejagnu.orig/lib/remote.exp +++ dejagnu/lib/remote.exp @@ -113,7 +113,10 @@ proc close_wait_program { program_id pid # We reaped the process, so cancel the pending force-kills, as # otherwise if the PID is reused for some other unrelated # process, we'd kill the wrong process. - exec sh -c "exec > /dev/null 2>&1 && kill -9 $exec_pid" + # + # Use `catch' in case the force-kills have completed, so as not + # to cause TCL to choke if `kill' returns a failure. + catch "exec sh -c \"exec > /dev/null 2>&1 && kill -9 $exec_pid\"" } return $res