Hi Pedro, On 31 March 2016 at 16:20, Pedro Alves <pal...@redhat.com> wrote: > On 03/30/2016 04:03 PM, Yvan Roux wrote: >> Hi, >> >> https://lists.gnu.org/archive/html/dejagnu/2015-07/msg00000.html >> >> this patch introduced a new failure related to GDB testing, but this >> time in GCC guality part of the testsuite. When >> gcc/testsuite/gcc.dg/guality/example.c is executed we have this set of >> processes started >> >> PID PPID command >> 100 99 ./example.exe >> 101 99 cat >> 102 100 sh -c gdb -nx -nw --quiet > /dev/null 2>&1 ./example.exe >> 103 102 gdb -nx -nw --quiet ./example.exe >> >> The issue is that when close_wait_program is called pid="100 101" and >> when the "wait" returns none of these 4 processes is actually killed, >> and the killing of the "kill pipeline" made these pids untouched and >> the validation hangs forever. > > The questions I'm not seeing answered are: > > - How did this work before?
We just discovered this issue in a new target we added to our validation (native armv8l), and on this target this guality test is in timeout (which triggers the call to close_and_wait routine). > - What exactly in the patch causes the regression? The issue is not related to pid being used as a number instead of a list (some cleanup are needed, but you're right this is here for a very long time and works I just discovered during my debugging session that it can be a list). The regression is due to this part: + # Reap it. + set res [catch "wait -i $program_id" wres] + if {$exec_pid != -1} { + # We reaped the process, so cancel the pending force-kills, as + # otherwise if the PID is reused for some other unrelated + # process, we'd kill the wrong process. + exec sh -c "exec > /dev/null 2>&1 && kill -9 $exec_pid" + } In our case the "wait -i" command ends without any effect on the running processes and then it kills the command which was about to kill them, and the validation is stuck. > AFAICS, before my patch we had: > > - if { $pid > 0 } { > - verbose "doing kill, pid is $pid" > - # This is very, very nasty. SH, instead of EXPECT, is used > - # to run this in the background since, on older CYGWINs, a > - # strange file I/O error occures. > - set pgid "-[join $pid { -}]" > > while my patch changed it to: > > + if { $pid > 0 } { > + # Tcl has no kill primitive, so we have to execute an external > + # command in order to kill the process. > + verbose "doing kill, pid is $pid" > + # Prepend "-" to generate the "process group ID" needed by > + # kill. > + set pgid "-$pid" > > I probably did that change just assuming pid is a number, given > the pid > 0 check. > > So if you replace: > > + set pgid "-$pid" > > with: > > - set pgid "-[join $pid { -}]" > > I imagine it would get things back to how they were for you? > > Thanks, > Pedro Alves > _______________________________________________ DejaGnu mailing list DejaGnu@gnu.org https://lists.gnu.org/mailman/listinfo/dejagnu