OK, I found one fix - you probably won't like it, but it fixes the oroblem, and all the test cases run in 'make test' still work - especially 'run_jobs' and 'run_lastpipe' - the patch simply removes the error from the FIND_CHILD macro in jobs.c, so it noe reads ; jobs.c @ line 2377: #define FIND_CHILD(pid,child) \ do \ {\ child = find_pipeline (pid, 0, (int*)NULL); \ if ( child == 0 ) \ { \ give_terminal_to (shell_pgrp, 0); \ UNLOCK_CHILD (oset); \ restore_sigint_handler (); \ return (termination_state = 0): \ } \ } \ while (0)
So I've removed the manufactured "No record of process X" error altogether, and yet all test cases still pass - it seems it was unecessary. Will a fix for this issue appear in a 'bash43-43+' patch file or in the forthcoming bash-4.4 release ? I think it should, as it is rather unfair of bash to make its internal bookkeeping errors appear as if they could be user programming errors. Regards, Jason On 05/06/2016, Jason Vas Dias <jason.vas.d...@gmail.com> wrote: > The strace log shows that process 8277 is the > bash subshell that runs f() , forks to create 8278, > which forks to execve "some_executable" ; > process 8277 then eads the 8278 output on a pipe, > so it knows when the pipe is closed, amd the strace > log shows 8277 first does a wait4, which returns 8278, > but does NOT get a SIGCHLD event for 8278. > Maybe this is the problem? Since the wait has > already succeeded, no SIGCHLD will be generated. > 8277 evidently is not aware that its 8278 child has exited, > and goes on to issue a further wait4 for it which > returns -1 with errno==ECHLD and then emits the > message: > "xxx.sh: line 46: No record of process 8278" . > Line 46 in my script consist of > function f() > . > This seems buggy to me - I'll try developing a > patch to fix it and will post back here. > > Regards, Jason > On 05/06/2016, Jason Vas Dias <jason.vas.d...@gmail.com> wrote: >> With a build of bash 4.3 patchlevel 42 >> on a Linux x86_64 system I am getting >> warning messages on stderr when >> running a certain script like: >> "wait_for: no record of process 8278" . >> Running bash with the script as >> input under strace shows that process 8277 >> does a successful wait4(-1,...) which DOES >> return pid 8278 . So why is bash complaining >> it has no record of it ? >> Is bash getting its book-keeping wrong here? >> The script is not using any background >> jobs with '&' or using the 'wait' built-in. >> It is simply doing something like: >> <quote><pre> >> shopt -s lastpipe; >> set -o pipefail; >> function f() >> { some_executable "$@" | { >> while read line; do { ... ; } done; >> } >> return 0; >> } >> ... >> f $args | { while read result; do >> ...; done ; } >> </pre></quote> >> >> So I'd expect the initial bash process to run >> a subshell bash to invoke the f() function, >> which runs a command child that execve-s >> "some_executable', parsing its output and writing >> to the subshell bash on a pipe, which writes to the >> parent bash on a pipe, which parses it & does whatever. >> Without the lastpipe option, this would be the >> other way round - the parent would run f, and >> its output would be parsed in the subshell >> running the f output parsing loop. >> All this seems to work OK, but why the warning >> message about "no record of process X"? >> Or is this message indicating something has >> gone seriously wrong ? >> Thanks in advance for any replies, >> Regards, >> Jason >> >