Re: `wait -n` returns 127 when it shouldn't
On Wed, 17 May 2023 at 03:35, Aleksey Covacevice < aleksey.covacev...@gmail.com> wrote: > Description: > `wait -n` sometimes returns with status code 127 even though there are > unwaited-for children. > > Repeat-By: > The following script does finish after a while: > > waitjobs() { > local status=0 > while true; do > local code=0; wait -n || code=$? > I put "local code" out of the loop and the problem went away (or at least became extremely less likely). I suspect putting "local" in a loop is doing something strange. > ((code == 127)) && break > ((!code)) || status=$code > done > return $status > } > > # Eventually finishes: > while true; do ( > true & > false & > waitjobs > ) && break; done > I'm testing with Bash 5.1.4p47 -Martin
Re: `wait -n` returns 127 when it shouldn't
On 5/16/23 8:35 PM, Aleksey Covacevice wrote: waitjobs() { local status=0 while true; do local code=0; wait -n || code=$? ((code == 127)) && break ((!code)) || status=$code done return $status } # Eventually finishes: while true; do ( true & false & waitjobs ) && break; done This boils down to the following true & false & wait -n There is no guarantee that `wait -n' will report the status of `true', the shell may acquire the status of `false' first. It's not a bug.
Re: `wait -n` returns 127 when it shouldn't
On 5/17/23 3:27 PM, Martin D Kealey wrote: On Wed, 17 May 2023 at 20:20, Oğuz İsmail Uysal wrote: On 5/16/23 8:35 PM, Aleksey Covacevice wrote: [original code elided as it's been mangled by line-wrapping] This boils down to the following true & false & wait -n With respect, I disagree with that statement of equivalence. The only way for the loop to terminate is when `wait` returns 127, after both children have been reaped. By when the non-zero exit status of "false" will have been noted, and then used as the return value of the function. Must have misread then, thanks
Re: `wait -n` returns 127 when it shouldn't
Date:Wed, 17 May 2023 17:23:21 +1000 From:Martin D Kealey Message-ID: | I suspect putting "local" in a loop is doing something strange. "local" is an executable statement, not a declaration (shell really has none of the latter) - every time it is executed it creates a new local variable (which remains until the function exits, there are no local scope rules in shell either). That should make no difference to this code though, and the difference you report likely hints at the source of the problem. The code is written weirdly however, this sequence code=0; wait -n || code=$? could just be wait -n; code=$? (the "local" that might be there makes no difference, or shouldn't, to the execution semantics). Getting status==127 out of the waitjobs function should be impossible, as it starts out being 0, and is only changed to $code if $code!=127 so if that ever happens, there looks to be a bug somewhere. oguzismailuy...@gmail.com said: | There is no guarantee that `wait -n' will report the status of `true', the | shell may acquire the status of `false' first. That should be irrelevant, waitjobs() has a loop that explicitly waits upon wait -n returning 127 (which it does not return to the caller, or should not) which should mean that there are no children remaining. Further, as long as waitjobs wait -n call actually reaps the exit from false, it should always return with status==1 (the exit status from false). Since false & true should both always be running in the bg when waitjobs is called, the exit status from false should always (fairly quickly, since it doesn't run for very long) be obtained, causing code==1 and hence status==1 (after which status will never be altered again as it isn't touched if code==0 or code==127 which should be the only other 2 returns from wait -n). I modified the script to get rid of the (()) usage and replace that with the similar [ ] code which made no difference at all when executed under bash, it still ends the outer loop, reasonably quickly. But then I could run the script using the NetBSD shell, where it (seems to) run forever (ie: it is still running - but forever hasn't been reached yet). I think there is a bug, probably some race condition in bash with the jobs table, causing the "false" job to get missed sometimes when running this code. That allows status to remain 0, and the outer look to break, and the script to terminate. Mostly likely the use of "local" in the loop which caused the difference that Martin noticed alters the timing somewhat to affect the race results. kre
Re: `wait -n` returns 127 when it shouldn't
On Wed, May 17, 2023 at 12:21 PM Oğuz İsmail Uysal < oguzismailuy...@gmail.com> wrote: > > This boils down to the following > > true & > false & > wait -n > > There is no guarantee that `wait -n' will report the status of `true', > the shell may acquire the status of `false' first. It's not a bug > Ok for the randomness of result yet the $? should be 0 or 1 never 127 as the OP asked ? did I miss something?
Re: `wait -n` returns 127 when it shouldn't
On 5/16/23 1:35 PM, Aleksey Covacevice wrote: Bash Version: 5.1 Patch Level: 16 Release Status: release Description: `wait -n` sometimes returns with status code 127 even though there are unwaited-for children. There are not. That's why `wait -n' returns 127. Repeat-By: The following script does finish after a while: waitjobs() { local status=0 while true; do local code=0; wait -n || code=$? ((code == 127)) && break ((!code)) || status=$code done return $status } # Eventually finishes: while true; do ( true & false & waitjobs ) && break; done It's possible for the shell to reap both background jobs before `wait -n' is called. The underlying function returns < 0 when there aren't any unwaited-for jobs, which the wait builtin translates to 127. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: Adding support of the z/OS platform
On 5/16/23 12:10 PM, Igor Todorovski wrote: Hi there, I’m looking for advice on the best way to submit a patch to enable support of z/OS. We have a few patches here which I will be cleaning up for the next few days: https://github.com/ZOSOpenTools/bashport/tree/main/patches You can just send me a note when these are cleaned up, as long as I can get to the link then. I looked at them, and I'm wondering why you patched configure instead of configure.ac and aclocal.m4. Do you not have autoconf? If someone happens to run autoconf in that directory, it will overwrite your changes. Some of these patches (e.g., the one to sig.c) seem to indicate bugs in z/OS (sigaction doesn't modify its second argument). Also, on z/OS we have a prefix message id before the error text as in this case: https://github.com/ZOSOpenTools/bashport/blob/main/patches/PR3/builtins.right.patch Is there a preferred approach for how to handle this? Should I create a builtins.right.zos? That's a one-off patch to deal with a specific environment. I'd either just add a warning to the test script or keep it in your local branch. There are already some warnings to deal with different signal ordering and some other messages that differ between systems (ok, between Solaris and everything else). Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: `wait -n` returns 127 when it shouldn't
On Thu, 18 May 2023 at 02:13, Chet Ramey wrote: > It's possible for the shell to reap both background jobs before `wait -n' > is called. The underlying function returns < 0 when there aren't any > unwaited-for jobs, which the wait builtin translates to 127. > I know that some platforms (used to?) lack all of the “waitpid()”, “wait3()”, “wait4()”, and “waitid()” syscalls. On those you need to use “wait()” repeatedly until you get the PID the script asked for, and keep track of the others until the script asks for them too. At least, this is what Perl and MSys did when running on older Windows. However Linux has all 5 reaping syscalls available, and can provide the exit status to a signal handler (in the siginfo parameter) without calling any of them, and therefore without *actually* reaping the process. If there is silent reaping going on (other than “wait -n” or “trap ... SIGCHLD”) shouldn't the exit status and pid of each silently reaped process be retained in a queue that “wait -n“ can extract from, in order to maintain the reasonable expected semantics? Arguably this queue should be shared with “fg” when job control is enabled. Would you care to speculate more precisely on where such silent reaping may occur, given the code as shown? -Martin PS: I'm not convinced that “trap ... SIGCHLD” needs to be in that list; it's the “wait” inside the trap that actually matters, and if you *don't* “wait” inside a SIGCHLD trap, things are going to get quite strange anyway.