On 1/31/24 2:35 PM, Robert Elz wrote:
| Not quite. `new' in this sense is the opposite of `anything in the past'
| as Dale described it -- already notified and removed from the jobs list.
I guess the part about bash that I am not understanding here is how the
"already notified" works. To me there are just two ways for that, either
the user has done a "wait" which has collected that pid already (either
without -n, and no pid args, or with pid args and one of those is the pid
in question) or with -n and the pid in question was the one whose status
was returned, or the user/script did the jobs command (or jobs -l) and the
job in question was shown as completed.
Is there some other way?
Notification after a job terminates due to a signal in a non-interactive
shell -- that runs the equivalent of `jobs'. As it turns out, this was the
problem with Steven Pelley's original report. I fixed one issue, but that
kind of notification will leave jobs marked as notified and eligible to
be removed from the jobs list.
| Half the problem here is that bash aggressively marks dead jobs as being
| notified in non-interactive shells without job control enabled, and moves
| them out of the jobs table.
That might be more than half the problem, it might be the entire problem.
It seems to be in this case. It's a good thing it's limited to processes
that terminate due to signals; a bad thing that processes terminating due
to signals was the entire subject of the original report.
| but if you
| do, or if you use wait -n with pid/job arguments (which you've presumably
| saved yourself) you're going to need slightly different semantics than we
| have now to answer that reliably. And that will probably need a new option.
That's a pity, particularly since the current semantics don't seem to
be useful in general.
Shoehorning pid/job arguments into the previous behavior, which only dealt
with running jobs, resulted in the current semantics. I should probably
have made `wait -n' with pid arguments look at terminated and notified
processes, but I didn't change the `running job' semantics. Hindsight.
Since the sole issue provoking that seems to be
the wait over and over policy,
It's not a policy, per se, it's behavior that has historically worked that
way.
rather than "wait once, and remove completely"
POSIX semantics.
perhaps rather than a new, but different, -n like option, a better idea would
be a "only once" option (ie: if the option (-r (remove) or -c (cleanup) or -o
(once only)) is set, then when the wait with that option returns status or,
or waits until termination without returning status (in the not -n case, with
no pid args, or many pid args) then the processes are completely deleted from
everywhere in the shell.
Or you could use posix mode with the recent change, already in devel, since
POSIX requires this behavior (but see below).
Using that option would make a changed -n safe
to use in loops. If you do that, also add an option (maybe the upper case
version of whatever is selected for that one, or just some other letter) to
mean "don't wait" (kind of like wait(2) WNOWAIT) - which in default bash would
just be a no-op (except in posix mode, apparently - whereas the -[cor] option
would be a no-op in posix mode).
You're not the only one to suggest some new option(s). Only one really
matters for this discussion.
If you were to do that, other shells could add the same (except in probably
all of them, -[cor] would always be the default, and the other one would be
the one which changes behaviour).
That's always hit or miss.
| > The one change that should be made is
| > to allow wait -n to collect processes/jobs that have already terminated.
|
| Yes, that's one of the things we're talking about. I don't have any problem
| with it, but should it take a new option to change those semantics?
Good, though I think some more thought should go into that. In another
thread you said (paraphrasing) correctly, that scripts should not be
relying upon bugs, and the current wait -n behaviour is a bug - that it
might have been intentionally coded that way doesn't make it any less so.
Trust me, there are people on the other side of that question.
It isn't as if it was ever documented to work the way it does, or everyone
would have known about it already.
You mean the behavior of `wait -n' with pid arguments, I presume. The
problem with your statement is that people do know about it. The question,
as above, is whether or not to avoid changing the behavior because they do.
There are two things that we could change:
1. wait -n needs to get access to the list of terminated pids (the ones
that satisfy POSIX's "CHILD_MAX processes known in the current shell
environment"), like wait without -n does. This can happen via a wait
option, a shell option, or a change in behavior controlled by the
compatibility level.
2. Some option to implement the po