Re: wait -n misses signaled subprocess

alex xmb sw ratchev Thu, 01 Feb 2024 01:21:00 -0800

On Thu, Feb 1, 2024, 09:09 alex xmb sw ratchev <fxmb...@gmail.com> wrote:


>
>
> On Wed, Jan 31, 2024, 20:36 Robert Elz <k...@munnari.oz.au> wrote:
>
>>     Date:        Wed, 31 Jan 2024 11:35:57 -0500
>>     From:        Chet Ramey <chet.ra...@case.edu>
>>     Message-ID:  <1e50aa99-8d53-4cdf-ba5e-6aaf3ccc6...@case.edu>
>>
>>   | Not quite. `new' in this sense is the opposite of `anything in the
>> past'
>>   | as Dale described it -- already notified and removed from the jobs
>> list.
>>
>> I guess the part about bash that I am not understanding here is how the
>> "already notified" works.   To me there are just two ways for that, either
>> the user has done a "wait" which has collected that pid already (either
>> without -n, and no pid args, or with pid args and one of those is the pid
>> in question) or with -n and the pid in question was the one whose status
>> was returned, or the user/script did the jobs command (or jobs -l) and the
>> job in question was shown as completed.
>>
>
> i say additional datastructure for the saving purpose ..
>

it d need new uid , real-unique-id , or some special hash of the
jobs/pids/cmdlines

Is there some other way?
>>
>>   | Half the problem here is that bash aggressively marks dead jobs as
>> being
>>   | notified in non-interactive shells without job control enabled, and
>> moves
>>   | them out of the jobs table.
>>
>> That might be more than half the problem, it might be the entire problem.
>>
>>   | If you use wait -n without arguments, you probably don't care,
>>
>> No you do, that just means any of the children ... the script could make
>> a list of all of them and supply that list, but if the list is just going
>> to contain all the existing children, why bother?    (With -n - and not
>> exactly one pid arg, -p is generally going to be required, but that option
>> has no bearing on which process is selected, or might be, which is the
>> issue here).
>>
>>   | but if you
>>   | do, or if you use wait -n with pid/job arguments (which you've
>> presumably
>>   | saved yourself) you're going to need slightly different semantics
>> than we
>>   | have now to answer that reliably. And that will probably need a new
>> option.
>>
>> That's a pity, particularly since the current semantics don't seem to
>> be useful in general.   Since the sole issue provoking that seems to be
>> the wait over and over policy, rather than "wait once, and remove
>> completely"
>> perhaps rather than a new, but different, -n like option, a better idea
>> would
>> be a "only once" option (ie: if the option (-r (remove) or -c (cleanup)
>> or -o
>> (once only)) is set, then when the wait with that option returns status
>> or,
>> or waits until termination without returning status (in the not -n case,
>> with
>> no pid args, or many pid args) then the processes are completely deleted
>> from
>> everywhere in the shell.   Using that option would make a changed -n safe
>> to use in loops.   If you do that, also add an option (maybe the upper
>> case
>> version of whatever is selected for that one, or just some other letter)
>> to
>> mean "don't wait" (kind of like wait(2) WNOWAIT) - which in default bash
>> would
>> just be a no-op (except in posix mode, apparently - whereas the -[cor]
>> option
>> would be a no-op in posix mode).
>>
>> If you were to do that, other shells could add the same (except in
>> probably
>> all of them, -[cor] would always be the default, and the other one would
>> be
>> the one which changes behaviour).
>>
>>   | And that's why I used `more': there are several differences, so which
>>   | of those differences should we attempt to change?
>>
>> Just the one.
>>
>>   | > The one change that should be made is
>>   | > to allow wait -n to collect processes/jobs that have already
>> terminated.
>>   |
>>   | Yes, that's one of the things we're talking about. I don't have any
>> problem
>>   | with it, but should it take a new option to change those semantics?
>>
>> Good, though I think some more thought should go into that.   In another
>> thread you said (paraphrasing) correctly, that scripts should not be
>> relying upon bugs, and the current wait -n behaviour is a bug - that it
>> might have been intentionally coded that way doesn't make it any less so.
>> It isn't as if it was ever documented to work the way it does, or everyone
>> would have known about it already.
>>
>>   | > Changing it to wait for all the listed pids
>>   | It's never done that.
>>   | We're not going to change the return value from wait.
>>
>> Good, I only mentioned those possibilities because your earlier
>> message was unclear about what "more like wait without -n" meant.
>>
>>   | Yeah, but we're talking about bash here. It doesn't really matter what
>>   | the Bourne shell did; there are likely plenty of scripts that assume
>>   | the historical bash behavior.
>>
>> Really?   Why?   What's the point of collecting the status twice?
>> It can't change in the meantime can it, once a process has done exit(N)
>> its exit status should always be N, regardless of how often it is waited
>> upon.
>>
>> [Aside: this should be obvious, but when one is collecting status changes,
>> rather than just "terminated" status, then the pid isn't removed if it
>> returns a "stopped" or "continued" status.]
>>
>>   | > I meant the distinction between processes
>>   | > that the shell has already collected status for, and those for
>> which it
>>
>>   | You're not the first to propose something like that, but I'm not
>> going to
>>   | be writing that code any time soon.
>>
>> Nor am I, if you go back to the message where I first mentioned it,
>> which I can't locate at the minute, I am fairly sure I said that while
>> it might help in this case, I doubt it is worth the effort.   Or something
>> like that.
>>
>> Actually, found it eventually (this is quoting myself, earlier):
>>   >> But as long as it is just a matter of cleaning up, and jobs works for
>>   >> that, I don't currently see the need.
>>
>>   | It is, in fact, true in the current implementation, as long as the pid
>>   | is in the jobs list.
>>
>> That caveat is the problem.
>>
>>   | It's always been true. If there is a job marked
>>   | (internally, if you must) as dead for which the user has not yet
>> received
>>   | notification, wait -n returns it and marks it as notified (and deletes
>>   | it from the jobs list).
>>
>> That part is good.
>>
>>   | Yes, that's one of the things we're talking about: whether wait -n
>> should
>>   | consider pids/jobs *not* in the jobs list, the way wait without -n
>> does.
>>   | That's about the only thing we're talking about changing here so far.
>>
>> Maybe a better discussion, and potential change, would be to whatever
>> other that the use of the wait, or jobs, commands can result in a job
>> moving out of the jobs list.   If there were nothing other than those,
>> (and jobs list overflow or similar) then we'd be fine, and it seems to
>> me now, no change to the -n operation would be needed.
>>
>>   | That hasn't actually been true with bash running in default mode for a
>>   | very long time now. Bash has allowed multiple waits for the same pid
>> for
>>   | many years, whether or not you or I think it's a good idea or the
>> correct
>>   | semantics. Even if it was an accident of the implementation, and
>> maybe you
>>   | could say it was, we are stuck with it.
>>
>> Which is why I suggested an option (just above) to turn that misfeature
>> off.
>> Even better perhaps might be a bash shopt.
>>
>>   | It's ok, we got one.
>>
>> A kind of unlikely one.
>>
>> kre
>>
>>
>>

Re: wait -n misses signaled subprocess

Reply via email to