Re: builtin man page for wait omits information from SIGNALS

Robert Elz Wed, 15 Jun 2022 15:46:45 -0700

    Date:        Wed, 15 Jun 2022 12:14:22 -0700
    From:        AA <aathan_git...@memeplex.com>
    Message-ID:  <023de9fa-06b2-95f1-bf49-7d8a416a8...@memeplex.com>


  | Therefore, if you 
  | could educate/direct me to the right place to ensure the patch at the 
  | link above is considered, I'd be happy to follow up.

I doubt you need to do any more than you have already done.

  | then I wish to wait for the child process to actually terminate. I do 
  | this by:
  |
  | _sigterm() { kill -TERM 6; 2>/dev/null; }; trap _sigterm SIGTERM;

the ';' after the pid in the kill is very unlikely to be what you want,
the following command (redirect only) is perfectly legal, but also
completely pointless.   I suspect you intended to redirect stderr of the
kill.
 
  | python foo.py 2>&1 | rotatelogs -v -e -L foo.log 
  | foo.logfile.%Y-%m-%d-%H_%M_%S 1G 2>> rotatelogs.log & wait 6; wait 6; }

And I assume there was aun unfortunate line wrapping inserted, and that
is really one long line.

I also assume there's something (probably earlier) missing, of that final '}'
is unbalanced.

  | (1) The wait 6 is a reference to the PID of python.

I suspect you know that's not the right way.

  | It's always the same because the pod startup is deterministic.

Famous last words.

  | Getting at that PID symbolically seems quite painful.

It shouldn't be, and while the mechanism is simple enough, how to
actually accomplish it depends upon how things get started.

  | It wasn't clear from the manpage if %- would give me that PID,

It gives a job number - but for the purposes here, that's sufficient
(both wait and kill can use job numbers instead of pids).   But %- is
not much better than 6 (a little better, but not a lot), and if I get
your intent, not the value you want either.   %- is the job number of
the last (but not still) current background job (%+ or %% is the current
one, which is the last started, or the last resumed, or the most recent to
stop - whichever of those happened most recently).  %- is the job that was
%% before the %% shifted.   The % operators never produce pids, and so
certainly %- isn't the pid of the 2nd rightmost process in a pipeline.

  | nor did I find any good way to refer to the job 
  | IDs of members of the most recent pipe.

There isn't really, or not in standard sh, and this isn't something for
which bash has an extension that I'm aware of.   There might be some
magic bash only array var which contains that info though.   Perhaps.

  | If %- works then perhaps the syntax could be %-N to refer to the N'th 
  | back PID in the most recent pipe built by bash?

That would not be much better.

The % notation isn't really intended for scripts to use, it is more for
interactive use, where the user can use the job number printed when an
interactive background job is started, or use the jobs command to list
the current ones.   There are also more options than %% and %- that can
be used - but they always refer to job numbers, never pids, and they
only ever work as standalone args to a few particular built-in commands
(like wait, kill, fg, bg).

  | Building a named pipe so I can execute the components separately in 
  | order to access $! seems absurdly roundabout. I refuse! :)

I'm not surprised.   I wouldn't do it that way either.   Rather, I'd
just do the normal thing and signal the process group (which in a script
means enabling job control ("set -m") before starting the job in question).
Then simply signal the entire job.

If you don't want all the processes in the pipe affected, just have them
ignore the signal - usually that would only apply to processes later in the
pipe than the one you're killing, as any earlier ones would have nowhere
left to send their output once a later process in the pipe has terminated
(they will just get SIGPIPE fairly soon after, usually).

That is, something like

        process1 | process2 | { trap '' TERM; process3; }

Then you kill the job (and use $! to get the pid to which to send the
signal - that would be the pid of the final element of the pipeline, but
since job control is enabled, the signal will be sent to the process group,
not just that process (they will all get it).   I am assuming there that
process2 is the one you want to signal, and that keeping process1 running
after process2 has gone away is pointless.

If there are lots of processes in the pipeline, and this way would be
tedious, you could instead use something more like

        trap '' TERM
        process1 | { trap - TERM; process2; } | process 3
        trap - TERM

  | (2) The double wait 6 is so that the parent bash doesn't immediately die 
  | after delivering the SIGTERM to the child python.

That's not the right way either.   What would be making the parent bash die?

  | It wasn't clear from the man page whether one can or should wait
  | inside a signal handler function.

shell doesn't have signal handlers, it has traps.   traps are invoked because
of signals, but not like a C program signal handler.  They only get invoked
at specific points of the script, and only when it is safe to do so.   There
are no restrictions on what you can do there.

  | Would that have been the better place to put the second wait 6?

I suspect that what you're dealing with is the effect of the text that you
quoted in the original message - where a wait is interrupted when a trapped
signal arrives.   For that, you should test the status from the wait.
To do that entirely safely, you really need a very recent version of bash,
(even more recent than 5.2 beta I suspect).   But in 99% of cases, it can
be done just like

        while :
        do
                wait $pid; status=$?
                if [ "$status" -lt 128 ]
                then
                        break;
                fi
        done

(you could add more to that if needed - including testing which signal
actually interrupted the wait ... use "kill -l $status" after the "fi"
to get the signal name from the exit status of wait).

That's usually fine, but can fail if the process ever does something
like exit(156);   (any exit status >= 128), as that leads to ambiguous
results from wait - in standard sh there's no way to tell the difference
between wait being interrupted by a signal, or the child process being
killed by a signal, or exiting with a value >128.  This is where recent
bash will help

        while :
        do
                wait -p var $pid; status=$?
                if [ "$var" = "$pid" ]
                then
                        break;
                fi
        done

Now you know that the status came from the child, not bash's internal
wait builtin being interrupted.   But to be safe, this needs a very very
recent bash version (any 5.2 has the -p var option, but it didn't always
work perfectly).

Don't shortcut scripts, do things properly, they tend to last longer than
you'd imagine, and the rest of world keeps changing under them.

  | Perhaps it's exactly equivalent.

If you mean that if you put a wait in the trap handler, then you wouldn't
need 2 outside, then no, that's not the right way.  The wait in there is
OK, but you should be checking the status after the process has ended, which
is not always going to be after a SIGTERM is sent, I presume.  Other
things can happen, or the signal might be sent directly to the process,
rather than to your script to pass along.

  | Does the answer warrant an additional doc PR?

Probably not.   Do remember that the manual page is a reference doc, not
a tutorial (there is at least one book about bash, and others that cover
non specific shell programming, which is what you actually seem to be
doing here).   That's why making the change you have requested so far might
not happen (once again, that decision has nothing to do with me) - the
explanation of what happens is there already after all, just not everywhere.

If the man page says something that's wrong, completely omits something
important, or is simply unclear, then you should submit a bug report.
That you have to hunt a bit to find the info that you want, perhaps not.

kre

Re: builtin man page for wait omits information from SIGNALS

Reply via email to