[PATCH] Fix foreground dead jobs in trap handlers reported like background ones in `jobs'

Koichi Murase Sun, 17 Jul 2022 20:54:02 -0700

Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -g -O2 -Wno-parentheses -Wno-format-security
uname output: Linux chatoyancy 5.6.13-100.fc30.x86_64 #1 SMP Fri May
15 00:36:06 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Machine Type: x86_64-pc-linux-gnu


Bash Version: 5.1.4, also in the current devel 87a6e89e

Description:

  In interactive shells, the foreground dead jobs (i.e., foreground
  jobs that have already terminated) in trap handlers, commands called
  through `bind -x', and `PROMPT_COMMAND' (let us call these three
  contexts as the affected context. Note: There might be even other
  similar contexts) are treated as if they are the background dead
  jobs.  This causes several unintended behaviors: (1) When `jobs' is
  called in the affected contexts, all the foreground jobs that have
  already terminated in the same call of the affected contexts are
  printed.  (2) When `wait -n' is called in the affected contexts, it
  immediately returns with the exit status of the previous foreground
  dead job.

  These behaviors are different from those in the normal case where
  `jobs' and `wait -n' are executed outside the trap handler, and I do
  not see the reasoning for this behavior difference between inside
  and outside of the affected contexts.  I hope the behavior in the
  affected contexts would be similar to that in the normal contexts
  where the foreground dead job information is not printed.

  A quick example is here:

    $ PROMPT_COMMAND='for a in {0..9};do (true);done;jobs'

  This reports every time 10 foreground dead job entries, which is
  unexpected.  Its detailed explanation and other examples are
  presented in the next section.  I have received related reports
  twice in different projects, once in 2022-12 and the other in the
  last week:

    https://github.com/akinomyoga/ble.sh/issues/78#issue-771281403
    https://github.com/petobens/trueline/pull/46#issuecomment-1179853850

  I have actually created a patch about one year ago after I first
  received the report, but they were pending because I was not sure
  what would be the intended behavior.  Here, I submit possible
  patches (depending on the intended behavior) with explanations.

Repeat-By:

  (1) `jobs' prints the information of all the foreground dead jobs in
  trap handlers.  This reproduces in all the versions of Bash that I
  tested from 1.14 to devel branch.  This one was originally reported
  by Fabio (@3ximus) at

    https://github.com/akinomyoga/ble.sh/issues/78#issue-771281403

  First consider the following function:

    $ bash --norc
    $ f(){(sleep 1;exit 1)&(exit 2);jobs;}

  If one executes the function normally, `jobs' called inside the
  function reports only the background job "%1 (sleep 1;exit 1)" and
  not the foreground job as expected:

    $ f
    [1] 62465
    [1]+  Running                 ( sleep 1; exit 1 ) &
    $      <-- (wait more than 1 sec and press RET)
    [1]+  Exit 1                  ( sleep 1; exit 1 )

  However, when this function is executed inside the trap handler, the
  the foreground job "%2 (exit 2)" is also reported by `jobs', which
  is not expected:

    $ trap f INT
    $ ^C          /* <-- Here type C-c */
    [1]+  Running                 ( sleep 1; exit 1 ) &
    [2]   Exit 2                  ( exit 2 )
    $      <-- (wait more than 1 sec and press RET)
    [1]+  Exit 1                  ( sleep 1; exit 1 )

  When there are many forks (subshells and external command
  executions), `jobs' will report as many job entries as forks in the
  trap handler which is annoying:

    $ trap 'for a in {0..99};do /bin/true;done;jobs' INT
    $ ^C
    [1]   Done      trap 'for a in {0..99};do /bin/true;done;jobs' INT
    [2]   Done      trap 'for a in {0..99};do /bin/true;done;jobs' INT

    [...]

    [98]   Done      trap 'for a in {0..99};do /bin/true;done;jobs' INT
    [99]   Done      trap 'for a in {0..99};do /bin/true;done;jobs' INT
    [100]   Done      trap 'for a in {0..99};do /bin/true;done;jobs' INT

  Actually, originally reported one had slightly different structure.
  `jobs' in the `bind -x' binding reports the foreground dead jobs in
  the immediately preceding invokation of trap handlers:

    $ trap 'for a in {0..9};do /bin/true;done' INT
    $ bind -x '"\C-t":jobs'
    $ /* <-- Here change the terminal size and type C-t */
    [1]   Done                    bind -x '"\C-t":jobs'
    [2]   Done                    bind -x '"\C-t":jobs'
    [3]   Done                    bind -x '"\C-t":jobs'
    [4]   Done                    bind -x '"\C-t":jobs'
    [5]   Done                    bind -x '"\C-t":jobs'
    [6]   Done                    bind -x '"\C-t":jobs'
    [7]   Done                    bind -x '"\C-t":jobs'
    [8]   Done                    bind -x '"\C-t":jobs'
    [9]   Done                    bind -x '"\C-t":jobs'
    [10]   Done                    bind -x '"\C-t":jobs'


  (2) The behavior of `wait -n' is also affected.

  Consider the following function which waits for the first
  termination of any background job.

    $ bash --norc
    $ g(){(sleep 1;exit 1)&(exit 2);wait -n;echo "wait status: $?";}

  When it is executed normally, it waits for the termination of
  `(sleep 1;exit 1)' for one second and report the exit status `1':

    $ g
    [1] 18082
    [1]+  Exit 1                  ( sleep 1; exit 1 )
    wait status: 1

  However, when it is executed in a trap handler, `wait -n'
  immediately returns and reports the status `2' which is the exit
  status of the already-terminated foreground job `(exit 2)':

    $ trap g INT
    $ ^C          /* <-- Here type C-c */
    wait status: 2
    $
    [1]+  Exit 1                  ( sleep 1; exit 1 )


  (3) The same also happens in PROMPT_COMMAND.  The `jobs` command in
  the following `PROMPT_COMMAND' always reports `(true)' as a
  terminated job.

    $ bash --norc
    $ PROMPT_COMMAND='(true);jobs'

  This was originally reported by Pedro Ferrari (@petobens) in the
  following discussion:

    https://github.com/petobens/trueline/pull/46

Fix:

  The reason that the behavior changes between the outside and inside
  of the affected contexts can be understood from the function body of
  `notify_and_cleanup ()' in `jobs.c'.

  > void
  > notify_and_cleanup ()
  > {
  >   if (jobs_list_frozen)
  >     return;
  >
  >   if (interactive || interactive_shell == 0 || sourcelevel)
  >     notify_of_job_status ();
  >
  >   cleanup_dead_jobs ();
  > }

  The condition of the second `if' statement is satisfied in normal
  contexts but unsatisfied in the affected contexts.  In a normal
  situation, `notify_of_job_status ()' just marks a foreground dead
  job as `J_NOTIFIED' without printing the job information unless the
  job is in the `WIFSIGNALED' state (terminated by a signal?).  Then,
  the subsequent call of `clean_up_dead_jobs ()' removes the jobs
  marked as `J_NOTIFIED'.  In this way, the foreground jobs are
  immediately removed from the job list in a normal context.  However,
  in the affected contexts, `notify_of_job_status ()' is not called,
  so foreground dead jobs are never marked as `J_NOTIFIED' and
  therefore survive after the call of `cleanup_dead_jobs ()'.
  Actually, any foreground jobs created in a trap handler would never
  be removed until the end of the trap handler.

  The actual work of `wait -n' is done in the function
  `wait_for_any_job (flags, ps)' in `jobs.c'.  In the description of
  `wait -n' in the manual and also the source-code comment of
  `wait_for_any_job ()', this function is supposed to wait for any
  background jobs.  However, in the implementation, it isn't actually
  checked whether the job is foreground or background.  This is the
  reason that `wait -n' immediately terminates with a foreground dead
  job in the affected contexts.  I think this implementation is based
  on the assumption that the foreground dead jobs would be immediately
  removed so that the dead jobs are only the background ones when
  `wait_for_any_job ()' is called.  In fact, this assumption is
  correct in normal cases, but this is not the case in the affected
  contexts.


  Option 1: I think the quickest fix is to remove the foreground dead
  jobs immediately in `cleanup_dead_jobs ()' regardless of
  `J_NOTIFIED'.  This is a one-line fix [see the attached patch
  `r0022-foreground-dead-job-fix1.patch.txt'].  There is mostly no use
  for the job entries of foreground dead jobs, so they can be removed
  safely in most cases.

  However, the problem with option 1 is that the job information of
  the foreground jobs of the `WIFSIGNALED' state (i.e., the foreground
  jobs terminated by a signal) will not be printed.

    $ bash-dev --norc    <-- the current devel
    $ trap 'echo begin;for a in {0..3}; do (kill -9 $BASHPID);
done;echo end' INT
    $ ^C     <-- press C-c
    begin
    end

    Killed
    Killed
    Killed
    Killed
    $ bash-r0022-fix1 --norc    <-- the devel patched with option 1
    $ trap 'echo begin;for a in {0..3}; do (kill -9 $BASHPID);
done;echo end' INT
    $ ^C
    begin
    end

  Option 2: Here, another naive fix is that we print the information
  of the foreground jobs terminated by a signal and removes the entry
  [see the attached patch `r0022-foreground-dead-job-fix2.patch.txt'].

  However, I guess this conflicts with the original reason that
  `notify_of_job_status ()' is not called in the affected contexts; I
  guess we do not want to print job information in the middle of trap
  handlers or the other affected contexts, but instead print them at
  the end of the trap handler. See the following example:

    $ bash-r0022-fix2 --norc    <-- the devel patched with option 2
    $ trap 'echo begin;for a in {0..3}; do (kill -9 $BASHPID);
done;echo end' INT
    $ ^C
    begin
    Killed
    Killed
    Killed
    Killed
    end

  Option 3: So, we can defer the deletion of the foreground jobs
  terminated by signals and only removes the foreground dead jobs that
  are not terminated by signals.  We may remove the foreground jobs
  terminated by signals later [see
  `r0022-foreground-dead-job-fix3.patch.txt'].  With this, the

    $ bash-r0022-fix3 --norc    <-- the devel patched with option 3
    $ trap 'echo begin;for a in {0..3}; do (kill -9 $BASHPID);
done;echo end' INT
    $ ^C
    begin
    end

    Killed
    Killed
    Killed
    Killed

--
Koichi

[PATCH] Fix foreground dead jobs in trap handlers reported like background ones in `jobs'

Reply via email to