commit:     4d435c6d22ac2a270c64aaef3e39aac2f1ccf38e
Author:     Kerin Millar <kfm <AT> plushkava <DOT> net>
AuthorDate: Fri Nov  7 00:15:59 2025 +0000
Commit:     Sam James <sam <AT> gentoo <DOT> org>
CommitDate: Sat Nov  8 00:36:46 2025 +0000
URL:        https://gitweb.gentoo.org/proj/portage.git/commit/?id=4d435c6d

ecompress: avoid losing track of find(1) in fix_symlinks()

Presently, the fix_symlinks() function is tasked with finding and
correcting symlinks that are dangling as a consequence of their targets
having been compressed. The routine that does so is structured in the
following way.

while IFS= read -rd '' link && IFS= read -rd '' target; do
        # Repair the symlinks. May execute rm(1) and ln(1).
done < <(
        # Find dangling symlinks.
        find "$top" -type l -xtype l -printf '%p\0%l\0'
)

# Reap the exit status of find(1), returning false upon failure.
wait "$!" || return

For this, a user reported the following error.

ecompress: line 172: wait: pid 18991 is not a child of this shell

After some initial difficulty in reproducing it, I realised that bash
was losing track of find(1) in its capacity as a background job, owing
to PID reuse. The reasons for the user being able to consistently induce
this issue are as follows.

- both rm(1) and ln(1) are executed many times for openssl (5138 each)
- his kernel was subject to a limit of kernel.pid_max = 32768
- he has a large number of processes (or endures a high fork rate)

Here is a simple reproducer.

# sysctl -w kernel.pid_max=512
# while read -r; do /bin/true; done < <(seq 1 1024); wait "$!"
bash: wait: pid 320 is not a child of this shell

Interestingly, not all shells have this problem. Consider dash as a case
in point.

# dash -c 'mkfifo fifo
{ seq 1 1024; exit 123; } > fifo & pid=$!
while read -r line; do /bin/true; done < fifo; wait "$pid"
echo "$?"'
123

I found it odd that the ability to reap a single asynchronous command
would be affected by proceeding to issue any number of commands in the
foreground. I contacted Chet Ramey, upon which he acknowledged the
problem and explained that bash is only able to keep one record of a
single PID throughout its job control code. Further, this limitation
extends to both background and foreground jobs, so as to simplify the
internal bookkeeping. Consequently, he is considering whether it would
be practical to have bash retain the status of a terminated background
process in the event that a foreground process recycles its PID.

In the meantime, ecompress must operate in a manner that is immune to
this issue. To that end, eschew the use of a process substitution by the
fix_symlinks() function. Instead, employ a conventional pipeline and
determine the exit status of find(1) by examining the PIPESTATUS array.
Also, to compensate for this change, enable the lastpipe shell option,
so that the assignment to the 'something_changed' variable is not lost
to a subshell.

Reported-by: Dan Johansson <dan.johansson <AT> dmj.nu>
Fixes: 781732d87525469e311732d5418ffb0f3e419da8
Closes: https://bugs.gentoo.org/965423
Signed-off-by: Kerin Millar <kfm <AT> plushkava.net>
Signed-off-by: Sam James <sam <AT> gentoo.org>

 bin/ecompress | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/bin/ecompress b/bin/ecompress
index a59e699f07..4aa0363f26 100755
--- a/bin/ecompress
+++ b/bin/ecompress
@@ -144,14 +144,21 @@ guess_suffix() (
        done
 )
 
-fix_symlinks() {
+fix_symlinks() (
        local something_changed link target1 target2 i
 
+       # Run the last element of the ensuing pipeline in the current shell.
+       # This option shall not persist because the function is a forking list.
+       shopt -s lastpipe
+
        # Repeat until nothing changes, in order to handle multiple
        # levels of indirection (see bug #470916).
        while true ; do
                something_changed=0
-               while IFS= read -rd '' link && IFS= read -rd '' target1; do
+
+               printf '%s\0' "${ED}" \
+               | find0 -type l -xtype l -printf '%p\0%l\0' \
+               | while IFS= read -rd '' link && IFS= read -rd '' target1; do
                        target2=${target1}${PORTAGE_COMPRESS_SUFFIX}
 
                        if [[ ${target2} == /* ]]; then
@@ -166,10 +173,15 @@ fix_symlinks() {
                        rm -f -- "${link}" \
                        && ln -snf -- "${target2}" 
"${link}${PORTAGE_COMPRESS_SUFFIX}" \
                        || return
-               done < <(printf '%s\0' "${ED}" | find0 -type l -xtype l -printf 
'%p\0%l\0')
+               done
 
-               # Check whether the invocation of find(1) succeeded.
-               wait "$!" || return
+               # Check whether the invocation of find(1) succeeded. The use of
+               # the wait builtin is avoided here because the rm(1) and ln(1)
+               # utilities may be executed many times, with bash being prone
+               # to losing the status code of the last asynchronous command as
+               # a consequence of PID reuse. This is an issue most likely to
+               # affect OpenRC users. See bug #965423.
+               (( PIPESTATUS[1] == 0 )) || return
 
                if (( ! something_changed )); then
                        break
@@ -180,7 +192,7 @@ fix_symlinks() {
                        break
                fi
        done
-}
+)
 
 if [[ -z $1 ]] ; then
        __helpers_die "${0##*/}: at least one argument needed"

Reply via email to