Hello!

We sporadically hit a segfault across different kinds of hardware on Bash version 5.1.8(1)-release (x86_64-redhat-linux-gnu).

We found that run_sigchld_trap was run despite our bash script never explicitly setting a SIGCHLD trap. Indeed, the trap_list array is empty save for a trap set for EXIT at trap_list[0]. I am able to reproduce a similar segfault when manually setting catch_flag=1 and pending_traps[17]=1 in a gdb session. However, I cannot normally reproduce the bug, and can only wait for it to show up about once a month across thousands of servers that each run this script every few minutes. We're currently suspecting that this is not a case of memory corruption not just because we saw this across different hardware, but every time the script crashes, we get this same stack trace.

Here's the stack trace and the relevant function in our bash code that occasionally leads to this segfault:

#0  0x00007f9d3d42976b in kill () at ../sysdeps/unix/syscall-template.S:120
#1  0x000055b6eaa5e920 in termsig_handler (sig=11)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/sig.c:617
#2  0x000055b6eaa5eb21 in termsig_handler (sig=<optimized out>)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/sig.c:484
#3  termsig_sighandler (sig=<optimized out>) at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/sig.c:539
#4  <signal handler called>
#5  __strlen_avx2_rtm () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:74
#6  0x000055b6eaa3d554 in run_sigchld_trap (nchild=1)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/jobs.c:4205
#7  0x000055b6eaa5a4e7 in run_pending_traps () at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/trap.c:370 #8  0x000055b6eaa22060 in execute_command_internal (command=<optimized out>,     asynchronous=<optimized out>, pipe_in=<optimized out>, pipe_out=<optimized out>,     fds_to_close=<optimized out>) at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:1141 #9  0x000055b6eaa233db in execute_connection (fds_to_close=0x55b6eb1940b0, pipe_out=-1, pipe_in=-1,
    asynchronous=0, command=0x55b6eb2611b0)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:2727
#10 execute_command_internal (command=0x55b6eb2611b0, asynchronous=<optimized out>, pipe_in=-1,
    pipe_out=-1, fds_to_close=0x55b6eb1940b0)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:1032
#11 0x000055b6eaa24758 in execute_command (command=0x55b6eb2611b0)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:399
#12 0x000055b6eaa2339e in execute_connection (fds_to_close=0x55b6eb193ea0, pipe_out=-1, pipe_in=-1,
    asynchronous=0, command=0x55b6eb266390)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:2718
#13 execute_command_internal (command=0x55b6eb266390, asynchronous=<optimized out>, pipe_in=-1,
    pipe_out=-1, fds_to_close=0x55b6eb193ea0)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:1032
#14 0x000055b6eaa24758 in execute_command (command=0x55b6eb266390)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:399
#15 0x000055b6eaa2339e in execute_connection (fds_to_close=0x55b6eb268840, pipe_out=-1, pipe_in=-1,
    asynchronous=0, command=0x55b6eb266330)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:2718
#16 execute_command_internal (command=0x55b6eb266330, asynchronous=<optimized out>, pipe_in=-1,
    pipe_out=-1, fds_to_close=0x55b6eb268840)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:1032
#17 0x000055b6eaa24758 in execute_command (command=0x55b6eb266330)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:399
#18 0x000055b6eaa2339e in execute_connection (fds_to_close=0x55b6eb193e50, pipe_out=-1, pipe_in=-1,
    asynchronous=0, command=0x55b6eb266730)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:2718
#19 execute_command_internal (command=0x55b6eb266730, asynchronous=<optimized out>, pipe_in=-1,
    pipe_out=-1, fds_to_close=0x55b6eb193e50)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:1032
#20 0x000055b6eaa24758 in execute_command (command=0x55b6eb266730)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:399
#21 0x000055b6eaa2339e in execute_connection (fds_to_close=0x55b6eb193d70, pipe_out=-1, pipe_in=-1,
    asynchronous=0, command=0x55b6eb1e1480)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:2718
#22 execute_command_internal (command=0x55b6eb1e1480, asynchronous=<optimized out>, pipe_in=-1,
    pipe_out=-1, fds_to_close=0x55b6eb193d70)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:1032
#23 0x000055b6eaa24758 in execute_command (command=0x55b6eb1e1480)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:399
#24 0x000055b6eaa2339e in execute_connection (fds_to_close=0x55b6eb24c240, pipe_out=-1, pipe_in=-1,
    asynchronous=0, command=0x55b6eb1e1420)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:2718
#25 execute_command_internal (command=0x55b6eb1e1420, asynchronous=<optimized out>, pipe_in=-1,
    pipe_out=-1, fds_to_close=0x55b6eb24c240)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:1032
#26 0x000055b6eaa24e4b in execute_in_subshell (command=command@entry=0x55b6eb256f40,     asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1,
    fds_to_close=fds_to_close@entry=0x55b6eb24c240)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:1697
#27 0x000055b6eaa21cb4 in execute_command_internal (command=0x55b6eb256f40,
    asynchronous=<optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x55b6eb24c240)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:670
#28 0x000055b6eaa233db in execute_connection (fds_to_close=0x55b6eb24c240, pipe_out=-1, pipe_in=-1,
    asynchronous=0, command=0x55b6eb25c820)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:2727
#29 execute_command_internal (command=0x55b6eb25c820, asynchronous=<optimized out>, pipe_in=-1,
    pipe_out=-1, fds_to_close=0x55b6eb24c240)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:1032
#30 0x000055b6eaa22193 in execute_command_internal (command=0x55b6eb26a790,
    asynchronous=<optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x55b6eb24c240)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:1024
#31 0x000055b6eaa2ada0 in execute_function (var=<optimized out>, words=0x55b6eb25b7e0,     flags=<optimized out>, fds_to_close=0x55b6eb24c240, async=0, subshell=0)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:5110
#32 0x000055b6eaa2139a in execute_builtin_or_function (flags=8, fds_to_close=0x55b6eb24c240,     redirects=<optimized out>, var=0x55b6eb1f2cc0, builtin=0x0, words=0x55b6eb25b7e0)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:5355
#33 execute_simple_command (simple_command=<optimized out>, pipe_in=-1, pipe_out=-1, async=0,     fds_to_close=<optimized out>) at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:4609
#34 0x000055b6eaa225ba in execute_command_internal (command=0x55b6eb25d380,
    asynchronous=<optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x55b6eb24c240)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:857
#35 0x000055b6eaa22193 in execute_command_internal (command=0x55b6eb269a30,
    asynchronous=<optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x55b6eb24c240)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:1024
#36 0x000055b6eaa2ada0 in execute_function (var=var@entry=0x55b6eb1e7a10,
    words=words@entry=0x55b6eb259e40, flags=flags@entry=72,
    fds_to_close=fds_to_close@entry=0x55b6eb24c240, async=async@entry=0, subshell=subshell@entry=1)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:5110
#37 0x000055b6eaa30162 in execute_subshell_builtin_or_function (words=0x55b6eb259e40, redirects=0x0,     builtin=0x0, var=0x55b6eb1e7a10, pipe_in=-1, pipe_out=-1, async=0, fds_to_close=0x55b6eb24c240,     flags=72) at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:5281 #38 0x000055b6eaa204a1 in execute_simple_command (simple_command=<optimized out>, pipe_in=-1,
    pipe_out=-1, async=0, fds_to_close=<optimized out>)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:4601
#39 0x000055b6eaa225ba in execute_command_internal (command=0x55b6eb25af80,
    asynchronous=<optimized out>, pipe_in=-1, pipe_out=5, fds_to_close=0x55b6eb24c240)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:857
#40 0x000055b6eaa255e6 in execute_pipeline (command=<optimized out>, asynchronous=0,
    pipe_in=<optimized out>, pipe_out=-1, fds_to_close=0x55b6eb25b160)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:2555
#41 0x000055b6eaa23d84 in execute_connection (fds_to_close=0x55b6eb25b160, pipe_out=-1, pipe_in=-1,
    asynchronous=0, command=0x55b6eb263570)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:2739
#42 execute_command_internal (command=0x55b6eb263570, asynchronous=<optimized out>, pipe_in=-1,
    pipe_out=-1, fds_to_close=0x55b6eb25b160)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:1032
#43 0x000055b6eaa24758 in execute_command (command=0x55b6eb263570)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:399
#44 0x000055b6eaa22cae in execute_if_command (if_command=<optimized out>)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:3709
#45 execute_command_internal (command=0x55b6eb26cd00, asynchronous=<optimized out>, pipe_in=-1,
    pipe_out=-1, fds_to_close=0x55b6eb2573c0)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:984
#46 0x000055b6eaa233db in execute_connection (fds_to_close=0x55b6eb2573c0, pipe_out=-1, pipe_in=-1,
    asynchronous=0, command=0x55b6eb257400)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:2727
#47 execute_command_internal (command=0x55b6eb257400, asynchronous=<optimized out>, pipe_in=-1,
    pipe_out=-1, fds_to_close=0x55b6eb2573c0)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:1032
#48 0x000055b6eaa22193 in execute_command_internal (command=0x55b6eb25ba90,
    asynchronous=<optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x55b6eb2573c0)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:1024
#49 0x000055b6eaa2ada0 in execute_function (var=var@entry=0x55b6eb201a80,
    words=words@entry=0x55b6eb274fc0, flags=flags@entry=64,
    fds_to_close=fds_to_close@entry=0x55b6eb2573c0, async=async@entry=0, subshell=subshell@entry=1)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:5110
#50 0x000055b6eaa30162 in execute_subshell_builtin_or_function (words=0x55b6eb274fc0, redirects=0x0,     builtin=0x0, var=0x55b6eb201a80, pipe_in=-1, pipe_out=-1, async=0, fds_to_close=0x55b6eb2573c0,     flags=64) at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:5281 #51 0x000055b6eaa204a1 in execute_simple_command (simple_command=<optimized out>, pipe_in=-1,
    pipe_out=-1, async=0, fds_to_close=<optimized out>)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:4601
#52 0x000055b6eaa225ba in execute_command_internal (command=0x55b6eb24f740,
    asynchronous=<optimized out>, pipe_in=4, pipe_out=-1, fds_to_close=0x55b6eb2573c0)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:857
#53 0x000055b6eaa2591f in execute_pipeline (command=<optimized out>, asynchronous=0,
    pipe_in=<optimized out>, pipe_out=-1, fds_to_close=0x55b6eb2573c0)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:2605
#54 0x000055b6eaa23d84 in execute_connection (fds_to_close=0x55b6eb2573c0, pipe_out=-1, pipe_in=-1,
    asynchronous=0, command=0x55b6eb24f7e0)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:2739
#55 execute_command_internal (command=0x55b6eb24f7e0, asynchronous=<optimized out>, pipe_in=-1,
    pipe_out=-1, fds_to_close=0x55b6eb2573c0)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:1032
#56 0x000055b6eaa233db in execute_connection (fds_to_close=0x55b6eb2573c0, pipe_out=-1, pipe_in=-1,
    asynchronous=0, command=0x55b6eb24f840)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:2727
#57 execute_command_internal (command=0x55b6eb24f840, asynchronous=<optimized out>, pipe_in=-1,
    pipe_out=-1, fds_to_close=0x55b6eb2573c0)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:1032
#58 0x000055b6eaa24758 in execute_command (command=0x55b6eb24f840)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:399
#59 0x000055b6eaa224b0 in execute_case_command (case_command=<optimized out>)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:3588
#60 execute_command_internal (command=0x55b6eb257260, asynchronous=<optimized out>,     pipe_in=<optimized out>, pipe_out=<optimized out>, fds_to_close=<optimized out>)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:966
#61 0x000055b6eaa24758 in execute_command (command=0x55b6eb257260)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/execute_cmd.c:399
#62 0x000055b6eaa160e9 in reader_loop () at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/eval.c:171 #63 0x000055b6eaa0795e in main (argc=4, argv=0x7ffdf47ebd68, env=0x7ffdf47ebd90)
    at /usr/src/debug/bash-5.1.8-6.el9_1.x86_64/shell.c:821

function validateHardwareCached()
{
    scxpath eval '/Platform/Storage/HostBusAdapter[@driver="nvme"][@pciScan][not(@no-scan)]/@pciAddress' /opt/scale/var/hardware/template.xml |
        while read pciAddress; do
            cd /sys/bus/pci/devices
            compgen -G "$pciAddress/nvme/nvme*" >& /dev/null && continue
            read pciScan < <(scxpath eval "/Platform/Storage/HostBusAdapter[@pciAddress='$pciAddress']/@pciScan" /opt/scale/var/hardware/template.xml)
            echo 1 > $pciAddress/remove
            echo 1 > $pciScan/rescan
        done
    [[ -z ${bypassCache:-} ]] || { validateHardwareRaw; return; }
    (
        mkdir -p /dev/shm/validateHardwareCache/tmp/
        shopt -s extglob
        read fingerprint < <(
            {
                set +eu
            shopt -s extglob
                if ! compgen -G "/dev/scale/slot*" >& /dev/null; then
                    date
                    dd if=/dev/urandom bs=8192 count=1 2> /dev/null | md5sum
                    exit
                fi
                exec {blockdev}< <(blockdev --getsize64 /dev/scale/slot*)
                exec {devs}< <(stat -L -c %n\ %t\ %T /dev/scale/slot*)
                exec {stat}< <(stat -c %n\ %Y /dev/scale/slot* /opt/scale/var/hardware/{template,model}.xml)                 exec {sdparm}< <(bash -O extglob -c "sdparm -6 --get=WCE=1 /dev/scale/slot+([0-9]) 2> /dev/null | paste -d ' ' - -")                 exec {rot}< <(bash -O extglob -c "grep -H . /sys/block/+(nvme|mmc|sd|vd)*/queue/rotational")
                exec {pci}< <(ls -ld /sys/bus/pci/devices/*)
                cat /dev/fd/{$blockdev,$stat,$sdparm,$devs,$rot,$pci} 2> /dev/null || true
                cat /opt/scale/var/node_uuid 2> /dev/null || true
                cat /opt/scale/var/hardware/validate-results.xml 2> /dev/null || true
                cat /etc/udev/rules.d/*scale* 2> /dev/null || true
            } |
                sort | tee /dev/shm/validateHardwareCache/tmp/lastContents.$$ | md5sum | awk '{print $1}'
        )
        trap "[[ ! -f /dev/shm/validateHardwareCache/tmp/lastContents.$$ ]] || /bin/rm /dev/shm/validateHardwareCache/tmp/lastContents.$$" EXIT         : ${validateHardwareCacheFile:=/dev/shm/validateHardwareCache/xml.validateHardware.${fingerprint}}         : ${validateHardwareCacheMD5:=/dev/shm/validateHardwareCache/md5.validateHardware.${fingerprint}}         : ${validateHardwareCacheContent:=/dev/shm/validateHardwareCache/content.validateHardware.${fingerprint}}
        touch $validateHardwareCacheFile $validateHardwareCacheMD5
        (
            flock $flockFD
            if [[ -z ${bypassCache:-} ]] && md5sum $validateHardwareCacheFile | { read md5sum file; echo "$md5sum"; } | xargs -i grep -qsx {} $validateHardwareCacheMD5; then
                cat $validateHardwareCacheFile
            else
                (
                    validateHardwareRaw | tee $validateHardwareCacheFile
                    md5sum $validateHardwareCacheFile | awk '{print $1}' > $validateHardwareCacheMD5                     mv -f /dev/shm/validateHardwareCache/tmp/lastContents.$$ $validateHardwareCacheContent
                )
            fi
        ) {flockFD}< $(dirname $validateHardwareCacheFile)
    )
}

At the time of the segfault, line_number corresponded to the third-to-last line, the one that reads ") {flockFD}< $(dirname $validateHardwareCacheFile)".

Any insight into what may be causing this would be much appreciated!

Reply via email to