Configuration Information: Machine: x86_64 OS: linux-gnu Compiler: gcc Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='x86_64' -DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='x86_64-unknown-linux-gnu' -DCONF_VENDOR='unknown' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H -I. -I. -I./include -I./lib -D_FORTIFY_SOURCE=2 -march=x86-64 -mtune=generic -O2 -pip e -fstack-protector-strong -g -fvar-tracking-assignments -g -fvar-tracking-assignments -DDEFAULT_PATH_VALUE='/usr/local/sbin:/usr/local/bin:/usr/bin' -DSTANDARD_UTILS_PATH='/usr/bin' -DSYS_BASHRC='/etc/bash.bashrc' -DSYS_BASH_LOGOUT='/etc/bash.bash_logout' -Wno-parentheses -Wno-format-security uname output: Linux gmx 4.8.13-1-ARCH #1 SMP PREEMPT Fri Dec 9 07:24:34 CET 2016 x86_64 GNU/Linux Machine Type: x86_64-unknown-linux-gnu
Bash Version: 4.4 Patch Level: 11 Release Status: release Description: I'm getting a mysterious hang on one of our Arch Linux machines for a particular, rather simple script; getting a debugger attached to the process after building some debugging symbols, I tracked the hang down to this loop in bgp_delete (with some minor formatting): for ( psi = *(pshash_getbucket (pid)); psi != NO_PIDSTAT; psi = bgpids.storage[psi].bucket_next ) if (bgpids.storage[psi].pid == pid) break; ...the problem is, according to my debugger: (gdb) p psi $1 = 11506 (gdb) p bgpids.storage[psi].bucket_next $2 = 11506 ...and so this just sits there wedging a core :) I'm not entirely sure what circumstances cause this, but it feels pretty racy; it takes, on average, a couple days to get this machine to reliably repeat the issue. I'll leave this process alive for now if you'd like me to gather more forensics. (I do have a core dump, but it's ~5.5MB :) For posterity, and reference below, here's a backtrace--sorry that my UA tries to word wrap it: #0 0x000000000043ff0e in bgp_delete (pid=pid@entry=15980) at jobs.c:868 #1 0x0000000000443f89 in make_child (command=0xb6b930 "/manage/totaldisk.sh", async_p=async_p@entry=1) at jobs.c:2093 #2 0x000000000042ff9c in execute_simple_command (simple_command=0x93bd10, pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, async=async@entry=1, fds_to_close=fds_to_close@entry=0xbbc4c0) at execute_cmd.c:4088 #3 0x0000000000431e5c in execute_command_internal (command=0x93bce0, asynchronous=asynchronous@entry=1, pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0xbbc4c0) at execute_cmd.c:802 #4 0x0000000000433176 in execute_connection (fds_to_close=0xbbc4c0, pipe_out=-1, pipe_in=-1, asynchronous=1, command=0x93bd60) at execute_cmd.c:2576 #5 execute_command_internal (command=command@entry=0x93bd60, asynchronous=asynchronous@entry=1, pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0xbbc4c0) at execute_cmd.c:971 #6 0x0000000000433105 in execute_connection (fds_to_close=0xbbc4c0, pipe_out=-1, pipe_in=-1, asynchronous=1, command=0x93be70) at execute_cmd.c:2564 #7 execute_command_internal (command=command@entry=0x93be70, asynchronous=asynchronous@entry=1, pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0xbbc4c0) at execute_cmd.c:971 #8 0x0000000000433105 in execute_connection (fds_to_close=0xbbc4c0, pipe_out=-1, pipe_in=-1, asynchronous=1, command=0x93bfe0) at execute_cmd.c:2564 #9 execute_command_internal (command=command@entry=0x93bfe0, asynchronous=asynchronous@entry=1, pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0xbbc4c0) at execute_cmd.c:971 #10 0x0000000000433105 in execute_connection (fds_to_close=0xbbc4c0, pipe_out=-1, pipe_in=-1, asynchronous=1, command=0x93c0f0) at execute_cmd.c:2564 #11 execute_command_internal (command=command@entry=0x93c0f0, asynchronous=asynchronous@entry=1, pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0xbbc4c0) at execute_cmd.c:971 #12 0x0000000000433105 in execute_connection (fds_to_close=0xbbc4c0, pipe_out=-1, pipe_in=-1, asynchronous=1, command=0x93c200) at execute_cmd.c:2564 #13 execute_command_internal (command=command@entry=0x93c200, asynchronous=asynchronous@entry=1, pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0xbbc4c0) at execute_cmd.c:971 #14 0x0000000000433105 in execute_connection (fds_to_close=0xbbc4c0, pipe_out=-1, pipe_in=-1, asynchronous=0, command=0x93c370) at execute_cmd.c:2564 #15 execute_command_internal (command=command@entry=0x93c370, asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0xbbc4c0) at execute_cmd.c:971 #16 0x0000000000433a7e in execute_command (command=0x93c370) at execute_cmd.c:405 #17 0x0000000000433b2f in execute_while_or_until (while_command=0x93c3a0, type=type@entry=0) at execute_cmd.c:3509 #18 0x0000000000431cad in execute_while_command (while_command=<optimized out>) at execute_cmd.c:3450 #19 execute_command_internal (command=command@entry=0x93c3c0, asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0x93c3f0) at execute_cmd.c:911 #20 0x0000000000433a7e in execute_command (command=0x93c3c0) at execute_cmd.c:405 #21 0x000000000041c4a2 in reader_loop () at eval.c:180 #22 0x000000000041b1c2 in main (argc=2, argv=0x7fff79e96c78, env=0x7fff79e96c90) at shell.c:792 ...and here is xxd /proc/3127/cmdline--the hanging process. I was asked by gdb when I dumped core to note that there are embedded NULs in here: 00000000: 2f62 696e 2f62 6173 6800 2f6d 616e 6167 /bin/bash./manag 00000010: 652f 7275 6e2e 7368 00 e/run.sh. Repeat-By: Attached is a tarball with the current master of (1) the script 3127 was running, (2) all of the scripts and program sources it calls, and (3) a systemd service which invoked 3127. Assuming your build is susceptible to the bug--exact conditions are quite unclear--extract this tarball's contents, rename the directory in its root to "/manage", and execute "/bin/bash /manage/run.sh"--or, more faithfully, install manage.service into a systemd unit path of your choosing and start that service. Let me know if there's anything else I can provide :) Thanks, Graham
manage3client-master-5e9652e7eb2f53545244aa180427f019ca3e92d6.tar.bz2
Description: Binary data
signature.asc
Description: OpenPGP digital signature