On 2/25/10 7:38 AM, wer...@suse.de wrote: > Configuration Information [Automatically generated, do not change]: > Machine: i586 > OS: linux-gnu > Compiler: gcc -I/usr/src/packages/BUILD/bash-4.1 > -L/usr/src/packages/BUILD/bash-4.1/../readline-6.1 > Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='i586' > -DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='i586-suse-linux-gnu' > -DCONF_VENDOR='suse' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' -DSHELL > -DHAVE_CONFIG_H -I. -I. -I./include -I./lib -O2 -march=i586 -mtune=i686 > -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector > -funwind-tables -fasynchronous-unwind-tables -g -D_LARGEFILE64_SOURCE > -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DRECYCLES_PIDS -Wall -g -std=gnu89 > -Wextra -Wno-unprototyped-calls -Wno-switch-enum -Wno-unused-variable > -Wno-unused-parameter -ftree-loop-linear -pipe -fprofile-use > uname output: Linux boole 2.6.27.19-3.2-pae #1 SMP 2009-02-25 15:40:44 +0100 > i686 i686 i386 GNU/Linux > Machine Type: i586-suse-linux-gnu > > Bash Version: 4.1, 4.0, 3.2, 3.1, 3.0 > Patch Level: all > Release Status: release > > Description: > Signal handler may hang in futex_wait() on fast multi processor systems.
This doesn't mean much. > This seems to caused by using stdio within signal handlers in some > cases > where glibc uses malloc()/free() internal. OK, let's set some baselines here. Most of the time, a received signal causes bash to set an internal flag and defer the actual handling of the signal until later. This includes signals for which a user has set a trap. The problem appears to be that bash sets an internal flag indicating that a signal should be processed immediately instead of waiting for a "good time" under certain circumstances and, when it receives a signal for which it has set a trap, running the trap handler immediately causes glibc to execute functions that are not "signal safe" and it is not prepared to accommodate. This report doesn't include the most basic information: the signal bash receives that causes this (not all signals are treated identically), and the contents of the trap handler. It doesn't even say whether bash is interactive or not, or under what circumstances it's executing. Let's start there. (Since I cheated and looked back at previous reports from Novell, I'm going to assume the shell is not interactive while this is happening.) Bash sets this flag under two basic circumstances: when it will potentially block in a state that will not be interruptible (e.g., reading from a remote file system), or when reading from the keyboard, when users expect read(2) to be interrupted and any trap to be taken immediately. The first case doesn't seem to apply. The second case is primarily used when the shell is interactive, so those uses don't work either. The remaining potential places where the "interrupt_immediately" variable is set are during the execution of the wait and read builtins and the unwind-protect framework. It should be possible for those folks who can reproduce this issue to instrument bash in such a way as to track the value of "interrupt_immediately" and notify when it changes. Whether that means outputting some message when it's incremented and decremented or using something like gdb's watchpoints, if we're going to assume that the variable is being inappropriately set (or not reset) the way to a robust fix is to find out where and why that's happening. One strategy I've used in the past is to assign a numeric tag to each place where the variable is modified, and write a message that includes the tag and the variable's value when when the variable is modified. It's never been a problem for me to use stdio to do this, but it may be different on Linux (I don't do the majority of my development on Linux). The increments and decrements should match, and there should always be a corresponding assignment of 0 after an assignment of 1. > Fix: > For the malloc()/free() used by the bash (confgured with > --without-gnu-malloc > and --without-bash-malloc) I use the patch below but this does not > work for > the in glibc internal used malloc()/free() calls. A real solution > could be > the way done in tcsh or ksh where only flags will be set from the > signal > handlers whereas the real work is done within the main loop its self. > > --- parse.y > +++ parse.y 2010-01-20 13:51:39.000000000 +0000 > @@ -1434,10 +1434,11 @@ yy_readline_get () > current_readline_prompt : ""); > > terminate_immediately = 0; > - if (signal_is_ignored (SIGINT) == 0 && old_sigint) > + if (signal_is_ignored (SIGINT) == 0) > { > interrupt_immediately--; > - set_signal_handler (SIGINT, old_sigint); > + if (old_sigint) > + set_signal_handler (SIGINT, old_sigint); > } This patch is ok, in that it makes the code more symmetric, but it's probably not relevant to this issue. This code is used when the shell is interactive, at which point the SIGINT handler has already been set to a known value and will not be NULL. > #if 0 > --- xmalloc.c > +++ xmalloc.c 2010-02-24 08:32:51.452626384 +0000 > @@ -35,6 +35,11 @@ > # include "ansi_stdlib.h" > #endif /* HAVE_STDLIB_H */ This isn't a fix. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRU c...@case.edu http://cnswww.cns.cwru.edu/~chet/