Oleg Nesterov has distilled a very simple (and reproducable) testcase below for what appears to be a potential long-existing bash bug. This is a problem that triggers on Linux quite frequently. (i can also send the configs.tar.bz2 testcase i made - but i think Oleg's is far simpler) I used bash-3.2-19.fc8 for my tests, on Linux 2.6.24-0.39.rc3.git1.fc9.
Ingo ----- Forwarded message from Oleg Nesterov <[EMAIL PROTECTED]> ----- Date: Mon, 3 Dec 2007 18:42:51 +0300 From: Oleg Nesterov <[EMAIL PROTECTED]> To: Ingo Molnar <[EMAIL PROTECTED]> Subject: Re: weird script behavior, signals? Cc: Jan Kratochvil <[EMAIL PROTECTED]>, Roland McGrath <[EMAIL PROTECTED]> On 12/03, Ingo Molnar wrote: > > here's a fresh incident that is 100% reproducible. I constructed the > following simple oneliner script to analyze saved kernel config files: > > for N in `grep 'is not set' config* | cut -d\# -f2- | cut -d' ' -f2 | > sort | uniq`; do printf "%10d %s\n" `grep "$N=y" config* | wc -l` $N; done > > the script starts printing results like this: > > [...] > 30 CONFIG_B43LEGACY_DEBUG > 15 CONFIG_B43LEGACY_DMA_AND_PIO_MODE > 18 CONFIG_B43LEGACY_DMA_MODE > 19 CONFIG_B43LEGACY_PIO_MODE > 21 CONFIG_B43_DEBUG > 15 CONFIG_B43_DMA_AND_PIO_MODE > 17 CONFIG_B43_DMA_MODE > 6 CONFIG_B43_PCMCIA > [...] > > now if i Ctrl-C the script, i get: > > -bash: printf: CONFIG_AFS_FS: invalid number > > if i Ctrl-Z the script, i get hung output, due to: > > |-login(2068)---bash(2306)---bash(10838)-+-grep(10839) > | `-wc(10840) > > both grep and wc are in T+ state: > > mingo 10839 0.0 0.0 6088 676 tty2 T+ 06:14 > mingo 10840 0.0 0.0 3800 428 tty2 T+ 06:14 0:00 wc -l > > is this signal behavior really expected? I cannot kill the script - i I assume you still can kill it doing "kill" aon another console, yes? > have to manually kill the wc and grep tasks and then have to wait until > its finished. Is this normal? Looks like a bash bug to me. $ echo `echo >&2 XXX; sleep 10000` $ ps ax ... 2549 tty1 S 0:00 -bash 2550 tty1 S+ 0:00 sleep 10000 ... Small note, the job control rules is a black magic to me, so I assume it is correct that "sleep" is in "foreground process group", but "bash" is not. This -bash btw is the child of login shell, it executes `...`. $ cat /proc/2549/status ... ShdPnd: 0000000000000000 SigBlk: 0000000000010000 ... No pending signals, but SIGCHLD is blocked, I think this is the reason. $ cat /proc/2549/wchan; echo do_wait Now I press Ctrl-Z, SIGTSTP goes to "sleep" and stopes it. $ cat /proc/2550/status ... State: T (stopped) ... "sleep" notifies the parent, $ cat /proc/2549/status ... ShdPnd: 0000000000010000 SigBlk: 0000000000010000 ... note the pending SIGCHLD. But it is blocked, signal_pending() is not true. do_notify_parent_cldstop() does __wake_up_parent() anyway, but this doesn't help because according to strace the "bash" does waitpid(-1, 0xafd37628, 0). So do_wait() was called with options == WEXITED, it blocks again after wakeup. This is correct because !signal_pending(). Unless I missed something, perhaps this should be reported to bash developers? Oleg. ----- End forwarded message ----- ----- End forwarded message ----- ----- End forwarded message ----- ----- End forwarded message -----