Re: [BUG] Bash not reacting to Ctrl-C
On Wed, Feb 9, 2011 at 1:18 PM, Bob Proulx wrote: > > Since the exit status of /bin/true is ignored then I think that test > case is flawed. I think at the least needs to check the exit status > of the /bin/true process. > > bash -c 'while true; do /bin/true || exit 1; done' The "|| exit 1" doesn't make any sense. If you seriously claim that that is needed for ^C to work reliably, you're just totally mistaken. Your whole premise that you should look at the error return code is total and utter crap. Lookie here: while : ; do sleep 1; done which is *exactly* the same case, and dammit, if ^C doesn't break out of that loop, then the shell is a broken POS. Agreed? If you tell me that it needs a "|| exit 1", you're just broken. Try it. And now go back to the original case. The same "^C should break out" is true when you replace "sleep' with "/bin/true" or with anything else. It had better break out every single time, on the first try. And it really doesn't. And it's a bash bug. I don't understand why bash people can't accept that. People even debugged it to the particular line of source code in bash. I just tried it: [torvalds@i5 ~]$ while : ; do /bin/sleep 1; done ^C [torvalds@i5 ~]$ while : ; do /bin/true; done ^C^C^C [torvalds@i5 ~]$ while : ; do /bin/true; done ^C [torvalds@i5 ~]$ while : ; do /bin/true; done ^C [torvalds@i5 ~]$ while : ; do /bin/true; done ^C^C^C and the thing to notice is that it clearly is very much about some race condition. Sometimes it works on the first try, sometimes it doesn't. Why are you arguing? Why are you bringing up totally idiotic arguments, while others are ignoring it because they can't reproduce it. There were people who reproduced this on OS X too, btw, so it clearly is not a Linux issue, even if you put your blinders on and ignore the fact that it was already root-caused by Oleg. The problem is that 'set_job_status_and_cleanup()' does that if (wait_sigint_received && (WTERMSIG (child->status) == SIGINT) && .. which just looks totally buggy and racy. There's even a comment about it in the bash source code, for chrissake! Here's the scenario: - wait_for() sets wait_sigint_received to zero (look for the comment here!), and installs the sigint handler - it does other things too, but it does waitchld() that does the actual waitpid() system call - now, imagine the following scenario: the ^C happens just as the child already exited successfully! - so bash itself gets the sigint, and sets wait_sigint_received to 1 So what happens? child->status will be successful (the child was not interrupted by the signal, it exited at just the right time), but bash saw the SIGINT. But because it thinks it needs to see *both* the sigint _and_ the WTERMSIG(child->status)==SIGINT, bash essentially ignores the ^C. Note how bash magically would have worked correctly if the child process had taken one extra millisecond, and also seen the ^C and died of it. Notice how bash acts differently based on that millisecond difference? So it's a bug. Please don't make inane and incorrect excuses for it ("you didn't have an '|| exit 1' there), and please don't say "I can't reproduce it". Even without reproducing it, just looking at the source code should be good enough, no? Especially as Oleg already pinpointed the exact line for you. Now, it does look like the problem is at least partly because bash has a horrible time trying to figure out a truly ambigious case: did the child process explicitly ignore the ^C or not? It looks like bash is trying to basically ignore the ^C in the case the child ignored it. I think that's misguided, but that does seem to be what bash is trying to do. It's misguided exactly because there is absolutely no way to know whether the child returned successfully because it just happened to exit just before the ^C came in, or whether it blocked ^C and ignored it. So even _trying_ to make that judgement call seems to be a bad idea. And no, I don't know bash sources all that well. I played around with them a long time ago, and for this I only glanced at it quickly to get more of a view into what bash is trying to do (all thanks should go to Oleg who already pinpointed the line that breaks). Maybe there are subtle issues, maybe there are broken historical shell semantics here. But please don't ignore this bug just because you cannot reproduce it. Linus
Re: [BUG] Bash not reacting to Ctrl-C
On Fri, Feb 11, 2011 at 8:57 AM, Illia Bobyr wrote: > On 2/9/2011 3:57 PM, Linus Torvalds wrote: >> >> Here's the scenario: I'll quote the scenario again, because you clearly didn't bother to read it. Please _READ_ it this time before you answer, ok? >> - wait_for() sets wait_sigint_received to zero (look for the comment >> here!), and installs the sigint handler >> - it does other things too, but it does waitchld() that does the >> actual waitpid() system call >> - now, imagine the following scenario: the ^C happens just as the >> child already exited successfully! >> - so bash itself gets the sigint, and sets wait_sigint_received to 1 Let me repeat: THE CHILD ALREADY EXITED SUCCESSFULLY. IT NEVER SAW THE ^C. So 'bash' saw the ^C that came from the keyboard. And the exit status from the child does NOT show SIGINT, because the child never saw it. > Do we really need to check wait_sigint_received here? Jesus wept. YES. Because the ONLY thing that has seen the SIGINT is bash. If bash then ignores "wait_sigint_received", then bash has lost the ^C. > If the child exits because of SIGINT was indeed received [..] The amount of pure denial in this thread is astounding. The bug has been analyzed. I told you exactly what happened. Please don't make up some unrelated and totally pointless scenarios. The child exited BECAUSE THE CHILD EXITED. The example child is "/bin/true", for chissake. It exits immediately. It never saw the ^C. The "waitpid()" will return a successful exit, so WTERMSIG() will be zero. If bash ignores 'wait_sigint_received' in this case, then ^C has been lost. Which is the bug. > Maybe the patch should be No. > I'm not an expert on Unix signals, but it seems that it is possible to > tell. Via WTERMSIG (child->status). Christ, NO! You didn't read the bug report. Go back and read it. It has a full analysis. The point is, and let me repeat this for the FIFTH time, since apparently it's not really being appreciated: the child exited before the ^C happened. There is no WTERMSIG(). > Again according to the "Proper handling of SIGINT/SIGQUIT" (*) the > child is supposed to "rekill" itself if it wishes to do any cleanup .. and this is just empty blathering, because you didn't read, and didn't understand, the problem. The child was never killed to begin with. t wasn't even RUNNING when the ^C happened. The only thing running was 'bash'. And bash ignores it, exactly because bash has the buggy if (wait_sigint_received&& (WTERMSIG (child->status) == SIGINT) && .. test. The WTERMSIG() part of the test is broken. It's really simple: wait_sigint_received means that bash got a SIGINT. And ff bash got a SIGINT, then the user pressed ^C. Any random value of WTERMSIG() and the fact that the child didn't see it is TOTALLY AND UTTERLY IMMATERIAL. So ignoring ^C just because it didn't happen while the child was running is wrong. And telling people that you should check the exit status of the child is WRONG. Can we please stop with the stupidity already? Can we please stop with the "I can't reproduce this" excuses? Can we please stop this "I'm running OS X and I don't see it" crap? Can we please stop with the "I didn't even read the bug report, but I'm going to make excuses anyway"? Can we please stop with the denial? It's a bug in bash. It really is that simple. Stop making excuses. Read the source code, and read the bug report. Linus
Re: [BUG] Bash not reacting to Ctrl-C
On Fri, Feb 11, 2011 at 12:21 PM, Chet Ramey wrote: > > You do realize that this case is indistinguishable from the original > scenario in question: the child gets the SIGINT, handles it, and exits > successfully (or not). Have you actually not followed the discussion? Umm. I'm the one who brought that ambiguity up in the first place. Go back and read the end of the original analysis posting (not just the part I had to quote AGAIN, just to get people to admit there is a bug). Illia is then the one who claimed it wasn't at all ambiguous, and suggested the (totally wrong) patch. So please, somebody from the bash camp finally just admit that I (and Oleg) actually know what we are talking about, instead of trying to derail the bug-report by "can't reproduce" or "I'm on OS X, so I don't see it" or "haven't you followed the discussion" answers? Will somebody _finally_ admit there is a race, and that bash has a bug? Please? Instead of this continuing denial? Here's a suggestion: only consider the child program to have "blocked" SIGINT if you actually got an EINTR to the waitpid() system call. Because if the child had exited _before_ you even did a waitpid(), then the waitpid() will - even if bash gets a ^C - return the success report rather than returning EINTR. Now, there's still some room for a race and a bug (^C happens just as the child is in the middle of exiting and the child exits with no WIFSIGNALED but still isn't "ready" yet by the time the waitpid happens), but at least the race condition should hopefully be much much smaller. Linus
Re: [BUG] Bash not reacting to Ctrl-C
On Fri, Feb 11, 2011 at 1:16 PM, Chet Ramey wrote: > > In the meantime, read Martin Cracauer's description of the issue. > http://www.cons.org/cracauer/sigint.html. This is now the second time in the thread that this has been quoted, but bash doesn't even FOLLOW the recommendations in that web-page, so I don't understand why you and Illia point to it. You violate the thing egregiously. Here, let me quote the relevant bit: Take no action when SIGINT is received while you wait for a child. You make a note of the fact that you did receive it, though. and note the "while you wait for a child" part. And the fact is, that is not at all what bash does. What bash does is "Take no action when SIGINT is received _even_if_you_didn't_wait_for_the_child_". Not "during the wait". That's the problem. There's a HUGE race condition in there, and in particular, if the child was quick and already exited, but bash just hadn't gotten around to noticing it it yet, bash will still ignore the ^C that happened after the child had already exited. See? So let's imagine that you're waiting for GNU emacs, and GNU emacs uses ^C for its own handling. Yes, in that case you need to ignore it. But in that case, when you get SIGINT, you'll actually get an EINTR from the waitpid system call. The other case is that the child process was quick and already exited. You get ^C, but the child never did. When you do the waitpid(), you'll never get the EINTR, because there was no actual wait. See the difference? And see how bash doesn't actually do what that "Proper handling of SIGINT" page says to do. Linus
Re: [BUG] Bash not reacting to Ctrl-C
On Fri, Feb 11, 2011 at 1:30 PM, Linus Torvalds wrote: > > The other case is that the child process was quick and already exited. > You get ^C, but the child never did. When you do the waitpid(), you'll > never get the EINTR, because there was no actual wait. Ok, so here's a suggested patch. It took much longer than expected, because I found another independent race while trying to fix this: bash did the "restore_sigint_handler()" _after_ having already tested the "wait_sigint_received" value, so what could happen is that the SIGINT would happen _after_ the test of that flag, but before the signal handler was restored, so bash would (once more) drop the ^C on the floor. So there are two fixes here, and while I have by no means done exhaustive testing, this patch seems to make the problem with dropped ^C really really hard for me to trigger. I do believe it's still racy, it's just that I think it is much less racy than it used to be. And maybe I introduced some new bug. There may be a reason why the sigint-handler was restored too late. But it not only passes my ^C tests, it seems to pass "make test" too, so it can't be horribly broken. And bash _was_ horribly broken wrt ^C before. I tried to make the "child_blocked_sigint" logic very trivial and obvious, and I added comments for it. But if there are any questions about the patch, feel free to holler. The basic rule is: - we only think that a child blocked sigint if we actually get EINTR from a system call (and the signal handler set wait_sigint_received) - if the child returns with WSIGNALLED and WTERMSIG()==SIGINT, then we say "it clearly didn't block it after all, or it emulated dying with SIGINT, so we clear that previous heuristic". - then, instead of checking how the child died, we just use the new 'child_blocked_sigint' flag instead. So instead of testing wait_sigint_received && WIFSIGNALED (s) && WTERMSIG (s) == SIGINT it now tests wait_sigint_received && !child_blocked_sigint which seems to match the comments better anyway. Long story short: maybe the patch is buggy. But the issue is real, and the patch looks largely sensible to me. Linus jobs.c | 35 +++ 1 files changed, 27 insertions(+), 8 deletions(-) diff --git a/jobs.c b/jobs.c index df13ad9..8b9320f 100644 --- a/jobs.c +++ b/jobs.c @@ -2193,7 +2193,7 @@ restore_sigint_handler () } } -static int wait_sigint_received; +static int wait_sigint_received, child_blocked_sigint; /* Handle SIGINT while we are waiting for children in a script to exit. The `wait' builtin should be interruptible, but all others should be @@ -2365,6 +2365,7 @@ wait_for (pid) /* This is possibly a race condition -- should it go in stop_pipeline? */ wait_sigint_received = 0; + child_blocked_sigint = 0; if (job_control == 0 || (subshell_environment&SUBSHELL_COMSUB)) { old_sigint_handler = set_signal_handler (SIGINT, wait_sigint_handler); @@ -2424,6 +2425,18 @@ wait_for (pid) sigaction (SIGCHLD, &oact, (struct sigaction *)NULL); sigprocmask (SIG_SETMASK, &chldset, (sigset_t *)NULL); # endif + /* If the waitchld returned EINTR, and the shell got a SIGINT, + then the child has not died yet, and we assume that the + child has blocked SIGINT. In that case, we require that the + child return with WSIGTERM() == SIGINT to actually consider + the ^C relevant. This is racy (the child may be in the + process of exiting and bash reacted to the EINTR first), + but this makes the race window much much smaller */ + if (r == -1 && errno == EINTR && wait_sigint_received) + { + child_blocked_sigint = 1; + } + queue_sigchld = 0; if (r == -1 && errno == ECHILD && this_shell_builtin == wait_builtin) { @@ -2461,6 +2474,16 @@ wait_for (pid) } while (PRUNNING (child) || (job != NO_JOB && RUNNING (job))); + /* If the child explicitly exited with a SIGINT, we dismiss the + fact that we guessed it blocked sigint earlier */ + if (WIFSIGNALED(child->status) && WTERMSIG(child->status) == SIGINT) +{ + child_blocked_sigint = 0; +} + + /* Restore the original SIGINT signal handler. */ + restore_sigint_handler (); + /* The exit state of the command is either the termination state of the child, or the termination state of the job. If a job, the status of the last child in the pipeline is the significant one. If the command @@ -2556,10 +2579,9 @@ if (job == NO_JOB) pass the status back to our parent. */ s = job_signal_status (job); - if (WIFSIGNALED (s) && WTERMSIG (s) == SIGINT && signal_is_trapped (SIGINT) == 0) + if (!child_blocked_sigint && signal_is_trapped (SIGINT) == 0)
Re: [BUG] Bash not reacting to Ctrl-C
On Mon, Feb 28, 2011 at 6:20 PM, Chet Ramey wrote: > > The patch looks good. I'll take a closer look and probably produce a > patch for bash-4.2 based on it. Thanks for taking a look. So I think that Oleg Nesterov is correct in that the -1 return with errno==EINTR will never actually trigger, because it is re-tried by the loop in "waitchld()". So I don't think my patch is really doing what it _intends_ to do. So in wait_for() (after my patch): r = waitchld (pid, 1); ... if (r == -1 && errno == EINTR && wait_sigint_received) { child_blocked_sigint = 1; } that child_blocked_sigint may never actually trigger. Which certainly explains why I couldn't reproduce the lost ^C any more: it really fixes the race condition, but it does it by basically never considering a child to block ^C. So I think the attached patch is better - it moves all the child_blocked_sigint logic down into waitchld() itself, so that it can really see the right error values. HOWEVER! When I do this, I can see the "lost ^C" issue again. It seems to be a bit harder than before, but I have a very hard time really judging it, it's subjective. But I get traces like this: ... 22:13:02.434759 rt_sigaction(SIGINT, {0x43c980, [], SA_RESTORER, 0x301f833140}, {SIG_DFL, [ 22:13:02.434790 wait4(-1, 0x7fffaa90befc, 0, NULL) = ? ERESTARTSYS (To be restarted) 22:13:02.434945 --- SIGINT (Interrupt) @ 0 (0) --- 22:13:02.434957 rt_sigreturn(0x2) = -1 EINTR (Interrupted system call) 22:13:02.434980 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 4623 22:13:02.435005 rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x301f833140}, {0x43c980, [ 22:13:02.435041 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 22:13:02.435063 --- SIGCHLD (Child exited) @ 0 (0) --- 22:13:02.435074 wait4(-1, 0x7fffaa90ba5c, WNOHANG, NULL) = -1 ECHILD (No child processes) 22:13:02.435095 rt_sigreturn(0x) = 0 ... ie we hit the exact case where the ^C happens just as the child is exiting, so the child has already done the "exit(0)" system call, but the exit() takes long enough that bash has time to react to the ^C and see that EINTR. Note the timing: the SIGINT happens at 02.434945, with the wait returning successfully at 02.434980. We're talking microseconds, but the whole "fork+exec+wait" things are all very fast. So it's just the luck of the draw just where the ^C happens. I can't get it to happen reliably, but it is not entirely rare either. It probably needs a lot of luck, and SMP with some timing bad luck to trigger. And it probably helps that the Linux fork/exec cost is quite low, so being able to hit it just at the exit() is easier. Anyway - to recap: looking at EINTR isn't sufficient, and doesn't close the race. It's _really_ hard to try to decide on the ambiguous case of "child exited without errors on its own just as ^C came in" and "child blocked ^C and exited afterwards without errors". Anybody have any other heuristics to try to disambiguate the two cases? If we really are talking about "interactive programs that catch ^C" here, then it is possible that the only real heuristic is one that is based on time. How _long_ did it take for the process to exit? If the process exits without WIFSIGNALED a long time after ^C (where "long" is obviously only in computer terms), then we might assume it really blocked it and considered it actual input. I don't much like the idea of time-based heuristics, but right now I think bash resolves the ambiguity the wrong way around: bash prefers to err on the "drop ^C" side, even though it's likely to be the rare case. A real interactive program that uses ^C (like an editor) isn't actually ever going to see a SIGINT _at_all_, since it will set the tty state to -isig, and actually read the ^C as the character '\003' rather than have any SIGINT issues). I dunno. Linus jobs.c | 36 1 files changed, 28 insertions(+), 8 deletions(-) diff --git a/jobs.c b/jobs.c index df13ad9..95ad995 100644 --- a/jobs.c +++ b/jobs.c @@ -2193,7 +2193,7 @@ restore_sigint_handler () } } -static int wait_sigint_received; +static int wait_sigint_received, child_blocked_sigint; /* Handle SIGINT while we are waiting for children in a script to exit. The `wait' builtin should be interruptible, but all others should be @@ -2365,6 +2365,7 @@ wait_for (pid) /* This is possibly a race condition -- should it go in stop_pipeline? */ wait_sigint_received = 0; + child_blocked_sigint = 0; if (job_control == 0 || (subshell_environment&SUBSHELL_COMSUB)) { old_sigint_handler = set_signal_handler (SIGINT, wait_sigint_handler); @@ -2461,6 +2462,9 @@ wait_for (pid) } while (PRUNNING (child) || (job != NO_JOB && RUNNING (job))); + /* Restore the original SIGINT signal handler. */ + restore_sigint_handler (); + /* The exit state of the co
Re: [BUG] Bash not reacting to Ctrl-C
On Mon, Mar 7, 2011 at 7:52 AM, Chet Ramey wrote: > > Let's take a step back and approach this a different way. Instead of > trying to intuit whether or not the child did anything with the SIGINT, > let's try to make the race condition smaller. I can still easily see the porblem. I think your patch is an improvement over plain bash, but I can still see the problem. How much harder it is to see is hard to say - it's pretty subjective. But I think my patch (the second one, the one that had the "working" EINTR logic) made it _harder_ to trigger than yours does. The trace I just captured looks like this: 08:12:54.424327 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fdf2ad8e9f0) = 15397 08:12:54.424394 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 08:12:54.424432 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 08:12:54.424463 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 08:12:54.424486 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 08:12:54.424510 rt_sigaction(SIGINT, {0x43c990, [], SA_RESTORER, 0x301f833140}, {SIG_DFL, [], SA_RESTORER, 0x301f833140}, 8) = 0 08:12:54.424541 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 15397 08:12:54.424668 --- SIGINT (Interrupt) @ 0 (0) --- 08:12:54.424680 rt_sigreturn(0x2) = 15397 08:12:54.424702 rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x301f833140}, {0x43c990, [], SA_RESTORER, 0x301f833140}, 8) = 0 08:12:54.424739 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 08:12:54.424761 --- SIGCHLD (Child exited) @ 0 (0) --- 08:12:54.424772 wait4(-1, 0x7fff0a9956dc, WNOHANG, NULL) = -1 ECHILD (No child processes) 08:12:54.424793 rt_sigreturn(0x) = 0 08:12:54.424824 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 08:12:54.424850 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 08:12:54.424881 rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0 and I think one reason why the race is hard to get rid of is simply that system call return is _the_ common point of signal handling in UNIX (technically, obviously any return to user space, but there are no appreciable interrupts etc going on, and there _are_ a lot of system calls). The above trace is one that my patch would have handled correctly (it has no EINTR). And there aren't a lot of different system calls in the tight loop here: that's literally the whole loop of the "while : ; do /bin/true; done" thing. So most of the time, SIGINT will happen at _one_ of those system calls, and so even if you set that "waiting_for_child" flag right around the call to "wait4()", the race is actually much easier to hit than you'd think from "it's just a few CPU instructions". So the above trace is one that my patch would have handled correctly (since it has no EINTR). Maybe a combination of the two approaches would work even better? Linus
Re: [BUG] Bash not reacting to Ctrl-C
On Mon, Mar 7, 2011 at 8:24 AM, Linus Torvalds wrote: > > and I think one reason why the race is hard to get rid of is simply > that system call return is _the_ common point of signal handling in > UNIX (technically, obviously any return to user space, but there are > no appreciable interrupts etc going on, and there _are_ a lot of > system calls). The above trace is one that my patch would have handled > correctly (it has no EINTR). Linux also ends up making this race easier to see probably because all system calls that are interruptible by signals are all "greedy": they try to do as much real work as possible, rather than return EINTR. That means that if there is work pending (like characters in a tty buffer for "read()", or a child that has exited for "wait()"), then system calls under Linux (and likely all other Unixes too, but Linux is the one I can guarantee works this way) will always do as much real work as possible, and return that real work rather than return with EINTR. So the _common_ case will be: - the system call returns with success ("read a few characters" or "found this child") - but the signal handler will be executed immediately at the return point, so user space won't really even "see" the success before the signal handler is executed. In other words, when you do waiting_for_child++; pid = WAITPID (-1, &status, waitpid_flags); waiting_for_child--; even if the "waiting_for_child--" were to be compiled to be one single instruction, and at the exact return point of the system call, the signal handler would still happen right in between the system call return and that instruction. Linus
Re: [BUG] Bash not reacting to Ctrl-C
On Mon, Mar 7, 2011 at 8:45 AM, Chet Ramey wrote: > > The one thing that jumps out at me here is the number of signal-handling > system calls. wait_sigint_handler gets installed and removed as the > SIGINT handler a lot. I wonder if we would see an improvement if I > used a global SIGINT handler that understood what to do with > wait_sigint_received. That's a change for down the road. Signal handling is a pain. There's a reason people hate UNIX signals. It's damn well nearly impossible to handle signals well and without races. Many server applications in particular end up basically exposing them as synchronous events rather than as signals, and it's why we have all the crazy "pselect()" etc extensions to various waiting system calls to say "I want to wait for events and still be interrupted by signals, but I want to then actually *handle* the signal synchronously". And they often do it for efficiency reasons - exactly so that they can avoid the cost of blocking and unblocking them all the time around critical regions. But even if there was some "pwait()" system call interface, I don't think it would really help this particular problem. You might be able to avoid a system call or two (and that would look prettier in traces), and some things might be faster (which is why the server people do it) and might be easier to understand (asynchronous events are just painful as hell). But the fundamental issue is still simply that the whole situation is ambiguous. It really isn't possible to tell the difference between "somebody blocked SIGINT and then returned success later _despite_ the SIGINT" and "somebody returned successfully before SIGINT even happened". Except for timing. The "catch EINTR" was a try at catching a timing issue: "did we see the dead child before we saw the SIGINT". And I think it's why it works pretty well (I really had to do quite a lot of testing to catch the lost ^C behavior with that patch - in contrast, with your latest patch I think I caught it on the fifth or sixth try). Painful. Linus
Re: bash kill(1) doesn't report errors when $(ulimit -i) is exceeded
On Tue, Jul 16, 2013 at 1:31 PM, Lionel Cons wrote: > > Either your ulimit -i is greater than 63000 or we have a Linux bug. If > ulimit -i is reached then kill(1) should fail. Traditionally kill() has never returned errors for things like this. In fact, quite arguably POSIX actively disallows kill() from returning errors for queue overflow: "The kill() function is successful if the process has permission to send sig to any of the processes specified by pid. If kill() fails, no signal shall be sent." Notice how "is successful" is not dependent on whether a signal was sent or not, it is dependent on whether you have _permission_ to send the signal to the specified process. Now, I don't think "POSIX requires" is all that big a deal, and there's a lot of gray areas where POSIX just doesn't talk about everything that could go wrong. So I don't think the above is a very strong argument for not possibly changing semantics, but I do argue that it's an argument for what traditional behavior is. I think you could quite validly argue for changing the Linux kernel semantics, but it has to come from that direction: talk about why you need it, show that it doesn't break other applications, maybe study what different versions of Unix actually do or don't do wrt this. But for Linux you'd need to do it on the kernel mailing list and probably particularly with the people involved in the whole signal semantics (Oleg Nesterov, Al Viro, Andrew Morton, with me probably cc'd). And if you have a patch, that obviously would be a great addition to that discussion too. Linus