Re: bash randomly logs out on VT / serial console
On Sun, Jan 01, 2017 at 09:14:45PM -0500, Chet Ramey wrote: > On 1/1/17 4:01 PM, r...@armlinux.org.uk wrote: > > Bash Version: 4.3 > > Patch Level: 30 > > Release Status: release > > > > Description: > > Running: > > while :; do s=$(sleep .005 | cat); echo $s; done | uniq > > in a login shell on a virtual terminal or serial console results > > in the shell randomly logging out after ^C. > > I believe this is fixed in bash-4.4. Thank you for your reply. However, while testing bash-4.4 with the five additional patches (bringing it to 4.4.5) shows that it is not fixed in bash-4.4 - see below. The line from __tty_check_change() is from the debug I added to the kernel, which shows that the reason for the EIO error is because the tty pgrp doesn't match the process' pgrp. # ps -ef | grep ttyS0 root 29257 1 0 10:44 ttyS000:00:00 /bin/login -- root 29293 29257 0 10:45 ttyS000:00:00 -bash root@mcbin:~# while :; do s=$(sleep .005 | cat); echo $s; done | uniq ^C root@mcbin:~# while :; do s=$(sleep .005 | cat); echo $s; done | uniq ^C root@mcbin:~# __tty_check_change: tty=ffc0e74bd800 current pgrp=29293(ffc0e795fc00) tty pgrp=29257(ffc0e795f080) logout -- Russell King
Re: bash randomly logs out on VT / serial console
On Mon, Jan 02, 2017 at 02:06:01PM -0500, Chet Ramey wrote: > On 1/2/17 6:09 AM, Russell King wrote: > > On Sun, Jan 01, 2017 at 09:14:45PM -0500, Chet Ramey wrote: > >> On 1/1/17 4:01 PM, r...@armlinux.org.uk wrote: > >>> Bash Version: 4.3 > >>> Patch Level: 30 > >>> Release Status: release > >>> > >>> Description: > >>> Running: > >>> while :; do s=$(sleep .005 | cat); echo $s; done | uniq > >>> in a login shell on a virtual terminal or serial console results > >>> in the shell randomly logging out after ^C. > >> > >> I believe this is fixed in bash-4.4. > > > > Thank you for your reply. > > > > However, while testing bash-4.4 with the five additional patches (bringing > > it to 4.4.5) shows that it is not fixed in bash-4.4 - see below. The line > > from __tty_check_change() is from the debug I added to the kernel, which > > shows that the reason for the EIO error is because the tty pgrp doesn't > > match the process' pgrp. > > OK. I can't reproduce it on Fedora 25. Thank you for your reply. Here's more information from the strace logs in case it helps - these are from the "useful" command rather than the one which I quoted in the bug report which reproduces it as well - "mii-diag" for the "sleep" and "sed" for the "cat" part - otherwise identical. Note that I'm running a modified mii-diag here... PID 10928 - the top level login shell: = 17:38:38.716472 pipe2([3, 4], 0)= 0 17:38:38.716555 rt_sigprocmask(SIG_BLOCK, [INT CHLD], [CHLD], 8) = 0 17:38:38.716617 pipe2([5, 6], 0)= 0 17:38:38.716677 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fae66c0d0) = 14119 17:38:38.717076 setpgid(14119, 14119) = 0 17:38:38.717185 rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0 17:38:38.717249 close(4)= 0 17:38:38.717301 close(4)= -1 EBADF (Bad file descriptor) 17:38:38.717398 rt_sigprocmask(SIG_BLOCK, [INT CHLD], [CHLD], 8) = 0 17:38:38.717463 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fae66c0d0) = 14120 17:38:38.717936 setpgid(14120, 14119) = 0 17:38:38.719654 rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0 17:38:38.719764 close(3)= 0 17:38:38.722508 rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD], 8) = 0 17:38:38.722576 close(5)= 0 17:38:38.722626 close(6)= 0 17:38:38.745323 ioctl(255, TIOCGPGRP, [14119]) = 0 17:38:38.745381 rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0 17:38:38.745501 rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD], 8) = 0 17:38:38.745560 wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGINT}], WSTOPPED|WCONTINUED, NULL) = 14120 17:38:39.477455 wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGINT}], WSTOPPED|WCONTINUED, NULL) = 14119 Here, the two sub-shells (pid 14119 and 14120) that bash has spawned were terminated by SIGINT. 17:38:39.477971 rt_sigprocmask(SIG_BLOCK, [CHLD TSTP TTIN TTOU], [CHLD], 8) = 0 17:38:39.478119 ioctl(255, SNDRV_TIMER_IOCTL_SELECT or TIOCSPGRP, [10928]) = 0 17:38:39.478250 rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0 17:38:39.478395 ioctl(255, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, {B115200 opost isig icanon echo ...}) = 0 17:38:39.478626 ioctl(255, SNDCTL_TMR_STOP or SNDRV_TIMER_IOCTL_GINFO or TCSETSW, {B115200 opost isig icanon echo ...}) = 0 17:38:39.478782 ioctl(255, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, {B115200 opost isig icanon echo ...}) = 0 17:38:39.479323 ioctl(255, TIOCGWINSZ, {ws_row=25, ws_col=80, ws_xpixel=0, ws_ypixel=0}) = 0 17:38:39.479488 write(1, "\n", 1) = 1 17:38:39.479583 rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0 17:38:39.479653 close(3)= -1 EBADF (Bad file descriptor) 17:38:39.479711 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 17:38:39.479802 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=14120, si_status=SIGINT, si_utime=0, si_stime=0} --- 17:38:39.479857 wait4(-1, 0x7fc81f8300, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child processes) 17:38:39.479921 rt_sigreturn() = 0 17:38:39.480088 rt_sigaction(SIGINT, {0x45e080, [], 0}, {0x45e080, [], 0}, 8) = 0 17:38:39.480311 rt_sigprocmask(SIG_BLOCK, [CHLD TSTP TTIN TTOU], [], 8) = 0 17:38:39.480458 ioctl(255, SNDRV_TIMER_IOCTL_SELECT or TIOCSPGRP, [10928]) = 0 Here, the top level shell sets the TTY PGRP to 10928. Note the time stamp - 17:38:39.480458. 17:38:39.480614 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 17:38:39.480740 rt_sigaction(SIGINT, {0x45e080, [], 0}, {0x45e080, [], 0}, 8) = 0 17:38:39.480941 ioctl(0, TIOCGWINSZ, {ws_row=25, w
Re: bash randomly logs out on VT / serial console
On Mon, Jan 02, 2017 at 02:06:01PM -0500, Chet Ramey wrote: > On 1/2/17 6:09 AM, Russell King wrote: > > On Sun, Jan 01, 2017 at 09:14:45PM -0500, Chet Ramey wrote: > >> On 1/1/17 4:01 PM, r...@armlinux.org.uk wrote: > >>> Bash Version: 4.3 > >>> Patch Level: 30 > >>> Release Status: release > >>> > >>> Description: > >>> Running: > >>> while :; do s=$(sleep .005 | cat); echo $s; done | uniq > >>> in a login shell on a virtual terminal or serial console results > >>> in the shell randomly logging out after ^C. > >> > >> I believe this is fixed in bash-4.4. > > > > Thank you for your reply. > > > > However, while testing bash-4.4 with the five additional patches (bringing > > it to 4.4.5) shows that it is not fixed in bash-4.4 - see below. The line > > from __tty_check_change() is from the debug I added to the kernel, which > > shows that the reason for the EIO error is because the tty pgrp doesn't > > match the process' pgrp. > > OK. I can't reproduce it on Fedora 25. Digging into why Fedora doesn't show it comes down to distro choices. Fedora ships util-linux login, which does this when starting the shell: setsid()= 14301 open("/dev/pts/4", O_RDWR|O_NONBLOCK|O_LARGEFILE) = 3 ioctl(3, TCGETS, {B38400 opost isig icanon echo ...}) = 0 fcntl64(3, F_GETFL) = 0x8802 (flags O_RDWR|O_NONBLOCK|O_LARGEFILE) fcntl64(3, F_SETFL, O_RDWR|O_LARGEFILE) = 0 close(0)= 0 close(1)= 0 close(2)= 0 dup2(3, 0) = 0 dup2(3, 1) = 1 dup2(3, 2) = 2 close(3)= 0 ioctl(0, TIOCSCTTY, 1) = 0 rt_sigaction(SIGINT, {SIG_DFL, [INT], SA_RESTART}, {SIG_IGN, [], 0}, 8) = 0 setuid32(0) = 0 chdir("/root") = 0 execve("/home/rmk/bash/bash-4.4/build-pristine/bash", ["-bash"], [/* 8 vars */]) = 0 This has the effect that bash sees that the existing PGRP is the same as the PID - so bash's "original_pgrp" and "shell_pgrp" end up as the same thing. Debian and Ubuntu ship shadow-utils login, which makes no calls to setsid, no TIOC*PGRP calls, no TIOCSCTTY calls. It also makes no attempt to open the tty itself, using the stdin/stderr supplied to print it's prompts and read the login name and password. This results in bash's "original_pgrp" being the PID of the login process, and "shell_pgrp" the PID of the top level bash - and that is a requirement for triggering the bug. So, systems using util-linux login will not show this bug. Systems using shadow-utils login do show the bug. Now also reproduced it on a single-CPU machine, also using the shadow-utils login. -- Russell King
Re: bash randomly logs out on VT / serial console
On Tue, Jan 03, 2017 at 09:03:52AM +, Russell King wrote: > On Mon, Jan 02, 2017 at 02:06:01PM -0500, Chet Ramey wrote: > > On 1/2/17 6:09 AM, Russell King wrote: > > > On Sun, Jan 01, 2017 at 09:14:45PM -0500, Chet Ramey wrote: > > >> On 1/1/17 4:01 PM, r...@armlinux.org.uk wrote: > > >>> Bash Version: 4.3 > > >>> Patch Level: 30 > > >>> Release Status: release > > >>> > > >>> Description: > > >>> Running: > > >>> while :; do s=$(sleep .005 | cat); echo $s; done | uniq > > >>> in a login shell on a virtual terminal or serial console results > > >>> in the shell randomly logging out after ^C. > > >> > > >> I believe this is fixed in bash-4.4. > > > > > > Thank you for your reply. > > > > > > However, while testing bash-4.4 with the five additional patches (bringing > > > it to 4.4.5) shows that it is not fixed in bash-4.4 - see below. The line > > > from __tty_check_change() is from the debug I added to the kernel, which > > > shows that the reason for the EIO error is because the tty pgrp doesn't > > > match the process' pgrp. > > > > OK. I can't reproduce it on Fedora 25. > > Digging into why Fedora doesn't show it comes down to distro choices. > > Fedora ships util-linux login, which does this when starting the shell: > > setsid()= 14301 > open("/dev/pts/4", O_RDWR|O_NONBLOCK|O_LARGEFILE) = 3 > ioctl(3, TCGETS, {B38400 opost isig icanon echo ...}) = 0 > fcntl64(3, F_GETFL) = 0x8802 (flags O_RDWR|O_NONBLOCK|O_LARGEFILE) > fcntl64(3, F_SETFL, O_RDWR|O_LARGEFILE) = 0 > close(0)= 0 > close(1)= 0 > close(2)= 0 > dup2(3, 0) = 0 > dup2(3, 1) = 1 > dup2(3, 2) = 2 > close(3)= 0 > ioctl(0, TIOCSCTTY, 1) = 0 > rt_sigaction(SIGINT, {SIG_DFL, [INT], SA_RESTART}, {SIG_IGN, [], 0}, 8) = 0 > setuid32(0) = 0 > chdir("/root") = 0 > execve("/home/rmk/bash/bash-4.4/build-pristine/bash", ["-bash"], [/* 8 vars > */]) = 0 > > This has the effect that bash sees that the existing PGRP is the same > as the PID - so bash's "original_pgrp" and "shell_pgrp" end up as the > same thing. > > Debian and Ubuntu ship shadow-utils login, which makes no calls to > setsid, no TIOC*PGRP calls, no TIOCSCTTY calls. It also makes no > attempt to open the tty itself, using the stdin/stderr supplied > to print it's prompts and read the login name and password. > > This results in bash's "original_pgrp" being the PID of the login > process, and "shell_pgrp" the PID of the top level bash - and that > is a requirement for triggering the bug. > > So, systems using util-linux login will not show this bug. Systems > using shadow-utils login do show the bug. Sorry for the third mail since you last responded. Having built the util-linux login on one of my Ubuntu boxes, when logging in using the shadow-utils login, with the bash shell printing its internal pgrp state in initialize_job_control() via the handy function already provided: TRACE: pid 6112: original_pgrp = 6076 shell_pgrp = 6112 terminal_pgrp = 6112 TRACE: pid 6112: tcgetpgrp(255) -> 6112, getpgid(0) -> 6112 With the util-linux flavour of login (used by Fedora): TRACE: pid 6217: original_pgrp = 6217 shell_pgrp = 6217 terminal_pgrp = 6217 TRACE: pid 6217: tcgetpgrp(255) -> 6217, getpgid(0) -> 6217 Note - util-linux login is not offered as an alternative on debian or ubuntu installations - this is a custom modification. The bash problem is unreproducable when the util-linux login flavour is used due to original_pgrp == shell_pgrp, which effectively hides the bug. -- Russell King