Re: bash randomly logs out on VT / serial console

2017-01-02 Thread Russell King
On Sun, Jan 01, 2017 at 09:14:45PM -0500, Chet Ramey wrote:
> On 1/1/17 4:01 PM, r...@armlinux.org.uk wrote:
> > Bash Version: 4.3
> > Patch Level: 30
> > Release Status: release
> > 
> > Description:
> > Running:
> > while :; do s=$(sleep .005 | cat); echo $s; done | uniq
> > in a login shell on a virtual terminal or serial console results
> > in the shell randomly logging out after ^C.
> 
> I believe this is fixed in bash-4.4.

Thank you for your reply.

However, while testing bash-4.4 with the five additional patches (bringing
it to 4.4.5) shows that it is not fixed in bash-4.4 - see below.  The line
from __tty_check_change() is from the debug I added to the kernel, which
shows that the reason for the EIO error is because the tty pgrp doesn't
match the process' pgrp.

# ps -ef | grep ttyS0
root 29257 1  0 10:44 ttyS000:00:00 /bin/login --
root 29293 29257  0 10:45 ttyS000:00:00 -bash

root@mcbin:~# while :; do s=$(sleep .005 | cat); echo $s; done | uniq

^C
root@mcbin:~# while :; do s=$(sleep .005 | cat); echo $s; done | uniq

^C
root@mcbin:~# __tty_check_change: tty=ffc0e74bd800 current 
pgrp=29293(ffc0e795fc00) tty pgrp=29257(ffc0e795f080)
logout

-- 
Russell King



Re: bash randomly logs out on VT / serial console

2017-01-02 Thread Russell King
On Mon, Jan 02, 2017 at 02:06:01PM -0500, Chet Ramey wrote:
> On 1/2/17 6:09 AM, Russell King wrote:
> > On Sun, Jan 01, 2017 at 09:14:45PM -0500, Chet Ramey wrote:
> >> On 1/1/17 4:01 PM, r...@armlinux.org.uk wrote:
> >>> Bash Version: 4.3
> >>> Patch Level: 30
> >>> Release Status: release
> >>>
> >>> Description:
> >>>   Running:
> >>>   while :; do s=$(sleep .005 | cat); echo $s; done | uniq
> >>>   in a login shell on a virtual terminal or serial console results
> >>>   in the shell randomly logging out after ^C.
> >>
> >> I believe this is fixed in bash-4.4.
> > 
> > Thank you for your reply.
> > 
> > However, while testing bash-4.4 with the five additional patches (bringing
> > it to 4.4.5) shows that it is not fixed in bash-4.4 - see below.  The line
> > from __tty_check_change() is from the debug I added to the kernel, which
> > shows that the reason for the EIO error is because the tty pgrp doesn't
> > match the process' pgrp.
> 
> OK.  I can't reproduce it on Fedora 25.

Thank you for your reply.

Here's more information from the strace logs in case it helps - these
are from the "useful" command rather than the one which I quoted in
the bug report which reproduces it as well - "mii-diag" for the "sleep"
and "sed" for the "cat" part - otherwise identical.  Note that I'm
running a modified mii-diag here...


PID 10928 - the top level login shell:
=

17:38:38.716472 pipe2([3, 4], 0)= 0
17:38:38.716555 rt_sigprocmask(SIG_BLOCK, [INT CHLD], [CHLD], 8) = 0
17:38:38.716617 pipe2([5, 6], 0)= 0
17:38:38.716677 clone(child_stack=0, 
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0x7fae66c0d0) = 14119
17:38:38.717076 setpgid(14119, 14119)   = 0
17:38:38.717185 rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
17:38:38.717249 close(4)= 0
17:38:38.717301 close(4)= -1 EBADF (Bad file descriptor)
17:38:38.717398 rt_sigprocmask(SIG_BLOCK, [INT CHLD], [CHLD], 8) = 0
17:38:38.717463 clone(child_stack=0, 
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0x7fae66c0d0) = 14120
17:38:38.717936 setpgid(14120, 14119)   = 0
17:38:38.719654 rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
17:38:38.719764 close(3)= 0
17:38:38.722508 rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD], 8) = 0
17:38:38.722576 close(5)= 0
17:38:38.722626 close(6)= 0
17:38:38.745323 ioctl(255, TIOCGPGRP, [14119]) = 0
17:38:38.745381 rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
17:38:38.745501 rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD], 8) = 0
17:38:38.745560 wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGINT}], 
WSTOPPED|WCONTINUED, NULL) = 14120
17:38:39.477455 wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGINT}], 
WSTOPPED|WCONTINUED, NULL) = 14119

Here, the two sub-shells (pid 14119 and 14120) that bash has
spawned were terminated by SIGINT.

17:38:39.477971 rt_sigprocmask(SIG_BLOCK, [CHLD TSTP TTIN TTOU], [CHLD], 8) = 0
17:38:39.478119 ioctl(255, SNDRV_TIMER_IOCTL_SELECT or TIOCSPGRP, [10928]) = 0
17:38:39.478250 rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
17:38:39.478395 ioctl(255, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE 
or TCGETS, {B115200 opost isig icanon echo ...}) = 0
17:38:39.478626 ioctl(255, SNDCTL_TMR_STOP or SNDRV_TIMER_IOCTL_GINFO or 
TCSETSW, {B115200 opost isig icanon echo ...}) = 0
17:38:39.478782 ioctl(255, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE 
or TCGETS, {B115200 opost isig icanon echo ...}) = 0
17:38:39.479323 ioctl(255, TIOCGWINSZ, {ws_row=25, ws_col=80, ws_xpixel=0, 
ws_ypixel=0}) = 0
17:38:39.479488 write(1, "\n", 1)   = 1
17:38:39.479583 rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
17:38:39.479653 close(3)= -1 EBADF (Bad file descriptor)
17:38:39.479711 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
17:38:39.479802 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, 
si_pid=14120, si_status=SIGINT, si_utime=0, si_stime=0} ---
17:38:39.479857 wait4(-1, 0x7fc81f8300, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 
ECHILD (No child processes)
17:38:39.479921 rt_sigreturn()  = 0
17:38:39.480088 rt_sigaction(SIGINT, {0x45e080, [], 0}, {0x45e080, [], 0}, 8) = 0
17:38:39.480311 rt_sigprocmask(SIG_BLOCK, [CHLD TSTP TTIN TTOU], [], 8) = 0
17:38:39.480458 ioctl(255, SNDRV_TIMER_IOCTL_SELECT or TIOCSPGRP, [10928]) = 0

Here, the top level shell sets the TTY PGRP to 10928.  Note
the time stamp - 17:38:39.480458.

17:38:39.480614 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
17:38:39.480740 rt_sigaction(SIGINT, {0x45e080, [], 0}, {0x45e080, [], 0}, 8) = 0
17:38:39.480941 ioctl(0, TIOCGWINSZ, {ws_row=25, w

Re: bash randomly logs out on VT / serial console

2017-01-03 Thread Russell King
On Mon, Jan 02, 2017 at 02:06:01PM -0500, Chet Ramey wrote:
> On 1/2/17 6:09 AM, Russell King wrote:
> > On Sun, Jan 01, 2017 at 09:14:45PM -0500, Chet Ramey wrote:
> >> On 1/1/17 4:01 PM, r...@armlinux.org.uk wrote:
> >>> Bash Version: 4.3
> >>> Patch Level: 30
> >>> Release Status: release
> >>>
> >>> Description:
> >>>   Running:
> >>>   while :; do s=$(sleep .005 | cat); echo $s; done | uniq
> >>>   in a login shell on a virtual terminal or serial console results
> >>>   in the shell randomly logging out after ^C.
> >>
> >> I believe this is fixed in bash-4.4.
> > 
> > Thank you for your reply.
> > 
> > However, while testing bash-4.4 with the five additional patches (bringing
> > it to 4.4.5) shows that it is not fixed in bash-4.4 - see below.  The line
> > from __tty_check_change() is from the debug I added to the kernel, which
> > shows that the reason for the EIO error is because the tty pgrp doesn't
> > match the process' pgrp.
> 
> OK.  I can't reproduce it on Fedora 25.

Digging into why Fedora doesn't show it comes down to distro choices.

Fedora ships util-linux login, which does this when starting the shell:

setsid()= 14301
open("/dev/pts/4", O_RDWR|O_NONBLOCK|O_LARGEFILE) = 3
ioctl(3, TCGETS, {B38400 opost isig icanon echo ...}) = 0
fcntl64(3, F_GETFL) = 0x8802 (flags O_RDWR|O_NONBLOCK|O_LARGEFILE)
fcntl64(3, F_SETFL, O_RDWR|O_LARGEFILE) = 0
close(0)= 0
close(1)= 0
close(2)= 0
dup2(3, 0)  = 0
dup2(3, 1)  = 1
dup2(3, 2)  = 2
close(3)= 0
ioctl(0, TIOCSCTTY, 1)  = 0
rt_sigaction(SIGINT, {SIG_DFL, [INT], SA_RESTART}, {SIG_IGN, [], 0}, 8) = 0
setuid32(0) = 0
chdir("/root")  = 0
execve("/home/rmk/bash/bash-4.4/build-pristine/bash", ["-bash"], [/* 8 vars 
*/]) = 0

This has the effect that bash sees that the existing PGRP is the same
as the PID - so bash's "original_pgrp" and "shell_pgrp" end up as the
same thing.

Debian and Ubuntu ship shadow-utils login, which makes no calls to
setsid, no TIOC*PGRP calls, no TIOCSCTTY calls.  It also makes no
attempt to open the tty itself, using the stdin/stderr supplied
to print it's prompts and read the login name and password.

This results in bash's "original_pgrp" being the PID of the login
process, and "shell_pgrp" the PID of the top level bash - and that
is a requirement for triggering the bug.

So, systems using util-linux login will not show this bug.  Systems
using shadow-utils login do show the bug.

Now also reproduced it on a single-CPU machine, also using the
shadow-utils login.

-- 
Russell King



Re: bash randomly logs out on VT / serial console

2017-01-03 Thread Russell King
On Tue, Jan 03, 2017 at 09:03:52AM +, Russell King wrote:
> On Mon, Jan 02, 2017 at 02:06:01PM -0500, Chet Ramey wrote:
> > On 1/2/17 6:09 AM, Russell King wrote:
> > > On Sun, Jan 01, 2017 at 09:14:45PM -0500, Chet Ramey wrote:
> > >> On 1/1/17 4:01 PM, r...@armlinux.org.uk wrote:
> > >>> Bash Version: 4.3
> > >>> Patch Level: 30
> > >>> Release Status: release
> > >>>
> > >>> Description:
> > >>> Running:
> > >>> while :; do s=$(sleep .005 | cat); echo $s; done | uniq
> > >>> in a login shell on a virtual terminal or serial console results
> > >>> in the shell randomly logging out after ^C.
> > >>
> > >> I believe this is fixed in bash-4.4.
> > > 
> > > Thank you for your reply.
> > > 
> > > However, while testing bash-4.4 with the five additional patches (bringing
> > > it to 4.4.5) shows that it is not fixed in bash-4.4 - see below.  The line
> > > from __tty_check_change() is from the debug I added to the kernel, which
> > > shows that the reason for the EIO error is because the tty pgrp doesn't
> > > match the process' pgrp.
> > 
> > OK.  I can't reproduce it on Fedora 25.
> 
> Digging into why Fedora doesn't show it comes down to distro choices.
> 
> Fedora ships util-linux login, which does this when starting the shell:
> 
> setsid()= 14301
> open("/dev/pts/4", O_RDWR|O_NONBLOCK|O_LARGEFILE) = 3
> ioctl(3, TCGETS, {B38400 opost isig icanon echo ...}) = 0
> fcntl64(3, F_GETFL) = 0x8802 (flags O_RDWR|O_NONBLOCK|O_LARGEFILE)
> fcntl64(3, F_SETFL, O_RDWR|O_LARGEFILE) = 0
> close(0)= 0
> close(1)= 0
> close(2)= 0
> dup2(3, 0)  = 0
> dup2(3, 1)  = 1
> dup2(3, 2)  = 2
> close(3)= 0
> ioctl(0, TIOCSCTTY, 1)  = 0
> rt_sigaction(SIGINT, {SIG_DFL, [INT], SA_RESTART}, {SIG_IGN, [], 0}, 8) = 0
> setuid32(0) = 0
> chdir("/root")  = 0
> execve("/home/rmk/bash/bash-4.4/build-pristine/bash", ["-bash"], [/* 8 vars 
> */]) = 0
> 
> This has the effect that bash sees that the existing PGRP is the same
> as the PID - so bash's "original_pgrp" and "shell_pgrp" end up as the
> same thing.
> 
> Debian and Ubuntu ship shadow-utils login, which makes no calls to
> setsid, no TIOC*PGRP calls, no TIOCSCTTY calls.  It also makes no
> attempt to open the tty itself, using the stdin/stderr supplied
> to print it's prompts and read the login name and password.
> 
> This results in bash's "original_pgrp" being the PID of the login
> process, and "shell_pgrp" the PID of the top level bash - and that
> is a requirement for triggering the bug.
> 
> So, systems using util-linux login will not show this bug.  Systems
> using shadow-utils login do show the bug.

Sorry for the third mail since you last responded.

Having built the util-linux login on one of my Ubuntu boxes, when
logging in using the shadow-utils login, with the bash shell printing
its internal pgrp state in initialize_job_control() via the handy
function already provided:

TRACE: pid 6112: original_pgrp = 6076 shell_pgrp = 6112 terminal_pgrp = 6112
TRACE: pid 6112: tcgetpgrp(255) -> 6112, getpgid(0) -> 6112

With the util-linux flavour of login (used by Fedora):

TRACE: pid 6217: original_pgrp = 6217 shell_pgrp = 6217 terminal_pgrp = 6217
TRACE: pid 6217: tcgetpgrp(255) -> 6217, getpgid(0) -> 6217

Note - util-linux login is not offered as an alternative on debian or
ubuntu installations - this is a custom modification.

The bash problem is unreproducable when the util-linux login flavour
is used due to original_pgrp == shell_pgrp, which effectively hides the
bug.

-- 
Russell King