Hi! On Thu, 9 Oct 2014 14:02:39 +0200, I wrote: > [CCing the Hurd developers having written or worked on the term server. > Would appreciate your comments, if you have any.] > > On Wed, 20 Aug 2014 01:24:36 +0200, I wrote: > > Matthias Klose has recently re-enable the GCC testsuite for GNU/Hurd, and > > while it now runs to completion (hooray!) there are a number of > > unexpected test failures (search for »FAIL:«): > > > > On Fri, 15 Aug 2014 14:32:01 +0200, Matthias Klose <d...@ubuntu.com> wrote: > > > https://buildd.debian.org/status/fetch.php?pkg=gcc-snapshot&arch=hurd-i386&ver=20140814-1&stamp=1408093330 > > > > I once began analyzing this. After one system upgrade, many months ago > > (exact timing lost; hardware failure, system set up again, etc.), most of > > these FAILs suddenly appeared (especially those testing the basic > > functionality of GCC, which he rightly considered worrying). When > > running the FAILing test cases manually, there are no failures. What > > strikes out is that often it's only the later checks for warnings/errors > > of one test case that FAIL (where the previous ones have PASSed), and I > > think I concluded that GCC's output on stdout gets truncated once it's > > reached a limit of 1 KiB of data (or similar) -- but only if running the > > testing through »make check«, DejaGnu (runtest), and not when running GCC > > (xgcc) manually, where everything works fine. Also. I got different > > results whether I was running in a screen session or not, and/or when I > > had been running the testsuite and/or a screen session on the same PTY > > before, or a fresh one. (To sum it up: a mess to diagnose.) > > The failure mode is that the (expected) errors output (as "seen" by the > testing framework) is truncated: > > spawn [xgcc] > > /media/erich/home/thomas/tmp/gcc/trunk/gcc/testsuite/gcc.dg/cpp/pr33466.c:8:18: > error: invalid suffix "rh" on floating constant > [...] > > /media/erich/home/thomas/tmp/gcc/trunk/gcc/testsuite/gcc.dg/cpp/pr33466.c:53:19: > error: invalid suffix "ddf"[truncated here] > > This is expected to show additional errors until line 64, but is cut off > after 6000-something characters. > > > It may be something "simple" like the SA_RESTART bug we recently fixed in > > dash: maybe something similar to that in GCC, or DejaGnu (runtest, > > expect, TCL), or screen, or something "funny" happening in the Hurd's PTY > > machinery (or FIFO?)... > > Turns out it is an issue more of the latter kind... That is, an > "incompatibility" of some kind, deep in TCL's buffering implementation > when reading from PTYs and/or the expect program's usage of these TCL > interfaces -- I cannot claim to understand this code.
It is, after all, a regression, due to a fix "recently" applied by Richard: commit 1cfdceba98c380ad1cebb3a6b3d1f141d852c691 Author: Richard Braun <rbr...@sceen.net> Date: Mon Oct 14 20:48:25 2013 +0200 term: fix read on a closed PTY * term/ptyio.c (pty_io_read): Return EIO if the terminal has been closed. ..., which addresses the issue filed at the end of <http://www.gnu.org/software/hurd/hurd/translator/term.html>, »screen Logout Hang«. (That's the very only reference I could find for this patch.) By the looks of it (but without having verified any details), Richard's patch seems reasonable, and does evidently fix an annoying issue -- that is, if I revert Richard's patch, the expect/TCL issues goes away, but the »screen logout hang« issue is back. I will try to figure out what's going wrong (but not now). > Anyhow, what can > be observed with the Linux kernel, when stracing the following code: > > #!/usr/bin/expect -f > > # Doesn't seem to matter. > #stty -cooked > stty cooked > > #spawn sh -c "/media/erich/home/thomas/tmp/gcc/755295.build/gcc/xgcc.real > -B/media/erich/home/thomas/tmp/gcc/755295.build/gcc/ > /media/erich/home/thomas/tmp/gcc/755295/gcc/testsuite/gcc.dg/cpp/pr33466.c > -fno-diagnostics-show-caret -fdiagnostics-color=never -std=gnu99 -S -o > pr33466.s 2> /tmp/e; cat < /tmp/e" > spawn sh -c "cat < /tmp/e" > #spawn sh -c "for i in \$(seq 1 99); do echo \$i \$(seq 0 50); done > > /tmp/d; cat < /tmp/d" > #spawn sh -c "for i in \$(seq 1 99); do echo \$i > --------------------------------------------------------------------------; > done > /tmp/d; cat < /tmp/d" > #spawn printf "%4095d\r\nabc" 1 > #spawn printf "%4096d%4096d\r\nabc" 1 2 > interact > > ..., is that the read syscall (reading from the spawned process) never > returns more than 4095 bytes (that is, does a "short read"), even though > 4096 bytes have been requested. The buffering implementation in TCL > recognizes this, and presumably assumes that there is a chance for the > next read syscall to block, and so first returns that data for the except > script to process. On GNU Hurd in turn, the term server returns the full > 4096 bytes, and the buffering implementation in TCL continues to read > another 2000-something bytes (its buffer having been configured for > 6000-something bytes), and then returns all that data to the except > script, which that does process fine, but then fails to continue reading > the next chunk of data. This is what is causing the truncation of the > error messages. > > Now, my knowledge about Unix TTY/PTYs is terribly limited. However, I > have read somewhere that indeed not the full 4096 bytes, even if > available, can be returned for there must be one character reserved for a > trailing newline (or similar...). If there is such a protocol to be > obeyed, then our non-conformance might be what is confusing the buffering > implementation in TCL. (Alternatively, if every other Unix but GNU Hurd > always returns "short reads", then maybe there really is a bug in the > buffering implementation in TCL that has not been noticed until now.) > (Also I cannot tell what change in GNU Hurd it is that this issue now > appears -- as an experiment, I downgraded all related packages to old > versions, that I used before ever noticing this, and this didn't help, so > it can't be a regression in TCL or the expect program itself, for > example.) > > With the following hack applied to Hurd's term server, the GCC testsuite > again works as expected, and no regressions are seen with the GDB > testsuite, another heavy user of the expect program. Obviously, this > doesn't quite look like a proper fix... > > diff --git term/ptyio.c term/ptyio.c > index 211e70a..ac7fb85 100644 > --- term/ptyio.c > +++ term/ptyio.c > @@ -26,6 +26,8 @@ > #include "term.h" > #include "tioctl_S.h" > > +#define READ_MAX 4095 > + > /* Set if we need a wakeup when tty output has been done */ > static int pty_read_blocked = 0; > > @@ -350,6 +352,10 @@ pty_io_read (struct trivfs_protid *cred, > size++; > } > > + if (!packet_mode && !user_ioctl_mode) > + if (amount > READ_MAX) > + amount = READ_MAX; > + > if (size > amount) > size = amount; > if (size > *datalen) > @@ -446,6 +452,7 @@ pty_io_write (struct trivfs_protid *cred, > } > > /* Validation has already been done by trivfs_S_io_readable */ > +//TODO: have to consider READ_MAX here? > error_t > pty_io_readable (size_t *amt) > { > diff --git term/users.c term/users.c > index 9bd51d0..e2ab473 100644 > --- term/users.c > +++ term/users.c > @@ -1545,6 +1545,7 @@ S_tioctl_tiocsti (struct trivfs_protid *cred, > } > > /* TIOCOUTQ -- return output queue size */ > +//TODO: have to consider ptyio.c:READ_MAX here? > kern_return_t > S_tioctl_tiocoutq (struct trivfs_protid *cred, > int *queue_size) Grüße, Thomas
pgpkkIe33yJy1.pgp
Description: PGP signature