Hi! On Thu, 9 Oct 2014 14:02:39 +0200, I wrote: > [CCing the Hurd developers having written or worked on the term server. > Would appreciate your comments, if you have any.] > > On Wed, 20 Aug 2014 01:24:36 +0200, I wrote: > > Matthias Klose has recently re-enable the GCC testsuite for GNU/Hurd, and > > while it now runs to completion (hooray!) there are a number of > > unexpected test failures (search for »FAIL:«): > > > > On Fri, 15 Aug 2014 14:32:01 +0200, Matthias Klose <[email protected]> wrote: > > > https://buildd.debian.org/status/fetch.php?pkg=gcc-snapshot&arch=hurd-i386&ver=20140814-1&stamp=1408093330 > > > > I once began analyzing this. After one system upgrade, many months ago > > (exact timing lost; hardware failure, system set up again, etc.), most of > > these FAILs suddenly appeared (especially those testing the basic > > functionality of GCC, which he rightly considered worrying). When > > running the FAILing test cases manually, there are no failures. What > > strikes out is that often it's only the later checks for warnings/errors > > of one test case that FAIL (where the previous ones have PASSed), and I > > think I concluded that GCC's output on stdout gets truncated once it's > > reached a limit of 1 KiB of data (or similar) -- but only if running the > > testing through »make check«, DejaGnu (runtest), and not when running GCC > > (xgcc) manually, where everything works fine. Also. I got different > > results whether I was running in a screen session or not, and/or when I > > had been running the testsuite and/or a screen session on the same PTY > > before, or a fresh one. (To sum it up: a mess to diagnose.) > > The failure mode is that the (expected) errors output (as "seen" by the > testing framework) is truncated: > > spawn [xgcc] > > /media/erich/home/thomas/tmp/gcc/trunk/gcc/testsuite/gcc.dg/cpp/pr33466.c:8:18: > error: invalid suffix "rh" on floating constant > [...] > > /media/erich/home/thomas/tmp/gcc/trunk/gcc/testsuite/gcc.dg/cpp/pr33466.c:53:19: > error: invalid suffix "ddf"[truncated here] > > This is expected to show additional errors until line 64, but is cut off > after 6000-something characters. > > > It may be something "simple" like the SA_RESTART bug we recently fixed in > > dash: maybe something similar to that in GCC, or DejaGnu (runtest, > > expect, TCL), or screen, or something "funny" happening in the Hurd's PTY > > machinery (or FIFO?)... > > Turns out it is an issue more of the latter kind... That is, an > "incompatibility" of some kind, deep in TCL's buffering implementation > when reading from PTYs and/or the expect program's usage of these TCL > interfaces -- I cannot claim to understand this code.
It is, after all, a regression, due to a fix "recently" applied by
Richard:
commit 1cfdceba98c380ad1cebb3a6b3d1f141d852c691
Author: Richard Braun <[email protected]>
Date: Mon Oct 14 20:48:25 2013 +0200
term: fix read on a closed PTY
* term/ptyio.c (pty_io_read): Return EIO if the terminal has been
closed.
..., which addresses the issue filed at the end of
<http://www.gnu.org/software/hurd/hurd/translator/term.html>, »screen
Logout Hang«. (That's the very only reference I could find for this
patch.) By the looks of it (but without having verified any details),
Richard's patch seems reasonable, and does evidently fix an annoying
issue -- that is, if I revert Richard's patch, the expect/TCL issues goes
away, but the »screen logout hang« issue is back. I will try to figure
out what's going wrong (but not now).
> Anyhow, what can
> be observed with the Linux kernel, when stracing the following code:
>
> #!/usr/bin/expect -f
>
> # Doesn't seem to matter.
> #stty -cooked
> stty cooked
>
> #spawn sh -c "/media/erich/home/thomas/tmp/gcc/755295.build/gcc/xgcc.real
> -B/media/erich/home/thomas/tmp/gcc/755295.build/gcc/
> /media/erich/home/thomas/tmp/gcc/755295/gcc/testsuite/gcc.dg/cpp/pr33466.c
> -fno-diagnostics-show-caret -fdiagnostics-color=never -std=gnu99 -S -o
> pr33466.s 2> /tmp/e; cat < /tmp/e"
> spawn sh -c "cat < /tmp/e"
> #spawn sh -c "for i in \$(seq 1 99); do echo \$i \$(seq 0 50); done >
> /tmp/d; cat < /tmp/d"
> #spawn sh -c "for i in \$(seq 1 99); do echo \$i
> --------------------------------------------------------------------------;
> done > /tmp/d; cat < /tmp/d"
> #spawn printf "%4095d\r\nabc" 1
> #spawn printf "%4096d%4096d\r\nabc" 1 2
> interact
>
> ..., is that the read syscall (reading from the spawned process) never
> returns more than 4095 bytes (that is, does a "short read"), even though
> 4096 bytes have been requested. The buffering implementation in TCL
> recognizes this, and presumably assumes that there is a chance for the
> next read syscall to block, and so first returns that data for the except
> script to process. On GNU Hurd in turn, the term server returns the full
> 4096 bytes, and the buffering implementation in TCL continues to read
> another 2000-something bytes (its buffer having been configured for
> 6000-something bytes), and then returns all that data to the except
> script, which that does process fine, but then fails to continue reading
> the next chunk of data. This is what is causing the truncation of the
> error messages.
>
> Now, my knowledge about Unix TTY/PTYs is terribly limited. However, I
> have read somewhere that indeed not the full 4096 bytes, even if
> available, can be returned for there must be one character reserved for a
> trailing newline (or similar...). If there is such a protocol to be
> obeyed, then our non-conformance might be what is confusing the buffering
> implementation in TCL. (Alternatively, if every other Unix but GNU Hurd
> always returns "short reads", then maybe there really is a bug in the
> buffering implementation in TCL that has not been noticed until now.)
> (Also I cannot tell what change in GNU Hurd it is that this issue now
> appears -- as an experiment, I downgraded all related packages to old
> versions, that I used before ever noticing this, and this didn't help, so
> it can't be a regression in TCL or the expect program itself, for
> example.)
>
> With the following hack applied to Hurd's term server, the GCC testsuite
> again works as expected, and no regressions are seen with the GDB
> testsuite, another heavy user of the expect program. Obviously, this
> doesn't quite look like a proper fix...
>
> diff --git term/ptyio.c term/ptyio.c
> index 211e70a..ac7fb85 100644
> --- term/ptyio.c
> +++ term/ptyio.c
> @@ -26,6 +26,8 @@
> #include "term.h"
> #include "tioctl_S.h"
>
> +#define READ_MAX 4095
> +
> /* Set if we need a wakeup when tty output has been done */
> static int pty_read_blocked = 0;
>
> @@ -350,6 +352,10 @@ pty_io_read (struct trivfs_protid *cred,
> size++;
> }
>
> + if (!packet_mode && !user_ioctl_mode)
> + if (amount > READ_MAX)
> + amount = READ_MAX;
> +
> if (size > amount)
> size = amount;
> if (size > *datalen)
> @@ -446,6 +452,7 @@ pty_io_write (struct trivfs_protid *cred,
> }
>
> /* Validation has already been done by trivfs_S_io_readable */
> +//TODO: have to consider READ_MAX here?
> error_t
> pty_io_readable (size_t *amt)
> {
> diff --git term/users.c term/users.c
> index 9bd51d0..e2ab473 100644
> --- term/users.c
> +++ term/users.c
> @@ -1545,6 +1545,7 @@ S_tioctl_tiocsti (struct trivfs_protid *cred,
> }
>
> /* TIOCOUTQ -- return output queue size */
> +//TODO: have to consider ptyio.c:READ_MAX here?
> kern_return_t
> S_tioctl_tiocoutq (struct trivfs_protid *cred,
> int *queue_size)
Grüße,
Thomas
pgpBFcWcp1nXW.pgp
Description: PGP signature
