Re: Hurd term server (was: Hurd GCC ping)

Thomas Schwinge Thu, 09 Oct 2014 07:51:07 -0700

Hi!

On Thu, 9 Oct 2014 14:02:39 +0200, I wrote:
> [CCing the Hurd developers having written or worked on the term server.
> Would appreciate your comments, if you have any.]
> 
> On Wed, 20 Aug 2014 01:24:36 +0200, I wrote:
> > Matthias Klose has recently re-enable the GCC testsuite for GNU/Hurd, and
> > while it now runs to completion (hooray!) there are a number of
> > unexpected test failures (search for »FAIL:«):
> > 
> > On Fri, 15 Aug 2014 14:32:01 +0200, Matthias Klose <[email protected]> wrote:
> > > https://buildd.debian.org/status/fetch.php?pkg=gcc-snapshot&arch=hurd-i386&ver=20140814-1&stamp=1408093330
> > 
> > I once began analyzing this.  After one system upgrade, many months ago
> > (exact timing lost; hardware failure, system set up again, etc.), most of
> > these FAILs suddenly appeared (especially those testing the basic
> > functionality of GCC, which he rightly considered worrying).  When
> > running the FAILing test cases manually, there are no failures.  What
> > strikes out is that often it's only the later checks for warnings/errors
> > of one test case that FAIL (where the previous ones have PASSed), and I
> > think I concluded that GCC's output on stdout gets truncated once it's
> > reached a limit of 1 KiB of data (or similar) -- but only if running the
> > testing through »make check«, DejaGnu (runtest), and not when running GCC
> > (xgcc) manually, where everything works fine.  Also. I got different
> > results whether I was running in a screen session or not, and/or when I
> > had been running the testsuite and/or a screen session on the same PTY
> > before, or a fresh one.  (To sum it up: a mess to diagnose.)
> 
> The failure mode is that the (expected) errors output (as "seen" by the
> testing framework) is truncated:
> 
>     spawn [xgcc]
>     
> /media/erich/home/thomas/tmp/gcc/trunk/gcc/testsuite/gcc.dg/cpp/pr33466.c:8:18:
>  error: invalid suffix "rh" on floating constant
>     [...]
>     
> /media/erich/home/thomas/tmp/gcc/trunk/gcc/testsuite/gcc.dg/cpp/pr33466.c:53:19:
>  error: invalid suffix "ddf"[truncated here]
> 
> This is expected to show additional errors until line 64, but is cut off
> after 6000-something characters.
> 
> > It may be something "simple" like the SA_RESTART bug we recently fixed in
> > dash: maybe something similar to that in GCC, or DejaGnu (runtest,
> > expect, TCL), or screen, or something "funny" happening in the Hurd's PTY
> > machinery (or FIFO?)...
> 
> Turns out it is an issue more of the latter kind...  That is, an
> "incompatibility" of some kind, deep in TCL's buffering implementation
> when reading from PTYs and/or the expect program's usage of these TCL
> interfaces -- I cannot claim to understand this code.


It is, after all, a regression, due to a fix "recently" applied by
Richard:

    commit 1cfdceba98c380ad1cebb3a6b3d1f141d852c691
    Author: Richard Braun <[email protected]>
    Date:   Mon Oct 14 20:48:25 2013 +0200
    
        term: fix read on a closed PTY
    
        * term/ptyio.c (pty_io_read): Return EIO if the terminal has been 
closed.

..., which addresses the issue filed at the end of
<http://www.gnu.org/software/hurd/hurd/translator/term.html>, »screen
Logout Hang«.  (That's the very only reference I could find for this
patch.)  By the looks of it (but without having verified any details),
Richard's patch seems reasonable, and does evidently fix an annoying
issue -- that is, if I revert Richard's patch, the expect/TCL issues goes
away, but the »screen logout hang« issue is back.  I will try to figure
out what's going wrong (but not now).

> Anyhow, what can
> be observed with the Linux kernel, when stracing the following code:
> 
>     #!/usr/bin/expect -f
>     
>     # Doesn't seem to matter.
>     #stty -cooked
>     stty cooked
>     
>     #spawn sh -c "/media/erich/home/thomas/tmp/gcc/755295.build/gcc/xgcc.real 
> -B/media/erich/home/thomas/tmp/gcc/755295.build/gcc/ 
> /media/erich/home/thomas/tmp/gcc/755295/gcc/testsuite/gcc.dg/cpp/pr33466.c  
> -fno-diagnostics-show-caret -fdiagnostics-color=never -std=gnu99 -S -o 
> pr33466.s 2> /tmp/e; cat < /tmp/e"
>     spawn sh -c "cat < /tmp/e"
>     #spawn sh -c "for i in \$(seq 1 99); do echo \$i \$(seq 0 50); done > 
> /tmp/d; cat < /tmp/d"
>     #spawn sh -c "for i in \$(seq 1 99); do echo \$i 
> --------------------------------------------------------------------------; 
> done > /tmp/d; cat < /tmp/d"
>     #spawn printf "%4095d\r\nabc" 1
>     #spawn printf "%4096d%4096d\r\nabc" 1 2
>     interact
> 
> ..., is that the read syscall (reading from the spawned process) never
> returns more than 4095 bytes (that is, does a "short read"), even though
> 4096 bytes have been requested.  The buffering implementation in TCL
> recognizes this, and presumably assumes that there is a chance for the
> next read syscall to block, and so first returns that data for the except
> script to process.  On GNU Hurd in turn, the term server returns the full
> 4096 bytes, and the buffering implementation in TCL continues to read
> another 2000-something bytes (its buffer having been configured for
> 6000-something bytes), and then returns all that data to the except
> script, which that does process fine, but then fails to continue reading
> the next chunk of data.  This is what is causing the truncation of the
> error messages.
> 
> Now, my knowledge about Unix TTY/PTYs is terribly limited.  However, I
> have read somewhere that indeed not the full 4096 bytes, even if
> available, can be returned for there must be one character reserved for a
> trailing newline (or similar...).  If there is such a protocol to be
> obeyed, then our non-conformance might be what is confusing the buffering
> implementation in TCL.  (Alternatively, if every other Unix but GNU Hurd
> always returns "short reads", then maybe there really is a bug in the
> buffering implementation in TCL that has not been noticed until now.)
> (Also I cannot tell what change in GNU Hurd it is that this issue now
> appears -- as an experiment, I downgraded all related packages to old
> versions, that I used before ever noticing this, and this didn't help, so
> it can't be a regression in TCL or the expect program itself, for
> example.)
> 
> With the following hack applied to Hurd's term server, the GCC testsuite
> again works as expected, and no regressions are seen with the GDB
> testsuite, another heavy user of the expect program.  Obviously, this
> doesn't quite look like a proper fix...
> 
> diff --git term/ptyio.c term/ptyio.c
> index 211e70a..ac7fb85 100644
> --- term/ptyio.c
> +++ term/ptyio.c
> @@ -26,6 +26,8 @@
>  #include "term.h"
>  #include "tioctl_S.h"
>  
> +#define READ_MAX 4095
> +
>  /* Set if we need a wakeup when tty output has been done */
>  static int pty_read_blocked = 0;
>  
> @@ -350,6 +352,10 @@ pty_io_read (struct trivfs_protid *cred,
>       size++;
>      }
>  
> +  if (!packet_mode && !user_ioctl_mode)
> +    if (amount > READ_MAX)
> +      amount = READ_MAX;
> +
>    if (size > amount)
>      size = amount;
>    if (size > *datalen)
> @@ -446,6 +452,7 @@ pty_io_write (struct trivfs_protid *cred,
>  }
>  
>  /* Validation has already been done by trivfs_S_io_readable */
> +//TODO: have to consider READ_MAX here?
>  error_t
>  pty_io_readable (size_t *amt)
>  {
> diff --git term/users.c term/users.c
> index 9bd51d0..e2ab473 100644
> --- term/users.c
> +++ term/users.c
> @@ -1545,6 +1545,7 @@ S_tioctl_tiocsti (struct trivfs_protid *cred,
>  }
>  
>  /* TIOCOUTQ -- return output queue size */
> +//TODO: have to consider ptyio.c:READ_MAX here?
>  kern_return_t
>  S_tioctl_tiocoutq (struct trivfs_protid *cred,
>                  int *queue_size)


Grüße,
 Thomas

pgpBFcWcp1nXW.pgp
Description: PGP signature

Re: Hurd term server (was: Hurd GCC ping)

Reply via email to