On Wed, Jul 20, 2016 at 3:07 PM, Andrew Pinski <pins...@gmail.com> wrote: > On Wed, Jul 20, 2016 at 2:57 PM, H.J. Lu <hjl.to...@gmail.com> wrote: >> On Wed, Jul 20, 2016 at 2:20 PM, Jeff Law <l...@redhat.com> wrote: >>> On 07/20/2016 03:09 PM, Andrew Pinski wrote: >>>> >>>> On Wed, Jul 20, 2016 at 2:02 PM, Andrew Pinski <pins...@gmail.com> wrote: >>>>> >>>>> On Wed, Jul 20, 2016 at 1:32 PM, Jeff Law <l...@redhat.com> wrote: >>>>>> >>>>>> On 07/20/2016 02:21 PM, Segher Boessenkool wrote: >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 20, 2016 at 12:48:09PM +0530, Senthil Kumar Selvaraj wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> I see this for some of the larger C frontend tests with lots of >>>>>>>>> expected >>>>>>>>> errors/warnings as well. >>>>>>> >>>>>>> >>>>>>> >>>>>>> I also see this for tests with small output, but it happens more >>>>>>> often for tests with big output. >>>>>>> >>>>>>>> Are you guys getting this everytime or is it sporadic? >>>>>>> >>>>>>> >>>>>>> >>>>>>> Not always, but usually. It seems related to how busy the system is, >>>>>>> but that could be my imagination. >>>>>>> >>>>>>> It usually happens for a bunch of tests at the same time. >>>>>>> >>>>>>> Not always is the output just cut short: some random characters >>>>>>> are inserted sometimes (in the individual log files, the consolidated >>>>>>> log file does not have those; it looks like pointers). >>>>>> >>>>>> >>>>>> Hmm, there was a kernel bug a while back which had similar behavior. >>>>>> What >>>>>> kernel version are you running? >>>>> >>>>> >>>>> I am running with 4.2. Let me find the old email and see I should >>>>> include it in our tree (I thought we did). >>>> >>>> >>>> Found it. It looks like it was not put in yet: >>>> https://bugzilla.kernel.org/show_bug.cgi?id=96311 >>>> >>>> I know Jakub asked about a year ago saying it was not 4.1-rc1 yet. >>>> Can someone look to see if it even made it into newer kernels? >>> >>> I know "a" fix is in modern Fedora kernels and I thought it came from >>> upstream. >>> >>> Jeff >>> >> >> commit 1a48632ffed61352a7810ce089dc5a8bcd505a60 >> Author: Peter Hurley <pe...@hurleysoftware.com> >> Date: Mon Apr 13 13:24:34 2015 -0400 >> >> pty: Fix input race when closing >> >> A read() from a pty master may mistakenly indicate EOF (errno == -EIO) >> after the pty slave has closed, even though input data remains to be >> read. >> For example, >> > > > I definitely have this patch in my kernel which I am using and still > seeing failures. > I wonder if the memory barriers are still wrong. Since I see this on > aarch64 where the memory ordering is weak with respect to stores. > > Thanks, > Andrew >> >> -- >> H.J.
There may be another fix: commit 0f40fbbcc34e093255a2b2d70b6b0fb48c3f39aa Author: Brian Bloniarz <brian.bloni...@gmail.com> Date: Sun Mar 6 13:16:30 2016 -0800 Fix OpenSSH pty regression on close OpenSSH expects the (non-blocking) read() of pty master to return EAGAIN only if it has received all of the slave-side output after it has received SIGCHLD. This used to work on pre-3.12 kernels. This fix effectively forces non-blocking read() and poll() to block for parallel i/o to complete for all ttys. It also unwinds these changes: 1) f8747d4a466ab2cafe56112c51b3379f9fdb7a12 tty: Fix pty master read() after slave closes 2) 52bce7f8d4fc633c9a9d0646eef58ba6ae9a3b73 pty, n_tty: Simplify input processing on final close 3) 1a48632ffed61352a7810ce089dc5a8bcd505a60 pty: Fix input race when closing -- H.J.