Re: lost output from asynchronous lists

Ralf Wildenhues Tue, 28 Oct 2008 14:51:52 -0700

Hi Stephane,

* Stephane Chazelas wrote on Tue, Oct 28, 2008 at 11:26:18AM CET:
> 
> I have to admit I would have thought the code above to be safe
> as well and I wonder if it's the same on all systems. But I can
> reproduce the problem on Linux. As far as I can tell, if you
> don't use O_APPEND, the system doesn't guarantee the write(2) to
> be atomic, so I suppose you can get this kind of behavior if a
> context switch occurs in the middle of a write(2) system call.


thanks for the feedback, that looks spot-on!

It is supported by the fact that the log:
> > <http://buildbot.proulx.com:9003/amd64-gnu-linux/builds/961/step-test/0>

shows that the per-test testsuite.log file contains all the output,
while the 'stdout' file did not.  The former is always generated by
either
  tee -a testsuite.log
or
  cat >> testsuite.log

Also, I have not been able to provoke lossage on an unredirected
standard output (manually running ./micro-suite in the test dir).

> That wouldn't have anything to do with the shell.

Yep.

> Replacing foo.sh > stdout 2> stderr with
> : > stdout > stderr
> ./foo.sh >> stdout 2>> stderr
> 
> should be guaranteed to work.

Yes.  For shell portability, I'll write the first line as
  : > stdout
  : > stderr

though.

> I think
> 
> { ./foo.sh | cat > stdout; } 2>&1 | cat > stderr
> 
> should be OK as well as write(2)s to a pipe are meant to be
> atomic as long as they are less than PIPE_BUF bytes (a page size
> on Linux) and even if they were not atomic, I would still
> consider it a bug if one process' output to a pipe was to
> overwrite another one's.

I agree.  However, this solution requires two or three more processes
than the first one.

Consequently, I think the patch below should fix the failure.  I've
tried it out on a couple of GNU/Linux systems, and been unable to
provoke the failure after an hour or so.  I've pushed the change,
and put Stéphane in THANKS.

Cheers,
Ralf, a lot less worried about parallel Autotest now  :-)


        Fix parallel test execution output lossage.
        * lib/autotest/general.m4 (_AT_CHECK): Truncate files to hold
        standard output and standard error before the test, use append
        mode for writing.
        * THANKS: Update.
        Caught by Bob Proulx' build daemons, analysis and suggested fix
        by Stephane Chazelas.

diff --git a/lib/autotest/general.m4 b/lib/autotest/general.m4
index 4d7c0f5..03d3902 100644
--- a/lib/autotest/general.m4
+++ b/lib/autotest/general.m4
@@ -1893,16 +1893,22 @@ m4_define([AT_DIFF_STDOUT()],
 #
 #  ( $at_traceon; $1 ) >at-stdout 2>at-stder1
 #
+# Note that we truncate and append to the output files, to avoid losing
+# output from multiple concurrent processes, e.g., an inner testsuite
+# with parallel jobs.
 m4_define([_AT_CHECK],
 [{ $at_traceoff
 AS_ECHO(["$at_srcdir/AT_LINE: AS_ESCAPE([$1])"])
 echo AT_LINE >"$at_check_line_file"
 
+: >"$at_stdout"
 if _AT_DECIDE_TRACEABLE([$1]); then
-  ( $at_traceon; $1 ) >"$at_stdout" 2>"$at_stder1"
+  : >"$at_stder1"
+  ( $at_traceon; $1 ) >>"$at_stdout" 2>>"$at_stder1"
   at_func_filter_trace $?
 else
-  ( :; $1 ) >"$at_stdout" 2>"$at_stderr"
+  : >"$at_stderr"
+  ( :; $1 ) >>"$at_stdout" 2>>"$at_stderr"
 fi
 at_status=$?
 at_failed=false

Re: lost output from asynchronous lists

Reply via email to