Re: [bug #33138] .PARLLELSYNC enhancement with patch

Paul Smith Tue, 16 Apr 2013 04:47:45 -0700

On Tue, 2013-04-16 at 11:30 +0300, Eli Zaretskii wrote:
> > Date: Tue, 16 Apr 2013 05:54:13 +0000
> > From: "Paul D. Smith" <invalid.nore...@gnu.org>
> > 
> > I did a little bit of code rearrangement, but I still think this code will 
> > not
> > work on Windows and might possibly not compile on Windows.
> 
> Indeed, it will not.  Some cursory comments below.


I was hoping that if OUTPUT_SYNC is not #defined, the Windows code would
compile OK (although obviously without this feature), until we get it
working.

> > Hopefully we can fix that.
> 
> We shall see...
> 
> Here's what I see in the changes that is not friendly to Windows:
> 
>  . STREAM_OK uses fcntl and F_GETFD which both don't exist on Windows.

This is to test if somehow make's stdout or stderr has been closed (not
just redirected to /dev/null, but closed).

>  . FD_NOT_EMPTY will only work if its argument descriptor is connected
>    to a file, not the console.  is this condition always true?

Yes, because here we're testing the temporary file that we're saving
output in.  Either the fd will be -1 (not redirected to a temp file), or
it will be a temp file.

>  . open_tmpfd will need close examination on Windows, especially since
>    it closes the stream (the issue is whether the file will still be
>    automatically deleted when the dup'ed file descriptor is closed).

Yes, I suspected this would not work well.  On UNIX the file is actually
deleted FIRST, by tmpfile(), because on UNIX a file is not actually
deleted until the last file descriptor using it is closed, even if it's
not visible in the filesystem anymore.  This is a very nice way to get a
truly anonymous temporary file that cannot be accessed by anything other
than the process that created it.

I'm not sure what the semantics of tmpfile() are on Windows.

>  . pump_from_tmp_fd will need to switch to_fd to binary mode, since
>    this is how tmpfile opens the temporary file; we need output mode
>    match the input mode.

That's a good point.

>  . acquire_semaphore uses fcntl and F_SETLKW, which don't exist on
>    Windows.  the commentary to that function is not revealing its
>    purpose, so I'm unsure how to implement its equivalent on Windows.
>    can someone explain why is this needed and what should it do?  the
>    name seems to imply that using fcntl is an implementation detail,
>    as a crude semaphore -- is that right?  similarly for
>    release_semaphore.  (see also the next item.)

Yes, this is the guts of the feature.  It ensures that only one make
process is writing output at a time.  On other systems like Windows a
different method might be more appropriate.  Since the resource we're
locking on is the output, on UNIX we lock the output fd.  This saves us
from having to create a separate lock file, etc.

I'm pretty convinced that it works fine, even if stdout/stderr are
redirected.  For example, a recursive make which is redirected to a
different file will work OK; the locking for the sub-make will happen on
that file which is different than the locking for other make instances,
but that's OK because they're writing to different places anyway.  The
sub-make's output will be internally consistent, and not interfere with
the parent make's output which is what you want.

Windows has LockFileEx() but we'd need to examine the semantics to
verify it will do what we want.

>  . is there any significance to the fact that sync_handle is either
>    stdout or stderr?  is that important for the synchronization to
>    work, or was that just a convenient handle around?  also, the
>    documentation I have indicates that locking works only on files;
>    is it guaranteed that these handles are redirected to files before
>    acquire_semaphore is called?

They are definitely NOT guaranteed to be redirected to files.  The lock
is taken on stdout (if open) before any redirection happens, so normally
it would be taken on stdout going to the console.  On Linux it works
fine.  I'll need to read the standard more closely and maybe do some
testing on other systems.  It's just a convenient handle... but if it
doesn't always work and we have to create our own handle then that's
some extra work as we have to communicate the handle info to the child
make processes.

>  . calculation of combined_output in start_job_command will need to be
>    reimplemented for Windows, since the reliance on st_dev and st_ino
>    makes assumptions that are false on Windows.

Yes, not surprising.

> Other notes:
> 
>  . outputting stdout and stderr separately is IMO a misfeature: it
>    breaks the temporal proximity of messages written to these streams,
>    and this makes it harder to understand failures.

I agree 100%; that's what the combined output test above is supposed to
handle.  If stdout and stderr are going to the same place then we
redirect them to the same temporary file so they will ultimately appear
in the same order as they would have on the terminal.  Only if they are
not going to the same place anyway do we keep them separate.  This seems
to always be the behavior you want.

>  . why is "leaving directory FOO" displayed _before_ releasing the
>    stderr output? it should be after, I think.

The enter/leave output needs investigation for sure.  I think we're
doing too much here.  I would have left it alone but I think some tools
that parse make results expect to see that output to help them figure
out what's going on.

>  . AFAIU, the call to fcntl inside sync_output can be interrupted by a
>    signal, and will return -1.  shouldn't the code do something
>    non-trivial to avoid a total failure in this case?

Hm, well, this is the unlock which you would not expect to be waiting.
But you're right it doesn't hurt to use EINTRLOOP here just to be safe.

>  . suggest to use "acquire_semaphore" and "release_semaphore" in the
>    call to perror in those two functions, as "fcntl" is not specific
>    enough, and also reveals implementation details.

Good point.

>  . likewise, "read()" and "write()" in diagnostics from
>    pump_from_tmp_fd are less specific than they could be; how about
>    "read from temporary file" instead?

Yes.

>  . btw, why an error in acquire_semaphore and release_semaphore causes
>    Make to bail out?  isn't it better to fall back on the normal
>    operation mode instead?

I'll look at this.  I guess the problem is that if some instances of
make can take the semaphore and some can't, for some reason, should we
continue anyway?

>  . I don't understand why PARLLEL_SYNC is turned on unconditionally in
>    job.h based on POSIX being defined.  it should IMO be in config.h
>    instead, set at configure time.

Yes I agree: that needs to be worked out.  Some configure.ac magic needs
to happen here.  It's not clear how hard it will be to test for all the
things this feature requires.  Some are straightforward but not all.

>  . the new argument to log_working_directory is not documented in the
>    commentary to the function.

All good points.

Thanks for the review; I'll look into this.


_______________________________________________
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make

Re: [bug #33138] .PARLLELSYNC enhancement with patch

Reply via email to