On Wed, 2022-07-20 at 20:33 -0400, Dmitry Goncharov wrote: > if we take a step back, what is the problem? The problem is the > presence of jobserver-auth in MAKEFLAGS in the non recursive case. > You already implemented a solution which sets jobserver-auth=-2,-2. > Another option is to remove jobserver-auth from MAKEFLAGS. This would > rob make of a chance to notify the user that the program cannot > participate in the job server protocol, the file descriptors cannot > be opened. That may be better than the subtlety that you described > above. On the other hand, if we were to rewrite the current impl > with e.g. named pipes, the problem of jobserver-auth in MAKEFLAGS > would stay, would it not?
No, because the problem is not really with jobserver-auth itself. The problem is fundamentally that make is using open file descriptors to pass information from parent makes to sub-makes, combined with the fact that we don't actually know with 100% accuracy which processes we start are really sub-makes, and which are not. We have a heuristic which is not always accurate. This leads to the following problems: 1) We have to be very careful about close-on-exec: when we invoke a sub-make we need to disable close-on-exec for these fds, and when we invoke a process that is not a sub-make we must enable close-on-exec. 2) Even though a sub-process may invoke a sub-make, it may do other things as well. Often a recipe does multiple things in the same subshell, one of which is invoke a sub-make. It's impossible for us to ensure, in these cases, that the jobserver is available only to make. 3) If other processes see the open fds and start reading/writing them (this has happened before) then they'll mess up the jobserver completely. 4) Some processes just close all fds and this causes problems with the jobserver. For example we have an issue with Python where it will, by default, set close-on-exec on all open file descriptors before it runs any subprocess. If you don't realize this and you use Python as part of your build system as an intermediary between parent and child makes, it breaks things in inscrutable ways. 5) The Savannah bugs also mention other issues: https://savannah.gnu.org/bugs/index.php?62397 https://savannah.gnu.org/bugs/index.php?57242 6) There is another issue with setting blocking / non-blocking read on the jobserver fds. I can't remember if there's a Savannah bug or not about this, but changing the blocking/non-blocking status on a fd is not local to a given process and this has caused problems for some applications used with make. I'll have to try to locate the info on this issue. So, why do named pipes help? They help because we're not expecting the open fds to be passed down; in fact we can set close-on-exec for ALL our (non-standard) fds, which is what you'd want. The parent make doesn't need to use a heuristic to figure out whether the child process is a make or not. Instead, any sub-process can look at MAKEFLAGS, see the value of jobserver-auth, find the name of the pipe, open it, and start to use it. If the sub-process doesn't know anything about MAKEFLAGS, it will never know that the jobserver is relevant. If the sub-process knows about it, it can participate. Everything is up to the sub-process and nothing is required of the parent make. This also allows an arbitrary number of intermediate processes to come between the parent make and the sub-make, since we don't need to force all the processes to preserve some resource (open fds) across exec calls to make sure everything continues to work. Semaphores would be the same: we pass down the NAME of the semaphore, and each sub-process that cares would use it. But as I discovered we can't use semaphores because we don't have a reliable way to create an event handler that works both with semaphores and with SIGCHLD (that I could find), other than polling which I don't like. The downside is we need to write code to manage the named pipe resource.