Hi, In git commit c9e6ab9ac73, the '--jobserver-fds' parameter was changed to '--jobserver-auth', with the intent to "publish" the interface as stable for interoperation with other tools. This commit was included in GNU Make 4.2 and newer releases. Presumably, this should mean that GNU Make 4.2, 4.2.1, and 4.3 which all use the same --jobserver-auth interface should be compatible with each other.
Unfortunately, this doesn't seem to be the case: git commit b552b0525 added logic to set the O_NONBLOCK flag both when creating the file descriptors in the parent make instance (jobserver_setup()) and when inheriting them in a child make instance (jobserver_parse_auth()). This flag takes effect in both the parent make instance and all processes which inherit it, even if a child process is the one invoking fcntl(). That commit exists in the GNU Make 4.3 release, but not 4.2 or 4.2.1. If the tool interoperating with version 4.3 is not prepared for read(3) to return EAGAIN, then setting the O_NONBLOCK flag will cause it to fail. Affected tools include GNU Make versions 4.2 and 4.2.1. Here is an example makefile that demonstrates the problem: $ cat Makefile print-version: @echo "At $(MAKELEVEL): $(shell $(MAKE) --version 2>&1 | head -n 1)" SUB_MAKE ?= ifneq ($(SUB_MAKE),) run-sub-make: +@$(SUB_MAKE) do-work ifneq ($(MAKELEVEL),1) do-work: run-sub-make endif endif define make_work do-work$(1): print-version @sleep 1 do-work: do-work$(1) .PHONY: do-work$(1) endef $(foreach i,1 2 3 4 5 6,$(eval $(call make_work,$(i)))) .PHONY: print-version do-work When invoked with GNU Make 4.2.1 or GNU Make 4.3 with a submake using the same version, everything works as expected: $ ./make-4.2.1 --no-print-directory -j2 do-work SUB_MAKE=./make-4.2.1 At 0: GNU Make 4.2.1 At 1: GNU Make 4.2.1 $ ./make-4.3 --no-print-directory -j2 do-work SUB_MAKE=./make-4.3 At 0: GNU Make 4.3 At 1: GNU Make 4.3 But when invoked with GNU Make 4.2.1 with GNU Make 4.3 as a child or vice versa, it fails pretty reliably: $ ./make-4.2.1 --no-print-directory -j2 do-work SUB_MAKE=./make-4.3 At 0: GNU Make 4.2.1 At 1: GNU Make 4.3 make-4.2.1: *** read jobs pipe: Resource temporarily unavailable. Stop. make-4.2.1: *** Waiting for unfinished jobs.... $ ./make-4.3 --no-print-directory -j2 do-work SUB_MAKE=./make-4.2.1 At 0: GNU Make 4.3 At 1: GNU Make 4.2.1 make-4.2.1[1]: *** read jobs pipe: Resource temporarily unavailable. Stop. make-4.2.1[1]: *** Waiting for unfinished jobs.... make-4.3: *** [Makefile:7: run-sub-make] Error 2 make-4.3: *** Waiting for unfinished jobs.... I have a fairly complex production codebase that ends up in the situation of GNU Make 4.2.1 calling GNU Make 4.3, which is running into this problem. (These are logically distinct components in separate source code repositories, both of which bootstrap themselves with a specific version of make in source control, in order to make the builds more reproducible across a wide range of build environments while still allowing the use of newer Make features.) It is not feasible to change both of these in lockstep. I put together a simple workaround to address this specific case of a parent GNU Make 4.2.1 with a child GNU Make 4.3: it simply removes the call to 'set_blocking (job_fds[0], 0);' from jobserver_parse_auth() in the GNU Make 4.3 build, while leaving GNU Make 4.2.1 alone. With this patch, when GNU Make 4.2.1 invokes GNU Make 4.3, both processes will use blocking reads. This exposes the patched GNU Make 4.3 process to the race condition that may lead to a hang which commit b552b0525 was originally implemented to address, but I think the risk of this happening shouldn't be demonstrably worse than just running GNU Make 4.2.1 in all instances. And, in the case that the patched GNU Make 4.3 is used for the parent process and all children processes, non-blocking reads will be used so it should have no effect. This doesn't address the inverse case of a version 4.3 parent process and version 4.2.1 child, but at least for my needs that's enough. My questions for this list are: - Is there a better way to handle the compatibility break in this stable interface? It looks like the latest git master version of doc/make.texi still documents: Note that the read side of the jobserver pipe is set to ``blocking'' mode. How are other tools expected to deal with this? - Is there some reason I'm missing that 'set_blocking (job_fds[0], 0);' is called from jobserver_parse_auth()? Putting aside all the mixed-version considerations, with a purely version 4.3 configuration this seems completely unnecessary since the parent's flags will be inherited. This may be a worthwhile patch to apply just for simplicity's sake. - Is there some reason that using GNU Make 4.3 with a blocking jobserver-auth FD (inherited as described) would be more susceptible to the race condition that was closed in commit b552b0525 than GNU Make 4.2.1? Thanks, Robert