I still think there must be an answer at the system-call level, i.e. using strace. It seems certain to me that some system call is taking much longer and that finding it would point to the solution. The fact that strace hasn't shown it yet implies that it's losing the thread at some point. Maybe try something like "SHELL := strace ... $(SHELL)", or find some other creative way to force it in at the leaves? The problem has to be visible at the system call layer.
Alternatively, have you tried building make 4.2.1 with custom instrumentation added? David On Sat, Jul 30, 2016 at 11:13 AM, Gardell, Steven <sgard...@sonusnet.com> wrote: > Thanks for the insights and apologies for the delayed response. My > measurements seem to bear out the inference about Orecurse. These were done > in succession on the same HP G9 server with 28 physical cores (56 > hyper-threads) and TMDIR set to /tmp: > > > > Build with leaf makefiles at j=32, and parallel suppressed in higher level > Makefiles – 1:37 (hours:minutes) > > Build with low level makefiles at j=32, parallel enabled for the “list of > makefiles” makefiles and -Orecurse – 1:11 > > Build with low level makefiles at j=32, parallel enabled for the “list of > makefiles” makefiles and -Onone– 1:04 > > Build with low level makefiles at j=32, parallel enabled for the “list of > makefiles” makefiles and -Otarget – 8:55 > > Build with j=1 – 4:45 > > > > These are wall clock times on a shared machine, so certainly subject to > some modest variation; it is likely that the Onone and Orecurse times are > effectively the same. > > > > Somewhat naively, I guess, the most surprising thing here is the -Otarget > time. I would not be awfully surprised to see it asymptotically approach > the no-parallel-building time. But to see it repeatably much longer just > seems a bit bizarre. > > > > *From:* Brian Vandenberg [mailto:phant...@gmail.com] > *Sent:* Wednesday, July 27, 2016 4:59 PM > *To:* Gardell, Steven <sgard...@sonusnet.com> > *Cc:* Bug-make@gnu.org > *Subject:* Re: Parallel builds across makefiles > > > > I haven't seen similar issues but I have a hypothesis: make is > single-threaded and therefore it consumes output from jobs in the same > thread it uses to reap/spawn new jobs. If make is spending a large enough > amount of time consuming output then this will impacts the rate at which it > can spawn/reap new jobs. > > > > I ran some tests to see when make attempts to consume output from child > processes and here's what I learned: > > > > (a) For jobs executing within a single instance of make (not recursive) it > doesn't appear to begin consuming text until a recipe completes (I only > tested this with *--output-sync=target*; I expect similar to be true of > other sync modes excluding *none*). > > (b) If you use *--output-sync=target* and a recipe contains the string > *${MAKE}* at all -- even if it doesn't actually start a recursive make > instance -- the way it consumes output is different (more on that below) > > (c) make will not stop reading job output until there's nothing left to > consume. During that time it cannot reap/spawn new jobs. If your jobs > produce a large enough amount of text this could create enough small delays > that aggregate into the noticeable delays you're seeing. > > > > As an example to demonstrate (b): > > > > .RECIPEPREFIX := < > > SHELL := bash > > OUTER := $(shell echo -n {0..15}) > > INNER := $(shell echo -n {16..32}) > > all: outer > > outer: ${OUTER} > > inner: ${INNER} > > > > ${OUTER}: > > <@${MAKE} -f /tmp/makefile inner > > > > ${INNER}: > > <@perl -e 'my $$tmp = "${MAKE}"; map { printf "x\n"; sleep 1 } (0..100)' > > > > > > I executed with (make 4.1): > > > > make -j64 --output-sync=target > > > > ... and it begins printing output almost immediately in spite of the fact > none of the recipes could have finished. If you take *${MAKE}* out of > the recipe then it behaves as expected. > > > > Now consider what I said w/regard to (c): it will only move on to > spawning/reaping other jobs (or consuming other inputs) if there's nothing > to read for the job it's currently consuming text from. > > > > When I use *--output-sync=recurse* I don't see the same behavior, but if > your build recurses more than 1 level then perhaps each sub-make attempts > to consume output from its recursive make calls, possibly eating away at > time it could spend spawning/reaping jobs. > > > > If I'm right this could be completely avoided for those of us who would > rather have log files instead of having everything dumped to the terminal. > If the parent make still needs to consume the output text, though, then the > limiting factor is: make is single-threaded and consumes input from > recursive jobs in the same thread it spawns/reaps new jobs. > > > > -brian > > _______________________________________________ > Bug-make mailing list > Bug-make@gnu.org > https://lists.gnu.org/mailman/listinfo/bug-make > >
_______________________________________________ Bug-make mailing list Bug-make@gnu.org https://lists.gnu.org/mailman/listinfo/bug-make