https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96794
--- Comment #7 from rguenther at suse dot de <rguenther at suse dot de> --- On Wed, 26 Aug 2020, hubicka at ucw dot cz wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96794 > > --- Comment #3 from Jan Hubicka <hubicka at ucw dot cz> --- > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96794 > > > > --- Comment #2 from Martin Liška <marxin at gcc dot gnu.org> --- > > (In reply to Jan Hubicka from comment #1) > > > > As seen > > > > here:https://gist.githubusercontent.com/marxin/223890df4d8d8e490b6b2918b77dacad/raw/7e0363da60dcddbfde4ab68fa3be755515166297/gcc-10-with-zstd.svg > > > > > > > > each blocking linking of a GCC front-end leads to a wasted jobserver > > > > worker. > > > Hmm, I am not sure how to interpret the graph. I can see that there is a > > > funny staircase of ltranses but how that relates to jobserver workers? > > > > Yes, I mean the staircase of LTRANS because at the beginning N-1 links are > > waiting for lock: > > > > [ 299s] lock-and-run.sh: (PID 7351) waiting 0 sec to acquire linkfe.lck > > from > > PID 7347 > > ... > > > > For jobserver they are still running even though they sleep. > Aha, so it is extra locking mechanizm we add without jobserver > knowledge. > > > > > > > We limit makefile to link a binary at a time to avoid Richi's box getting > > > out of memory, right? > > > > No. It's because we want to have a reasonable contrains which is right now > > 8GB. > > Without --enable-link-mutex, we would consume ~ 10 x 1.3 GB (plus WPA > > parallel > > streaming peak), which is probably not desired. > > 10x1.3GB will get consumed only if the building machine has 10 threads. > I wonder if the jobserver WPA streaming integration will happen this > year, with that snd some patches to WPA memory use we could fit in 8GB > unless very large parallelism is configured. Note even without LTO the link step will consume about 1GB for each FE, this is enough to make my box with 6GB ram swap and die miserably when bootstrapping with all languages enabled. Yes, without LTO bootstrap - ld.bfd (also gold) really are that memory hungry. > I suppose only really effective solution would to teach the jobserver > that some jobs are "big" and consume multiple tokens, that is WPA, while > other jobs like ltranses and streaming are small. The only effective solution would be to make the glue with the waiting mechanism pass on its token? Hmm, I guess since lto-wrapper invokes make again and also waits lto-wrapper itself still sits on the original token from the parent jobserver and thus this isn't really dependent on the waiting mechanism but an inherent "bug" in the way we execute the LTRANS compiles in jobserver mode?