[PATCH] D136701: [LinkerWrapper] Perform device linking steps in parallel

Joseph Huber via Phabricator via cfe-commits Tue, 25 Oct 2022 11:15:32 -0700

jhuber6 added a comment.

In D136701#3883218 <https://reviews.llvm.org/D136701#3883218>, @tra wrote:


> I would argue that parallel compilation and linking may need to be disabled 
> by default. I believe similar patches were discussed in the past regarding 
> sub-compilations, but they are relevant for parallel linking, too.
> Google search shows D52193 <https://reviews.llvm.org/D52193>, but I believe 
> there were other attempts in the past. 
> @yaxunl - I vaguely recall that we did discuss parallel HIP/CUDA compilation 
> in the past, but I can't find the details.

I think parallel compilation might be desirable as well, but it's a harder sell 
than parallel linking in my opinion. However, as an opt-in feature it would be 
very helpful in some cases. Like consider someone creating a static library 
that supports every GPU architecture LLVM supports, it would be nice to be able 
to optionally turn on parallelism in the driver.

  clang lib.c -fopenmp -O3 -fvisibility=hidden -foffload-lto -nostdlib 
--offload-arch=gfx700,gfx701,gfx801,gfx803,gfx900,gfx902,gfx906,gfx908,gfx90a,gfx90c,gfx940,gfx1010,gfx1030,gfx1031,gfx1032,gfx1033,gfx1034,gfx1035,gfx1036,gfx1100,gfx1101,gfx1102,gfx1103,sm_35,sm_37,sm_50,sm_52,sm_53,sm_60,sm_61,sm_62,sm_70,sm_72,sm_75,sm_80,sm_86

This is something we might be doing more often as we start trying to provide 
standard library features on the GPU via static libraries. It might be wasteful 
to compile for every architecture but I think it's the soundest approach if we 
want compatibility.

> These days most of the builds are parallel already and it's very likely that 
> the build system already launches as many jobs as there are CPUs available. 
> Making each compilation launch multiple parallel subcompilations would likely 
> result in way too many simultaneously running processes.
> Granted, linking is done less often than compilation, so having parallel 
> linking may be lucky to be the last remaining process in the parallel build, 
> but it's not unusual to have multiple linker processes running simultaneously 
> during the build either. Linking is often the most resource-heavy part of the 
> build, so I would not be surprised if even a few linker instances would cause 
> problems if they spawn parallel sub-linking jobs.

`lld` already uses all available threads for its parallel linking, the linker 
wrapper runs before the host linker invocation so it shouldn't interfere 
either. My only concern is in the future we may try to support faster LTO 
linking via thin-LTO or some other parallel implementation. I think there's a 
reasonable precedent for parallel linking already.

> Having parallel subcompilations may be useful in some cases -- e.g. 
> distributed compilation with one compilation per remote worker w/ multiple 
> CPUs available on the worker, but that's unlikely to be a common scenario. 
> Having deterministic output is also very important, both for the build 
> repeatability/provenance tracking and for the build system's cache hit rates. 
> Reliably cached slow repeatable compilation will be a net win over fast, but 
> unstable compilation that causes cache churn and triggers more things to be 
> rebuilt.

This is only non-deterministic for the order of linking jobs between several 
targets and architectures. If the user only links a single architecture it 
should behave as before. The average case is still probably going to be one or 
two architectures at once, in which case this change won't make much of a 
difference.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136701/new/

https://reviews.llvm.org/D136701

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D136701: [LinkerWrapper] Perform device linking steps in parallel

Reply via email to