tra added a comment.

In D136701#3883300 <https://reviews.llvm.org/D136701#3883300>, @jhuber6 wrote:

> However, as an opt-in feature it would be very helpful in some cases.

I'm OK with the explicit opt-in.

> Like consider someone creating a static library that supports every GPU 
> architecture LLVM supports, it would be nice to be able to optionally turn on 
> parallelism in the driver.

Yes, but the implicit assumption here is that you have sufficient resources. If 
you create N libraries, each for M architectures, your build machine may not 
have enough memory for N*M linkers.
Having N*M processes may or may not be an issue, but if each of the linkers is 
an `lld` which may want to run their own K parallel threads, it would not help 
anything.

In other words, I agree that it may be helpful in some cases, but I can also 
see how it may actually hurt the build, possibly catastrophically.

>   clang lib.c -fopenmp -O3 -fvisibility=hidden -foffload-lto -nostdlib 
> --offload-arch=gfx700,gfx701,gfx801,gfx803,gfx900,gfx902,gfx906,gfx908,gfx90a,gfx90c,gfx940,gfx1010,gfx1030,gfx1031,gfx1032,gfx1033,gfx1034,gfx1035,gfx1036,gfx1100,gfx1101,gfx1102,gfx1103,sm_35,sm_37,sm_50,sm_52,sm_53,sm_60,sm_61,sm_62,sm_70,sm_72,sm_75,sm_80,sm_86
>
> This is something we might be doing more often as we start trying to provide 
> standard library features on the GPU via static libraries. It might be 
> wasteful to compile for every architecture but I think it's the soundest 
> approach if we want compatibility.

My point is that grabbing resources will likely break build system's 
assumptions about their availability. How that would affect the build is 
anyone's guess. With infinite resources, parallel-everything would win, but in 
practice it's a big maybe. It would likely be a win for small builds and 
probably would be a wash or a regression for a larger build with multiple such 
targets.

Ideally it would be great if there would be a way to cooperate with the build 
system and let it manage the scheduling, but I don't think we have a good way 
of doing that. 
E.g. for CUDA compilation I was thinking of exposing per-GPU sub-compilations 
(well, we already do with --cuda-device-only/--cuda-device-only) and providing 
a way to create  combined object from them, and then let the build system 
manage how those per-GPU compilations would be launched. The problem there is 
that the build system would need to know our under-the-hood implementation 
details, so such an approach will be very fragile. The way the new driver does 
things may be a bit more suitable for this, but I suspect it would still be 
hard to do.

> `lld` already uses all available threads for its parallel linking, the linker 
> wrapper runs before the host linker invocation so it shouldn't interfere 
> either.

You do have a point here. As long as we don't end up with too many threads 
(e.g. we guarantee that per-offload linker instance does not run their own 
parallel threads, offload linking may be similar to parallel lld.

> This is only non-deterministic for the order of linking jobs between several 
> targets and architectures. If the user only links a single architecture it 
> should behave as before.

I'm not sure what you mean. Are you saying that linking with 
`--offload-arch=gfx700` is repeatable, but with `--offload-arch=gfx700,gfx701` 
it's not? That would still be a problem.

> The average case is still probably going to be one or two architectures at 
> once, in which case this change won't make much of a difference.

Any difference is a difference, as far as content-based caching and provenance 
tracking is concerned.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136701/new/

https://reviews.llvm.org/D136701

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to