jhuber6 wrote:

> I think I'm with Art on this one.
> 
> > > Problem #2 [...] The arch=native will create a working configuration, but 
> > > would build more than necessary.
> > 
> > 
> > It will target the first GPU it finds. We could maybe change the behavior 
> > to detect the newest, but the idea is just to target the user's system.
> 
> OK, but I think this is worse.
> 
> Now it's basically always incorrect to ship a build system which uses 
> arch=native, because the people running the build might very reasonably have 
> multiple GPUs in their system, and which GPU clang picks is unspecified.

It's not unspecified per-se, it just picks the one the CUDA driver assigned to 
ID zero, so it will correspond to the layman using a default device if loaded 
into CUDA.

The AMDGPU version has a warning when multiple GPUs are found. I should 
probably add the same thing here as it would make this explicit.
 
> But we all know people are going to do it anyway.
> 
> Given that this feature cannot correctly be used with a build system, and 
> given that 99.99% of invocations of clang are from a build system that the 
> user running the build did not write, it seems to me that we should not add a 
> feature that is such a footgun when used with a build system.
> 
> (A non-CUDA C++ file compiled with march=native will almost surely run on 
> your computer, whereas this won't, and it's unpredictable whether or not it 
> will, depending on the order the nvidia driver returns GPUs in. So there's no 
> good analogy here.)
> 
> If we were going to add this, I think we should compile for all the GPUs in 
> your system, like Art had assumed. I think that's better, but it has other 
> problems, like slow builds and also the fact that your graphics GPU is likely 
> less powerful than your compute GPU, so now compilation is going to fail 
> because you're e.g. using tensorcores and compiling for a GPU that doesn't 
> have them. So again you can't really use arch=native in a build system, even 
> if you say "requires an sm80 GPU", because really the requirement is "has an 
> sm80 GPU and no others in the machine".

We already do this for CUDA with `--offload-arch=native`. This handling is for 
targeting NVPTX directly, similar to OpenCL. That means there is no concept of 
multiple device passes, there can only be a single target architecture just 
like compiling standard C++ code. I'd like to have `-march=native` because it 
makes it easier to just build something that works for testing purposes, and 
it's consistent with all the other native handling, since the NVPTX target is 
the only one that doesn't support it to my knowledge.

https://github.com/llvm/llvm-project/pull/79373
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to