jhuber6 wrote:

Some interesting points, I'll try to clarify some things.

> This option may not as well as one would hope.
> 
> Problem #1 is that it will drastically slow down compilation for some users. 
> NVIDIA GPU drivers are loaded on demand, and the process takes a while 
> (O(second), depending on the kind and number of GPUs). If you build on a 
> headless machine, they will get loaded during GPU probing step, and they will 
> get unloaded after that. For each compilation. This will also affect folks 
> who use AMD GPUs to run graphics, but use NVIDIA GPUs for compute (my current 
> machine is set up that way). It can be worked around by enabling driver 
> persistence, but there would be no obvious cues for the user that they would 
> need to do so.

On my machine, which the GPUs already loaded, calling `nvptx-arch` takes about 
15ms. For the headless situation, I've noticed that if I have no started XORG 
on my server it will take up to 250ms, which is what I'm assuming you're 
referring to. I think this latency is reasonable, but we'd probably want to 
document what it does under the hood.

> Problem #2 is that it will likely result in unnecessary compilation for 
> nontrivial subset of users who have separate GPUs dedicated to compute and do 
> not care to compile for a separate GPU they use for graphics. The 
> `arch=native` will create a working configuration, but would build more than 
> necessary. Again, the end user would not be aware of that.

It will target the first GPU it finds. We could maybe change the behavior to 
detect the newest, but the idea is just to target the user's system. I suppose 
this is somewhat different to the existing `--offload-arch=native` which will 
correctly copmile for all supported GPUs.

> Problem #3 -- it adds an extra step to the reproducibility/debugging process. 
> If/when someone reports an issue with a compilation done with `-mnative`, 
> we'll inevitably have to start with clarifying questions -- what exactly was 
> the hardware configuration of the machine where the compilation was done.

I'm not so sure, the actual architecture will show up when doing `-v` or with 
an LLVM stack dump, so unless the bug report is really unhelpful it should be 
visible somewhere. But I suppose it's possible. I think that it's much less 
intuitive currently where we'll just have it default to `sm_52` and then not 
execute anything when that fails to load. Either  that or JIT the PTX we may or 
may not include.
 
> With my "GPU support dude for nontrivial number of users" hat on, I 
> personally would really like not to open this can of worms. It's not a very 
> big deal, but my gut is telling me that I will see all three cases once the 
> option makes it into the tribal knowledge (hi, reddit & stack overflow!).
> 
> So, in short, the source code changes are OK, but I'm not a huge fan of 
> `-mnative` in principle (both CPU and GPU variants). If others find it 
> useful, I'm OK with adding the option, but it should probably come with 
> documented caveats so affected users have a chance to find the answer if/when 
> they run into trouble.

There's some argument against the `native` operations that users are accustomed 
to, but because the CPU does it I feel like it's helpful to make the GPU do it 
for cases where the user just wants something that's guaranteed to work. This 
not working is weird considering that `-mcpu=native` works for AMDGPU and 
`--offload-arch=native` works for CUDA, HIP, and OpenMP currently. 


https://github.com/llvm/llvm-project/pull/79373
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to