tra added a subscriber: jhuber6.
tra added a comment.

In D137154#3917692 <https://reviews.llvm.org/D137154#3917692>, @hdelan wrote:

> Thanks for feedback. Instead of adding `__nvvm_reflect` as a clang builtin, 
> would it be acceptable if I modified the NVVMReflect pass

That would be less problematic, but I'm still concerned that it would tacitly 
endorse the use of `__nvvm_reflect` by LLVM users.

> so that it works with addrspace casting as well? This would allow us to use 
> `__nvvm_reflect` in openCL

Relaxing argument type checks on `__nvvm_reflect` function would be fine with 
me.

That said,...

TBH, I still don't quite convinced that compiler changes is the right solution 
for making it possible for *one* library to rely on something that was never 
intended to be exposed to compiler users.

Perhaps we should take a step back, figure out the fundamental problem you need 
to solve (as opposed to figuring out how to make a tactical hack work) and then 
figure out a more principled solution.

> In DPC++ for CUDA we use libclc as a wrapper around CUDA SDK's libdevice. 
> Like libdevice we want to precompile libclc to bc for the CUDA backend 
> without specializing for a particular arch, so that we can call different 
> __nv funcs based on the arch. For this reason we use the __nvvm_reflect llvm 
> intrinsic.

For starters, libdevice by itself is something that's not quite intended for 
the end user. It was a rather poor stop-gap solution to address the fact that 
there used to be no linking phase for GPU binaries and no 'standard' math 
library the code could rely on. The library itself does not have anything 
particularly interesting in it. Its major advantage is that it exists, while we 
don't have our own GPU-side libm yet. We do want to get rid of libdevice and 
replace it with an open-source math library of our own. With the recent 
improvements in offloading support in clang driver we're getting closer to 
making it possible.

As for the code specialization, why not build for individual GPUs? To me it 
looks like this use case is a good match for the "new-driver" offloading that's 
been recently implemented in clang. It allows compiling/linking GPU side code 
and that should obviate the need for shipping bitcode and relying on 
`__nvvm_reflect` for specialization.
The downside is that it's a recent feature, so it would not be available in 
older clang versions. @jhuber6 : I'm also not sure if OpenCL is supported by 
the new driver.

With the new driver, you in theory should be able to compile the source with 
`--offload-arch=A --offload-arch=B` and it would produce an object file with 
GPU-specific bitcode or object file which can then be transparently linked into 
the final executable, where clang would also perform final linking of GPU 
binaries, as well.

I realize that it may be hard or not feasible for your project right now. I'm 
OK with allowing limited nvvm_reflect use for the time being, but please do 
consider making things work w/o it or libdevice, if possible.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137154/new/

https://reviews.llvm.org/D137154

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to