Artem-B wrote:

@AlexMaclean who authored #89417 and possibly other NVIDIA folks may have some 
thoughts on this.

In general, making it per-function attribute makes sense on LLVM level.

We will also need to reconcile it with the 
https://github.com/llvm/llvm-project/blob/10bef367a5643bc41d0172b02e080645c68f821a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp#L94-L96

However, propagating it to NVVMReflect pass complicates things, as libdevice 
we're linking with is linked once per module.

I think we may need to disentangle libdevice from the IR generated by clang.

Currently, CUDA compilation. call to `sqrtf()` maps to `__nv_sqrtf(__a)` which 
is served by libdevice bitcode and which chooses precise or approximate version 
of LLVM intrinsic based on NVVMReflect.

What we need to do is change `sqrtf()` to use clang builtins() so we retain 
per-function control on lowering it.
Once we have that in place, we can independently control sqrtf precision via 
function and/or module attributes, and do it independently from the choice we 
make via NVVMReflect for __nv_sqrtf().


https://github.com/llvm/llvm-project/pull/134244
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to