[clang] [llvm] [Clang][NVVM] Support `-f[no-]cuda-prec-sqrt` and propagate precision flag to `NVVMReflect` (PR #134244)

Artem Belevich via cfe-commits Mon, 07 Apr 2025 12:21:48 -0700

Artem-B wrote:

@AlexMaclean who authored #89417 and possibly other NVIDIA folks may have some 
thoughts on this.


In general, making it per-function attribute makes sense on LLVM level.

We will also need to reconcile it with the 
https://github.com/llvm/llvm-project/blob/10bef367a5643bc41d0172b02e080645c68f821a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp#L94-L96

However, propagating it to NVVMReflect pass complicates things, as libdevice 
we're linking with is linked once per module.

I think we may need to disentangle libdevice from the IR generated by clang.

Currently, CUDA compilation. call to `sqrtf()` maps to `__nv_sqrtf(__a)` which 
is served by libdevice bitcode and which chooses precise or approximate version 
of LLVM intrinsic based on NVVMReflect.

What we need to do is change `sqrtf()` to use clang builtins() so we retain 
per-function control on lowering it.
Once we have that in place, we can independently control sqrtf precision via 
function and/or module attributes, and do it independently from the choice we 
make via NVVMReflect for __nv_sqrtf().


https://github.com/llvm/llvm-project/pull/134244
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Clang][NVVM] Support `-f[no-]cuda-prec-sqrt` and propagate precision flag to `NVVMReflect` (PR #134244)

Reply via email to