Artem-B wrote: @AlexMaclean who authored #89417 and possibly other NVIDIA folks may have some thoughts on this.
In general, making it per-function attribute makes sense on LLVM level. We will also need to reconcile it with the https://github.com/llvm/llvm-project/blob/10bef367a5643bc41d0172b02e080645c68f821a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp#L94-L96 However, propagating it to NVVMReflect pass complicates things, as libdevice we're linking with is linked once per module. I think we may need to disentangle libdevice from the IR generated by clang. Currently, CUDA compilation. call to `sqrtf()` maps to `__nv_sqrtf(__a)` which is served by libdevice bitcode and which chooses precise or approximate version of LLVM intrinsic based on NVVMReflect. What we need to do is change `sqrtf()` to use clang builtins() so we retain per-function control on lowering it. Once we have that in place, we can independently control sqrtf precision via function and/or module attributes, and do it independently from the choice we make via NVVMReflect for __nv_sqrtf(). https://github.com/llvm/llvm-project/pull/134244 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits