AlexMaclean wrote:

It seems like we already have perhaps too many mechanisms to control how sqrt 
gets lowered. There is the `__nv_sqrtf` libdevice function which chooses 
between specific (1:1 to PTX) intrinsics based on NVVMReflect and then there is 
also `llvm.sqrt` and `nvvm.sqrt.f` which are lowered and optimized based on 
command-line options and function and instruction level flags, each in its own 
way. 
I think for more fine grained responsiveness to instruction and function level 
options it makes sense to use the existing intrinsics. While, it is consistent 
with the existing design to treat NVVMReflect as operating globally across the 
entire module. I'm not sure it makes sense to introduce a new module flag and 
clang cl opt though...

I personally agree with @Artem-B that `__nv_sqrtf`+NVVMReflect may not be the 
way to go. Using one of the intrinsics seems like a better approach but I may 
be missing something.

https://github.com/llvm/llvm-project/pull/134244
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to