pasaulais wrote:

Thanks for the review. I agree that function-level attributes are not ideal for 
solving this issue and instruction-level metadata would work better with things 
like inlining. Is the incomplete patch you mentioned something I could take on 
and complete?

Regarding int vs floating-point, I'm afraid there is a need for toggling one 
independently of the other (or at least special-casing operations like XOR that 
are not supported by PCIe 3.0). As the link I posted in the description 
mentions (see this comment 
https://github.com/RadeonOpenCompute/ROCm/issues/2481#issuecomment-1725874765), 
there are configurations where using FP atomics like add would work whereas XOR 
doesn't, due to missing support in the PCIe 3.0 spec. I have reproduced this on 
a system with a RX 6700 XT GPU, where `global_atomic_add_f32` works as expected 
using fine-grained allocations, and `global_atomic_xor` doesn't.

https://github.com/llvm/llvm-project/pull/69229
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to