pasaulais wrote: Thanks for the review. I agree that function-level attributes are not ideal for solving this issue and instruction-level metadata would work better with things like inlining. Is the incomplete patch you mentioned something I could take on and complete?
Regarding int vs floating-point, I'm afraid there is a need for toggling one independently of the other (or at least special-casing operations like XOR that are not supported by PCIe 3.0). As the link I posted in the description mentions (see this comment https://github.com/RadeonOpenCompute/ROCm/issues/2481#issuecomment-1725874765), there are configurations where using FP atomics like add would work whereas XOR doesn't, due to missing support in the PCIe 3.0 spec. I have reproduced this on a system with a RX 6700 XT GPU, where `global_atomic_add_f32` works as expected using fine-grained allocations, and `global_atomic_xor` doesn't. https://github.com/llvm/llvm-project/pull/69229 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits