[clang] [llvm] [AMDGPU][True16][MC] true16 for v_alignbyte_b32 (PR #119750)

Ivan Kosarev via cfe-commits Fri, 03 Jan 2025 04:51:36 -0800

================
@@ -2353,8 +2353,8 @@ def int_amdgcn_writelane :
   [IntrNoMem, IntrConvergent, IntrWillReturn, IntrNoCallback, IntrNoFree]
 >;
 
-def int_amdgcn_alignbyte : ClangBuiltin<"__builtin_amdgcn_alignbyte">,
-  DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
+def int_amdgcn_alignbyte : DefaultAttrsIntrinsic<[llvm_i32_ty],
+  [llvm_i32_ty, llvm_i32_ty, llvm_anyint_ty],
----------------
kosarev wrote:


It's tricky this one. I see the operand is 16-bit on, e.g., GFX10 as well, and 
SP3 doesn't mind assembling, say, `v_alignbyte_b32 v5, vcc_hi, lit(0xaf123456), 
sel_hi(v255)`, but then that's packed math, and not true16. So it feels like 
for GFX10 it would probably be most natural for the instrinsic to take a 32-bit 
operand and maybe at some point even support an op_sel_hi kind of flag whereas 
on GFX11 that would/should be just a 16-bit value, and so it's not just about 
the type of the operand.

Maybe we should either keep it taking llvm_i32_ty and do the 
conversion/truncation where needed, thus letting the intrinsic to mask the 
subtarget specifics or alternatively have two separate instrinsics that reflect 
these specifics properly?

Just switching llvm_anyint_ty feels a bit like masking the actual issue.

https://github.com/llvm/llvm-project/pull/119750
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU][True16][MC] true16 for v_alignbyte_b32 (PR #119750)

Reply via email to