[llvm] [clang] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)
mbrkusanin wrote: Note that the first commit in this PR is: https://github.com/llvm/llvm-project/pull/77785 https://github.com/llvm/llvm-project/pull/77795 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[mlir] [llvm] [clang] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)
mbrkusanin wrote: Rebased and reverted bfloat https://github.com/llvm/llvm-project/pull/77795 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [mlir] [llvm] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)
mbrkusanin wrote: Rebased and updated after https://github.com/llvm/llvm-project/pull/76143 https://github.com/llvm/llvm-project/pull/77795 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [mlir] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)
https://github.com/mbrkusanin closed https://github.com/llvm/llvm-project/pull/77795 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)
@@ -423,6 +423,67 @@ TARGET_BUILTIN(__builtin_amdgcn_s_wakeup_barrier, "vi", "n", "gfx12-insts") TARGET_BUILTIN(__builtin_amdgcn_s_barrier_leave, "b", "n", "gfx12-insts") TARGET_BUILTIN(__builtin_amdgcn_s_get_barrier_state, "Uii", "n", "gfx12-insts") +//===--===// +// WMMA builtins. +// Postfix w32 indicates the builtin requires wavefront size of 32. +// Postfix w64 indicates the builtin requires wavefront size of 64. +// +// Some of these are very similar to their GFX11 counterparts, but they don't +// require replication of the A,B matrices, so they use fewer vector elements. +// Therefore, we add an "_gfx12" suffix to distinguish them from the existing +// builtins. +//===--===// +TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w32_gfx12, "V8fV8hV8hV8f", "nc", "gfx12-insts,wavefrontsize32") +TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w32_gfx12, "V8fV8sV8sV8f", "nc", "gfx12-insts,wavefrontsize32") +TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w32_gfx12, "V8hV8hV8hV8h", "nc", "gfx12-insts,wavefrontsize32") +TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32_gfx12, "V8sV8sV8sV8s", "nc", "gfx12-insts,wavefrontsize32") +TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu8_w32_gfx12, "V8iIbV2iIbV2iV8iIb", "nc", "gfx12-insts,wavefrontsize32") +TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu4_w32_gfx12, "V8iIbiIbiV8iIb", "nc", "gfx12-insts,wavefrontsize32") +// These are gfx12-only, but for consistency with the other WMMA variants we're +// keeping the "_gfx12" suffix. +TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_fp8_fp8_w32_gfx12, "V8fV2iV2iV8f", "nc", "gfx12-insts,wavefrontsize32") +TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_fp8_bf8_w32_gfx12, "V8fV2iV2iV8f", "nc", "gfx12-insts,wavefrontsize32") +TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf8_fp8_w32_gfx12, "V8fV2iV2iV8f", "nc", "gfx12-insts,wavefrontsize32") +TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf8_bf8_w32_gfx12, "V8fV2iV2iV8f", "nc", "gfx12-insts,wavefrontsize32") +TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x32_iu4_w32_gfx12, "V8iIbV2iIbV2iV8iIb", "nc", "gfx12-insts,wavefrontsize32") + +TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w64_gfx12, "V4fV4hV4hV4f", "nc", "gfx12-insts,wavefrontsize64") +TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64_gfx12, "V4fV4sV4sV4f", "nc", "gfx12-insts,wavefrontsize64") mbrkusanin wrote: Updated to bfloat but GlobalISel does not handle it properly yet. Should we use i16 for now until we update GlobalISel? https://github.com/llvm/llvm-project/pull/77795 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang-tools-extra] [AMDGPU] Update uses of new VOP2 pseudos for GFX12 (PR #78155)
https://github.com/mbrkusanin approved this pull request. https://github.com/llvm/llvm-project/pull/78155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AMDGPU] Do not emit `V_DOT2C_F32_F16_e32` on GFX12 (PR #78709)
https://github.com/mbrkusanin approved this pull request. https://github.com/llvm/llvm-project/pull/78709 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)
mbrkusanin wrote: Rebased. https://github.com/llvm/llvm-project/pull/77795 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[libc] [flang] [compiler-rt] [llvm] [clang-tools-extra] [lldb] [clang] [libcxx] [lld] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)
mbrkusanin wrote: > > Why is so there so much special casing in the assembler/disassembler? > > I'm not an original author of these change, but from what I understand it is > a workaround to handle VOP3 instructions which have a single source but > require the use of two bits from OPSEL. `V_CVT_F32_FP8` has one source but is > using two bits from OPSEL to specify which part from 32 bit register to > convert ([7:0], [15:8], [23: 16] or 31 : 24]). And since OPSELs are > correlated with sources/destination (one bit from OPSEL with one > soruce/destination) these is required without any deeper changes to TableGen. > > I'm open to change TableGen, but I would prefer to create new ticket and do > it with new PR. These change may take longer than one day and we would like > to have these PR merged before LLVM branching. Correct some of these instructions use opsel[1] which in LLVM in stored in src1_modifiers so a dummy src1 is used. And as far as I know we can not have src1_modfiers without src1 operand. https://github.com/llvm/llvm-project/pull/78414 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[libc] [clang] [libcxx] [llvm] [lld] [clang-tools-extra] [flang] [compiler-rt] [lldb] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)
mbrkusanin wrote: > > > Why is so there so much special casing in the assembler/disassembler? > > > > > > I'm not an original author of these change, but from what I understand it > > is a workaround to handle VOP3 instructions which have a single source but > > require the use of two bits from OPSEL. `V_CVT_F32_FP8` has one source but > > is using two bits from OPSEL to specify which part from 32 bit register to > > convert ([7:0], [15:8], [23: 16] or 31 : 24]). And since OPSELs are > > correlated with sources/destination (one bit from OPSEL with one > > soruce/destination) these is required without any deeper changes to > > TableGen. > > I'm open to change TableGen, but I would prefer to create new ticket and do > > it with new PR. These change may take longer than one day and we would like > > to have these PR merged before LLVM branching. > > Correct, some of these instructions use opsel[1] which in LLVM in stored in > src1_modifiers so a dummy src1 is used. And as far as I know we can not have > src1_modfiers without src1 operand. Similarly V_CVT_SR_BF8_F32 for example uses opsel[2] and opsel[3] so we need src2_modifiers and src2. https://github.com/llvm/llvm-project/pull/78414 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] [compiler-rt] [llvm] [flang] [libc] [lld] [lldb] [libcxx] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)
@@ -626,11 +629,82 @@ class Cvt_PK_F32_F8_Pat; -foreach Index = [0, -1] in { - def : Cvt_PK_F32_F8_Pat; - def : Cvt_PK_F32_F8_Pat; +let SubtargetPredicate = isGFX9Only in { + foreach Index = [0, -1] in { +def : Cvt_PK_F32_F8_Pat; +def : Cvt_PK_F32_F8_Pat; + } +} + + +// Similar to VOPProfile_Base_CVT_F32_F8, but for VOP3 instructions. +def VOPProfile_Base_CVT_PK_F32_F8_OpSel : VOPProfileI2F { + let InsVOP3OpSel = (ins Src0Mod:$src0_modifiers, Src0RC64:$src0, + clampmod:$clamp, omod:$omod, op_sel0:$op_sel); + + let HasOpSel = 1; + let HasExtVOP3DPP = 0; +} + +def VOPProfile_Base_CVT_F32_F8_OpSel : VOPProfile<[f32, i32, i32, untyped]> { + let InsVOP3OpSel = (ins Src0Mod:$src0_modifiers, Src0RC64:$src0, + Src1Mod:$src1_modifiers, Src1RC64:$src1, + clampmod:$clamp, omod:$omod, op_sel0:$op_sel); + let AsmVOP3OpSel = !subst(", $src1_modifiers", "", getAsmVOP3OpSel<2, 0, 0, 1, 1, 0>.ret); + + let HasOpSel = 1; + let HasExtDPP = 1; + let HasExtVOP3DPP = 1; + + let Src1VOP3DPP = Src1RC64; + let AsmVOP3DPP8 = getAsmVOP3DPP8.ret; + let AsmVOP3DPP16 = getAsmVOP3DPP16.ret; +} + +let SubtargetPredicate = isGFX12Plus, mayRaiseFPException = 0, +SchedRW = [WriteFloatCvt] in { + defm V_CVT_F32_FP8_OP_SEL: VOP1Inst<"v_cvt_f32_fp8_op_sel", VOPProfile_Base_CVT_F32_F8_OpSel>; + defm V_CVT_F32_BF8_OP_SEL: VOP1Inst<"v_cvt_f32_bf8_op_sel", VOPProfile_Base_CVT_F32_F8_OpSel>; + defm V_CVT_PK_F32_FP8_OP_SEL : VOP1Inst<"v_cvt_pk_f32_fp8_op_sel", VOPProfile_Base_CVT_PK_F32_F8_OpSel>; + defm V_CVT_PK_F32_BF8_OP_SEL : VOP1Inst<"v_cvt_pk_f32_bf8_op_sel", VOPProfile_Base_CVT_PK_F32_F8_OpSel>; +} + +class Cvt_F32_F8_Pat_OpSel index, +VOP1_Pseudo inst_e32, VOP3_Pseudo inst_e64> : GCNPat< +(f32 (node i32:$src, index)), +!if (index, + (inst_e64 !if(index{0}, SRCMODS.OP_SEL_0, SRCMODS.OP_SEL_1), $src, + !if(index{1}, SRCMODS.OP_SEL_0, SRCMODS.OP_SEL_1), (i32 0), mbrkusanin wrote: Looks like SRCMODS.OP_SEL_1 does nothing here. These should be 0. Tests are in llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.ll: test_cvt_f32_fp8_byte0 test_cvt_f32_fp8_byte1 test_cvt_f32_fp8_byte2 test_cvt_f32_fp8_byte3 https://github.com/llvm/llvm-project/pull/78414 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[compiler-rt] [clang-tools-extra] [lld] [lldb] [clang] [flang] [libc] [libcxx] [llvm] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)
mbrkusanin wrote: > > Correct, some of these instructions use opsel[1] which in LLVM in stored in > > src1_modifiers so a dummy src1 is used. > > Why can't we just use `SRCMODS.OP_SEL_1` with src0? That could work. We would have to make custom encoding classes then since OP_SEL_1 would have different meaning for these instructions. https://github.com/llvm/llvm-project/pull/78414 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)
@@ -253,22 +253,22 @@ def ROCDL_mfma_f32_32x32x16_fp8_fp8 : ROCDL_Mfma_IntrOp<"mfma.f32.32x32x16.fp8.f //===-===// // WMMA intrinsics -class ROCDL_Wmma_IntrOp traits = []> : +class ROCDL_Wmma_IntrOp overloadedOperands> : LLVM_IntrOpBase, + [0], overloadedOperands, [], 1>, Arguments<(ins Variadic:$args)> { let assemblyFormat = "$args attr-dict `:` functional-type($args, $res)"; } // Available on RDNA3 -def ROCDL_wmma_f32_16x16x16_f16 : ROCDL_Wmma_IntrOp<"wmma.f32.16x16x16.f16">; -def ROCDL_wmma_f32_16x16x16_bf16 : ROCDL_Wmma_IntrOp<"wmma.f32.16x16x16.bf16">; -def ROCDL_wmma_f16_16x16x16_f16 : ROCDL_Wmma_IntrOp<"wmma.f16.16x16x16.f16">; -def ROCDL_wmma_bf16_16x16x16_bf16 : ROCDL_Wmma_IntrOp<"wmma.bf16.16x16x16.bf16">; -def ROCDL_wmma_i32_16x16x16_iu8 : ROCDL_Wmma_IntrOp<"wmma.i32.16x16x16.iu8">; -def ROCDL_wmma_i32_16x16x16_iu4 : ROCDL_Wmma_IntrOp<"wmma.i32.16x16x16.iu4">; +def ROCDL_wmma_f32_16x16x16_f16 : ROCDL_Wmma_IntrOp<"wmma.f32.16x16x16.f16", [0]>; mbrkusanin wrote: Sure. I removed it because it was unused. Not sure what purpose it serves. Also there is unused argument warning. https://github.com/llvm/llvm-project/pull/77795 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[libc] [llvm] [clang] [lldb] [libcxx] [compiler-rt] [lld] [clang-tools-extra] [flang] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)
@@ -305,6 +305,11 @@ class VOP3OpSel_gfx10 op, VOPProfile p> : VOP3e_gfx10 { class VOP3OpSel_gfx11_gfx12 op, VOPProfile p> : VOP3OpSel_gfx10; +class VOP3FP8OpSel_gfx11_gfx12 op, VOPProfile p> : VOP3e_gfx10 { + let Inst{11} = !if(p.HasSrc0, src0_modifiers{2}, 0); + let Inst{12} = !if(p.HasSrc0, src0_modifiers{3}, 0); mbrkusanin wrote: @kosarev Is this what you had in mind? This uses OP_SEL_1 from src0_modifiers as opsel[1] so we can avoid using src1/src1_modifiers. https://github.com/llvm/llvm-project/pull/78414 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[mlir] [clang] [llvm] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)
mbrkusanin wrote: If there are no further comments, should I merge this? https://github.com/llvm/llvm-project/pull/77795 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)
@@ -253,22 +253,22 @@ def ROCDL_mfma_f32_32x32x16_fp8_fp8 : ROCDL_Mfma_IntrOp<"mfma.f32.32x32x16.fp8.f //===-===// // WMMA intrinsics -class ROCDL_Wmma_IntrOp traits = []> : +class ROCDL_Wmma_IntrOp overloadedOperands> : LLVM_IntrOpBase, + [0], overloadedOperands, [], 1>, Arguments<(ins Variadic:$args)> { let assemblyFormat = "$args attr-dict `:` functional-type($args, $res)"; } // Available on RDNA3 -def ROCDL_wmma_f32_16x16x16_f16 : ROCDL_Wmma_IntrOp<"wmma.f32.16x16x16.f16">; -def ROCDL_wmma_f32_16x16x16_bf16 : ROCDL_Wmma_IntrOp<"wmma.f32.16x16x16.bf16">; -def ROCDL_wmma_f16_16x16x16_f16 : ROCDL_Wmma_IntrOp<"wmma.f16.16x16x16.f16">; -def ROCDL_wmma_bf16_16x16x16_bf16 : ROCDL_Wmma_IntrOp<"wmma.bf16.16x16x16.bf16">; -def ROCDL_wmma_i32_16x16x16_iu8 : ROCDL_Wmma_IntrOp<"wmma.i32.16x16x16.iu8">; -def ROCDL_wmma_i32_16x16x16_iu4 : ROCDL_Wmma_IntrOp<"wmma.i32.16x16x16.iu4">; +def ROCDL_wmma_f32_16x16x16_f16 : ROCDL_Wmma_IntrOp<"wmma.f32.16x16x16.f16", [0]>; mbrkusanin wrote: Sorry, that was my mistake. If we pass it to inherited class LLVM_IntrOpBase there is no warning. I updated it already: [actually use traits](https://github.com/llvm/llvm-project/pull/77795/commits/732186bc2d41bf94e63b950685216a5f73dc89b8) https://github.com/llvm/llvm-project/pull/77795 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lld] [llvm] [compiler-rt] [libcxx] [flang] [clang-tools-extra] [lldb] [libc] [clang] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)
@@ -305,6 +305,11 @@ class VOP3OpSel_gfx10 op, VOPProfile p> : VOP3e_gfx10 { class VOP3OpSel_gfx11_gfx12 op, VOPProfile p> : VOP3OpSel_gfx10; +class VOP3FP8OpSel_gfx11_gfx12 op, VOPProfile p> : VOP3e_gfx10 { + let Inst{11} = !if(p.HasSrc0, src0_modifiers{2}, 0); + let Inst{12} = !if(p.HasSrc0, src0_modifiers{3}, 0); mbrkusanin wrote: https://github.com/llvm/llvm-project/pull/79122 <- This should help with the tests in llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.dpp.mir https://github.com/llvm/llvm-project/pull/78414 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lld] [llvm] [compiler-rt] [libcxx] [flang] [clang-tools-extra] [lldb] [libc] [clang] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)
@@ -305,6 +305,11 @@ class VOP3OpSel_gfx10 op, VOPProfile p> : VOP3e_gfx10 { class VOP3OpSel_gfx11_gfx12 op, VOPProfile p> : VOP3OpSel_gfx10; +class VOP3FP8OpSel_gfx11_gfx12 op, VOPProfile p> : VOP3e_gfx10 { + let Inst{11} = !if(p.HasSrc0, src0_modifiers{2}, 0); + let Inst{12} = !if(p.HasSrc0, src0_modifiers{3}, 0); mbrkusanin wrote: and llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.dpp.ll https://github.com/llvm/llvm-project/pull/78414 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [mlir] [clang] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)
mbrkusanin wrote: Ping https://github.com/llvm/llvm-project/pull/77795 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[mlir] [llvm] [clang] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)
@@ -2601,67 +2601,73 @@ def int_amdgcn_ds_bvh_stack_rtn : [ImmArg>, IntrWillReturn, IntrNoCallback, IntrNoFree] >; +def int_amdgcn_s_wait_event_export_ready : + ClangBuiltin<"__builtin_amdgcn_s_wait_event_export_ready">, + Intrinsic<[], [], [IntrNoMem, IntrHasSideEffects, IntrWillReturn] +>; + // WMMA (Wave Matrix Multiply-Accumulate) intrinsics // // These operations perform a matrix multiplication and accumulation of // the form: D = A * B + C . class AMDGPUWmmaIntrinsic : Intrinsic< -[CD], // %D +[CD], // %D [ AB, // %A - AB, // %B + LLVMMatchType<1>, // %B LLVMMatchType<0>, // %C ], [IntrNoMem, IntrConvergent, IntrWillReturn, IntrNoCallback, IntrNoFree] >; class AMDGPUWmmaIntrinsicOPSEL : Intrinsic< -[CD], // %D +[CD], // %D [ AB, // %A - AB, // %B + LLVMMatchType<1>, // %B LLVMMatchType<0>, // %C - llvm_i1_ty, // %high + llvm_i1_ty, // %high (op_sel) for GFX11, 0 for GFX12 ], [IntrNoMem, IntrConvergent, ImmArg>, IntrWillReturn, IntrNoCallback, IntrNoFree] >; class AMDGPUWmmaIntrinsicIU : Intrinsic< -[CD], // %D +[CD], // %D [ llvm_i1_ty, // %A_sign AB, // %A llvm_i1_ty, // %B_sign - AB, // %B + LLVMMatchType<1>, // %B LLVMMatchType<0>, // %C llvm_i1_ty, // %clamp ], [IntrNoMem, IntrConvergent, ImmArg>, ImmArg>, ImmArg>, IntrWillReturn, IntrNoCallback, IntrNoFree] >; -def int_amdgcn_wmma_f32_16x16x16_f16 : AMDGPUWmmaIntrinsic; -def int_amdgcn_wmma_f32_16x16x16_bf16 : AMDGPUWmmaIntrinsic; -// The regular, untied f16/bf16 wmma intrinsics only write to one half -// of the registers (set via the op_sel bit). -// The content of the other 16-bit of the registers is undefined. -def int_amdgcn_wmma_f16_16x16x16_f16 : AMDGPUWmmaIntrinsicOPSEL; -def int_amdgcn_wmma_bf16_16x16x16_bf16 : AMDGPUWmmaIntrinsicOPSEL; -// The tied versions of the f16/bf16 wmma intrinsics tie the destination matrix -// registers to the input accumulator registers. -// Essentially, the content of the other 16-bit is preserved from the input. -def int_amdgcn_wmma_f16_16x16x16_f16_tied : AMDGPUWmmaIntrinsicOPSEL; -def int_amdgcn_wmma_bf16_16x16x16_bf16_tied : AMDGPUWmmaIntrinsicOPSEL; -def int_amdgcn_wmma_i32_16x16x16_iu8 : AMDGPUWmmaIntrinsicIU; -def int_amdgcn_wmma_i32_16x16x16_iu4 : AMDGPUWmmaIntrinsicIU; +// WMMA GFX11Only -def int_amdgcn_s_wait_event_export_ready : - ClangBuiltin<"__builtin_amdgcn_s_wait_event_export_ready">, - Intrinsic<[], [], [IntrNoMem, IntrHasSideEffects, IntrWillReturn] ->; +// The OPSEL intrinsics read from and write to one half of the registers, selected by the op_sel bit. +// The tied versions of the f16/bf16 wmma intrinsics tie the destination matrix registers to the input accumulator registers. +// The content of the other 16-bit half is preserved from the input. +def int_amdgcn_wmma_f16_16x16x16_f16_tied : AMDGPUWmmaIntrinsicOPSEL; +def int_amdgcn_wmma_bf16_16x16x16_bf16_tied : AMDGPUWmmaIntrinsicOPSEL; + +// WMMA GFX11Plus + +def int_amdgcn_wmma_f32_16x16x16_f16 : AMDGPUWmmaIntrinsic; +def int_amdgcn_wmma_f32_16x16x16_bf16 : AMDGPUWmmaIntrinsic; +def int_amdgcn_wmma_i32_16x16x16_iu8 : AMDGPUWmmaIntrinsicIU; +def int_amdgcn_wmma_i32_16x16x16_iu4 : AMDGPUWmmaIntrinsicIU; + +// GFX11: The OPSEL intrinsics read from and write to one half of the registers, selected by the op_sel bit. +//The content of the other 16-bit half is undefined. +// GFX12: The op_sel bit must be 0. +def int_amdgcn_wmma_f16_16x16x16_f16 : AMDGPUWmmaIntrinsicOPSEL; +def int_amdgcn_wmma_bf16_16x16x16_bf16 : AMDGPUWmmaIntrinsicOPSEL; mbrkusanin wrote: Sizes are halved. GFX11 basically contained same matrix twice. This is how intrinsics look like at the moment: gfx11: declare <16 x i16> @llvm.amdgcn.wmma.bf16.16x16x16.bf16(<16 x i16>, <16 x i16> , <16 x i16>, i1 immarg) gfx12: declare <8 x bfloat> @llvm.amdgcn.wmma.bf16.16x16x16.bf16(<8 x bfloat>, <8 x bfloat>, <8 x bfloat>, i1 immarg) https://github.com/llvm/llvm-project/pull/77795 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Remove s_wakeup_barrier instruction (PR #122277)
https://github.com/mbrkusanin closed https://github.com/llvm/llvm-project/pull/122277 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Remove s_wakeup_barrier instruction (PR #122277)
https://github.com/mbrkusanin created https://github.com/llvm/llvm-project/pull/122277 None From 27d929a270ea1d8d3fa885f00794a092af12e50e Mon Sep 17 00:00:00 2001 From: Mirko Brkusanin Date: Mon, 23 Dec 2024 12:25:25 +0100 Subject: [PATCH] [AMDGPU] Remove s_wakeup_barrier instruction --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 1 - llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 6 - .../AMDGPU/AMDGPUInstructionSelector.cpp | 5 llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp | 1 - .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 2 -- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 27 --- llvm/lib/Target/AMDGPU/SOPInstructions.td | 12 - llvm/test/CodeGen/AMDGPU/s-barrier.ll | 9 --- llvm/test/MC/AMDGPU/gfx12_asm_sop1.s | 9 --- .../Disassembler/AMDGPU/gfx12_dasm_sop1.txt | 9 --- 10 files changed, 5 insertions(+), 76 deletions(-) diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 14c1746716cdd6..1b29a8e359c205 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -489,7 +489,6 @@ TARGET_BUILTIN(__builtin_amdgcn_s_barrier_wait, "vIs", "n", "gfx12-insts") TARGET_BUILTIN(__builtin_amdgcn_s_barrier_signal_isfirst, "bIi", "n", "gfx12-insts") TARGET_BUILTIN(__builtin_amdgcn_s_barrier_init, "vv*i", "n", "gfx12-insts") TARGET_BUILTIN(__builtin_amdgcn_s_barrier_join, "vv*", "n", "gfx12-insts") -TARGET_BUILTIN(__builtin_amdgcn_s_wakeup_barrier, "vv*", "n", "gfx12-insts") TARGET_BUILTIN(__builtin_amdgcn_s_barrier_leave, "vIs", "n", "gfx12-insts") TARGET_BUILTIN(__builtin_amdgcn_s_get_barrier_state, "Uii", "n", "gfx12-insts") TARGET_BUILTIN(__builtin_amdgcn_s_get_named_barrier_state, "Uiv*", "n", "gfx12-insts") diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td index 92418b9104ad14..b930d6983e2251 100644 --- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td +++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td @@ -284,12 +284,6 @@ def int_amdgcn_s_barrier_join : ClangBuiltin<"__builtin_amdgcn_s_barrier_join">, Intrinsic<[], [local_ptr_ty], [IntrNoMem, IntrHasSideEffects, IntrConvergent, IntrWillReturn, IntrNoCallback, IntrNoFree]>; -// void @llvm.amdgcn.s.wakeup.barrier(ptr addrspace(3) %barrier) -// The %barrier argument must be uniform, otherwise behavior is undefined. -def int_amdgcn_s_wakeup_barrier : ClangBuiltin<"__builtin_amdgcn_s_wakeup_barrier">, - Intrinsic<[], [local_ptr_ty], [IntrNoMem, IntrHasSideEffects, IntrConvergent, IntrWillReturn, -IntrNoCallback, IntrNoFree]>; - // void @llvm.amdgcn.s.barrier.wait(i16 %barrierType) def int_amdgcn_s_barrier_wait : ClangBuiltin<"__builtin_amdgcn_s_barrier_wait">, Intrinsic<[], [llvm_i16_ty], [ImmArg>, IntrNoMem, IntrHasSideEffects, IntrConvergent, diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp index 041b9b4d66f63f..50e0faef9e7c27 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp @@ -2239,7 +2239,6 @@ bool AMDGPUInstructionSelector::selectG_INTRINSIC_W_SIDE_EFFECTS( case Intrinsic::amdgcn_s_barrier_signal_var: return selectNamedBarrierInit(I, IntrinsicID); case Intrinsic::amdgcn_s_barrier_join: - case Intrinsic::amdgcn_s_wakeup_barrier: case Intrinsic::amdgcn_s_get_named_barrier_state: return selectNamedBarrierInst(I, IntrinsicID); case Intrinsic::amdgcn_s_get_barrier_state: @@ -5839,8 +5838,6 @@ unsigned getNamedBarrierOp(bool HasInlineConst, Intrinsic::ID IntrID) { llvm_unreachable("not a named barrier op"); case Intrinsic::amdgcn_s_barrier_join: return AMDGPU::S_BARRIER_JOIN_IMM; -case Intrinsic::amdgcn_s_wakeup_barrier: - return AMDGPU::S_WAKEUP_BARRIER_IMM; case Intrinsic::amdgcn_s_get_named_barrier_state: return AMDGPU::S_GET_BARRIER_STATE_IMM; }; @@ -5850,8 +5847,6 @@ unsigned getNamedBarrierOp(bool HasInlineConst, Intrinsic::ID IntrID) { llvm_unreachable("not a named barrier op"); case Intrinsic::amdgcn_s_barrier_join: return AMDGPU::S_BARRIER_JOIN_M0; -case Intrinsic::amdgcn_s_wakeup_barrier: - return AMDGPU::S_WAKEUP_BARRIER_M0; case Intrinsic::amdgcn_s_get_named_barrier_state: return AMDGPU::S_GET_BARRIER_STATE_M0; }; diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp b/llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp index 2df068d8fb007b..0406ba9c68ccd3 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp @@ -326,7 +326,6 @@ bool isReallyAClobber(const Value *Ptr, MemoryDef *Def, AAResults *AA) { case Intrinsic::amdgcn_s_barrier_wait: case Intrinsic::amdgcn_s_barrier_leave: case Intrins
[clang] [llvm] [AMDGPU] Remove s_wakeup_barrier instruction (PR #122277)
mbrkusanin wrote: > Context? Instruction is unused. https://github.com/llvm/llvm-project/pull/122277 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Remove s_wakeup_barrier instruction (PR #122277)
https://github.com/mbrkusanin updated https://github.com/llvm/llvm-project/pull/122277 From 27d929a270ea1d8d3fa885f00794a092af12e50e Mon Sep 17 00:00:00 2001 From: Mirko Brkusanin Date: Mon, 23 Dec 2024 12:25:25 +0100 Subject: [PATCH 1/2] [AMDGPU] Remove s_wakeup_barrier instruction --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 1 - llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 6 - .../AMDGPU/AMDGPUInstructionSelector.cpp | 5 llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp | 1 - .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 2 -- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 27 --- llvm/lib/Target/AMDGPU/SOPInstructions.td | 12 - llvm/test/CodeGen/AMDGPU/s-barrier.ll | 9 --- llvm/test/MC/AMDGPU/gfx12_asm_sop1.s | 9 --- .../Disassembler/AMDGPU/gfx12_dasm_sop1.txt | 9 --- 10 files changed, 5 insertions(+), 76 deletions(-) diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 14c1746716cdd6..1b29a8e359c205 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -489,7 +489,6 @@ TARGET_BUILTIN(__builtin_amdgcn_s_barrier_wait, "vIs", "n", "gfx12-insts") TARGET_BUILTIN(__builtin_amdgcn_s_barrier_signal_isfirst, "bIi", "n", "gfx12-insts") TARGET_BUILTIN(__builtin_amdgcn_s_barrier_init, "vv*i", "n", "gfx12-insts") TARGET_BUILTIN(__builtin_amdgcn_s_barrier_join, "vv*", "n", "gfx12-insts") -TARGET_BUILTIN(__builtin_amdgcn_s_wakeup_barrier, "vv*", "n", "gfx12-insts") TARGET_BUILTIN(__builtin_amdgcn_s_barrier_leave, "vIs", "n", "gfx12-insts") TARGET_BUILTIN(__builtin_amdgcn_s_get_barrier_state, "Uii", "n", "gfx12-insts") TARGET_BUILTIN(__builtin_amdgcn_s_get_named_barrier_state, "Uiv*", "n", "gfx12-insts") diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td index 92418b9104ad14..b930d6983e2251 100644 --- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td +++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td @@ -284,12 +284,6 @@ def int_amdgcn_s_barrier_join : ClangBuiltin<"__builtin_amdgcn_s_barrier_join">, Intrinsic<[], [local_ptr_ty], [IntrNoMem, IntrHasSideEffects, IntrConvergent, IntrWillReturn, IntrNoCallback, IntrNoFree]>; -// void @llvm.amdgcn.s.wakeup.barrier(ptr addrspace(3) %barrier) -// The %barrier argument must be uniform, otherwise behavior is undefined. -def int_amdgcn_s_wakeup_barrier : ClangBuiltin<"__builtin_amdgcn_s_wakeup_barrier">, - Intrinsic<[], [local_ptr_ty], [IntrNoMem, IntrHasSideEffects, IntrConvergent, IntrWillReturn, -IntrNoCallback, IntrNoFree]>; - // void @llvm.amdgcn.s.barrier.wait(i16 %barrierType) def int_amdgcn_s_barrier_wait : ClangBuiltin<"__builtin_amdgcn_s_barrier_wait">, Intrinsic<[], [llvm_i16_ty], [ImmArg>, IntrNoMem, IntrHasSideEffects, IntrConvergent, diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp index 041b9b4d66f63f..50e0faef9e7c27 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp @@ -2239,7 +2239,6 @@ bool AMDGPUInstructionSelector::selectG_INTRINSIC_W_SIDE_EFFECTS( case Intrinsic::amdgcn_s_barrier_signal_var: return selectNamedBarrierInit(I, IntrinsicID); case Intrinsic::amdgcn_s_barrier_join: - case Intrinsic::amdgcn_s_wakeup_barrier: case Intrinsic::amdgcn_s_get_named_barrier_state: return selectNamedBarrierInst(I, IntrinsicID); case Intrinsic::amdgcn_s_get_barrier_state: @@ -5839,8 +5838,6 @@ unsigned getNamedBarrierOp(bool HasInlineConst, Intrinsic::ID IntrID) { llvm_unreachable("not a named barrier op"); case Intrinsic::amdgcn_s_barrier_join: return AMDGPU::S_BARRIER_JOIN_IMM; -case Intrinsic::amdgcn_s_wakeup_barrier: - return AMDGPU::S_WAKEUP_BARRIER_IMM; case Intrinsic::amdgcn_s_get_named_barrier_state: return AMDGPU::S_GET_BARRIER_STATE_IMM; }; @@ -5850,8 +5847,6 @@ unsigned getNamedBarrierOp(bool HasInlineConst, Intrinsic::ID IntrID) { llvm_unreachable("not a named barrier op"); case Intrinsic::amdgcn_s_barrier_join: return AMDGPU::S_BARRIER_JOIN_M0; -case Intrinsic::amdgcn_s_wakeup_barrier: - return AMDGPU::S_WAKEUP_BARRIER_M0; case Intrinsic::amdgcn_s_get_named_barrier_state: return AMDGPU::S_GET_BARRIER_STATE_M0; }; diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp b/llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp index 2df068d8fb007b..0406ba9c68ccd3 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp @@ -326,7 +326,6 @@ bool isReallyAClobber(const Value *Ptr, MemoryDef *Def, AAResults *AA) { case Intrinsic::amdgcn_s_barrier_wait: case Intrinsic::amdgcn_s_barrier_leave: case Intrinsic