[llvm] [clang] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)

2024-01-11 Thread Mirko Brkušanin via cfe-commits

mbrkusanin wrote:

Note that the first commit in this PR is: 
https://github.com/llvm/llvm-project/pull/77785

https://github.com/llvm/llvm-project/pull/77795
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[mlir] [llvm] [clang] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)

2024-01-24 Thread Mirko Brkušanin via cfe-commits

mbrkusanin wrote:

Rebased and reverted bfloat

https://github.com/llvm/llvm-project/pull/77795
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [mlir] [llvm] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)

2024-01-24 Thread Mirko Brkušanin via cfe-commits

mbrkusanin wrote:

Rebased and updated after https://github.com/llvm/llvm-project/pull/76143

https://github.com/llvm/llvm-project/pull/77795
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [mlir] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)

2024-01-24 Thread Mirko Brkušanin via cfe-commits

https://github.com/mbrkusanin closed 
https://github.com/llvm/llvm-project/pull/77795
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)

2024-01-18 Thread Mirko Brkušanin via cfe-commits


@@ -423,6 +423,67 @@ TARGET_BUILTIN(__builtin_amdgcn_s_wakeup_barrier, "vi", 
"n", "gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_barrier_leave, "b", "n", "gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_get_barrier_state, "Uii", "n", "gfx12-insts")
 
+//===--===//
+// WMMA builtins.
+// Postfix w32 indicates the builtin requires wavefront size of 32.
+// Postfix w64 indicates the builtin requires wavefront size of 64.
+//
+// Some of these are very similar to their GFX11 counterparts, but they don't
+// require replication of the A,B matrices, so they use fewer vector elements.
+// Therefore, we add an "_gfx12" suffix to distinguish them from the existing
+// builtins.
+//===--===//
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w32_gfx12, 
"V8fV8hV8hV8f", "nc", "gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w32_gfx12, 
"V8fV8sV8sV8f", "nc", "gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w32_gfx12, 
"V8hV8hV8hV8h", "nc", "gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32_gfx12, 
"V8sV8sV8sV8s", "nc", "gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu8_w32_gfx12, 
"V8iIbV2iIbV2iV8iIb", "nc", "gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu4_w32_gfx12, 
"V8iIbiIbiV8iIb", "nc", "gfx12-insts,wavefrontsize32")
+// These are gfx12-only, but for consistency with the other WMMA variants we're
+// keeping the "_gfx12" suffix.
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_fp8_fp8_w32_gfx12, 
"V8fV2iV2iV8f", "nc", "gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_fp8_bf8_w32_gfx12, 
"V8fV2iV2iV8f", "nc", "gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf8_fp8_w32_gfx12, 
"V8fV2iV2iV8f", "nc", "gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf8_bf8_w32_gfx12, 
"V8fV2iV2iV8f", "nc", "gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x32_iu4_w32_gfx12, 
"V8iIbV2iIbV2iV8iIb", "nc", "gfx12-insts,wavefrontsize32")
+
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w64_gfx12, 
"V4fV4hV4hV4f", "nc", "gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64_gfx12, 
"V4fV4sV4sV4f", "nc", "gfx12-insts,wavefrontsize64")

mbrkusanin wrote:

Updated to bfloat but GlobalISel does not handle it properly yet. Should we use 
i16 for now until we update GlobalISel?

https://github.com/llvm/llvm-project/pull/77795
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang-tools-extra] [AMDGPU] Update uses of new VOP2 pseudos for GFX12 (PR #78155)

2024-01-18 Thread Mirko Brkušanin via cfe-commits

https://github.com/mbrkusanin approved this pull request.


https://github.com/llvm/llvm-project/pull/78155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AMDGPU] Do not emit `V_DOT2C_F32_F16_e32` on GFX12 (PR #78709)

2024-01-19 Thread Mirko Brkušanin via cfe-commits

https://github.com/mbrkusanin approved this pull request.


https://github.com/llvm/llvm-project/pull/78709
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)

2024-01-19 Thread Mirko Brkušanin via cfe-commits

mbrkusanin wrote:

Rebased.

https://github.com/llvm/llvm-project/pull/77795
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[libc] [flang] [compiler-rt] [llvm] [clang-tools-extra] [lldb] [clang] [libcxx] [lld] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)

2024-01-22 Thread Mirko Brkušanin via cfe-commits

mbrkusanin wrote:

> > Why is so there so much special casing in the assembler/disassembler?
> 
> I'm not an original author of these change, but from what I understand it is 
> a workaround to handle VOP3 instructions which have a single source but 
> require the use of two bits from OPSEL. `V_CVT_F32_FP8` has one source but is 
> using two bits from OPSEL to specify which part from 32 bit register to 
> convert ([7:0], [15:8], [23: 16] or 31 : 24]). And since OPSELs are 
> correlated with sources/destination (one bit from OPSEL with one 
> soruce/destination) these is required without any deeper changes to TableGen.
> 
> I'm open to change TableGen, but I would prefer to create new ticket and do 
> it with new PR. These change may take longer than one day and we would like 
> to have these PR merged before LLVM branching.

Correct some of these instructions use opsel[1] which in LLVM in stored in 
src1_modifiers so a dummy src1 is used. And as far as I know we can not have 
src1_modfiers without src1 operand.

https://github.com/llvm/llvm-project/pull/78414
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[libc] [clang] [libcxx] [llvm] [lld] [clang-tools-extra] [flang] [compiler-rt] [lldb] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)

2024-01-22 Thread Mirko Brkušanin via cfe-commits

mbrkusanin wrote:

> > > Why is so there so much special casing in the assembler/disassembler?
> > 
> > 
> > I'm not an original author of these change, but from what I understand it 
> > is a workaround to handle VOP3 instructions which have a single source but 
> > require the use of two bits from OPSEL. `V_CVT_F32_FP8` has one source but 
> > is using two bits from OPSEL to specify which part from 32 bit register to 
> > convert ([7:0], [15:8], [23: 16] or 31 : 24]). And since OPSELs are 
> > correlated with sources/destination (one bit from OPSEL with one 
> > soruce/destination) these is required without any deeper changes to 
> > TableGen.
> > I'm open to change TableGen, but I would prefer to create new ticket and do 
> > it with new PR. These change may take longer than one day and we would like 
> > to have these PR merged before LLVM branching.
> 
> Correct, some of these instructions use opsel[1] which in LLVM in stored in 
> src1_modifiers so a dummy src1 is used. And as far as I know we can not have 
> src1_modfiers without src1 operand.

Similarly V_CVT_SR_BF8_F32 for example uses opsel[2] and opsel[3] so we need 
src2_modifiers and src2.

https://github.com/llvm/llvm-project/pull/78414
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-tools-extra] [compiler-rt] [llvm] [flang] [libc] [lld] [lldb] [libcxx] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)

2024-01-22 Thread Mirko Brkušanin via cfe-commits


@@ -626,11 +629,82 @@ class Cvt_PK_F32_F8_Pat;
 
-foreach Index = [0, -1] in {
-  def : Cvt_PK_F32_F8_Pat;
-  def : Cvt_PK_F32_F8_Pat;
+let SubtargetPredicate = isGFX9Only in {
+  foreach Index = [0, -1] in {
+def : Cvt_PK_F32_F8_Pat;
+def : Cvt_PK_F32_F8_Pat;
+  }
+}
+
+
+// Similar to VOPProfile_Base_CVT_F32_F8, but for VOP3 instructions.
+def VOPProfile_Base_CVT_PK_F32_F8_OpSel : VOPProfileI2F  {
+  let InsVOP3OpSel = (ins Src0Mod:$src0_modifiers, Src0RC64:$src0,
+  clampmod:$clamp, omod:$omod, op_sel0:$op_sel);
+
+  let HasOpSel = 1;
+  let HasExtVOP3DPP = 0;
+}
+
+def VOPProfile_Base_CVT_F32_F8_OpSel : VOPProfile<[f32, i32, i32, untyped]> {
+  let InsVOP3OpSel = (ins Src0Mod:$src0_modifiers, Src0RC64:$src0,
+  Src1Mod:$src1_modifiers, Src1RC64:$src1,
+  clampmod:$clamp, omod:$omod, op_sel0:$op_sel);
+  let AsmVOP3OpSel = !subst(", $src1_modifiers", "", getAsmVOP3OpSel<2, 0, 0, 
1, 1, 0>.ret);
+
+  let HasOpSel = 1;
+  let HasExtDPP = 1;
+  let HasExtVOP3DPP = 1;
+
+  let Src1VOP3DPP = Src1RC64;
+  let AsmVOP3DPP8 = getAsmVOP3DPP8.ret;
+  let AsmVOP3DPP16 = getAsmVOP3DPP16.ret;
+}
+
+let SubtargetPredicate = isGFX12Plus, mayRaiseFPException = 0,
+SchedRW = [WriteFloatCvt] in {
+  defm V_CVT_F32_FP8_OP_SEL: VOP1Inst<"v_cvt_f32_fp8_op_sel", 
VOPProfile_Base_CVT_F32_F8_OpSel>;
+  defm V_CVT_F32_BF8_OP_SEL: VOP1Inst<"v_cvt_f32_bf8_op_sel", 
VOPProfile_Base_CVT_F32_F8_OpSel>;
+  defm V_CVT_PK_F32_FP8_OP_SEL : VOP1Inst<"v_cvt_pk_f32_fp8_op_sel", 
VOPProfile_Base_CVT_PK_F32_F8_OpSel>;
+  defm V_CVT_PK_F32_BF8_OP_SEL : VOP1Inst<"v_cvt_pk_f32_bf8_op_sel", 
VOPProfile_Base_CVT_PK_F32_F8_OpSel>;
+}
+
+class Cvt_F32_F8_Pat_OpSel index,
+VOP1_Pseudo inst_e32, VOP3_Pseudo inst_e64> : GCNPat<
+(f32 (node i32:$src, index)),
+!if (index,
+ (inst_e64 !if(index{0}, SRCMODS.OP_SEL_0, SRCMODS.OP_SEL_1), $src,
+   !if(index{1}, SRCMODS.OP_SEL_0, SRCMODS.OP_SEL_1), (i32 0),

mbrkusanin wrote:

Looks like SRCMODS.OP_SEL_1 does nothing here. These should be 0.

Tests are in llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.ll:

test_cvt_f32_fp8_byte0
test_cvt_f32_fp8_byte1
test_cvt_f32_fp8_byte2
test_cvt_f32_fp8_byte3



https://github.com/llvm/llvm-project/pull/78414
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[compiler-rt] [clang-tools-extra] [lld] [lldb] [clang] [flang] [libc] [libcxx] [llvm] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)

2024-01-22 Thread Mirko Brkušanin via cfe-commits

mbrkusanin wrote:

> > Correct, some of these instructions use opsel[1] which in LLVM in stored in 
> > src1_modifiers so a dummy src1 is used.
> 
> Why can't we just use `SRCMODS.OP_SEL_1` with src0?

That could work. We would have to make custom encoding classes then since 
OP_SEL_1 would have different meaning for these instructions.

https://github.com/llvm/llvm-project/pull/78414
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)

2024-01-22 Thread Mirko Brkušanin via cfe-commits


@@ -253,22 +253,22 @@ def ROCDL_mfma_f32_32x32x16_fp8_fp8 : 
ROCDL_Mfma_IntrOp<"mfma.f32.32x32x16.fp8.f
 
 //===-===//
 // WMMA intrinsics
-class ROCDL_Wmma_IntrOp traits = []> :
+class ROCDL_Wmma_IntrOp overloadedOperands> :
   LLVM_IntrOpBase,
+  [0], overloadedOperands, [], 1>,
   Arguments<(ins Variadic:$args)> {
   let assemblyFormat =
 "$args attr-dict `:` functional-type($args, $res)";
 }
 
 // Available on RDNA3
-def ROCDL_wmma_f32_16x16x16_f16 : ROCDL_Wmma_IntrOp<"wmma.f32.16x16x16.f16">;
-def ROCDL_wmma_f32_16x16x16_bf16 : ROCDL_Wmma_IntrOp<"wmma.f32.16x16x16.bf16">;
-def ROCDL_wmma_f16_16x16x16_f16 : ROCDL_Wmma_IntrOp<"wmma.f16.16x16x16.f16">;
-def ROCDL_wmma_bf16_16x16x16_bf16 : 
ROCDL_Wmma_IntrOp<"wmma.bf16.16x16x16.bf16">;
-def ROCDL_wmma_i32_16x16x16_iu8 : ROCDL_Wmma_IntrOp<"wmma.i32.16x16x16.iu8">;
-def ROCDL_wmma_i32_16x16x16_iu4 : ROCDL_Wmma_IntrOp<"wmma.i32.16x16x16.iu4">;
+def ROCDL_wmma_f32_16x16x16_f16 : ROCDL_Wmma_IntrOp<"wmma.f32.16x16x16.f16", 
[0]>;

mbrkusanin wrote:

Sure. I removed it because it was unused. Not sure what purpose it serves.
Also there is unused argument warning.

https://github.com/llvm/llvm-project/pull/77795
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[libc] [llvm] [clang] [lldb] [libcxx] [compiler-rt] [lld] [clang-tools-extra] [flang] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)

2024-01-22 Thread Mirko Brkušanin via cfe-commits


@@ -305,6 +305,11 @@ class VOP3OpSel_gfx10 op, VOPProfile p> : 
VOP3e_gfx10 {
 
 class VOP3OpSel_gfx11_gfx12 op, VOPProfile p> : VOP3OpSel_gfx10;
 
+class VOP3FP8OpSel_gfx11_gfx12 op, VOPProfile p> : VOP3e_gfx10 
{
+  let Inst{11} = !if(p.HasSrc0, src0_modifiers{2}, 0);
+  let Inst{12} = !if(p.HasSrc0, src0_modifiers{3}, 0);

mbrkusanin wrote:

@kosarev Is this what you had in mind? This uses OP_SEL_1 from src0_modifiers 
as opsel[1] so we can avoid using src1/src1_modifiers.

https://github.com/llvm/llvm-project/pull/78414
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[mlir] [clang] [llvm] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)

2024-01-22 Thread Mirko Brkušanin via cfe-commits

mbrkusanin wrote:

If there are no further comments, should I merge this?

https://github.com/llvm/llvm-project/pull/77795
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)

2024-01-23 Thread Mirko Brkušanin via cfe-commits


@@ -253,22 +253,22 @@ def ROCDL_mfma_f32_32x32x16_fp8_fp8 : 
ROCDL_Mfma_IntrOp<"mfma.f32.32x32x16.fp8.f
 
 //===-===//
 // WMMA intrinsics
-class ROCDL_Wmma_IntrOp traits = []> :
+class ROCDL_Wmma_IntrOp overloadedOperands> :
   LLVM_IntrOpBase,
+  [0], overloadedOperands, [], 1>,
   Arguments<(ins Variadic:$args)> {
   let assemblyFormat =
 "$args attr-dict `:` functional-type($args, $res)";
 }
 
 // Available on RDNA3
-def ROCDL_wmma_f32_16x16x16_f16 : ROCDL_Wmma_IntrOp<"wmma.f32.16x16x16.f16">;
-def ROCDL_wmma_f32_16x16x16_bf16 : ROCDL_Wmma_IntrOp<"wmma.f32.16x16x16.bf16">;
-def ROCDL_wmma_f16_16x16x16_f16 : ROCDL_Wmma_IntrOp<"wmma.f16.16x16x16.f16">;
-def ROCDL_wmma_bf16_16x16x16_bf16 : 
ROCDL_Wmma_IntrOp<"wmma.bf16.16x16x16.bf16">;
-def ROCDL_wmma_i32_16x16x16_iu8 : ROCDL_Wmma_IntrOp<"wmma.i32.16x16x16.iu8">;
-def ROCDL_wmma_i32_16x16x16_iu4 : ROCDL_Wmma_IntrOp<"wmma.i32.16x16x16.iu4">;
+def ROCDL_wmma_f32_16x16x16_f16 : ROCDL_Wmma_IntrOp<"wmma.f32.16x16x16.f16", 
[0]>;

mbrkusanin wrote:

Sorry, that was my mistake. If we pass it to inherited class LLVM_IntrOpBase 
there is no warning. I updated it already: [actually use 
traits](https://github.com/llvm/llvm-project/pull/77795/commits/732186bc2d41bf94e63b950685216a5f73dc89b8)
 

https://github.com/llvm/llvm-project/pull/77795
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[lld] [llvm] [compiler-rt] [libcxx] [flang] [clang-tools-extra] [lldb] [libc] [clang] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)

2024-01-23 Thread Mirko Brkušanin via cfe-commits


@@ -305,6 +305,11 @@ class VOP3OpSel_gfx10 op, VOPProfile p> : 
VOP3e_gfx10 {
 
 class VOP3OpSel_gfx11_gfx12 op, VOPProfile p> : VOP3OpSel_gfx10;
 
+class VOP3FP8OpSel_gfx11_gfx12 op, VOPProfile p> : VOP3e_gfx10 
{
+  let Inst{11} = !if(p.HasSrc0, src0_modifiers{2}, 0);
+  let Inst{12} = !if(p.HasSrc0, src0_modifiers{3}, 0);

mbrkusanin wrote:

https://github.com/llvm/llvm-project/pull/79122 <- This should help with the 
tests in llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.dpp.mir



https://github.com/llvm/llvm-project/pull/78414
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[lld] [llvm] [compiler-rt] [libcxx] [flang] [clang-tools-extra] [lldb] [libc] [clang] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)

2024-01-23 Thread Mirko Brkušanin via cfe-commits


@@ -305,6 +305,11 @@ class VOP3OpSel_gfx10 op, VOPProfile p> : 
VOP3e_gfx10 {
 
 class VOP3OpSel_gfx11_gfx12 op, VOPProfile p> : VOP3OpSel_gfx10;
 
+class VOP3FP8OpSel_gfx11_gfx12 op, VOPProfile p> : VOP3e_gfx10 
{
+  let Inst{11} = !if(p.HasSrc0, src0_modifiers{2}, 0);
+  let Inst{12} = !if(p.HasSrc0, src0_modifiers{3}, 0);

mbrkusanin wrote:

and llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.dpp.ll

https://github.com/llvm/llvm-project/pull/78414
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [mlir] [clang] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)

2024-01-23 Thread Mirko Brkušanin via cfe-commits

mbrkusanin wrote:

Ping

https://github.com/llvm/llvm-project/pull/77795
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[mlir] [llvm] [clang] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)

2024-01-23 Thread Mirko Brkušanin via cfe-commits


@@ -2601,67 +2601,73 @@ def int_amdgcn_ds_bvh_stack_rtn :
 [ImmArg>, IntrWillReturn, IntrNoCallback, IntrNoFree]
   >;
 
+def int_amdgcn_s_wait_event_export_ready :
+  ClangBuiltin<"__builtin_amdgcn_s_wait_event_export_ready">,
+  Intrinsic<[], [], [IntrNoMem, IntrHasSideEffects, IntrWillReturn]
+>;
+
 // WMMA (Wave Matrix Multiply-Accumulate) intrinsics
 //
 // These operations perform a matrix multiplication and accumulation of
 // the form: D = A * B + C .
 
 class AMDGPUWmmaIntrinsic :
   Intrinsic<
-[CD],   // %D
+[CD], // %D
 [
   AB,   // %A
-  AB,   // %B
+  LLVMMatchType<1>, // %B
   LLVMMatchType<0>, // %C
 ],
 [IntrNoMem, IntrConvergent, IntrWillReturn, IntrNoCallback, IntrNoFree]
 >;
 
 class AMDGPUWmmaIntrinsicOPSEL :
   Intrinsic<
-[CD],   // %D
+[CD], // %D
 [
   AB,   // %A
-  AB,   // %B
+  LLVMMatchType<1>, // %B
   LLVMMatchType<0>, // %C
-  llvm_i1_ty,   // %high
+  llvm_i1_ty,   // %high (op_sel) for GFX11, 0 for GFX12
 ],
 [IntrNoMem, IntrConvergent, ImmArg>, IntrWillReturn, 
IntrNoCallback, IntrNoFree]
 >;
 
 class AMDGPUWmmaIntrinsicIU :
   Intrinsic<
-[CD],   // %D
+[CD], // %D
 [
   llvm_i1_ty,   // %A_sign
   AB,   // %A
   llvm_i1_ty,   // %B_sign
-  AB,   // %B
+  LLVMMatchType<1>, // %B
   LLVMMatchType<0>, // %C
   llvm_i1_ty,   // %clamp
 ],
 [IntrNoMem, IntrConvergent, ImmArg>, ImmArg>, 
ImmArg>, IntrWillReturn, IntrNoCallback, IntrNoFree]
 >;
 
-def int_amdgcn_wmma_f32_16x16x16_f16   : AMDGPUWmmaIntrinsic;
-def int_amdgcn_wmma_f32_16x16x16_bf16  : AMDGPUWmmaIntrinsic;
-// The regular, untied f16/bf16 wmma intrinsics only write to one half
-// of the registers (set via the op_sel bit).
-// The content of the other 16-bit of the registers is undefined.
-def int_amdgcn_wmma_f16_16x16x16_f16   : 
AMDGPUWmmaIntrinsicOPSEL;
-def int_amdgcn_wmma_bf16_16x16x16_bf16 : 
AMDGPUWmmaIntrinsicOPSEL;
-// The tied versions of the f16/bf16 wmma intrinsics tie the destination matrix
-// registers to the input accumulator registers.
-// Essentially, the content of the other 16-bit is preserved from the input.
-def int_amdgcn_wmma_f16_16x16x16_f16_tied   : 
AMDGPUWmmaIntrinsicOPSEL;
-def int_amdgcn_wmma_bf16_16x16x16_bf16_tied : 
AMDGPUWmmaIntrinsicOPSEL;
-def int_amdgcn_wmma_i32_16x16x16_iu8   : AMDGPUWmmaIntrinsicIU;
-def int_amdgcn_wmma_i32_16x16x16_iu4   : AMDGPUWmmaIntrinsicIU;
+// WMMA GFX11Only
 
-def int_amdgcn_s_wait_event_export_ready :
-  ClangBuiltin<"__builtin_amdgcn_s_wait_event_export_ready">,
-  Intrinsic<[], [], [IntrNoMem, IntrHasSideEffects, IntrWillReturn]
->;
+// The OPSEL intrinsics read from and write to one half of the registers, 
selected by the op_sel bit.
+// The tied versions of the f16/bf16 wmma intrinsics tie the destination 
matrix registers to the input accumulator registers.
+// The content of the other 16-bit half is preserved from the input.
+def int_amdgcn_wmma_f16_16x16x16_f16_tied   : 
AMDGPUWmmaIntrinsicOPSEL;
+def int_amdgcn_wmma_bf16_16x16x16_bf16_tied : 
AMDGPUWmmaIntrinsicOPSEL;
+
+// WMMA GFX11Plus
+
+def int_amdgcn_wmma_f32_16x16x16_f16   : AMDGPUWmmaIntrinsic;
+def int_amdgcn_wmma_f32_16x16x16_bf16  : AMDGPUWmmaIntrinsic;
+def int_amdgcn_wmma_i32_16x16x16_iu8   : AMDGPUWmmaIntrinsicIU;
+def int_amdgcn_wmma_i32_16x16x16_iu4   : AMDGPUWmmaIntrinsicIU;
+
+// GFX11: The OPSEL intrinsics read from and write to one half of the 
registers, selected by the op_sel bit.
+//The content of the other 16-bit half is undefined.
+// GFX12: The op_sel bit must be 0.
+def int_amdgcn_wmma_f16_16x16x16_f16   : 
AMDGPUWmmaIntrinsicOPSEL;
+def int_amdgcn_wmma_bf16_16x16x16_bf16 : AMDGPUWmmaIntrinsicOPSEL;

mbrkusanin wrote:

Sizes are halved. GFX11 basically contained same matrix twice.

This is how intrinsics look like at the moment:
gfx11:
declare <16 x i16> @llvm.amdgcn.wmma.bf16.16x16x16.bf16(<16 x i16>, <16 x i16> 
, <16 x i16>, i1 immarg)
gfx12:
declare <8 x bfloat> @llvm.amdgcn.wmma.bf16.16x16x16.bf16(<8 x bfloat>, <8 x 
bfloat>, <8 x bfloat>, i1 immarg)

https://github.com/llvm/llvm-project/pull/77795
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Remove s_wakeup_barrier instruction (PR #122277)

2025-01-10 Thread Mirko Brkušanin via cfe-commits

https://github.com/mbrkusanin closed 
https://github.com/llvm/llvm-project/pull/122277
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Remove s_wakeup_barrier instruction (PR #122277)

2025-01-09 Thread Mirko Brkušanin via cfe-commits

https://github.com/mbrkusanin created 
https://github.com/llvm/llvm-project/pull/122277

None

From 27d929a270ea1d8d3fa885f00794a092af12e50e Mon Sep 17 00:00:00 2001
From: Mirko Brkusanin 
Date: Mon, 23 Dec 2024 12:25:25 +0100
Subject: [PATCH] [AMDGPU] Remove s_wakeup_barrier instruction

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |  1 -
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |  6 -
 .../AMDGPU/AMDGPUInstructionSelector.cpp  |  5 
 llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp  |  1 -
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |  2 --
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 27 ---
 llvm/lib/Target/AMDGPU/SOPInstructions.td | 12 -
 llvm/test/CodeGen/AMDGPU/s-barrier.ll |  9 ---
 llvm/test/MC/AMDGPU/gfx12_asm_sop1.s  |  9 ---
 .../Disassembler/AMDGPU/gfx12_dasm_sop1.txt   |  9 ---
 10 files changed, 5 insertions(+), 76 deletions(-)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 14c1746716cdd6..1b29a8e359c205 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -489,7 +489,6 @@ TARGET_BUILTIN(__builtin_amdgcn_s_barrier_wait, "vIs", "n", 
"gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_barrier_signal_isfirst, "bIi", "n", 
"gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_barrier_init, "vv*i", "n", "gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_barrier_join, "vv*", "n", "gfx12-insts")
-TARGET_BUILTIN(__builtin_amdgcn_s_wakeup_barrier, "vv*", "n", "gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_barrier_leave, "vIs", "n", "gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_get_barrier_state, "Uii", "n", "gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_get_named_barrier_state, "Uiv*", "n", 
"gfx12-insts")
diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td 
b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
index 92418b9104ad14..b930d6983e2251 100644
--- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -284,12 +284,6 @@ def int_amdgcn_s_barrier_join : 
ClangBuiltin<"__builtin_amdgcn_s_barrier_join">,
   Intrinsic<[], [local_ptr_ty], [IntrNoMem, IntrHasSideEffects, 
IntrConvergent, IntrWillReturn,
 IntrNoCallback, IntrNoFree]>;
 
-// void @llvm.amdgcn.s.wakeup.barrier(ptr addrspace(3) %barrier)
-// The %barrier argument must be uniform, otherwise behavior is undefined.
-def int_amdgcn_s_wakeup_barrier : 
ClangBuiltin<"__builtin_amdgcn_s_wakeup_barrier">,
-  Intrinsic<[], [local_ptr_ty], [IntrNoMem, IntrHasSideEffects, 
IntrConvergent, IntrWillReturn,
-IntrNoCallback, IntrNoFree]>;
-
 // void @llvm.amdgcn.s.barrier.wait(i16 %barrierType)
 def int_amdgcn_s_barrier_wait : 
ClangBuiltin<"__builtin_amdgcn_s_barrier_wait">,
   Intrinsic<[], [llvm_i16_ty], [ImmArg>, IntrNoMem, 
IntrHasSideEffects, IntrConvergent,
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
index 041b9b4d66f63f..50e0faef9e7c27 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
@@ -2239,7 +2239,6 @@ bool 
AMDGPUInstructionSelector::selectG_INTRINSIC_W_SIDE_EFFECTS(
   case Intrinsic::amdgcn_s_barrier_signal_var:
 return selectNamedBarrierInit(I, IntrinsicID);
   case Intrinsic::amdgcn_s_barrier_join:
-  case Intrinsic::amdgcn_s_wakeup_barrier:
   case Intrinsic::amdgcn_s_get_named_barrier_state:
 return selectNamedBarrierInst(I, IntrinsicID);
   case Intrinsic::amdgcn_s_get_barrier_state:
@@ -5839,8 +5838,6 @@ unsigned getNamedBarrierOp(bool HasInlineConst, 
Intrinsic::ID IntrID) {
   llvm_unreachable("not a named barrier op");
 case Intrinsic::amdgcn_s_barrier_join:
   return AMDGPU::S_BARRIER_JOIN_IMM;
-case Intrinsic::amdgcn_s_wakeup_barrier:
-  return AMDGPU::S_WAKEUP_BARRIER_IMM;
 case Intrinsic::amdgcn_s_get_named_barrier_state:
   return AMDGPU::S_GET_BARRIER_STATE_IMM;
 };
@@ -5850,8 +5847,6 @@ unsigned getNamedBarrierOp(bool HasInlineConst, 
Intrinsic::ID IntrID) {
   llvm_unreachable("not a named barrier op");
 case Intrinsic::amdgcn_s_barrier_join:
   return AMDGPU::S_BARRIER_JOIN_M0;
-case Intrinsic::amdgcn_s_wakeup_barrier:
-  return AMDGPU::S_WAKEUP_BARRIER_M0;
 case Intrinsic::amdgcn_s_get_named_barrier_state:
   return AMDGPU::S_GET_BARRIER_STATE_M0;
 };
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp
index 2df068d8fb007b..0406ba9c68ccd3 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp
@@ -326,7 +326,6 @@ bool isReallyAClobber(const Value *Ptr, MemoryDef *Def, 
AAResults *AA) {
 case Intrinsic::amdgcn_s_barrier_wait:
 case Intrinsic::amdgcn_s_barrier_leave:
 case Intrins

[clang] [llvm] [AMDGPU] Remove s_wakeup_barrier instruction (PR #122277)

2025-01-09 Thread Mirko Brkušanin via cfe-commits

mbrkusanin wrote:

> Context?

Instruction is unused.

https://github.com/llvm/llvm-project/pull/122277
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Remove s_wakeup_barrier instruction (PR #122277)

2025-01-09 Thread Mirko Brkušanin via cfe-commits

https://github.com/mbrkusanin updated 
https://github.com/llvm/llvm-project/pull/122277

From 27d929a270ea1d8d3fa885f00794a092af12e50e Mon Sep 17 00:00:00 2001
From: Mirko Brkusanin 
Date: Mon, 23 Dec 2024 12:25:25 +0100
Subject: [PATCH 1/2] [AMDGPU] Remove s_wakeup_barrier instruction

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |  1 -
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |  6 -
 .../AMDGPU/AMDGPUInstructionSelector.cpp  |  5 
 llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp  |  1 -
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |  2 --
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 27 ---
 llvm/lib/Target/AMDGPU/SOPInstructions.td | 12 -
 llvm/test/CodeGen/AMDGPU/s-barrier.ll |  9 ---
 llvm/test/MC/AMDGPU/gfx12_asm_sop1.s  |  9 ---
 .../Disassembler/AMDGPU/gfx12_dasm_sop1.txt   |  9 ---
 10 files changed, 5 insertions(+), 76 deletions(-)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 14c1746716cdd6..1b29a8e359c205 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -489,7 +489,6 @@ TARGET_BUILTIN(__builtin_amdgcn_s_barrier_wait, "vIs", "n", 
"gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_barrier_signal_isfirst, "bIi", "n", 
"gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_barrier_init, "vv*i", "n", "gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_barrier_join, "vv*", "n", "gfx12-insts")
-TARGET_BUILTIN(__builtin_amdgcn_s_wakeup_barrier, "vv*", "n", "gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_barrier_leave, "vIs", "n", "gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_get_barrier_state, "Uii", "n", "gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_get_named_barrier_state, "Uiv*", "n", 
"gfx12-insts")
diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td 
b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
index 92418b9104ad14..b930d6983e2251 100644
--- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -284,12 +284,6 @@ def int_amdgcn_s_barrier_join : 
ClangBuiltin<"__builtin_amdgcn_s_barrier_join">,
   Intrinsic<[], [local_ptr_ty], [IntrNoMem, IntrHasSideEffects, 
IntrConvergent, IntrWillReturn,
 IntrNoCallback, IntrNoFree]>;
 
-// void @llvm.amdgcn.s.wakeup.barrier(ptr addrspace(3) %barrier)
-// The %barrier argument must be uniform, otherwise behavior is undefined.
-def int_amdgcn_s_wakeup_barrier : 
ClangBuiltin<"__builtin_amdgcn_s_wakeup_barrier">,
-  Intrinsic<[], [local_ptr_ty], [IntrNoMem, IntrHasSideEffects, 
IntrConvergent, IntrWillReturn,
-IntrNoCallback, IntrNoFree]>;
-
 // void @llvm.amdgcn.s.barrier.wait(i16 %barrierType)
 def int_amdgcn_s_barrier_wait : 
ClangBuiltin<"__builtin_amdgcn_s_barrier_wait">,
   Intrinsic<[], [llvm_i16_ty], [ImmArg>, IntrNoMem, 
IntrHasSideEffects, IntrConvergent,
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
index 041b9b4d66f63f..50e0faef9e7c27 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
@@ -2239,7 +2239,6 @@ bool 
AMDGPUInstructionSelector::selectG_INTRINSIC_W_SIDE_EFFECTS(
   case Intrinsic::amdgcn_s_barrier_signal_var:
 return selectNamedBarrierInit(I, IntrinsicID);
   case Intrinsic::amdgcn_s_barrier_join:
-  case Intrinsic::amdgcn_s_wakeup_barrier:
   case Intrinsic::amdgcn_s_get_named_barrier_state:
 return selectNamedBarrierInst(I, IntrinsicID);
   case Intrinsic::amdgcn_s_get_barrier_state:
@@ -5839,8 +5838,6 @@ unsigned getNamedBarrierOp(bool HasInlineConst, 
Intrinsic::ID IntrID) {
   llvm_unreachable("not a named barrier op");
 case Intrinsic::amdgcn_s_barrier_join:
   return AMDGPU::S_BARRIER_JOIN_IMM;
-case Intrinsic::amdgcn_s_wakeup_barrier:
-  return AMDGPU::S_WAKEUP_BARRIER_IMM;
 case Intrinsic::amdgcn_s_get_named_barrier_state:
   return AMDGPU::S_GET_BARRIER_STATE_IMM;
 };
@@ -5850,8 +5847,6 @@ unsigned getNamedBarrierOp(bool HasInlineConst, 
Intrinsic::ID IntrID) {
   llvm_unreachable("not a named barrier op");
 case Intrinsic::amdgcn_s_barrier_join:
   return AMDGPU::S_BARRIER_JOIN_M0;
-case Intrinsic::amdgcn_s_wakeup_barrier:
-  return AMDGPU::S_WAKEUP_BARRIER_M0;
 case Intrinsic::amdgcn_s_get_named_barrier_state:
   return AMDGPU::S_GET_BARRIER_STATE_M0;
 };
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp
index 2df068d8fb007b..0406ba9c68ccd3 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp
@@ -326,7 +326,6 @@ bool isReallyAClobber(const Value *Ptr, MemoryDef *Def, 
AAResults *AA) {
 case Intrinsic::amdgcn_s_barrier_wait:
 case Intrinsic::amdgcn_s_barrier_leave:
 case Intrinsic