[clang] 813e521 - [AMDGPU] Add gfx11 subtarget ELF definition

2022-04-29 Thread Joe Nash via cfe-commits

Author: Joe Nash
Date: 2022-04-29T12:27:17-04:00
New Revision: 813e521e55b11165138b071f446eda94b14570dc

URL: 
https://github.com/llvm/llvm-project/commit/813e521e55b11165138b071f446eda94b14570dc
DIFF: 
https://github.com/llvm/llvm-project/commit/813e521e55b11165138b071f446eda94b14570dc.diff

LOG: [AMDGPU] Add gfx11 subtarget ELF definition

This is the first patch of a series to upstream support for the new
subtarget.

Contributors:
Jay Foad 
Konstantin Zhuravlyov 

Patch 1/N for upstreaming AMDGPU gfx11 architectures.

Reviewed By: foad, kzhuravl, #amdgpu

Differential Revision: https://reviews.llvm.org/D124536

Added: 


Modified: 
clang/test/Misc/target-invalid-cpu-note.c
llvm/docs/AMDGPUUsage.rst
llvm/include/llvm/BinaryFormat/ELF.h
llvm/include/llvm/Support/TargetParser.h
llvm/lib/Object/ELFObjectFile.cpp
llvm/lib/ObjectYAML/ELFYAML.cpp
llvm/lib/Support/TargetParser.cpp
llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
llvm/test/Object/AMDGPU/elf-header-flags-mach.yaml
llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test
llvm/tools/llvm-readobj/ELFDumper.cpp

Removed: 




diff  --git a/clang/test/Misc/target-invalid-cpu-note.c 
b/clang/test/Misc/target-invalid-cpu-note.c
index 8b9409336d1cb..4248090cb9feb 100644
--- a/clang/test/Misc/target-invalid-cpu-note.c
+++ b/clang/test/Misc/target-invalid-cpu-note.c
@@ -37,7 +37,7 @@
 
 // RUN: not %clang_cc1 -triple amdgcn--- -target-cpu not-a-cpu -fsyntax-only 
%s 2>&1 | FileCheck %s --check-prefix AMDGCN
 // AMDGCN: error: unknown target CPU 'not-a-cpu'
-// AMDGCN-NEXT: note: valid target CPU values are: gfx600, tahiti, gfx601, 
pitcairn, verde, gfx602, hainan, oland, gfx700, kaveri, gfx701, hawaii, gfx702, 
gfx703, kabini, mullins, gfx704, bonaire, gfx705, gfx801, carrizo, gfx802, 
iceland, tonga, gfx803, fiji, polaris10, polaris11, gfx805, tongapro, gfx810, 
stoney, gfx900, gfx902, gfx904, gfx906, gfx908, gfx909, gfx90a, gfx90c, gfx940, 
gfx1010, gfx1011, gfx1012, gfx1013, gfx1030, gfx1031, gfx1032, gfx1033, 
gfx1034, gfx1035, gfx1036{{$}}
+// AMDGCN-NEXT: note: valid target CPU values are: gfx600, tahiti, gfx601, 
pitcairn, verde, gfx602, hainan, oland, gfx700, kaveri, gfx701, hawaii, gfx702, 
gfx703, kabini, mullins, gfx704, bonaire, gfx705, gfx801, carrizo, gfx802, 
iceland, tonga, gfx803, fiji, polaris10, polaris11, gfx805, tongapro, gfx810, 
stoney, gfx900, gfx902, gfx904, gfx906, gfx908, gfx909, gfx90a, gfx90c, gfx940, 
gfx1010, gfx1011, gfx1012, gfx1013, gfx1030, gfx1031, gfx1032, gfx1033, 
gfx1034, gfx1035, gfx1036, gfx1100, gfx1101, gfx1102, gfx1103{{$}}
 
 // RUN: not %clang_cc1 -triple wasm64--- -target-cpu not-a-cpu -fsyntax-only 
%s 2>&1 | FileCheck %s --check-prefix WEBASM
 // WEBASM: error: unknown target CPU 'not-a-cpu'

diff  --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index e962674867df5..1cc342f85c659 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -444,6 +444,36 @@ Every processor supports every OS ABI (see 
:ref:`amdgpu-os`) with the following

 Add product

 names.
 
+ **GCN GFX11** [AMD-GCN-GFX11]_
+ 
---
+ ``gfx1100`` ``amdgcn``   dGPU  - cumode  - 
Architected   - *pal-amdpal*  *TBA*
+- wavefrontsize64   flat
+
scratch   .. TODO::
+  - Packed
+
work-item   Add product
+IDs
 names.
+
+ ``gfx1101`` ``amdgcn``   dGPU  - cumode  - 
Architected   *TBA*
+- wavefrontsize64   flat
+
scratch   .. TODO::
+  - Packed
+
work-item   Add product
+IDs
 names.
+
+ ``gfx1102`` ``amdgcn``   dGPU  - cumode  - 
Architected   *TBA*
+- wavefrontsize64   flat
+ 

[clang] 8bdfc73 - [AMDGPU][clang] Definition of gfx11 subtarget

2022-04-29 Thread Joe Nash via cfe-commits

Author: Joe Nash
Date: 2022-04-29T13:55:56-04:00
New Revision: 8bdfc73f633dca9859123b8596bcb521700c6a7f

URL: 
https://github.com/llvm/llvm-project/commit/8bdfc73f633dca9859123b8596bcb521700c6a7f
DIFF: 
https://github.com/llvm/llvm-project/commit/8bdfc73f633dca9859123b8596bcb521700c6a7f.diff

LOG: [AMDGPU][clang] Definition of gfx11 subtarget

Contributors:
Jay Foad 
Konstantin Zhuravlyov 

Patch 2/N for upstreaming of AMDGPU gfx11 architecture

Depends on D124536

Reviewed By: foad, kzhuravl, #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D124537

Added: 


Modified: 
clang/include/clang/Basic/Cuda.h
clang/lib/Basic/Cuda.cpp
clang/lib/Basic/Targets/AMDGPU.cpp
clang/lib/Basic/Targets/NVPTX.cpp
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
clang/test/CodeGenOpenCL/amdgpu-features.cl
clang/test/Driver/amdgpu-macros.cl
clang/test/Driver/amdgpu-mcpu.cl
clang/test/Misc/target-invalid-cpu-note.c

Removed: 




diff  --git a/clang/include/clang/Basic/Cuda.h 
b/clang/include/clang/Basic/Cuda.h
index 147b04eb57459..18ef373784e5b 100644
--- a/clang/include/clang/Basic/Cuda.h
+++ b/clang/include/clang/Basic/Cuda.h
@@ -97,6 +97,10 @@ enum class CudaArch {
   GFX1034,
   GFX1035,
   GFX1036,
+  GFX1100,
+  GFX1101,
+  GFX1102,
+  GFX1103,
   Generic, // A processor model named 'generic' if the target backend defines a
// public one.
   LAST,

diff  --git a/clang/lib/Basic/Cuda.cpp b/clang/lib/Basic/Cuda.cpp
index adc61a567dbef..412f5c3f45e36 100644
--- a/clang/lib/Basic/Cuda.cpp
+++ b/clang/lib/Basic/Cuda.cpp
@@ -125,6 +125,10 @@ static const CudaArchToStringMap arch_names[] = {
 GFX(1034), // gfx1034
 GFX(1035), // gfx1035
 GFX(1036), // gfx1036
+GFX(1100), // gfx1100
+GFX(1101), // gfx1101
+GFX(1102), // gfx1102
+GFX(1103), // gfx1103
 {CudaArch::Generic, "generic", ""},
 // clang-format on
 };

diff  --git a/clang/lib/Basic/Targets/AMDGPU.cpp 
b/clang/lib/Basic/Targets/AMDGPU.cpp
index 32eacc871093e..c13aec4b2cae5 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -183,6 +183,26 @@ bool AMDGPUTargetInfo::initFeatureMap(
   // XXX - What does the member GPU mean if device name string passed here?
   if (isAMDGCN(getTriple())) {
 switch (llvm::AMDGPU::parseArchAMDGCN(CPU)) {
+case GK_GFX1103:
+case GK_GFX1102:
+case GK_GFX1101:
+case GK_GFX1100:
+  Features["ci-insts"] = true;
+  Features["dot1-insts"] = true;
+  Features["dot5-insts"] = true;
+  Features["dot6-insts"] = true;
+  Features["dot7-insts"] = true;
+  Features["dot8-insts"] = true;
+  Features["dl-insts"] = true;
+  Features["flat-address-space"] = true;
+  Features["16-bit-insts"] = true;
+  Features["dpp"] = true;
+  Features["gfx8-insts"] = true;
+  Features["gfx9-insts"] = true;
+  Features["gfx10-insts"] = true;
+  Features["gfx10-3-insts"] = true;
+  Features["gfx11-insts"] = true;
+  break;
 case GK_GFX1036:
 case GK_GFX1035:
 case GK_GFX1034:

diff  --git a/clang/lib/Basic/Targets/NVPTX.cpp 
b/clang/lib/Basic/Targets/NVPTX.cpp
index f03d5c600e039..ffd69983a0be5 100644
--- a/clang/lib/Basic/Targets/NVPTX.cpp
+++ b/clang/lib/Basic/Targets/NVPTX.cpp
@@ -217,6 +217,10 @@ void NVPTXTargetInfo::getTargetDefines(const LangOptions 
&Opts,
   case CudaArch::GFX1034:
   case CudaArch::GFX1035:
   case CudaArch::GFX1036:
+  case CudaArch::GFX1100:
+  case CudaArch::GFX1101:
+  case CudaArch::GFX1102:
+  case CudaArch::GFX1103:
   case CudaArch::Generic:
   case CudaArch::LAST:
 break;

diff  --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index f4228cfb3086e..85efe93d6bd98 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -3946,6 +3946,10 @@ void CGOpenMPRuntimeGPU::processRequiresDirective(
   case CudaArch::GFX1034:
   case CudaArch::GFX1035:
   case CudaArch::GFX1036:
+  case CudaArch::GFX1100:
+  case CudaArch::GFX1101:
+  case CudaArch::GFX1102:
+  case CudaArch::GFX1103:
   case CudaArch::Generic:
   case CudaArch::UNUSED:
   case CudaArch::UNKNOWN:

diff  --git a/clang/test/CodeGenOpenCL/amdgpu-features.cl 
b/clang/test/CodeGenOpenCL/amdgpu-features.cl
index 0967e932868eb..cb3a3eff01f70 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-features.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-features.cl
@@ -37,6 +37,10 @@
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1034 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX1034 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1035 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX1035 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1036 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX1036 %s
+// 

[lldb] [clang-tools-extra] [libcxx] [compiler-rt] [libc] [clang] [lld] [llvm] [flang] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)

2024-01-23 Thread Joe Nash via cfe-commits
Mirko =?utf-8?q?Brkušanin?= ,
Mirko =?utf-8?q?Brkušanin?= ,Mirko Brkusanin
 ,Mariusz Sikora 
Message-ID:
In-Reply-To: 


https://github.com/Sisyph commented:

DPP changes look good, and functionally I'm fine with the patch.

I don't think the tablegen 'bit IsFP8' version of managing the op_sel bits is 
any better than adding a fake src1. It doesn't scale up to any more op_sel bits 
(Hence why we can't use it for V_CVT_SR_BF8_F32_e64_dpp_gfx12) and it is a new 
abstraction, whereas we have many instances of fake src operands already. 
Consider it a +1 but not +2 from me as is, based on that.

https://github.com/llvm/llvm-project/pull/78414
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [lld] [llvm] [compiler-rt] [clang] [libc] [libcxx] [flang] [lldb] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)

2024-01-22 Thread Joe Nash via cfe-commits
Mirko =?utf-8?q?Brkušanin?= ,
Mirko =?utf-8?q?Brkušanin?= 
Message-ID:
In-Reply-To: 



@@ -305,6 +305,11 @@ class VOP3OpSel_gfx10 op, VOPProfile p> : 
VOP3e_gfx10 {
 
 class VOP3OpSel_gfx11_gfx12 op, VOPProfile p> : VOP3OpSel_gfx10;
 
+class VOP3FP8OpSel_gfx11_gfx12 op, VOPProfile p> : VOP3e_gfx10 
{
+  let Inst{11} = !if(p.HasSrc0, src0_modifiers{2}, 0);
+  let Inst{12} = !if(p.HasSrc0, src0_modifiers{3}, 0);

Sisyph wrote:

 A couple points related to this. 
- I don't think the rules for forming op_sel with dpp are currently being 
checked correctly. In GCNDPPCombine.cpp:369, we check the named op_sel operand 
and op_sel_hi operand. We use src_modifier operands through most of the 
compiler, and typically don't (if ever?) copy the bits to the named op_sel 
operands. This should be fixed regardless of this patch.
- The rules we should enforce for dpp with op_sel is that for the alu to be 
combined, op_sel must be 0 and op_sel_hi must be 0b111
- My conclusion is that it is only safe to form the dpp alu with these 
instructions if op_sel bits are all zero
- We are emitting those alu based on this patch that probably shouldn't be 
allowed ( e.g. v_cvt_f32_fp8_e64_dpp v0, v0 op_sel:[1,1] quad_perm:[0,1,2,3] 
row_mask:0xf bank_mask:0xf bound_ctrl:1)
- The cleanup Ivan suggested using the dst_op_sel bit of src0 (equivalent to 
the op_sel_hi bit of src0) would require a special case in GCNDppCombine to 
check the correct bit. Otherwise, we might allow an alu to be formed if 
op_sel:[0,1] 


https://github.com/llvm/llvm-project/pull/78414
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[flang] [clang] [compiler-rt] [libcxx] [llvm] [lld] [lldb] [clang-tools-extra] [libc] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)

2024-01-23 Thread Joe Nash via cfe-commits
Mirko =?utf-8?q?Brkušanin?= ,
Mirko =?utf-8?q?Brkušanin?= 
Message-ID:
In-Reply-To: 



@@ -305,6 +305,11 @@ class VOP3OpSel_gfx10 op, VOPProfile p> : 
VOP3e_gfx10 {
 
 class VOP3OpSel_gfx11_gfx12 op, VOPProfile p> : VOP3OpSel_gfx10;
 
+class VOP3FP8OpSel_gfx11_gfx12 op, VOPProfile p> : VOP3e_gfx10 
{
+  let Inst{11} = !if(p.HasSrc0, src0_modifiers{2}, 0);
+  let Inst{12} = !if(p.HasSrc0, src0_modifiers{3}, 0);

Sisyph wrote:

Thanks! I do think that patch will help a lot. I also think it handles the case 
where we use dst_op_sel to store the other bit instead of src1. If the 
CVT_F32_FP8 instruction was VOP3P, we would need a special case, but since it 
is VOP3, we want all the op_sel bits to be zero and we want dst_op_sel to be 
zero.

https://github.com/llvm/llvm-project/pull/78414
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 2d43de1 - [AMDGPU] gfx11 new dot instruction codegen support

2022-06-16 Thread Joe Nash via cfe-commits

Author: Joe Nash
Date: 2022-06-16T14:19:34-04:00
New Revision: 2d43de13df03eab0fda1023b22b335b207afc507

URL: 
https://github.com/llvm/llvm-project/commit/2d43de13df03eab0fda1023b22b335b207afc507
DIFF: 
https://github.com/llvm/llvm-project/commit/2d43de13df03eab0fda1023b22b335b207afc507.diff

LOG: [AMDGPU] gfx11 new dot instruction codegen support

Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D127904

Added: 
clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-gfx11.cl
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sudot4.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sudot8.ll
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.bf16.bf16.ll
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.f16.f16.ll
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.sudot4.ll
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.sudot8.ll

Modified: 
clang/include/clang/Basic/BuiltinsAMDGPU.def
clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl
llvm/include/llvm/IR/IntrinsicsAMDGPU.td
llvm/lib/Target/AMDGPU/AMDGPUGISel.td
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
llvm/lib/Target/AMDGPU/VOP3Instructions.td
llvm/lib/Target/AMDGPU/VOP3PInstructions.td
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.fdot2.ll

Removed: 




diff  --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 19e4ea998aa47..bd188c7f34371 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -222,12 +222,17 @@ TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_v2bf16, 
"V2sV2s*3V2s", "t", "gfx9
 
//===--===//
 
 TARGET_BUILTIN(__builtin_amdgcn_fdot2, "fV2hV2hfIb", "nc", "dot7-insts")
+TARGET_BUILTIN(__builtin_amdgcn_fdot2_f16_f16, "hV2hV2hh", "nc", "dot8-insts")
+TARGET_BUILTIN(__builtin_amdgcn_fdot2_bf16_bf16, "sV2sV2ss", "nc", 
"dot8-insts")
+TARGET_BUILTIN(__builtin_amdgcn_fdot2_f32_bf16, "fV2sV2sfIb", "nc", 
"dot8-insts")
 TARGET_BUILTIN(__builtin_amdgcn_sdot2, "SiV2SsV2SsSiIb", "nc", "dot2-insts")
 TARGET_BUILTIN(__builtin_amdgcn_udot2, "UiV2UsV2UsUiIb", "nc", "dot2-insts")
 TARGET_BUILTIN(__builtin_amdgcn_sdot4, "SiSiSiSiIb", "nc", "dot1-insts")
 TARGET_BUILTIN(__builtin_amdgcn_udot4, "UiUiUiUiIb", "nc", "dot7-insts")
+TARGET_BUILTIN(__builtin_amdgcn_sudot4, "iIbiIbiiIb", "nc", "dot8-insts")
 TARGET_BUILTIN(__builtin_amdgcn_sdot8, "SiSiSiSiIb", "nc", "dot1-insts")
 TARGET_BUILTIN(__builtin_amdgcn_udot8, "UiUiUiUiIb", "nc", "dot7-insts")
+TARGET_BUILTIN(__builtin_amdgcn_sudot8, "iIbiIbiiIb", "nc", "dot8-insts")
 
 
//===--===//
 // GFX10+ only builtins.

diff  --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl
index e7a71b5158859..ac732952b390b 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl
@@ -8,29 +8,45 @@ typedef half __attribute__((ext_vector_type(2))) half2;
 typedef short __attribute__((ext_vector_type(2))) short2;
 typedef unsigned short __attribute__((ext_vector_type(2))) ushort2;
 
+#pragma OPENCL EXTENSION cl_khr_fp16 : enable
 kernel void builtins_amdgcn_dl_insts_err(
 global float *fOut, global int *siOut, global uint *uiOut,
-half2 v2hA, half2 v2hB, float fC,
-short2 v2ssA, short2 v2ssB, int siA, int siB, int siC,
-ushort2 v2usA, ushort2 v2usB, uint uiA, uint uiB, uint uiC) {
-  fOut[0] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, false); // 
expected-error {{'__builtin_amdgcn_fdot2' needs target feature dot7-insts}}
-  fOut[1] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, true);  // 
expected-error {{'__builtin_amdgcn_fdot2' needs target feature dot7-insts}}
+global short *sOut, global int *iOut, global half *hOut,
+half2 v2hA, half2 v2hB, float fC, half hC,
+short2 v2ssA, short2 v2ssB, short sC, int siA, int siB, int siC,
+ushort2 v2usA, ushort2 v2usB, uint uiA, uint uiB, uint uiC,
+int A, int B, int C) {
+  fOut[0] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, false);  // 
expected-error {{'__builtin_amdgcn_fdot2' needs target feature dot7-insts}}
+  fOut[1] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, true);   // 
expected-error {{'__builtin_amdgcn_fdot2' needs target feature dot7-insts}}
 
-  siOut[0] = __builtin_amdgcn_sdot2(v2ssA, v2ssB, siC, false); // 
expected-error {{'__builtin_amdgcn_sdot2' needs target feature dot2-insts}}
-  siOut[1] = __builtin_amdgcn_sdot2(v2ssA, v2ssB, si

[clang] [AMDGPU] Add another SIFoldOperands instance after shrink (PR #67878)

2023-10-03 Thread Joe Nash via cfe-commits

Sisyph wrote:

> I've just tested this on 1 graphics shaders and it seems to make no 
> difference at all. I tried gfx900 and gfx1100. Can anyone else from the 
> graphics team confirm this?

I can confirm no difference on gfx1102

https://github.com/llvm/llvm-project/pull/67878
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[libunwind] [AMDGPU] Add another SIFoldOperands instance after shrink (PR #67878)

2023-10-03 Thread Joe Nash via cfe-commits

Sisyph wrote:

> I've just tested this on 1 graphics shaders and it seems to make no 
> difference at all. I tried gfx900 and gfx1100. Can anyone else from the 
> graphics team confirm this?

I can confirm no difference on gfx1102

https://github.com/llvm/llvm-project/pull/67878
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][True16][MC][CodeGen] true16 for v_alignbyte_b32 (PR #119750)

2025-01-30 Thread Joe Nash via cfe-commits

https://github.com/Sisyph commented:

LGTM but please wait for the other reviewers. 

https://github.com/llvm/llvm-project/pull/119750
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][True16][MC][CodeGen] true16 for v_alignbyte_b32 (PR #119750)

2025-01-30 Thread Joe Nash via cfe-commits

https://github.com/Sisyph edited 
https://github.com/llvm/llvm-project/pull/119750
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][True16][MC][CodeGen] true16 for v_alignbyte_b32 (PR #119750)

2025-01-30 Thread Joe Nash via cfe-commits


@@ -1,10 +1,48 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
 ; RUN: llc -mtriple=amdgcn -verify-machineinstrs < %s | FileCheck 
-check-prefix=GCN %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -mattr=+real-true16 
-verify-machineinstrs < %s | FileCheck -check-prefixes=GFX11-TRUE16 %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -mattr=-real-true16 
-verify-machineinstrs < %s | FileCheck -check-prefixes=GFX11-FAKE16 %s
 
 declare i32 @llvm.amdgcn.alignbyte(i32, i32, i32) #0
 
-; GCN-LABEL: {{^}}v_alignbyte_b32:
-; GCN: v_alignbyte_b32 {{[vs][0-9]+}}, {{[vs][0-9]+}}, {{[vs][0-9]+}}
 define amdgpu_kernel void @v_alignbyte_b32(ptr addrspace(1) %out, i32 %src1, 
i32 %src2, i32 %src3) #1 {
+; GCN-LABEL: v_alignbyte_b32:
+; GCN:   ; %bb.0:
+; GCN-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0xb
+; GCN-NEXT:s_load_dwordx2 s[4:5], s[4:5], 0x9
+; GCN-NEXT:s_mov_b32 s7, 0xf000
+; GCN-NEXT:s_mov_b32 s6, -1
+; GCN-NEXT:s_waitcnt lgkmcnt(0)
+; GCN-NEXT:v_mov_b32_e32 v0, s1
+; GCN-NEXT:v_mov_b32_e32 v1, s2
+; GCN-NEXT:v_alignbyte_b32 v0, s0, v0, v1
+; GCN-NEXT:buffer_store_dword v0, off, s[4:7], 0
+; GCN-NEXT:s_endpgm
+;
+; GFX11-TRUE16-LABEL: v_alignbyte_b32:
+; GFX11-TRUE16:   ; %bb.0:
+; GFX11-TRUE16-NEXT:s_clause 0x1
+; GFX11-TRUE16-NEXT:s_load_b128 s[0:3], s[4:5], 0x2c
+; GFX11-TRUE16-NEXT:s_load_b64 s[4:5], s[4:5], 0x24
+; GFX11-TRUE16-NEXT:v_mov_b32_e32 v1, 0
+; GFX11-TRUE16-NEXT:s_waitcnt lgkmcnt(0)
+; GFX11-TRUE16-NEXT:v_mov_b16_e32 v0.l, s2
+; GFX11-TRUE16-NEXT:s_delay_alu instid0(VALU_DEP_1)
+; GFX11-TRUE16-NEXT:v_alignbyte_b32 v0, s0, s1, v0.l

Sisyph wrote:

Nit: Can you add another test in this file where s0 and s1 are vgpr arguments 
instead, so we can see if s2 can be folded into src2 of alignbyte?

https://github.com/llvm/llvm-project/pull/119750
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][True16][MC][CodeGen] true16 for v_alignbyte_b32 (PR #119750)

2025-01-30 Thread Joe Nash via cfe-commits


@@ -3802,6 +3802,26 @@ def : FPMinCanonMaxPat, fmaximum_oneuse>;
 }
 
+let True16Predicate = UseFakeTrue16Insts in
+def : GCNPat <
+(i32 (int_amdgcn_alignbyte (i32 (VOP3OpSelMods i32:$src0, 
i32:$src0_modifiers)),
+   (i32 (VOP3OpSelMods i32:$src1, 
i32:$src1_modifiers)),
+   (i32 (VOP3OpSelMods i32:$src2, 
i32:$src2_modifiers,
+(V_ALIGNBYTE_B32_fake16_e64 i32:$src0_modifiers, VSrc_b32:$src0,
+i32:$src1_modifiers, VSrc_b32:$src1,
+i32:$src2_modifiers, VGPR_32:$src2)
+>;
+
+let True16Predicate = UseRealTrue16Insts in

Sisyph wrote:

I would put this pattern in VOP3Instructions.td

https://github.com/llvm/llvm-project/pull/119750
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][True16][MC][CodeGen] true16 for v_alignbyte_b32 (PR #119750)

2025-01-30 Thread Joe Nash via cfe-commits


@@ -3802,6 +3802,26 @@ def : FPMinCanonMaxPat, fmaximum_oneuse>;
 }
 
+let True16Predicate = UseFakeTrue16Insts in
+def : GCNPat <
+(i32 (int_amdgcn_alignbyte (i32 (VOP3OpSelMods i32:$src0, 
i32:$src0_modifiers)),

Sisyph wrote:

Instead of this fake16 pattern, can you put int_amdgcn_alignbyte  in the 
V_ALIGNBYTE_B32_fake16 definition, just like for the NotHasTrue16 one?

https://github.com/llvm/llvm-project/pull/119750
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [Sema] Fix bug in builtin AS override (PR #138141)

2025-05-01 Thread Joe Nash via cfe-commits

https://github.com/Sisyph created 
https://github.com/llvm/llvm-project/pull/138141

Fix the logic in rewriteBuiltinFunctionDecl to work when the builtin
has a pointer parameter with an address space and one without a fixed
address space. A builtin fitting these criteria was recently added.
Change the attribute string to perform type checking on it, so without
the sema change compilation would fail with a wrong number of arguments
error.

>From 96e94b5662c613fd80f712080751076254a73524 Mon Sep 17 00:00:00 2001
From: Krzysztof Drewniak 
Date: Sat, 26 Apr 2025 00:20:22 +
Subject: [PATCH 1/3] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic

This PR adds a amdgns_load_to_lds intrinsic that abstracts over loads
to LDS from global (address space 1) pointers and buffer fat
pointers (address space 7), since they use the saem API and "gather
from a pointer to LDS" is something of an abstract operation.

This commet adds the intrinsic and its lowerings for addrspaces 1 and
7, and updates the MLIR wrappers to use it (loosening up the
restrictions on loads to LDS along the way to match the ground truth
from target features).

It also plumbs the intrinsic through to clang.
---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |   1 +
 clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp   |   4 +
 clang/lib/Sema/SemaAMDGPU.cpp |   1 +
 .../CodeGenOpenCL/builtins-amdgcn-gfx950.cl   |  30 +++
 .../builtins-amdgcn-load-to-lds.cl|  60 +
 llvm/docs/ReleaseNotes.md |   8 +
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |  21 ++
 .../AMDGPU/AMDGPUInstructionSelector.cpp  |   5 +
 .../AMDGPU/AMDGPULowerBufferFatPointers.cpp   |  20 ++
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   2 +
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |   8 +-
 .../AMDGPU/llvm.amdgcn.load.to.lds.gfx950.ll  |  75 ++
 .../CodeGen/AMDGPU/llvm.amdgcn.load.to.lds.ll | 220 ++
 .../lower-buffer-fat-pointers-mem-transfer.ll |  18 ++
 mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td |  12 +-
 mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td  |  35 ++-
 .../AMDGPUToROCDL/AMDGPUToROCDL.cpp   |  15 +-
 mlir/lib/Dialect/AMDGPU/IR/AMDGPUDialect.cpp  |  21 +-
 .../Conversion/AMDGPUToROCDL/load_lds.mlir|  67 --
 mlir/test/Dialect/LLVMIR/rocdl.mlir   |  17 +-
 mlir/test/Target/LLVMIR/rocdl.mlir|  11 +-
 21 files changed, 598 insertions(+), 53 deletions(-)
 create mode 100644 clang/test/CodeGenOpenCL/builtins-amdgcn-load-to-lds.cl
 create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.load.to.lds.gfx950.ll
 create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.load.to.lds.ll

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 39fef9e4601f8..730fd15913c11 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -257,6 +257,7 @@ TARGET_BUILTIN(__builtin_amdgcn_flat_atomic_fadd_v2bf16, 
"V2sV2s*0V2s", "t", "at
 TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_v2bf16, "V2sV2s*1V2s", "t", 
"atomic-global-pk-add-bf16-inst")
 TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_v2bf16, "V2sV2s*3V2s", "t", 
"atomic-ds-pk-add-16-insts")
 TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_v2f16, "V2hV2h*3V2h", "t", 
"atomic-ds-pk-add-16-insts")
+TARGET_BUILTIN(__builtin_amdgcn_load_to_lds, "vv*v*3IUiIiIUi", "t", 
"vmem-to-lds-load-insts")
 TARGET_BUILTIN(__builtin_amdgcn_global_load_lds, "vv*1v*3IUiIiIUi", "t", 
"vmem-to-lds-load-insts")
 
 
//===--===//
diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp 
b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index ad012d98635ff..a32ef1c2a5a12 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -564,6 +564,10 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 llvm::Function *F = CGM.getIntrinsic(IID, {LoadTy});
 return Builder.CreateCall(F, {Addr});
   }
+  case AMDGPU::BI__builtin_amdgcn_load_to_lds: {
+return emitBuiltinWithOneOverloadedType<5>(*this, E,
+   Intrinsic::amdgcn_load_to_lds);
+  }
   case AMDGPU::BI__builtin_amdgcn_get_fpenv: {
 Function *F = CGM.getIntrinsic(Intrinsic::get_fpenv,
{llvm::Type::getInt64Ty(getLLVMContext())});
diff --git a/clang/lib/Sema/SemaAMDGPU.cpp b/clang/lib/Sema/SemaAMDGPU.cpp
index a6366aceec2a6..e6414a623b929 100644
--- a/clang/lib/Sema/SemaAMDGPU.cpp
+++ b/clang/lib/Sema/SemaAMDGPU.cpp
@@ -36,6 +36,7 @@ bool SemaAMDGPU::CheckAMDGCNBuiltinFunctionCall(unsigned 
BuiltinID,
 
   switch (BuiltinID) {
   case AMDGPU::BI__builtin_amdgcn_raw_ptr_buffer_load_lds:
+  case AMDGPU::BI__builtin_amdgcn_load_to_lds:
   case AMDGPU::BI__builtin_amdgcn_global_load_lds: {
 constexpr const int SizeIdx = 2;
 llvm::APSInt Size;
diff --git a/clang/test/C

[clang] [llvm] [mlir] [Sema] Fix bug in builtin AS override (PR #138141)

2025-05-01 Thread Joe Nash via cfe-commits

Sisyph wrote:

As far as I know, there is no existing builtin in tree that covers this 
behavior. The one in https://github.com/llvm/llvm-project/pull/137425 does. If 
that builtin doesn't land, I can land this without a test. Or perhaps, is there 
a way to register a 'unit test' builtin?

https://github.com/llvm/llvm-project/pull/138141
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [Sema] Fix bug in builtin AS override (PR #138141)

2025-05-01 Thread Joe Nash via cfe-commits

Sisyph wrote:

Sorry, the file changes on this PR is not clear. This PR contains 
https://github.com/llvm/llvm-project/pull/137425 plus the fix on top of it.

https://github.com/llvm/llvm-project/pull/138141
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [Sema] Fix bug in builtin AS override (PR #138141)

2025-05-01 Thread Joe Nash via cfe-commits

Sisyph wrote:

> The patch to SemaExpr looks reasonable to me. I'd suggest that goes in 
> separate from the amdgpu intrinsic stuff.
> 
> I'd test this by tweaking the code to do the current lowering _and_ the 
> proposed and check that they do exactly the same thing on all the existing 
> builtins, then drop the current code path, but ymmv in terms of how that 
> strategy interacts with our code review system.

Thanks. I will wait for https://github.com/llvm/llvm-project/pull/137425 to be 
resolved, then rebase or convert this PR as needed.

https://github.com/llvm/llvm-project/pull/138141
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Sema] Fix bug in builtin AS override (PR #138141)

2025-05-30 Thread Joe Nash via cfe-commits

Sisyph wrote:

> Is there a test that needs to be added here?

With just the change to BuiltinsAMDGPU.def, we will get errors in 

  Clang :: CodeGenOpenCL/builtins-amdgcn-gfx950.cl
  Clang :: CodeGenOpenCL/builtins-amdgcn-load-to-lds.cl
  Clang :: SemaOpenCL/builtins-amdgcn-load-to-lds-err.cl

which the patch to Sema fixes. 

I don't know of a reasonable way to write a test only target builtin and use of 
it, to guard against something like __builtin_amdgcn_load_to_lds being removed 
and then the Sema code being changed.


https://github.com/llvm/llvm-project/pull/138141
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Sema] Fix bug in builtin AS override (PR #138141)

2025-05-30 Thread Joe Nash via cfe-commits

https://github.com/Sisyph updated 
https://github.com/llvm/llvm-project/pull/138141

>From f5cdefe8200d9c9f567d6b4a276a5587e44ac1fa Mon Sep 17 00:00:00 2001
From: Joe Nash 
Date: Thu, 1 May 2025 10:00:47 -0400
Subject: [PATCH] [Sema] Fix bug in builtin AS override

Fix the logic in rewriteBuiltinFunctionDecl to work when the builtin
has a pointer parameter with an address space and one without a fixed
address space. A builtin fitting these criteria was recently added.
Change the attribute string to perform type checking on it, so without
the sema change compilation would fail with a wrong number of arguments
error.
---
 clang/include/clang/Basic/BuiltinsAMDGPU.def | 2 +-
 clang/lib/Sema/SemaExpr.cpp  | 6 ++
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 730fd15913c11..802b4be42419d 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -257,7 +257,7 @@ TARGET_BUILTIN(__builtin_amdgcn_flat_atomic_fadd_v2bf16, 
"V2sV2s*0V2s", "t", "at
 TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_v2bf16, "V2sV2s*1V2s", "t", 
"atomic-global-pk-add-bf16-inst")
 TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_v2bf16, "V2sV2s*3V2s", "t", 
"atomic-ds-pk-add-16-insts")
 TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_v2f16, "V2hV2h*3V2h", "t", 
"atomic-ds-pk-add-16-insts")
-TARGET_BUILTIN(__builtin_amdgcn_load_to_lds, "vv*v*3IUiIiIUi", "t", 
"vmem-to-lds-load-insts")
+TARGET_BUILTIN(__builtin_amdgcn_load_to_lds, "vv*v*3IUiIiIUi", "", 
"vmem-to-lds-load-insts")
 TARGET_BUILTIN(__builtin_amdgcn_global_load_lds, "vv*1v*3IUiIiIUi", "t", 
"vmem-to-lds-load-insts")
 
 
//===--===//
diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index 21bd7315e3dd4..85c7f995233a7 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -6395,7 +6395,8 @@ static FunctionDecl *rewriteBuiltinFunctionDecl(Sema 
*Sema, ASTContext &Context,
   return nullptr;
 Expr *Arg = ArgRes.get();
 QualType ArgType = Arg->getType();
-if (!ParamType->isPointerType() || ParamType.hasAddressSpace() ||
+if (!ParamType->isPointerType() ||
+ParamType->getPointeeType().hasAddressSpace() ||
 !ArgType->isPointerType() ||
 !ArgType->getPointeeType().hasAddressSpace() ||
 isPtrSizeAddressSpace(ArgType->getPointeeType().getAddressSpace())) {
@@ -6404,9 +6405,6 @@ static FunctionDecl *rewriteBuiltinFunctionDecl(Sema 
*Sema, ASTContext &Context,
 }
 
 QualType PointeeType = ParamType->getPointeeType();
-if (PointeeType.hasAddressSpace())
-  continue;
-
 NeedsNewDecl = true;
 LangAS AS = ArgType->getPointeeType().getAddressSpace();
 

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Sema] Fix bug in builtin AS override (PR #138141)

2025-05-30 Thread Joe Nash via cfe-commits

Sisyph wrote:

Rebased, PTAL

https://github.com/llvm/llvm-project/pull/138141
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Sema] Fix bug in builtin AS override (PR #138141)

2025-05-30 Thread Joe Nash via cfe-commits

Sisyph wrote:

It is documented in clang/include/clang/Basic/Builtins.def. 
` //  t -> signature is meaningless, use custom typechecking `
It essentially disables builtin signature typechecking, though I haven't looked 
into all the details.

https://github.com/llvm/llvm-project/pull/138141
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Sema] Fix bug in builtin AS override (PR #138141)

2025-06-06 Thread Joe Nash via cfe-commits

Sisyph wrote:

> Can you also remove all `t`? They don't seem to be necessary here.

I'll put this on my todo list.

https://github.com/llvm/llvm-project/pull/138141
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Sema] Fix bug in builtin AS override (PR #138141)

2025-06-06 Thread Joe Nash via cfe-commits

https://github.com/Sisyph closed 
https://github.com/llvm/llvm-project/pull/138141
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits