[clang] 813e521 - [AMDGPU] Add gfx11 subtarget ELF definition
Author: Joe Nash Date: 2022-04-29T12:27:17-04:00 New Revision: 813e521e55b11165138b071f446eda94b14570dc URL: https://github.com/llvm/llvm-project/commit/813e521e55b11165138b071f446eda94b14570dc DIFF: https://github.com/llvm/llvm-project/commit/813e521e55b11165138b071f446eda94b14570dc.diff LOG: [AMDGPU] Add gfx11 subtarget ELF definition This is the first patch of a series to upstream support for the new subtarget. Contributors: Jay Foad Konstantin Zhuravlyov Patch 1/N for upstreaming AMDGPU gfx11 architectures. Reviewed By: foad, kzhuravl, #amdgpu Differential Revision: https://reviews.llvm.org/D124536 Added: Modified: clang/test/Misc/target-invalid-cpu-note.c llvm/docs/AMDGPUUsage.rst llvm/include/llvm/BinaryFormat/ELF.h llvm/include/llvm/Support/TargetParser.h llvm/lib/Object/ELFObjectFile.cpp llvm/lib/ObjectYAML/ELFYAML.cpp llvm/lib/Support/TargetParser.cpp llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp llvm/test/Object/AMDGPU/elf-header-flags-mach.yaml llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test llvm/tools/llvm-readobj/ELFDumper.cpp Removed: diff --git a/clang/test/Misc/target-invalid-cpu-note.c b/clang/test/Misc/target-invalid-cpu-note.c index 8b9409336d1cb..4248090cb9feb 100644 --- a/clang/test/Misc/target-invalid-cpu-note.c +++ b/clang/test/Misc/target-invalid-cpu-note.c @@ -37,7 +37,7 @@ // RUN: not %clang_cc1 -triple amdgcn--- -target-cpu not-a-cpu -fsyntax-only %s 2>&1 | FileCheck %s --check-prefix AMDGCN // AMDGCN: error: unknown target CPU 'not-a-cpu' -// AMDGCN-NEXT: note: valid target CPU values are: gfx600, tahiti, gfx601, pitcairn, verde, gfx602, hainan, oland, gfx700, kaveri, gfx701, hawaii, gfx702, gfx703, kabini, mullins, gfx704, bonaire, gfx705, gfx801, carrizo, gfx802, iceland, tonga, gfx803, fiji, polaris10, polaris11, gfx805, tongapro, gfx810, stoney, gfx900, gfx902, gfx904, gfx906, gfx908, gfx909, gfx90a, gfx90c, gfx940, gfx1010, gfx1011, gfx1012, gfx1013, gfx1030, gfx1031, gfx1032, gfx1033, gfx1034, gfx1035, gfx1036{{$}} +// AMDGCN-NEXT: note: valid target CPU values are: gfx600, tahiti, gfx601, pitcairn, verde, gfx602, hainan, oland, gfx700, kaveri, gfx701, hawaii, gfx702, gfx703, kabini, mullins, gfx704, bonaire, gfx705, gfx801, carrizo, gfx802, iceland, tonga, gfx803, fiji, polaris10, polaris11, gfx805, tongapro, gfx810, stoney, gfx900, gfx902, gfx904, gfx906, gfx908, gfx909, gfx90a, gfx90c, gfx940, gfx1010, gfx1011, gfx1012, gfx1013, gfx1030, gfx1031, gfx1032, gfx1033, gfx1034, gfx1035, gfx1036, gfx1100, gfx1101, gfx1102, gfx1103{{$}} // RUN: not %clang_cc1 -triple wasm64--- -target-cpu not-a-cpu -fsyntax-only %s 2>&1 | FileCheck %s --check-prefix WEBASM // WEBASM: error: unknown target CPU 'not-a-cpu' diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst index e962674867df5..1cc342f85c659 100644 --- a/llvm/docs/AMDGPUUsage.rst +++ b/llvm/docs/AMDGPUUsage.rst @@ -444,6 +444,36 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following Add product names. + **GCN GFX11** [AMD-GCN-GFX11]_ + --- + ``gfx1100`` ``amdgcn`` dGPU - cumode - Architected - *pal-amdpal* *TBA* +- wavefrontsize64 flat + scratch .. TODO:: + - Packed + work-item Add product +IDs names. + + ``gfx1101`` ``amdgcn`` dGPU - cumode - Architected *TBA* +- wavefrontsize64 flat + scratch .. TODO:: + - Packed + work-item Add product +IDs names. + + ``gfx1102`` ``amdgcn`` dGPU - cumode - Architected *TBA* +- wavefrontsize64 flat +
[clang] 8bdfc73 - [AMDGPU][clang] Definition of gfx11 subtarget
Author: Joe Nash Date: 2022-04-29T13:55:56-04:00 New Revision: 8bdfc73f633dca9859123b8596bcb521700c6a7f URL: https://github.com/llvm/llvm-project/commit/8bdfc73f633dca9859123b8596bcb521700c6a7f DIFF: https://github.com/llvm/llvm-project/commit/8bdfc73f633dca9859123b8596bcb521700c6a7f.diff LOG: [AMDGPU][clang] Definition of gfx11 subtarget Contributors: Jay Foad Konstantin Zhuravlyov Patch 2/N for upstreaming of AMDGPU gfx11 architecture Depends on D124536 Reviewed By: foad, kzhuravl, #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D124537 Added: Modified: clang/include/clang/Basic/Cuda.h clang/lib/Basic/Cuda.cpp clang/lib/Basic/Targets/AMDGPU.cpp clang/lib/Basic/Targets/NVPTX.cpp clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp clang/test/CodeGenOpenCL/amdgpu-features.cl clang/test/Driver/amdgpu-macros.cl clang/test/Driver/amdgpu-mcpu.cl clang/test/Misc/target-invalid-cpu-note.c Removed: diff --git a/clang/include/clang/Basic/Cuda.h b/clang/include/clang/Basic/Cuda.h index 147b04eb57459..18ef373784e5b 100644 --- a/clang/include/clang/Basic/Cuda.h +++ b/clang/include/clang/Basic/Cuda.h @@ -97,6 +97,10 @@ enum class CudaArch { GFX1034, GFX1035, GFX1036, + GFX1100, + GFX1101, + GFX1102, + GFX1103, Generic, // A processor model named 'generic' if the target backend defines a // public one. LAST, diff --git a/clang/lib/Basic/Cuda.cpp b/clang/lib/Basic/Cuda.cpp index adc61a567dbef..412f5c3f45e36 100644 --- a/clang/lib/Basic/Cuda.cpp +++ b/clang/lib/Basic/Cuda.cpp @@ -125,6 +125,10 @@ static const CudaArchToStringMap arch_names[] = { GFX(1034), // gfx1034 GFX(1035), // gfx1035 GFX(1036), // gfx1036 +GFX(1100), // gfx1100 +GFX(1101), // gfx1101 +GFX(1102), // gfx1102 +GFX(1103), // gfx1103 {CudaArch::Generic, "generic", ""}, // clang-format on }; diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp index 32eacc871093e..c13aec4b2cae5 100644 --- a/clang/lib/Basic/Targets/AMDGPU.cpp +++ b/clang/lib/Basic/Targets/AMDGPU.cpp @@ -183,6 +183,26 @@ bool AMDGPUTargetInfo::initFeatureMap( // XXX - What does the member GPU mean if device name string passed here? if (isAMDGCN(getTriple())) { switch (llvm::AMDGPU::parseArchAMDGCN(CPU)) { +case GK_GFX1103: +case GK_GFX1102: +case GK_GFX1101: +case GK_GFX1100: + Features["ci-insts"] = true; + Features["dot1-insts"] = true; + Features["dot5-insts"] = true; + Features["dot6-insts"] = true; + Features["dot7-insts"] = true; + Features["dot8-insts"] = true; + Features["dl-insts"] = true; + Features["flat-address-space"] = true; + Features["16-bit-insts"] = true; + Features["dpp"] = true; + Features["gfx8-insts"] = true; + Features["gfx9-insts"] = true; + Features["gfx10-insts"] = true; + Features["gfx10-3-insts"] = true; + Features["gfx11-insts"] = true; + break; case GK_GFX1036: case GK_GFX1035: case GK_GFX1034: diff --git a/clang/lib/Basic/Targets/NVPTX.cpp b/clang/lib/Basic/Targets/NVPTX.cpp index f03d5c600e039..ffd69983a0be5 100644 --- a/clang/lib/Basic/Targets/NVPTX.cpp +++ b/clang/lib/Basic/Targets/NVPTX.cpp @@ -217,6 +217,10 @@ void NVPTXTargetInfo::getTargetDefines(const LangOptions &Opts, case CudaArch::GFX1034: case CudaArch::GFX1035: case CudaArch::GFX1036: + case CudaArch::GFX1100: + case CudaArch::GFX1101: + case CudaArch::GFX1102: + case CudaArch::GFX1103: case CudaArch::Generic: case CudaArch::LAST: break; diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp index f4228cfb3086e..85efe93d6bd98 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp @@ -3946,6 +3946,10 @@ void CGOpenMPRuntimeGPU::processRequiresDirective( case CudaArch::GFX1034: case CudaArch::GFX1035: case CudaArch::GFX1036: + case CudaArch::GFX1100: + case CudaArch::GFX1101: + case CudaArch::GFX1102: + case CudaArch::GFX1103: case CudaArch::Generic: case CudaArch::UNUSED: case CudaArch::UNKNOWN: diff --git a/clang/test/CodeGenOpenCL/amdgpu-features.cl b/clang/test/CodeGenOpenCL/amdgpu-features.cl index 0967e932868eb..cb3a3eff01f70 100644 --- a/clang/test/CodeGenOpenCL/amdgpu-features.cl +++ b/clang/test/CodeGenOpenCL/amdgpu-features.cl @@ -37,6 +37,10 @@ // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1034 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX1034 %s // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1035 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX1035 %s // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1036 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX1036 %s +//
[lldb] [clang-tools-extra] [libcxx] [compiler-rt] [libc] [clang] [lld] [llvm] [flang] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)
Mirko =?utf-8?q?Brkušanin?= , Mirko =?utf-8?q?Brkušanin?= ,Mirko Brkusanin ,Mariusz Sikora Message-ID: In-Reply-To: https://github.com/Sisyph commented: DPP changes look good, and functionally I'm fine with the patch. I don't think the tablegen 'bit IsFP8' version of managing the op_sel bits is any better than adding a fake src1. It doesn't scale up to any more op_sel bits (Hence why we can't use it for V_CVT_SR_BF8_F32_e64_dpp_gfx12) and it is a new abstraction, whereas we have many instances of fake src operands already. Consider it a +1 but not +2 from me as is, based on that. https://github.com/llvm/llvm-project/pull/78414 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [lld] [llvm] [compiler-rt] [clang] [libc] [libcxx] [flang] [lldb] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)
Mirko =?utf-8?q?Brkušanin?= , Mirko =?utf-8?q?Brkušanin?= Message-ID: In-Reply-To: @@ -305,6 +305,11 @@ class VOP3OpSel_gfx10 op, VOPProfile p> : VOP3e_gfx10 { class VOP3OpSel_gfx11_gfx12 op, VOPProfile p> : VOP3OpSel_gfx10; +class VOP3FP8OpSel_gfx11_gfx12 op, VOPProfile p> : VOP3e_gfx10 { + let Inst{11} = !if(p.HasSrc0, src0_modifiers{2}, 0); + let Inst{12} = !if(p.HasSrc0, src0_modifiers{3}, 0); Sisyph wrote: A couple points related to this. - I don't think the rules for forming op_sel with dpp are currently being checked correctly. In GCNDPPCombine.cpp:369, we check the named op_sel operand and op_sel_hi operand. We use src_modifier operands through most of the compiler, and typically don't (if ever?) copy the bits to the named op_sel operands. This should be fixed regardless of this patch. - The rules we should enforce for dpp with op_sel is that for the alu to be combined, op_sel must be 0 and op_sel_hi must be 0b111 - My conclusion is that it is only safe to form the dpp alu with these instructions if op_sel bits are all zero - We are emitting those alu based on this patch that probably shouldn't be allowed ( e.g. v_cvt_f32_fp8_e64_dpp v0, v0 op_sel:[1,1] quad_perm:[0,1,2,3] row_mask:0xf bank_mask:0xf bound_ctrl:1) - The cleanup Ivan suggested using the dst_op_sel bit of src0 (equivalent to the op_sel_hi bit of src0) would require a special case in GCNDppCombine to check the correct bit. Otherwise, we might allow an alu to be formed if op_sel:[0,1] https://github.com/llvm/llvm-project/pull/78414 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[flang] [clang] [compiler-rt] [libcxx] [llvm] [lld] [lldb] [clang-tools-extra] [libc] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)
Mirko =?utf-8?q?Brkušanin?= , Mirko =?utf-8?q?Brkušanin?= Message-ID: In-Reply-To: @@ -305,6 +305,11 @@ class VOP3OpSel_gfx10 op, VOPProfile p> : VOP3e_gfx10 { class VOP3OpSel_gfx11_gfx12 op, VOPProfile p> : VOP3OpSel_gfx10; +class VOP3FP8OpSel_gfx11_gfx12 op, VOPProfile p> : VOP3e_gfx10 { + let Inst{11} = !if(p.HasSrc0, src0_modifiers{2}, 0); + let Inst{12} = !if(p.HasSrc0, src0_modifiers{3}, 0); Sisyph wrote: Thanks! I do think that patch will help a lot. I also think it handles the case where we use dst_op_sel to store the other bit instead of src1. If the CVT_F32_FP8 instruction was VOP3P, we would need a special case, but since it is VOP3, we want all the op_sel bits to be zero and we want dst_op_sel to be zero. https://github.com/llvm/llvm-project/pull/78414 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] 2d43de1 - [AMDGPU] gfx11 new dot instruction codegen support
Author: Joe Nash Date: 2022-06-16T14:19:34-04:00 New Revision: 2d43de13df03eab0fda1023b22b335b207afc507 URL: https://github.com/llvm/llvm-project/commit/2d43de13df03eab0fda1023b22b335b207afc507 DIFF: https://github.com/llvm/llvm-project/commit/2d43de13df03eab0fda1023b22b335b207afc507.diff LOG: [AMDGPU] gfx11 new dot instruction codegen support Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D127904 Added: clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-gfx11.cl llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sudot4.ll llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sudot8.ll llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.bf16.bf16.ll llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.f16.f16.ll llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll llvm/test/CodeGen/AMDGPU/llvm.amdgcn.sudot4.ll llvm/test/CodeGen/AMDGPU/llvm.amdgcn.sudot8.ll Modified: clang/include/clang/Basic/BuiltinsAMDGPU.def clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl llvm/include/llvm/IR/IntrinsicsAMDGPU.td llvm/lib/Target/AMDGPU/AMDGPUGISel.td llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp llvm/lib/Target/AMDGPU/VOP3Instructions.td llvm/lib/Target/AMDGPU/VOP3PInstructions.td llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.fdot2.ll Removed: diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 19e4ea998aa47..bd188c7f34371 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -222,12 +222,17 @@ TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_v2bf16, "V2sV2s*3V2s", "t", "gfx9 //===--===// TARGET_BUILTIN(__builtin_amdgcn_fdot2, "fV2hV2hfIb", "nc", "dot7-insts") +TARGET_BUILTIN(__builtin_amdgcn_fdot2_f16_f16, "hV2hV2hh", "nc", "dot8-insts") +TARGET_BUILTIN(__builtin_amdgcn_fdot2_bf16_bf16, "sV2sV2ss", "nc", "dot8-insts") +TARGET_BUILTIN(__builtin_amdgcn_fdot2_f32_bf16, "fV2sV2sfIb", "nc", "dot8-insts") TARGET_BUILTIN(__builtin_amdgcn_sdot2, "SiV2SsV2SsSiIb", "nc", "dot2-insts") TARGET_BUILTIN(__builtin_amdgcn_udot2, "UiV2UsV2UsUiIb", "nc", "dot2-insts") TARGET_BUILTIN(__builtin_amdgcn_sdot4, "SiSiSiSiIb", "nc", "dot1-insts") TARGET_BUILTIN(__builtin_amdgcn_udot4, "UiUiUiUiIb", "nc", "dot7-insts") +TARGET_BUILTIN(__builtin_amdgcn_sudot4, "iIbiIbiiIb", "nc", "dot8-insts") TARGET_BUILTIN(__builtin_amdgcn_sdot8, "SiSiSiSiIb", "nc", "dot1-insts") TARGET_BUILTIN(__builtin_amdgcn_udot8, "UiUiUiUiIb", "nc", "dot7-insts") +TARGET_BUILTIN(__builtin_amdgcn_sudot8, "iIbiIbiiIb", "nc", "dot8-insts") //===--===// // GFX10+ only builtins. diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl index e7a71b5158859..ac732952b390b 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl @@ -8,29 +8,45 @@ typedef half __attribute__((ext_vector_type(2))) half2; typedef short __attribute__((ext_vector_type(2))) short2; typedef unsigned short __attribute__((ext_vector_type(2))) ushort2; +#pragma OPENCL EXTENSION cl_khr_fp16 : enable kernel void builtins_amdgcn_dl_insts_err( global float *fOut, global int *siOut, global uint *uiOut, -half2 v2hA, half2 v2hB, float fC, -short2 v2ssA, short2 v2ssB, int siA, int siB, int siC, -ushort2 v2usA, ushort2 v2usB, uint uiA, uint uiB, uint uiC) { - fOut[0] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, false); // expected-error {{'__builtin_amdgcn_fdot2' needs target feature dot7-insts}} - fOut[1] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, true); // expected-error {{'__builtin_amdgcn_fdot2' needs target feature dot7-insts}} +global short *sOut, global int *iOut, global half *hOut, +half2 v2hA, half2 v2hB, float fC, half hC, +short2 v2ssA, short2 v2ssB, short sC, int siA, int siB, int siC, +ushort2 v2usA, ushort2 v2usB, uint uiA, uint uiB, uint uiC, +int A, int B, int C) { + fOut[0] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, false); // expected-error {{'__builtin_amdgcn_fdot2' needs target feature dot7-insts}} + fOut[1] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, true); // expected-error {{'__builtin_amdgcn_fdot2' needs target feature dot7-insts}} - siOut[0] = __builtin_amdgcn_sdot2(v2ssA, v2ssB, siC, false); // expected-error {{'__builtin_amdgcn_sdot2' needs target feature dot2-insts}} - siOut[1] = __builtin_amdgcn_sdot2(v2ssA, v2ssB, si
[clang] [AMDGPU] Add another SIFoldOperands instance after shrink (PR #67878)
Sisyph wrote: > I've just tested this on 1 graphics shaders and it seems to make no > difference at all. I tried gfx900 and gfx1100. Can anyone else from the > graphics team confirm this? I can confirm no difference on gfx1102 https://github.com/llvm/llvm-project/pull/67878 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[libunwind] [AMDGPU] Add another SIFoldOperands instance after shrink (PR #67878)
Sisyph wrote: > I've just tested this on 1 graphics shaders and it seems to make no > difference at all. I tried gfx900 and gfx1100. Can anyone else from the > graphics team confirm this? I can confirm no difference on gfx1102 https://github.com/llvm/llvm-project/pull/67878 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][True16][MC][CodeGen] true16 for v_alignbyte_b32 (PR #119750)
https://github.com/Sisyph commented: LGTM but please wait for the other reviewers. https://github.com/llvm/llvm-project/pull/119750 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][True16][MC][CodeGen] true16 for v_alignbyte_b32 (PR #119750)
https://github.com/Sisyph edited https://github.com/llvm/llvm-project/pull/119750 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][True16][MC][CodeGen] true16 for v_alignbyte_b32 (PR #119750)
@@ -1,10 +1,48 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 ; RUN: llc -mtriple=amdgcn -verify-machineinstrs < %s | FileCheck -check-prefix=GCN %s +; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -mattr=+real-true16 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX11-TRUE16 %s +; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -mattr=-real-true16 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX11-FAKE16 %s declare i32 @llvm.amdgcn.alignbyte(i32, i32, i32) #0 -; GCN-LABEL: {{^}}v_alignbyte_b32: -; GCN: v_alignbyte_b32 {{[vs][0-9]+}}, {{[vs][0-9]+}}, {{[vs][0-9]+}} define amdgpu_kernel void @v_alignbyte_b32(ptr addrspace(1) %out, i32 %src1, i32 %src2, i32 %src3) #1 { +; GCN-LABEL: v_alignbyte_b32: +; GCN: ; %bb.0: +; GCN-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0xb +; GCN-NEXT:s_load_dwordx2 s[4:5], s[4:5], 0x9 +; GCN-NEXT:s_mov_b32 s7, 0xf000 +; GCN-NEXT:s_mov_b32 s6, -1 +; GCN-NEXT:s_waitcnt lgkmcnt(0) +; GCN-NEXT:v_mov_b32_e32 v0, s1 +; GCN-NEXT:v_mov_b32_e32 v1, s2 +; GCN-NEXT:v_alignbyte_b32 v0, s0, v0, v1 +; GCN-NEXT:buffer_store_dword v0, off, s[4:7], 0 +; GCN-NEXT:s_endpgm +; +; GFX11-TRUE16-LABEL: v_alignbyte_b32: +; GFX11-TRUE16: ; %bb.0: +; GFX11-TRUE16-NEXT:s_clause 0x1 +; GFX11-TRUE16-NEXT:s_load_b128 s[0:3], s[4:5], 0x2c +; GFX11-TRUE16-NEXT:s_load_b64 s[4:5], s[4:5], 0x24 +; GFX11-TRUE16-NEXT:v_mov_b32_e32 v1, 0 +; GFX11-TRUE16-NEXT:s_waitcnt lgkmcnt(0) +; GFX11-TRUE16-NEXT:v_mov_b16_e32 v0.l, s2 +; GFX11-TRUE16-NEXT:s_delay_alu instid0(VALU_DEP_1) +; GFX11-TRUE16-NEXT:v_alignbyte_b32 v0, s0, s1, v0.l Sisyph wrote: Nit: Can you add another test in this file where s0 and s1 are vgpr arguments instead, so we can see if s2 can be folded into src2 of alignbyte? https://github.com/llvm/llvm-project/pull/119750 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][True16][MC][CodeGen] true16 for v_alignbyte_b32 (PR #119750)
@@ -3802,6 +3802,26 @@ def : FPMinCanonMaxPat, fmaximum_oneuse>; } +let True16Predicate = UseFakeTrue16Insts in +def : GCNPat < +(i32 (int_amdgcn_alignbyte (i32 (VOP3OpSelMods i32:$src0, i32:$src0_modifiers)), + (i32 (VOP3OpSelMods i32:$src1, i32:$src1_modifiers)), + (i32 (VOP3OpSelMods i32:$src2, i32:$src2_modifiers, +(V_ALIGNBYTE_B32_fake16_e64 i32:$src0_modifiers, VSrc_b32:$src0, +i32:$src1_modifiers, VSrc_b32:$src1, +i32:$src2_modifiers, VGPR_32:$src2) +>; + +let True16Predicate = UseRealTrue16Insts in Sisyph wrote: I would put this pattern in VOP3Instructions.td https://github.com/llvm/llvm-project/pull/119750 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][True16][MC][CodeGen] true16 for v_alignbyte_b32 (PR #119750)
@@ -3802,6 +3802,26 @@ def : FPMinCanonMaxPat, fmaximum_oneuse>; } +let True16Predicate = UseFakeTrue16Insts in +def : GCNPat < +(i32 (int_amdgcn_alignbyte (i32 (VOP3OpSelMods i32:$src0, i32:$src0_modifiers)), Sisyph wrote: Instead of this fake16 pattern, can you put int_amdgcn_alignbyte in the V_ALIGNBYTE_B32_fake16 definition, just like for the NotHasTrue16 one? https://github.com/llvm/llvm-project/pull/119750 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [Sema] Fix bug in builtin AS override (PR #138141)
https://github.com/Sisyph created https://github.com/llvm/llvm-project/pull/138141 Fix the logic in rewriteBuiltinFunctionDecl to work when the builtin has a pointer parameter with an address space and one without a fixed address space. A builtin fitting these criteria was recently added. Change the attribute string to perform type checking on it, so without the sema change compilation would fail with a wrong number of arguments error. >From 96e94b5662c613fd80f712080751076254a73524 Mon Sep 17 00:00:00 2001 From: Krzysztof Drewniak Date: Sat, 26 Apr 2025 00:20:22 + Subject: [PATCH 1/3] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic This PR adds a amdgns_load_to_lds intrinsic that abstracts over loads to LDS from global (address space 1) pointers and buffer fat pointers (address space 7), since they use the saem API and "gather from a pointer to LDS" is something of an abstract operation. This commet adds the intrinsic and its lowerings for addrspaces 1 and 7, and updates the MLIR wrappers to use it (loosening up the restrictions on loads to LDS along the way to match the ground truth from target features). It also plumbs the intrinsic through to clang. --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 1 + clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp | 4 + clang/lib/Sema/SemaAMDGPU.cpp | 1 + .../CodeGenOpenCL/builtins-amdgcn-gfx950.cl | 30 +++ .../builtins-amdgcn-load-to-lds.cl| 60 + llvm/docs/ReleaseNotes.md | 8 + llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 21 ++ .../AMDGPU/AMDGPUInstructionSelector.cpp | 5 + .../AMDGPU/AMDGPULowerBufferFatPointers.cpp | 20 ++ .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 2 + llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 8 +- .../AMDGPU/llvm.amdgcn.load.to.lds.gfx950.ll | 75 ++ .../CodeGen/AMDGPU/llvm.amdgcn.load.to.lds.ll | 220 ++ .../lower-buffer-fat-pointers-mem-transfer.ll | 18 ++ mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td | 12 +- mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td | 35 ++- .../AMDGPUToROCDL/AMDGPUToROCDL.cpp | 15 +- mlir/lib/Dialect/AMDGPU/IR/AMDGPUDialect.cpp | 21 +- .../Conversion/AMDGPUToROCDL/load_lds.mlir| 67 -- mlir/test/Dialect/LLVMIR/rocdl.mlir | 17 +- mlir/test/Target/LLVMIR/rocdl.mlir| 11 +- 21 files changed, 598 insertions(+), 53 deletions(-) create mode 100644 clang/test/CodeGenOpenCL/builtins-amdgcn-load-to-lds.cl create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.load.to.lds.gfx950.ll create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.load.to.lds.ll diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 39fef9e4601f8..730fd15913c11 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -257,6 +257,7 @@ TARGET_BUILTIN(__builtin_amdgcn_flat_atomic_fadd_v2bf16, "V2sV2s*0V2s", "t", "at TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_v2bf16, "V2sV2s*1V2s", "t", "atomic-global-pk-add-bf16-inst") TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_v2bf16, "V2sV2s*3V2s", "t", "atomic-ds-pk-add-16-insts") TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_v2f16, "V2hV2h*3V2h", "t", "atomic-ds-pk-add-16-insts") +TARGET_BUILTIN(__builtin_amdgcn_load_to_lds, "vv*v*3IUiIiIUi", "t", "vmem-to-lds-load-insts") TARGET_BUILTIN(__builtin_amdgcn_global_load_lds, "vv*1v*3IUiIiIUi", "t", "vmem-to-lds-load-insts") //===--===// diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp index ad012d98635ff..a32ef1c2a5a12 100644 --- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp +++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp @@ -564,6 +564,10 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, llvm::Function *F = CGM.getIntrinsic(IID, {LoadTy}); return Builder.CreateCall(F, {Addr}); } + case AMDGPU::BI__builtin_amdgcn_load_to_lds: { +return emitBuiltinWithOneOverloadedType<5>(*this, E, + Intrinsic::amdgcn_load_to_lds); + } case AMDGPU::BI__builtin_amdgcn_get_fpenv: { Function *F = CGM.getIntrinsic(Intrinsic::get_fpenv, {llvm::Type::getInt64Ty(getLLVMContext())}); diff --git a/clang/lib/Sema/SemaAMDGPU.cpp b/clang/lib/Sema/SemaAMDGPU.cpp index a6366aceec2a6..e6414a623b929 100644 --- a/clang/lib/Sema/SemaAMDGPU.cpp +++ b/clang/lib/Sema/SemaAMDGPU.cpp @@ -36,6 +36,7 @@ bool SemaAMDGPU::CheckAMDGCNBuiltinFunctionCall(unsigned BuiltinID, switch (BuiltinID) { case AMDGPU::BI__builtin_amdgcn_raw_ptr_buffer_load_lds: + case AMDGPU::BI__builtin_amdgcn_load_to_lds: case AMDGPU::BI__builtin_amdgcn_global_load_lds: { constexpr const int SizeIdx = 2; llvm::APSInt Size; diff --git a/clang/test/C
[clang] [llvm] [mlir] [Sema] Fix bug in builtin AS override (PR #138141)
Sisyph wrote: As far as I know, there is no existing builtin in tree that covers this behavior. The one in https://github.com/llvm/llvm-project/pull/137425 does. If that builtin doesn't land, I can land this without a test. Or perhaps, is there a way to register a 'unit test' builtin? https://github.com/llvm/llvm-project/pull/138141 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [Sema] Fix bug in builtin AS override (PR #138141)
Sisyph wrote: Sorry, the file changes on this PR is not clear. This PR contains https://github.com/llvm/llvm-project/pull/137425 plus the fix on top of it. https://github.com/llvm/llvm-project/pull/138141 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [Sema] Fix bug in builtin AS override (PR #138141)
Sisyph wrote: > The patch to SemaExpr looks reasonable to me. I'd suggest that goes in > separate from the amdgpu intrinsic stuff. > > I'd test this by tweaking the code to do the current lowering _and_ the > proposed and check that they do exactly the same thing on all the existing > builtins, then drop the current code path, but ymmv in terms of how that > strategy interacts with our code review system. Thanks. I will wait for https://github.com/llvm/llvm-project/pull/137425 to be resolved, then rebase or convert this PR as needed. https://github.com/llvm/llvm-project/pull/138141 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Sema] Fix bug in builtin AS override (PR #138141)
Sisyph wrote: > Is there a test that needs to be added here? With just the change to BuiltinsAMDGPU.def, we will get errors in Clang :: CodeGenOpenCL/builtins-amdgcn-gfx950.cl Clang :: CodeGenOpenCL/builtins-amdgcn-load-to-lds.cl Clang :: SemaOpenCL/builtins-amdgcn-load-to-lds-err.cl which the patch to Sema fixes. I don't know of a reasonable way to write a test only target builtin and use of it, to guard against something like __builtin_amdgcn_load_to_lds being removed and then the Sema code being changed. https://github.com/llvm/llvm-project/pull/138141 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Sema] Fix bug in builtin AS override (PR #138141)
https://github.com/Sisyph updated https://github.com/llvm/llvm-project/pull/138141 >From f5cdefe8200d9c9f567d6b4a276a5587e44ac1fa Mon Sep 17 00:00:00 2001 From: Joe Nash Date: Thu, 1 May 2025 10:00:47 -0400 Subject: [PATCH] [Sema] Fix bug in builtin AS override Fix the logic in rewriteBuiltinFunctionDecl to work when the builtin has a pointer parameter with an address space and one without a fixed address space. A builtin fitting these criteria was recently added. Change the attribute string to perform type checking on it, so without the sema change compilation would fail with a wrong number of arguments error. --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 2 +- clang/lib/Sema/SemaExpr.cpp | 6 ++ 2 files changed, 3 insertions(+), 5 deletions(-) diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 730fd15913c11..802b4be42419d 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -257,7 +257,7 @@ TARGET_BUILTIN(__builtin_amdgcn_flat_atomic_fadd_v2bf16, "V2sV2s*0V2s", "t", "at TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_v2bf16, "V2sV2s*1V2s", "t", "atomic-global-pk-add-bf16-inst") TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_v2bf16, "V2sV2s*3V2s", "t", "atomic-ds-pk-add-16-insts") TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_v2f16, "V2hV2h*3V2h", "t", "atomic-ds-pk-add-16-insts") -TARGET_BUILTIN(__builtin_amdgcn_load_to_lds, "vv*v*3IUiIiIUi", "t", "vmem-to-lds-load-insts") +TARGET_BUILTIN(__builtin_amdgcn_load_to_lds, "vv*v*3IUiIiIUi", "", "vmem-to-lds-load-insts") TARGET_BUILTIN(__builtin_amdgcn_global_load_lds, "vv*1v*3IUiIiIUi", "t", "vmem-to-lds-load-insts") //===--===// diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp index 21bd7315e3dd4..85c7f995233a7 100644 --- a/clang/lib/Sema/SemaExpr.cpp +++ b/clang/lib/Sema/SemaExpr.cpp @@ -6395,7 +6395,8 @@ static FunctionDecl *rewriteBuiltinFunctionDecl(Sema *Sema, ASTContext &Context, return nullptr; Expr *Arg = ArgRes.get(); QualType ArgType = Arg->getType(); -if (!ParamType->isPointerType() || ParamType.hasAddressSpace() || +if (!ParamType->isPointerType() || +ParamType->getPointeeType().hasAddressSpace() || !ArgType->isPointerType() || !ArgType->getPointeeType().hasAddressSpace() || isPtrSizeAddressSpace(ArgType->getPointeeType().getAddressSpace())) { @@ -6404,9 +6405,6 @@ static FunctionDecl *rewriteBuiltinFunctionDecl(Sema *Sema, ASTContext &Context, } QualType PointeeType = ParamType->getPointeeType(); -if (PointeeType.hasAddressSpace()) - continue; - NeedsNewDecl = true; LangAS AS = ArgType->getPointeeType().getAddressSpace(); ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Sema] Fix bug in builtin AS override (PR #138141)
Sisyph wrote: Rebased, PTAL https://github.com/llvm/llvm-project/pull/138141 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Sema] Fix bug in builtin AS override (PR #138141)
Sisyph wrote: It is documented in clang/include/clang/Basic/Builtins.def. ` // t -> signature is meaningless, use custom typechecking ` It essentially disables builtin signature typechecking, though I haven't looked into all the details. https://github.com/llvm/llvm-project/pull/138141 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Sema] Fix bug in builtin AS override (PR #138141)
Sisyph wrote: > Can you also remove all `t`? They don't seem to be necessary here. I'll put this on my todo list. https://github.com/llvm/llvm-project/pull/138141 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Sema] Fix bug in builtin AS override (PR #138141)
https://github.com/Sisyph closed https://github.com/llvm/llvm-project/pull/138141 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits