r357792 - [AMDGPU] rename vi-insts into gfx8-insts
Author: rampitec Date: Fri Apr 5 11:25:00 2019 New Revision: 357792 URL: http://llvm.org/viewvc/llvm-project?rev=357792&view=rev Log: [AMDGPU] rename vi-insts into gfx8-insts Differential Revision: https://reviews.llvm.org/D60293 Modified: cfe/trunk/include/clang/Basic/BuiltinsAMDGPU.def cfe/trunk/lib/Basic/Targets/AMDGPU.cpp cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl cfe/trunk/test/SemaOpenCL/builtins-amdgcn-error-vi.cl Modified: cfe/trunk/include/clang/Basic/BuiltinsAMDGPU.def URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/BuiltinsAMDGPU.def?rev=357792&r1=357791&r2=357792&view=diff == --- cfe/trunk/include/clang/Basic/BuiltinsAMDGPU.def (original) +++ cfe/trunk/include/clang/Basic/BuiltinsAMDGPU.def Fri Apr 5 11:25:00 2019 @@ -133,7 +133,7 @@ TARGET_BUILTIN(__builtin_amdgcn_classh, TARGET_BUILTIN(__builtin_amdgcn_s_memrealtime, "LUi", "n", "s-memrealtime") TARGET_BUILTIN(__builtin_amdgcn_mov_dpp, "iiIiIiIiIb", "nc", "dpp") TARGET_BUILTIN(__builtin_amdgcn_update_dpp, "iiiIiIiIiIb", "nc", "dpp") -TARGET_BUILTIN(__builtin_amdgcn_s_dcache_wb, "v", "n", "vi-insts") +TARGET_BUILTIN(__builtin_amdgcn_s_dcache_wb, "v", "n", "gfx8-insts") //===--===// // GFX9+ only builtins. Modified: cfe/trunk/lib/Basic/Targets/AMDGPU.cpp URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Basic/Targets/AMDGPU.cpp?rev=357792&r1=357791&r2=357792&view=diff == --- cfe/trunk/lib/Basic/Targets/AMDGPU.cpp (original) +++ cfe/trunk/lib/Basic/Targets/AMDGPU.cpp Fri Apr 5 11:25:00 2019 @@ -150,7 +150,7 @@ bool AMDGPUTargetInfo::initFeatureMap( case GK_GFX803: case GK_GFX802: case GK_GFX801: - Features["vi-insts"] = true; + Features["gfx8-insts"] = true; Features["16-bit-insts"] = true; Features["dpp"] = true; Features["s-memrealtime"] = true; Modified: cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl?rev=357792&r1=357791&r2=357792&view=diff == --- cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl (original) +++ cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl Fri Apr 5 11:25:00 2019 @@ -10,9 +10,9 @@ // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx600 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX600 %s // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx601 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX601 %s -// GFX904: "target-features"="+16-bit-insts,+ci-insts,+dpp,+fp32-denormals,+fp64-fp16-denormals,+gfx9-insts,+s-memrealtime,+vi-insts" -// GFX906: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dpp,+fp32-denormals,+fp64-fp16-denormals,+gfx9-insts,+s-memrealtime,+vi-insts" -// GFX801: "target-features"="+16-bit-insts,+ci-insts,+dpp,+fp32-denormals,+fp64-fp16-denormals,+s-memrealtime,+vi-insts" +// GFX904: "target-features"="+16-bit-insts,+ci-insts,+dpp,+fp32-denormals,+fp64-fp16-denormals,+gfx8-insts,+gfx9-insts,+s-memrealtime" +// GFX906: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dpp,+fp32-denormals,+fp64-fp16-denormals,+gfx8-insts,+gfx9-insts,+s-memrealtime" +// GFX801: "target-features"="+16-bit-insts,+ci-insts,+dpp,+fp32-denormals,+fp64-fp16-denormals,+gfx8-insts,+s-memrealtime" // GFX700: "target-features"="+ci-insts,+fp64-fp16-denormals,-fp32-denormals" // GFX600: "target-features"="+fp64-fp16-denormals,-fp32-denormals" // GFX601: "target-features"="+fp64-fp16-denormals,-fp32-denormals" Modified: cfe/trunk/test/SemaOpenCL/builtins-amdgcn-error-vi.cl URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/SemaOpenCL/builtins-amdgcn-error-vi.cl?rev=357792&r1=357791&r2=357792&view=diff == --- cfe/trunk/test/SemaOpenCL/builtins-amdgcn-error-vi.cl (original) +++ cfe/trunk/test/SemaOpenCL/builtins-amdgcn-error-vi.cl Fri Apr 5 11:25:00 2019 @@ -4,5 +4,5 @@ void test_vi_s_dcache_wb() { - __builtin_amdgcn_s_dcache_wb(); // expected-error {{'__builtin_amdgcn_s_dcache_wb' needs target feature vi-insts}} + __builtin_amdgcn_s_dcache_wb(); // expected-error {{'__builtin_amdgcn_s_dcache_wb' needs target feature gfx8-insts}} } ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
r353588 - [AMDGPU] Split dot-insts feature
Author: rampitec Date: Fri Feb 8 16:34:41 2019 New Revision: 353588 URL: http://llvm.org/viewvc/llvm-project?rev=353588&view=rev Log: [AMDGPU] Split dot-insts feature Differential Revision: https://reviews.llvm.org/D57972 Modified: cfe/trunk/include/clang/Basic/BuiltinsAMDGPU.def cfe/trunk/lib/Basic/Targets/AMDGPU.cpp cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl cfe/trunk/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl Modified: cfe/trunk/include/clang/Basic/BuiltinsAMDGPU.def URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/BuiltinsAMDGPU.def?rev=353588&r1=353587&r2=353588&view=diff == --- cfe/trunk/include/clang/Basic/BuiltinsAMDGPU.def (original) +++ cfe/trunk/include/clang/Basic/BuiltinsAMDGPU.def Fri Feb 8 16:34:41 2019 @@ -145,13 +145,13 @@ TARGET_BUILTIN(__builtin_amdgcn_fmed3h, // Deep learning builtins. //===--===// -TARGET_BUILTIN(__builtin_amdgcn_fdot2, "fV2hV2hfIb", "nc", "dot-insts") -TARGET_BUILTIN(__builtin_amdgcn_sdot2, "SiV2SsV2SsSiIb", "nc", "dot-insts") -TARGET_BUILTIN(__builtin_amdgcn_udot2, "UiV2UsV2UsUiIb", "nc", "dot-insts") -TARGET_BUILTIN(__builtin_amdgcn_sdot4, "SiSiSiSiIb", "nc", "dot-insts") -TARGET_BUILTIN(__builtin_amdgcn_udot4, "UiUiUiUiIb", "nc", "dot-insts") -TARGET_BUILTIN(__builtin_amdgcn_sdot8, "SiSiSiSiIb", "nc", "dot-insts") -TARGET_BUILTIN(__builtin_amdgcn_udot8, "UiUiUiUiIb", "nc", "dot-insts") +TARGET_BUILTIN(__builtin_amdgcn_fdot2, "fV2hV2hfIb", "nc", "dot2-insts") +TARGET_BUILTIN(__builtin_amdgcn_sdot2, "SiV2SsV2SsSiIb", "nc", "dot2-insts") +TARGET_BUILTIN(__builtin_amdgcn_udot2, "UiV2UsV2UsUiIb", "nc", "dot2-insts") +TARGET_BUILTIN(__builtin_amdgcn_sdot4, "SiSiSiSiIb", "nc", "dot1-insts") +TARGET_BUILTIN(__builtin_amdgcn_udot4, "UiUiUiUiIb", "nc", "dot2-insts") +TARGET_BUILTIN(__builtin_amdgcn_sdot8, "SiSiSiSiIb", "nc", "dot1-insts") +TARGET_BUILTIN(__builtin_amdgcn_udot8, "UiUiUiUiIb", "nc", "dot2-insts") //===--===// // Special builtins. Modified: cfe/trunk/lib/Basic/Targets/AMDGPU.cpp URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Basic/Targets/AMDGPU.cpp?rev=353588&r1=353587&r2=353588&view=diff == --- cfe/trunk/lib/Basic/Targets/AMDGPU.cpp (original) +++ cfe/trunk/lib/Basic/Targets/AMDGPU.cpp Fri Feb 8 16:34:41 2019 @@ -136,7 +136,8 @@ bool AMDGPUTargetInfo::initFeatureMap( switch (llvm::AMDGPU::parseArchAMDGCN(CPU)) { case GK_GFX906: Features["dl-insts"] = true; - Features["dot-insts"] = true; + Features["dot1-insts"] = true; + Features["dot2-insts"] = true; LLVM_FALLTHROUGH; case GK_GFX909: case GK_GFX904: Modified: cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl?rev=353588&r1=353587&r2=353588&view=diff == --- cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl (original) +++ cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl Fri Feb 8 16:34:41 2019 @@ -11,7 +11,7 @@ // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx601 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX601 %s // GFX904: "target-features"="+16-bit-insts,+ci-insts,+dpp,+fp32-denormals,+fp64-fp16-denormals,+gfx9-insts,+s-memrealtime,+vi-insts" -// GFX906: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot-insts,+dpp,+fp32-denormals,+fp64-fp16-denormals,+gfx9-insts,+s-memrealtime,+vi-insts" +// GFX906: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dpp,+fp32-denormals,+fp64-fp16-denormals,+gfx9-insts,+s-memrealtime,+vi-insts" // GFX801: "target-features"="+16-bit-insts,+ci-insts,+dpp,+fp32-denormals,+fp64-fp16-denormals,+s-memrealtime,+vi-insts" // GFX700: "target-features"="+ci-insts,+fp64-fp16-denormals,-fp32-denormals" // GFX600: "target-features"="+fp64-fp16-denormals,-fp32-denormals" Modified: cfe/trunk/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl?rev=353588&r1=353587&r2=353588&view=diff == --- cfe/trunk/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl (original) +++ cfe/trunk/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl Fri Feb 8 16:34:41 2019 @@ -12,24 +12,24 @@ kernel void builtins_amdgcn_dl_insts_err half2 v2hA, half2 v2hB, float fC, short2 v2ssA, short2 v2ssB, int siA, int siB, int siC, ushort2 v2usA, ushort2 v2usB, uint uiA, uint uiB, uint uiC) { - fOut[0] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, false); // expected-error {{'__builtin_amdgcn_fdot2' needs target
r363341 - [AMDGPU] gfx1010 wave32 clang support
Author: rampitec Date: Thu Jun 13 16:47:59 2019 New Revision: 363341 URL: http://llvm.org/viewvc/llvm-project?rev=363341&view=rev Log: [AMDGPU] gfx1010 wave32 clang support Differential Revision: https://reviews.llvm.org/D63209 Modified: cfe/trunk/docs/ClangCommandLineReference.rst cfe/trunk/include/clang/Driver/Options.td cfe/trunk/lib/CodeGen/CGBuiltin.cpp cfe/trunk/lib/Driver/ToolChains/AMDGPU.cpp cfe/trunk/lib/Driver/ToolChains/HIP.cpp cfe/trunk/test/CodeGenOpenCL/builtins-amdgcn.cl cfe/trunk/test/Driver/amdgpu-features.c Modified: cfe/trunk/docs/ClangCommandLineReference.rst URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/docs/ClangCommandLineReference.rst?rev=363341&r1=363340&r2=363341&view=diff == --- cfe/trunk/docs/ClangCommandLineReference.rst (original) +++ cfe/trunk/docs/ClangCommandLineReference.rst Thu Jun 13 16:47:59 2019 @@ -2401,6 +2401,10 @@ AMDGPU CU wavefront execution mode is used if enabled and WGP wavefront execution mode is used if disabled (AMDGPU only) +.. option:: -mwavefrontsize64, -mno-wavefrontsize64 + +Wavefront size 64 is used if enabled and wavefront size 32 if disabled (AMDGPU only) + .. option:: -mxnack, -mno-xnack Enable XNACK (AMDGPU only) Modified: cfe/trunk/include/clang/Driver/Options.td URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Driver/Options.td?rev=363341&r1=363340&r2=363341&view=diff == --- cfe/trunk/include/clang/Driver/Options.td (original) +++ cfe/trunk/include/clang/Driver/Options.td Thu Jun 13 16:47:59 2019 @@ -2216,6 +2216,11 @@ def mcumode : Flag<["-"], "mcumode">, Gr def mno_cumode : Flag<["-"], "mno-cumode">, Group, HelpText<"WGP wavefront execution mode is used (AMDGPU only)">; +def mwavefrontsize64 : Flag<["-"], "mwavefrontsize64">, + Group, HelpText<"Wavefront size 64 is used">; +def mno_wavefrontsize64 : Flag<["-"], "mno-wavefrontsize64">, + Group, HelpText<"Wavefront size 32 is used">; + def faltivec : Flag<["-"], "faltivec">, Group, Flags<[DriverOption]>; def fno_altivec : Flag<["-"], "fno-altivec">, Group, Flags<[DriverOption]>; def maltivec : Flag<["-"], "maltivec">, Group; Modified: cfe/trunk/lib/CodeGen/CGBuiltin.cpp URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGBuiltin.cpp?rev=363341&r1=363340&r2=363341&view=diff == --- cfe/trunk/lib/CodeGen/CGBuiltin.cpp (original) +++ cfe/trunk/lib/CodeGen/CGBuiltin.cpp Thu Jun 13 16:47:59 2019 @@ -12736,11 +12736,27 @@ Value *CodeGenFunction::EmitAMDGPUBuilti case AMDGPU::BI__builtin_amdgcn_uicmp: case AMDGPU::BI__builtin_amdgcn_uicmpl: case AMDGPU::BI__builtin_amdgcn_sicmp: - case AMDGPU::BI__builtin_amdgcn_sicmpl: -return emitTernaryBuiltin(*this, E, Intrinsic::amdgcn_icmp); + case AMDGPU::BI__builtin_amdgcn_sicmpl: { +llvm::Value *Src0 = EmitScalarExpr(E->getArg(0)); +llvm::Value *Src1 = EmitScalarExpr(E->getArg(1)); +llvm::Value *Src2 = EmitScalarExpr(E->getArg(2)); + +// FIXME-GFX10: How should 32 bit mask be handled? +Value *F = CGM.getIntrinsic(Intrinsic::amdgcn_icmp, + { Builder.getInt64Ty(), Src0->getType() }); +return Builder.CreateCall(F, { Src0, Src1, Src2 }); + } case AMDGPU::BI__builtin_amdgcn_fcmp: - case AMDGPU::BI__builtin_amdgcn_fcmpf: -return emitTernaryBuiltin(*this, E, Intrinsic::amdgcn_fcmp); + case AMDGPU::BI__builtin_amdgcn_fcmpf: { +llvm::Value *Src0 = EmitScalarExpr(E->getArg(0)); +llvm::Value *Src1 = EmitScalarExpr(E->getArg(1)); +llvm::Value *Src2 = EmitScalarExpr(E->getArg(2)); + +// FIXME-GFX10: How should 32 bit mask be handled? +Value *F = CGM.getIntrinsic(Intrinsic::amdgcn_fcmp, + { Builder.getInt64Ty(), Src0->getType() }); +return Builder.CreateCall(F, { Src0, Src1, Src2 }); + } case AMDGPU::BI__builtin_amdgcn_class: case AMDGPU::BI__builtin_amdgcn_classf: case AMDGPU::BI__builtin_amdgcn_classh: Modified: cfe/trunk/lib/Driver/ToolChains/AMDGPU.cpp URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/ToolChains/AMDGPU.cpp?rev=363341&r1=363340&r2=363341&view=diff == --- cfe/trunk/lib/Driver/ToolChains/AMDGPU.cpp (original) +++ cfe/trunk/lib/Driver/ToolChains/AMDGPU.cpp Thu Jun 13 16:47:59 2019 @@ -41,6 +41,17 @@ void amdgpu::getAMDGPUTargetFeatures(con if (const Arg *dAbi = Args.getLastArg(options::OPT_mamdgpu_debugger_abi)) D.Diag(diag::err_drv_clang_unsupported) << dAbi->getAsString(Args); + if (Args.getLastArg(options::OPT_mwavefrontsize64)) { +Features.push_back("-wavefrontsize16"); +Features.push_back("-wavefrontsize32"); +Features.push_back("+wavefrontsize64"); + } + if (Args.getLastArg(options::OPT_mno_wavefrontsize64)) { +Features.push_back
r363345 - [AMDGPU] gfx1011/gfx1012 clang support
Author: rampitec Date: Thu Jun 13 17:33:59 2019 New Revision: 363345 URL: http://llvm.org/viewvc/llvm-project?rev=363345&view=rev Log: [AMDGPU] gfx1011/gfx1012 clang support Differential Revision: https://reviews.llvm.org/D63308 Modified: cfe/trunk/lib/Basic/Targets/AMDGPU.cpp cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl cfe/trunk/test/Driver/amdgpu-macros.cl cfe/trunk/test/Driver/amdgpu-mcpu.cl Modified: cfe/trunk/lib/Basic/Targets/AMDGPU.cpp URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Basic/Targets/AMDGPU.cpp?rev=363345&r1=363344&r2=363345&view=diff == --- cfe/trunk/lib/Basic/Targets/AMDGPU.cpp (original) +++ cfe/trunk/lib/Basic/Targets/AMDGPU.cpp Thu Jun 13 17:33:59 2019 @@ -135,6 +135,13 @@ bool AMDGPUTargetInfo::initFeatureMap( CPU = "gfx600"; switch (llvm::AMDGPU::parseArchAMDGCN(CPU)) { +case GK_GFX1012: +case GK_GFX1011: + Features["dot1-insts"] = true; + Features["dot2-insts"] = true; + Features["dot5-insts"] = true; + Features["dot6-insts"] = true; + LLVM_FALLTHROUGH; case GK_GFX1010: Features["dl-insts"] = true; Features["16-bit-insts"] = true; Modified: cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl?rev=363345&r1=363344&r2=363345&view=diff == --- cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl (original) +++ cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl Thu Jun 13 17:33:59 2019 @@ -6,6 +6,8 @@ // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx904 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX904 %s // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx906 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX906 %s // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1010 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX1010 %s +// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1011 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX1011 %s +// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1012 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX1012 %s // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx801 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX801 %s // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx700 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX700 %s // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx600 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX600 %s @@ -14,6 +16,8 @@ // GFX904: "target-features"="+16-bit-insts,+ci-insts,+dpp,+fp32-denormals,+fp64-fp16-denormals,+gfx8-insts,+gfx9-insts,+s-memrealtime" // GFX906: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dpp,+fp32-denormals,+fp64-fp16-denormals,+gfx8-insts,+gfx9-insts,+s-memrealtime" // GFX1010: "target-features"="+16-bit-insts,+dl-insts,+dpp,+fp32-denormals,+fp64-fp16-denormals,+gfx10-insts,+gfx9-insts,+s-memrealtime" +// GFX1011: "target-features"="+16-bit-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dpp,+fp32-denormals,+fp64-fp16-denormals,+gfx10-insts,+gfx9-insts,+s-memrealtime" +// GFX1012: "target-features"="+16-bit-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dpp,+fp32-denormals,+fp64-fp16-denormals,+gfx10-insts,+gfx9-insts,+s-memrealtime" // GFX801: "target-features"="+16-bit-insts,+ci-insts,+dpp,+fp32-denormals,+fp64-fp16-denormals,+gfx8-insts,+s-memrealtime" // GFX700: "target-features"="+ci-insts,+fp64-fp16-denormals,-fp32-denormals" // GFX600: "target-features"="+fp64-fp16-denormals,-fp32-denormals" Modified: cfe/trunk/test/Driver/amdgpu-macros.cl URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/Driver/amdgpu-macros.cl?rev=363345&r1=363344&r2=363345&view=diff == --- cfe/trunk/test/Driver/amdgpu-macros.cl (original) +++ cfe/trunk/test/Driver/amdgpu-macros.cl Thu Jun 13 17:33:59 2019 @@ -177,6 +177,8 @@ // RUN: %clang -E -dM -target amdgcn -mcpu=gfx906 %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,GFX906 %s // RUN: %clang -E -dM -target amdgcn -mcpu=gfx909 %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,GFX909 %s // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1010 %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,GFX1010 %s +// RUN: %clang -E -dM -target amdgcn -mcpu=gfx1011 %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,GFX1011 %s +// RUN: %clang -E -dM -target amdgcn -mcpu=gfx1012 %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,GFX1012 %s // GFX600-DAG: #define FP_FAST_FMA 1 // GFX601-DAG: #define FP_FAST_FMA 1 @@ -195,6 +197,8 @@ // GFX906-DAG: #define FP_FAST_FMA 1 // GFX909-DAG: #define FP_FAST_FMA 1 // GFX1010-DAG: #define FP_FAST_FMA 1 +// GFX1011-DAG: #define FP_FAST_FMA 1 +// GFX1012-DAG: #define FP_FAST_FMA 1 // GFX600-DAG: #define FP_FAST_FMAF 1 // GFX601-NOT: #d
r368917 - [AMDGPU] Do not assume a default GCN target
Author: rampitec Date: Wed Aug 14 13:55:15 2019 New Revision: 368917 URL: http://llvm.org/viewvc/llvm-project?rev=368917&view=rev Log: [AMDGPU] Do not assume a default GCN target Differential Revision: https://reviews.llvm.org/D66246 Modified: cfe/trunk/lib/Basic/Targets/AMDGPU.cpp cfe/trunk/test/Driver/amdgpu-mcpu.cl Modified: cfe/trunk/lib/Basic/Targets/AMDGPU.cpp URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Basic/Targets/AMDGPU.cpp?rev=368917&r1=368916&r2=368917&view=diff == --- cfe/trunk/lib/Basic/Targets/AMDGPU.cpp (original) +++ cfe/trunk/lib/Basic/Targets/AMDGPU.cpp Wed Aug 14 13:55:15 2019 @@ -131,9 +131,6 @@ bool AMDGPUTargetInfo::initFeatureMap( // XXX - What does the member GPU mean if device name string passed here? if (isAMDGCN(getTriple())) { -if (CPU.empty()) - CPU = "gfx600"; - switch (llvm::AMDGPU::parseArchAMDGCN(CPU)) { case GK_GFX1012: case GK_GFX1011: @@ -189,7 +186,7 @@ bool AMDGPUTargetInfo::initFeatureMap( case GK_GFX600: break; case GK_NONE: - return false; + break; default: llvm_unreachable("Unhandled GPU!"); } Modified: cfe/trunk/test/Driver/amdgpu-mcpu.cl URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/Driver/amdgpu-mcpu.cl?rev=368917&r1=368916&r2=368917&view=diff == --- cfe/trunk/test/Driver/amdgpu-mcpu.cl (original) +++ cfe/trunk/test/Driver/amdgpu-mcpu.cl Wed Aug 14 13:55:15 2019 @@ -52,6 +52,7 @@ // AMDGCN-based processors. // +// RUN: %clang -### -target amdgcn %s 2>&1 | FileCheck --check-prefix=GCNDEFAULT %s // RUN: %clang -### -target amdgcn -mcpu=gfx600 %s 2>&1 | FileCheck --check-prefix=GFX600 %s // RUN: %clang -### -target amdgcn -mcpu=tahiti %s 2>&1 | FileCheck --check-prefix=TAHITI %s // RUN: %clang -### -target amdgcn -mcpu=gfx601 %s 2>&1 | FileCheck --check-prefix=GFX601 %s @@ -90,6 +91,7 @@ // RUN: %clang -### -target amdgcn -mcpu=gfx1011 %s 2>&1 | FileCheck --check-prefix=GFX1011 %s // RUN: %clang -### -target amdgcn -mcpu=gfx1012 %s 2>&1 | FileCheck --check-prefix=GFX1012 %s +// GCNDEFAULT-NOT: -target-cpu // GFX600:"-target-cpu" "gfx600" // TAHITI:"-target-cpu" "tahiti" // GFX601:"-target-cpu" "gfx601" ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
r365528 - [AMDGPU] gfx908 clang target
Author: rampitec Date: Tue Jul 9 11:19:00 2019 New Revision: 365528 URL: http://llvm.org/viewvc/llvm-project?rev=365528&view=rev Log: [AMDGPU] gfx908 clang target Differential Revision: https://reviews.llvm.org/D64430 Modified: cfe/trunk/include/clang/Basic/Cuda.h cfe/trunk/lib/Basic/Cuda.cpp cfe/trunk/lib/Basic/Targets/AMDGPU.cpp cfe/trunk/lib/Basic/Targets/NVPTX.cpp cfe/trunk/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl cfe/trunk/test/Driver/amdgpu-macros.cl cfe/trunk/test/Driver/amdgpu-mcpu.cl cfe/trunk/test/Driver/cuda-bad-arch.cu Modified: cfe/trunk/include/clang/Basic/Cuda.h URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/Cuda.h?rev=365528&r1=365527&r2=365528&view=diff == --- cfe/trunk/include/clang/Basic/Cuda.h (original) +++ cfe/trunk/include/clang/Basic/Cuda.h Tue Jul 9 11:19:00 2019 @@ -64,6 +64,7 @@ enum class CudaArch { GFX902, GFX904, GFX906, + GFX908, GFX909, LAST, }; Modified: cfe/trunk/lib/Basic/Cuda.cpp URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Basic/Cuda.cpp?rev=365528&r1=365527&r2=365528&view=diff == --- cfe/trunk/lib/Basic/Cuda.cpp (original) +++ cfe/trunk/lib/Basic/Cuda.cpp Tue Jul 9 11:19:00 2019 @@ -109,6 +109,8 @@ const char *CudaArchToString(CudaArch A) return "gfx904"; case CudaArch::GFX906: // TBA return "gfx906"; + case CudaArch::GFX908: // TBA +return "gfx908"; case CudaArch::GFX909: // TBA return "gfx909"; } @@ -147,6 +149,7 @@ CudaArch StringToCudaArch(llvm::StringRe .Case("gfx902", CudaArch::GFX902) .Case("gfx904", CudaArch::GFX904) .Case("gfx906", CudaArch::GFX906) + .Case("gfx908", CudaArch::GFX908) .Case("gfx909", CudaArch::GFX909) .Default(CudaArch::UNKNOWN); } @@ -259,6 +262,7 @@ CudaVirtualArch VirtualArchForCudaArch(C case CudaArch::GFX902: case CudaArch::GFX904: case CudaArch::GFX906: + case CudaArch::GFX908: case CudaArch::GFX909: return CudaVirtualArch::COMPUTE_AMDGCN; } @@ -306,6 +310,7 @@ CudaVersion MinVersionForCudaArch(CudaAr case CudaArch::GFX902: case CudaArch::GFX904: case CudaArch::GFX906: + case CudaArch::GFX908: case CudaArch::GFX909: return CudaVersion::CUDA_70; } Modified: cfe/trunk/lib/Basic/Targets/AMDGPU.cpp URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Basic/Targets/AMDGPU.cpp?rev=365528&r1=365527&r2=365528&view=diff == --- cfe/trunk/lib/Basic/Targets/AMDGPU.cpp (original) +++ cfe/trunk/lib/Basic/Targets/AMDGPU.cpp Tue Jul 9 11:19:00 2019 @@ -152,6 +152,12 @@ bool AMDGPUTargetInfo::initFeatureMap( Features["gfx10-insts"] = true; Features["s-memrealtime"] = true; break; +case GK_GFX908: + Features["dot3-insts"] = true; + Features["dot4-insts"] = true; + Features["dot5-insts"] = true; + Features["dot6-insts"] = true; + LLVM_FALLTHROUGH; case GK_GFX906: Features["dl-insts"] = true; Features["dot1-insts"] = true; Modified: cfe/trunk/lib/Basic/Targets/NVPTX.cpp URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Basic/Targets/NVPTX.cpp?rev=365528&r1=365527&r2=365528&view=diff == --- cfe/trunk/lib/Basic/Targets/NVPTX.cpp (original) +++ cfe/trunk/lib/Basic/Targets/NVPTX.cpp Tue Jul 9 11:19:00 2019 @@ -191,6 +191,7 @@ void NVPTXTargetInfo::getTargetDefines(c case CudaArch::GFX902: case CudaArch::GFX904: case CudaArch::GFX906: + case CudaArch::GFX908: case CudaArch::GFX909: case CudaArch::LAST: break; Modified: cfe/trunk/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp?rev=365528&r1=365527&r2=365528&view=diff == --- cfe/trunk/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp (original) +++ cfe/trunk/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp Tue Jul 9 11:19:00 2019 @@ -4928,6 +4928,7 @@ void CGOpenMPRuntimeNVPTX::checkArchForU case CudaArch::GFX902: case CudaArch::GFX904: case CudaArch::GFX906: + case CudaArch::GFX908: case CudaArch::GFX909: case CudaArch::UNKNOWN: break; @@ -4982,6 +4983,7 @@ static std::pair get case CudaArch::GFX902: case CudaArch::GFX904: case CudaArch::GFX906: + case CudaArch::GFX908: case CudaArch::GFX909: case CudaArch::UNKNOWN: break; Modified: cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl?rev=365528&r1=365527&r2=365528&view=diff
r360634 - [AMDGPU] gfx1010 clang target
Author: rampitec Date: Mon May 13 16:15:59 2019 New Revision: 360634 URL: http://llvm.org/viewvc/llvm-project?rev=360634&view=rev Log: [AMDGPU] gfx1010 clang target Differential Revision: https://reviews.llvm.org/D61875 Modified: cfe/trunk/docs/ClangCommandLineReference.rst cfe/trunk/include/clang/Driver/Options.td cfe/trunk/lib/Basic/Targets/AMDGPU.cpp cfe/trunk/lib/Basic/Targets/AMDGPU.h cfe/trunk/lib/Driver/ToolChains/HIP.cpp cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl cfe/trunk/test/Driver/amdgpu-features.c cfe/trunk/test/Driver/amdgpu-macros.cl cfe/trunk/test/Driver/amdgpu-mcpu.cl Modified: cfe/trunk/docs/ClangCommandLineReference.rst URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/docs/ClangCommandLineReference.rst?rev=360634&r1=360633&r2=360634&view=diff == --- cfe/trunk/docs/ClangCommandLineReference.rst (original) +++ cfe/trunk/docs/ClangCommandLineReference.rst Mon May 13 16:15:59 2019 @@ -2396,6 +2396,11 @@ Generate code which only uses the genera AMDGPU -- +.. option:: -mcumode, -mno-cumode + +CU wavefront execution mode is used if enabled and WGP wavefront execution mode +is used if disabled (AMDGPU only) + .. option:: -mxnack, -mno-xnack Enable XNACK (AMDGPU only) Modified: cfe/trunk/include/clang/Driver/Options.td URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Driver/Options.td?rev=360634&r1=360633&r2=360634&view=diff == --- cfe/trunk/include/clang/Driver/Options.td (original) +++ cfe/trunk/include/clang/Driver/Options.td Mon May 13 16:15:59 2019 @@ -2202,6 +2202,11 @@ def msram_ecc : Flag<["-"], "msram-ecc"> def mno_sram_ecc : Flag<["-"], "mno-sram-ecc">, Group, HelpText<"Disable SRAM ECC (AMDGPU only)">; +def mcumode : Flag<["-"], "mcumode">, Group, + HelpText<"CU wavefront execution mode is used (AMDGPU only)">; +def mno_cumode : Flag<["-"], "mno-cumode">, Group, + HelpText<"WGP wavefront execution mode is used (AMDGPU only)">; + def faltivec : Flag<["-"], "faltivec">, Group, Flags<[DriverOption]>; def fno_altivec : Flag<["-"], "fno-altivec">, Group, Flags<[DriverOption]>; def maltivec : Flag<["-"], "maltivec">, Group; Modified: cfe/trunk/lib/Basic/Targets/AMDGPU.cpp URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Basic/Targets/AMDGPU.cpp?rev=360634&r1=360633&r2=360634&view=diff == --- cfe/trunk/lib/Basic/Targets/AMDGPU.cpp (original) +++ cfe/trunk/lib/Basic/Targets/AMDGPU.cpp Mon May 13 16:15:59 2019 @@ -135,6 +135,14 @@ bool AMDGPUTargetInfo::initFeatureMap( CPU = "gfx600"; switch (llvm::AMDGPU::parseArchAMDGCN(CPU)) { +case GK_GFX1010: + Features["dl-insts"] = true; + Features["16-bit-insts"] = true; + Features["dpp"] = true; + Features["gfx9-insts"] = true; + Features["gfx10-insts"] = true; + Features["s-memrealtime"] = true; + break; case GK_GFX906: Features["dl-insts"] = true; Features["dot1-insts"] = true; Modified: cfe/trunk/lib/Basic/Targets/AMDGPU.h URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Basic/Targets/AMDGPU.h?rev=360634&r1=360633&r2=360634&view=diff == --- cfe/trunk/lib/Basic/Targets/AMDGPU.h (original) +++ cfe/trunk/lib/Basic/Targets/AMDGPU.h Mon May 13 16:15:59 2019 @@ -41,7 +41,6 @@ class LLVM_LIBRARY_VISIBILITY AMDGPUTarg llvm::AMDGPU::GPUKind GPUKind; unsigned GPUFeatures; - bool hasFP64() const { return getTriple().getArch() == llvm::Triple::amdgcn || !!(GPUFeatures & llvm::AMDGPU::FEATURE_FP64); Modified: cfe/trunk/lib/Driver/ToolChains/HIP.cpp URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/ToolChains/HIP.cpp?rev=360634&r1=360633&r2=360634&view=diff == --- cfe/trunk/lib/Driver/ToolChains/HIP.cpp (original) +++ cfe/trunk/lib/Driver/ToolChains/HIP.cpp Mon May 13 16:15:59 2019 @@ -307,8 +307,8 @@ void HIPToolChain::addClangTargetOptions if (BCLibs.empty()) { // Get the bc lib file name for ISA version. For example, // gfx803 => oclc_isa_version_803.amdgcn.bc. -std::string ISAVerBC = -"oclc_isa_version_" + GpuArch.drop_front(3).str() + ".amdgcn.bc"; +std::string GFXVersion = GpuArch.drop_front(3).str(); +std::string ISAVerBC = "oclc_isa_version_" + GFXVersion + ".amdgcn.bc"; llvm::StringRef FlushDenormalControlBC; if (DriverArgs.hasArg(options::OPT_fcuda_flush_denormals_to_zero)) Modified: cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl?rev=360634&r1=360633&r2=360634&view=diff =
r350794 - [AMDGPU] Separate feature dot-insts
Author: rampitec Date: Wed Jan 9 19:25:47 2019 New Revision: 350794 URL: http://llvm.org/viewvc/llvm-project?rev=350794&view=rev Log: [AMDGPU] Separate feature dot-insts Differential Revision: https://reviews.llvm.org/D56525 Modified: cfe/trunk/include/clang/Basic/BuiltinsAMDGPU.def cfe/trunk/lib/Basic/Targets/AMDGPU.cpp cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl cfe/trunk/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl Modified: cfe/trunk/include/clang/Basic/BuiltinsAMDGPU.def URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/BuiltinsAMDGPU.def?rev=350794&r1=350793&r2=350794&view=diff == --- cfe/trunk/include/clang/Basic/BuiltinsAMDGPU.def (original) +++ cfe/trunk/include/clang/Basic/BuiltinsAMDGPU.def Wed Jan 9 19:25:47 2019 @@ -135,13 +135,13 @@ TARGET_BUILTIN(__builtin_amdgcn_fmed3h, // Deep learning builtins. //===--===// -TARGET_BUILTIN(__builtin_amdgcn_fdot2, "fV2hV2hfIb", "nc", "dl-insts") -TARGET_BUILTIN(__builtin_amdgcn_sdot2, "SiV2SsV2SsSiIb", "nc", "dl-insts") -TARGET_BUILTIN(__builtin_amdgcn_udot2, "UiV2UsV2UsUiIb", "nc", "dl-insts") -TARGET_BUILTIN(__builtin_amdgcn_sdot4, "SiSiSiSiIb", "nc", "dl-insts") -TARGET_BUILTIN(__builtin_amdgcn_udot4, "UiUiUiUiIb", "nc", "dl-insts") -TARGET_BUILTIN(__builtin_amdgcn_sdot8, "SiSiSiSiIb", "nc", "dl-insts") -TARGET_BUILTIN(__builtin_amdgcn_udot8, "UiUiUiUiIb", "nc", "dl-insts") +TARGET_BUILTIN(__builtin_amdgcn_fdot2, "fV2hV2hfIb", "nc", "dot-insts") +TARGET_BUILTIN(__builtin_amdgcn_sdot2, "SiV2SsV2SsSiIb", "nc", "dot-insts") +TARGET_BUILTIN(__builtin_amdgcn_udot2, "UiV2UsV2UsUiIb", "nc", "dot-insts") +TARGET_BUILTIN(__builtin_amdgcn_sdot4, "SiSiSiSiIb", "nc", "dot-insts") +TARGET_BUILTIN(__builtin_amdgcn_udot4, "UiUiUiUiIb", "nc", "dot-insts") +TARGET_BUILTIN(__builtin_amdgcn_sdot8, "SiSiSiSiIb", "nc", "dot-insts") +TARGET_BUILTIN(__builtin_amdgcn_udot8, "UiUiUiUiIb", "nc", "dot-insts") //===--===// // Special builtins. Modified: cfe/trunk/lib/Basic/Targets/AMDGPU.cpp URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Basic/Targets/AMDGPU.cpp?rev=350794&r1=350793&r2=350794&view=diff == --- cfe/trunk/lib/Basic/Targets/AMDGPU.cpp (original) +++ cfe/trunk/lib/Basic/Targets/AMDGPU.cpp Wed Jan 9 19:25:47 2019 @@ -137,6 +137,7 @@ bool AMDGPUTargetInfo::initFeatureMap( switch (llvm::AMDGPU::parseArchAMDGCN(CPU)) { case GK_GFX906: Features["dl-insts"] = true; + Features["dot-insts"] = true; LLVM_FALLTHROUGH; case GK_GFX909: case GK_GFX904: Modified: cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl?rev=350794&r1=350793&r2=350794&view=diff == --- cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl (original) +++ cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl Wed Jan 9 19:25:47 2019 @@ -11,7 +11,7 @@ // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx601 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX601 %s // GFX904: "target-features"="+16-bit-insts,+ci-insts,+dpp,+fp32-denormals,+fp64-fp16-denormals,+gfx9-insts,+s-memrealtime,+vi-insts" -// GFX906: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dpp,+fp32-denormals,+fp64-fp16-denormals,+gfx9-insts,+s-memrealtime,+vi-insts" +// GFX906: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot-insts,+dpp,+fp32-denormals,+fp64-fp16-denormals,+gfx9-insts,+s-memrealtime,+vi-insts" // GFX801: "target-features"="+16-bit-insts,+ci-insts,+dpp,+fp32-denormals,+fp64-fp16-denormals,+s-memrealtime,+vi-insts" // GFX700: "target-features"="+ci-insts,+fp64-fp16-denormals,-fp32-denormals" // GFX600: "target-features"="+fp64-fp16-denormals,-fp32-denormals" Modified: cfe/trunk/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl?rev=350794&r1=350793&r2=350794&view=diff == --- cfe/trunk/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl (original) +++ cfe/trunk/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl Wed Jan 9 19:25:47 2019 @@ -12,24 +12,24 @@ kernel void builtins_amdgcn_dl_insts_err half2 v2hA, half2 v2hB, float fC, short2 v2ssA, short2 v2ssB, int siA, int siB, int siC, ushort2 v2usA, ushort2 v2usB, uint uiA, uint uiB, uint uiC) { - fOut[0] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, false); // expected-error {{'__builtin_amdgcn_fdot2' needs target feature dl-insts}} - fOut[1] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, true); // expected-error {{'__buil
[clang] 58de24c - [AMDGPU] Sorted targets in amdgpu-features.cl. NFC.
Author: Stanislav Mekhanoshin Date: 2020-06-12T11:57:40-07:00 New Revision: 58de24ce6cb413afea1470ec183f3fc5d9ca6817 URL: https://github.com/llvm/llvm-project/commit/58de24ce6cb413afea1470ec183f3fc5d9ca6817 DIFF: https://github.com/llvm/llvm-project/commit/58de24ce6cb413afea1470ec183f3fc5d9ca6817.diff LOG: [AMDGPU] Sorted targets in amdgpu-features.cl. NFC. Added: Modified: clang/test/CodeGenOpenCL/amdgpu-features.cl Removed: diff --git a/clang/test/CodeGenOpenCL/amdgpu-features.cl b/clang/test/CodeGenOpenCL/amdgpu-features.cl index 7529a4d4abb1..344d4bca44c9 100644 --- a/clang/test/CodeGenOpenCL/amdgpu-features.cl +++ b/clang/test/CodeGenOpenCL/amdgpu-features.cl @@ -3,26 +3,26 @@ // Check that appropriate features are defined for every supported AMDGPU // "-target" and "-mcpu" options. +// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx600 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX600 %s +// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx601 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX601 %s +// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx700 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX700 %s +// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx801 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX801 %s // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx904 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX904 %s // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx906 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX906 %s // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx908 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX908 %s // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1010 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX1010 %s // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1011 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX1011 %s // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1012 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX1012 %s -// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx801 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX801 %s -// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx700 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX700 %s -// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx600 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX600 %s -// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx601 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX601 %s +// GFX600-NOT: "target-features" +// GFX601-NOT: "target-features" +// GFX700: "target-features"="+ci-insts,+flat-address-space" +// GFX801: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+s-memrealtime" // GFX904: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime" // GFX906: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime" // GFX908: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+mai-insts,+s-memrealtime" // GFX1010: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dpp,+flat-address-space,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime" // GFX1011: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dpp,+flat-address-space,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime" // GFX1012: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dpp,+flat-address-space,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime" -// GFX801: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+s-memrealtime" -// GFX700: "target-features"="+ci-insts,+flat-address-space" -// GFX600-NOT: "target-features" -// GFX601-NOT: "target-features" kernel void test() {} ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] ea7d0e2 - [AMDGPU] gfx1031 target
Author: Stanislav Mekhanoshin Date: 2020-08-05T12:36:26-07:00 New Revision: ea7d0e2996ec6b72a08dbef26dadf217458ab382 URL: https://github.com/llvm/llvm-project/commit/ea7d0e2996ec6b72a08dbef26dadf217458ab382 DIFF: https://github.com/llvm/llvm-project/commit/ea7d0e2996ec6b72a08dbef26dadf217458ab382.diff LOG: [AMDGPU] gfx1031 target Differential Revision: https://reviews.llvm.org/D85337 Added: Modified: clang/include/clang/Basic/Cuda.h clang/lib/Basic/Targets/AMDGPU.cpp clang/lib/Basic/Targets/NVPTX.cpp clang/test/CodeGenOpenCL/amdgpu-features.cl clang/test/Driver/amdgpu-macros.cl clang/test/Driver/amdgpu-mcpu.cl llvm/docs/AMDGPUUsage.rst llvm/include/llvm/BinaryFormat/ELF.h llvm/include/llvm/Support/TargetParser.h llvm/lib/ObjectYAML/ELFYAML.cpp llvm/lib/Support/TargetParser.cpp llvm/lib/Target/AMDGPU/GCNProcessors.td llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fmad.s32.mir llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.global.atomic.csub.ll llvm/test/CodeGen/AMDGPU/elf-header-flags-mach.ll llvm/test/CodeGen/AMDGPU/hsa-note-no-func.ll llvm/test/CodeGen/AMDGPU/idot8s.ll llvm/test/CodeGen/AMDGPU/llvm.amdgcn.atomic.csub.ll llvm/test/CodeGen/AMDGPU/llvm.amdgcn.sdot4.ll llvm/test/CodeGen/AMDGPU/llvm.amdgcn.sdot8.ll llvm/test/MC/AMDGPU/gfx1030_err.s llvm/test/MC/AMDGPU/gfx1030_new.s llvm/test/MC/Disassembler/AMDGPU/gfx1030_dasm_new.txt llvm/tools/llvm-readobj/ELFDumper.cpp Removed: diff --git a/clang/include/clang/Basic/Cuda.h b/clang/include/clang/Basic/Cuda.h index 1716325a99312..19301e825bcfd 100644 --- a/clang/include/clang/Basic/Cuda.h +++ b/clang/include/clang/Basic/Cuda.h @@ -75,6 +75,7 @@ enum class CudaArch { GFX1011, GFX1012, GFX1030, + GFX1031, LAST, }; diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp index e147045110a99..57351c7557082 100644 --- a/clang/lib/Basic/Targets/AMDGPU.cpp +++ b/clang/lib/Basic/Targets/AMDGPU.cpp @@ -174,6 +174,7 @@ bool AMDGPUTargetInfo::initFeatureMap( // XXX - What does the member GPU mean if device name string passed here? if (isAMDGCN(getTriple())) { switch (llvm::AMDGPU::parseArchAMDGCN(CPU)) { +case GK_GFX1031: case GK_GFX1030: Features["ci-insts"] = true; Features["dot1-insts"] = true; diff --git a/clang/lib/Basic/Targets/NVPTX.cpp b/clang/lib/Basic/Targets/NVPTX.cpp index 18c3c83703310..ef61b8f78946c 100644 --- a/clang/lib/Basic/Targets/NVPTX.cpp +++ b/clang/lib/Basic/Targets/NVPTX.cpp @@ -201,6 +201,7 @@ void NVPTXTargetInfo::getTargetDefines(const LangOptions &Opts, case CudaArch::GFX1011: case CudaArch::GFX1012: case CudaArch::GFX1030: + case CudaArch::GFX1031: case CudaArch::LAST: break; case CudaArch::UNKNOWN: diff --git a/clang/test/CodeGenOpenCL/amdgpu-features.cl b/clang/test/CodeGenOpenCL/amdgpu-features.cl index 4c26163237980..93357a48eb89a 100644 --- a/clang/test/CodeGenOpenCL/amdgpu-features.cl +++ b/clang/test/CodeGenOpenCL/amdgpu-features.cl @@ -14,6 +14,7 @@ // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1011 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX1011 %s // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1012 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX1012 %s // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1030 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX1030 %s +// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1031 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX1031 %s // GFX600-NOT: "target-features" // GFX601-NOT: "target-features" @@ -26,5 +27,6 @@ // GFX1011: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dpp,+flat-address-space,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime" // GFX1012: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dpp,+flat-address-space,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime" // GFX1030: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime" +// GFX1031: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dpp,+flat-address-space,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime" kernel void test() {} diff --git a/clang/test/Driver/amdgpu-macros.cl b/clang/test/Driver/amdgpu-macros.cl index 24d2fead28d5e..ae46e85f94794 100644 --- a/clang/test/Driver/amdgpu-macros.cl +++ b/clang/test/Driver/amdgpu-macros.cl @@ -181,6 +181,7 @@ // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1011 %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,GFX1011 %s
[clang] 105608a - [AMDGPU] Added missing gfx1031 cases to CGOpenMPRuntimeGPU.cpp
Author: Stanislav Mekhanoshin Date: 2020-08-05T12:39:03-07:00 New Revision: 105608a4c2821ca8f8340104614c1176ed1ed82d URL: https://github.com/llvm/llvm-project/commit/105608a4c2821ca8f8340104614c1176ed1ed82d DIFF: https://github.com/llvm/llvm-project/commit/105608a4c2821ca8f8340104614c1176ed1ed82d.diff LOG: [AMDGPU] Added missing gfx1031 cases to CGOpenMPRuntimeGPU.cpp Added: Modified: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp Removed: diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp index 452eb15eb8d1..9440758a85b6 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp @@ -5014,6 +5014,7 @@ void CGOpenMPRuntimeGPU::processRequiresDirective( case CudaArch::GFX1011: case CudaArch::GFX1012: case CudaArch::GFX1030: + case CudaArch::GFX1031: case CudaArch::UNKNOWN: break; case CudaArch::LAST: @@ -5074,6 +5075,7 @@ static std::pair getSMsBlocksPerSM(CodeGenModule &CGM) { case CudaArch::GFX1011: case CudaArch::GFX1012: case CudaArch::GFX1030: + case CudaArch::GFX1031: case CudaArch::UNKNOWN: break; case CudaArch::LAST: ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] - Add clang builtins for tied WMMA intrinsics (PR #70669)
@@ -292,13 +292,17 @@ TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w32, "V8fV16hV16hV8f", "nc TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w32, "V8fV16sV16sV8f", "nc", "gfx11-insts") TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w32, "V16hV16hV16hV16hIb", "nc", "gfx11-insts") TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32, "V16sV16sV16sV16sIb", "nc", "gfx11-insts") +TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32, "V16hV16hV16hV16hIb", "nc", "gfx11-insts") rampitec wrote: Need to add negative test for the last operand to always be a constant integer. We do it every time 'I' modifier is used. https://github.com/llvm/llvm-project/pull/70669 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] - Add clang builtins for tied WMMA intrinsics (PR #70669)
https://github.com/rampitec approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/70669 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [mlir] [libcxx] [llvm] [libc] [flang] [openmp] [clang] [lldb] GlobalISel: Guide return in llvm::getIConstantSplatVal (PR #71989)
rampitec wrote: Any tests? https://github.com/llvm/llvm-project/pull/71989 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[compiler-rt] [libcxxabi] [libcxx] [lld] [flang] [llvm] [clang-tools-extra] [lldb] [clang] [AMDGPU] GFX12: select @llvm.prefetch intrinsic (PR #74576)
@@ -959,6 +967,32 @@ def : GCNPat < } } // let OtherPredicates = [HasShaderCyclesRegister] +def SIMM24bitPtr : ImmLeaf (Imm);}] +>; + +multiclass SMPrefetchPat { + def : GCNPat < +(smrd_prefetch (SMRDImm i64:$sbase, i32:$offset), timm, timm, (i32 cache_type)), +(!cast("S_PREFETCH_"#type) $sbase, $offset, (i32 SGPR_NULL), (i8 0)) + >; + + def : GCNPat < +(smrd_prefetch (i64 SReg_64:$sbase), timm, timm, (i32 cache_type)), +(!cast("S_PREFETCH_"#type) $sbase, 0, (i32 SGPR_NULL), (i8 0)) + >; + + def : GCNPat < +(prefetch SIMM24bitPtr:$offset, timm, timm, (i32 cache_type)), +(!cast("S_PREFETCH_"#type#"_PC_REL") (as_i32timm $offset), (i32 SGPR_NULL), (i8 0)) + > { +let AddedComplexity = 10; + } rampitec wrote: Prefetch on an absolute address is practically useless. https://github.com/llvm/llvm-project/pull/74576 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [llvm] [libcxx] [lldb] [flang] [clang] [compiler-rt] [libcxxabi] [lld] [AMDGPU] GFX12: select @llvm.prefetch intrinsic (PR #74576)
@@ -959,6 +967,32 @@ def : GCNPat < } } // let OtherPredicates = [HasShaderCyclesRegister] +def SIMM24bitPtr : ImmLeaf (Imm);}] +>; + +multiclass SMPrefetchPat { + def : GCNPat < +(smrd_prefetch (SMRDImm i64:$sbase, i32:$offset), timm, timm, (i32 cache_type)), +(!cast("S_PREFETCH_"#type) $sbase, $offset, (i32 SGPR_NULL), (i8 0)) + >; + + def : GCNPat < +(smrd_prefetch (i64 SReg_64:$sbase), timm, timm, (i32 cache_type)), +(!cast("S_PREFETCH_"#type) $sbase, 0, (i32 SGPR_NULL), (i8 0)) + >; + + def : GCNPat < +(prefetch SIMM24bitPtr:$offset, timm, timm, (i32 cache_type)), +(!cast("S_PREFETCH_"#type#"_PC_REL") (as_i32timm $offset), (i32 SGPR_NULL), (i8 0)) + > { +let AddedComplexity = 10; + } rampitec wrote: So you want a target intrinsic? https://github.com/llvm/llvm-project/pull/74576 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[compiler-rt] [libcxx] [lldb] [libcxxabi] [clang-tools-extra] [lld] [llvm] [clang] [flang] [AMDGPU] GFX12: select @llvm.prefetch intrinsic (PR #74576)
@@ -959,6 +967,32 @@ def : GCNPat < } } // let OtherPredicates = [HasShaderCyclesRegister] +def SIMM24bitPtr : ImmLeaf (Imm);}] +>; + +multiclass SMPrefetchPat { + def : GCNPat < +(smrd_prefetch (SMRDImm i64:$sbase, i32:$offset), timm, timm, (i32 cache_type)), +(!cast("S_PREFETCH_"#type) $sbase, $offset, (i32 SGPR_NULL), (i8 0)) + >; + + def : GCNPat < +(smrd_prefetch (i64 SReg_64:$sbase), timm, timm, (i32 cache_type)), +(!cast("S_PREFETCH_"#type) $sbase, 0, (i32 SGPR_NULL), (i8 0)) + >; + + def : GCNPat < +(prefetch SIMM24bitPtr:$offset, timm, timm, (i32 cache_type)), +(!cast("S_PREFETCH_"#type#"_PC_REL") (as_i32timm $offset), (i32 SGPR_NULL), (i8 0)) + > { +let AddedComplexity = 10; + } rampitec wrote: I do not think we need to use PC_REL form to prefetch on a function's address. The instruction can take full 64-bit address, so one can just use this address. My understanding that PC_REL form can be useful if you expect something like a huge loop or a local branch and want to prefetch something like 1K from the PC. I am not sure though how useful this can be at a high language level or even in IR. https://github.com/llvm/llvm-project/pull/74576 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lldb] [lld] [llvm] [libc] [libcxx] [flang] [compiler-rt] [clang-tools-extra] [clang] [GlobalISel] Add G_PREFETCH (PR #74863)
@@ -1209,6 +1209,15 @@ def G_FENCE : GenericInstruction { let hasSideEffects = true; } +// Generic opcode equivalent to the llvm.prefetch intrinsic. +def G_PREFETCH : GenericInstruction { + let OutOperandList = (outs); + let InOperandList = (ins ptype0:$address, i32imm:$rw, i32imm:$locality, i32imm:$cachetype); + let hasSideEffects = true; + let mayLoad = true; + let mayStore = true; rampitec wrote: > should probably just be hasSideEffects. mayLoad/mayStore imply it needs a > memory operand and is an ordered memory reference when it doesn't have one I could argue this is not a memory operation at all as it shall have no visible effects other than access speed, although practically it has ordering. You certainly do not want a prefetch to be moved past the loads which it was supposed to prefetch. I.e. in my view use of both mayLoad and mayStore is justified. Although we need to make sure it is not considered an aliased store or load from the AA point of view. https://github.com/llvm/llvm-project/pull/74863 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [clang] [llvm] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/74537 >From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Mon, 4 Dec 2023 16:11:53 -0800 Subject: [PATCH 1/7] [AMDGPU] Use alias info to relax waitcounts for LDS DMA LDA DMA loads increase VMCNT and a load from the LDS stored must wait on this counter to only read memory after it is written. Wait count insertion pass does not track memory dependencies, it tracks register dependencies. To model the LDS dependency a psuedo register is used in the scoreboard, acting like if LDS DMA writes it and LDS load reads it. This patch adds 8 more pseudo registers to use for independent LDS locations if we can prove they are disjoint using alias analysis. Fixes: SWDEV-433427 --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 16 +- llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | 73 +- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 4 +- llvm/lib/Target/AMDGPU/SIInstrInfo.h| 8 + llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll| 2 + 6 files changed, 241 insertions(+), 16 deletions(-) create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index a7f4d63229b7ef..2e079404b087fa 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, MachineMemOperand::MOStore | MachineMemOperand::MODereferenceable; - // XXX - Should this be volatile without known ordering? - Info.flags |= MachineMemOperand::MOVolatile; - switch (IntrID) { default: +// XXX - Should this be volatile without known ordering? +Info.flags |= MachineMemOperand::MOVolatile; break; case Intrinsic::amdgcn_raw_buffer_load_lds: case Intrinsic::amdgcn_raw_ptr_buffer_load_lds: @@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: { unsigned Width = cast(CI.getArgOperand(2))->getZExtValue(); Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8); +Info.ptrVal = CI.getArgOperand(1); return true; } } @@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, Info.opc = ISD::INTRINSIC_VOID; unsigned Width = cast(CI.getArgOperand(2))->getZExtValue(); Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8); -Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore | - MachineMemOperand::MOVolatile; +Info.ptrVal = CI.getArgOperand(1); +Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore; return true; } case Intrinsic::amdgcn_ds_bvh_stack_rtn: { @@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo(); MachinePointerInfo StorePtrI = LoadPtrI; -StorePtrI.V = nullptr; +LoadPtrI.V = UndefValue::get( +PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS)); +LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS; StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS; auto F = LoadMMO->getFlags() & @@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo(); LoadPtrI.Offset = Op->getConstantOperandVal(5); MachinePointerInfo StorePtrI = LoadPtrI; +LoadPtrI.V = UndefValue::get( +PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS)); LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS; StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS; auto F = LoadMMO->getFlags() & diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp index ede4841b8a5fd7..50ad22130e939e 100644 --- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp +++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp @@ -31,6 +31,7 @@ #include "llvm/ADT/MapVector.h" #include "llvm/ADT/PostOrderIterator.h" #include "llvm/ADT/Sequence.h" +#include "llvm/Analysis/AliasAnalysis.h" #include "llvm/CodeGen/MachineLoopInfo.h" #include "llvm/CodeGen/MachinePostDominators.h" #include "llvm/InitializePasses.h" @@ -121,8 +122,13 @@ enum RegisterMapping { SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets. AGPR_OFFSET = 256, // Maximum programmable ArchVGPRs across all targets. SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets. - NUM_EXTRA_VGPRS = 1,// A reserved slot for DS. - EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes. + NUM_EXTRA_VGPRS = 9,// Reserved slots f
[llvm] [libc] [flang] [openmp] [clang] [mlir] [libcxx] [lldb] [clang-tools-extra] GlobalISel: Guard return in llvm::getIConstantSplatVal (PR #71989)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/71989 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] make v32i16/v32f16 legal (PR #70484)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/70484 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] Select 64-bit imm moves if can be encoded as 32 bit operand (PR #70395)
https://github.com/rampitec closed https://github.com/llvm/llvm-project/pull/70395 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [AMDGPU] Select 64-bit imm moves if can be encoded as 32 bit operand (PR #70395)
https://github.com/rampitec closed https://github.com/llvm/llvm-project/pull/70395 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] - Add clang builtins for tied WMMA intrinsics (PR #70669)
https://github.com/rampitec commented: Also needed negative tests that gfx11-insts feature is required (using gfx1030 target for example) and for the immediate arguments. See for example builtins-amdgcn-gfx11-err.cl and builtins-amdgcn-fp-atomics-gfx11-err.cl. https://github.com/llvm/llvm-project/pull/70669 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[libcxx] [flang] [clang] [compiler-rt] [lld] [clang-tools-extra] [llvm] [lldb] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/74537 >From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Mon, 4 Dec 2023 16:11:53 -0800 Subject: [PATCH 1/7] [AMDGPU] Use alias info to relax waitcounts for LDS DMA LDA DMA loads increase VMCNT and a load from the LDS stored must wait on this counter to only read memory after it is written. Wait count insertion pass does not track memory dependencies, it tracks register dependencies. To model the LDS dependency a psuedo register is used in the scoreboard, acting like if LDS DMA writes it and LDS load reads it. This patch adds 8 more pseudo registers to use for independent LDS locations if we can prove they are disjoint using alias analysis. Fixes: SWDEV-433427 --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 16 +- llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | 73 +- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 4 +- llvm/lib/Target/AMDGPU/SIInstrInfo.h| 8 + llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll| 2 + 6 files changed, 241 insertions(+), 16 deletions(-) create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index a7f4d63229b7ef..2e079404b087fa 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, MachineMemOperand::MOStore | MachineMemOperand::MODereferenceable; - // XXX - Should this be volatile without known ordering? - Info.flags |= MachineMemOperand::MOVolatile; - switch (IntrID) { default: +// XXX - Should this be volatile without known ordering? +Info.flags |= MachineMemOperand::MOVolatile; break; case Intrinsic::amdgcn_raw_buffer_load_lds: case Intrinsic::amdgcn_raw_ptr_buffer_load_lds: @@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: { unsigned Width = cast(CI.getArgOperand(2))->getZExtValue(); Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8); +Info.ptrVal = CI.getArgOperand(1); return true; } } @@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, Info.opc = ISD::INTRINSIC_VOID; unsigned Width = cast(CI.getArgOperand(2))->getZExtValue(); Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8); -Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore | - MachineMemOperand::MOVolatile; +Info.ptrVal = CI.getArgOperand(1); +Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore; return true; } case Intrinsic::amdgcn_ds_bvh_stack_rtn: { @@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo(); MachinePointerInfo StorePtrI = LoadPtrI; -StorePtrI.V = nullptr; +LoadPtrI.V = UndefValue::get( +PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS)); +LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS; StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS; auto F = LoadMMO->getFlags() & @@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo(); LoadPtrI.Offset = Op->getConstantOperandVal(5); MachinePointerInfo StorePtrI = LoadPtrI; +LoadPtrI.V = UndefValue::get( +PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS)); LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS; StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS; auto F = LoadMMO->getFlags() & diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp index ede4841b8a5fd7..50ad22130e939e 100644 --- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp +++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp @@ -31,6 +31,7 @@ #include "llvm/ADT/MapVector.h" #include "llvm/ADT/PostOrderIterator.h" #include "llvm/ADT/Sequence.h" +#include "llvm/Analysis/AliasAnalysis.h" #include "llvm/CodeGen/MachineLoopInfo.h" #include "llvm/CodeGen/MachinePostDominators.h" #include "llvm/InitializePasses.h" @@ -121,8 +122,13 @@ enum RegisterMapping { SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets. AGPR_OFFSET = 256, // Maximum programmable ArchVGPRs across all targets. SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets. - NUM_EXTRA_VGPRS = 1,// A reserved slot for DS. - EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes. + NUM_EXTRA_VGPRS = 9,// Reserved slots f
[libcxx] [flang] [clang] [compiler-rt] [lld] [clang-tools-extra] [llvm] [lldb] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
rampitec wrote: Ping https://github.com/llvm/llvm-project/pull/74537 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[flang] [llvm] [libcxx] [compiler-rt] [lld] [clang-tools-extra] [libc] [clang] [lldb] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/74537 >From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Mon, 4 Dec 2023 16:11:53 -0800 Subject: [PATCH 1/7] [AMDGPU] Use alias info to relax waitcounts for LDS DMA LDA DMA loads increase VMCNT and a load from the LDS stored must wait on this counter to only read memory after it is written. Wait count insertion pass does not track memory dependencies, it tracks register dependencies. To model the LDS dependency a psuedo register is used in the scoreboard, acting like if LDS DMA writes it and LDS load reads it. This patch adds 8 more pseudo registers to use for independent LDS locations if we can prove they are disjoint using alias analysis. Fixes: SWDEV-433427 --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 16 +- llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | 73 +- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 4 +- llvm/lib/Target/AMDGPU/SIInstrInfo.h| 8 + llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll| 2 + 6 files changed, 241 insertions(+), 16 deletions(-) create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index a7f4d63229b7ef..2e079404b087fa 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, MachineMemOperand::MOStore | MachineMemOperand::MODereferenceable; - // XXX - Should this be volatile without known ordering? - Info.flags |= MachineMemOperand::MOVolatile; - switch (IntrID) { default: +// XXX - Should this be volatile without known ordering? +Info.flags |= MachineMemOperand::MOVolatile; break; case Intrinsic::amdgcn_raw_buffer_load_lds: case Intrinsic::amdgcn_raw_ptr_buffer_load_lds: @@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: { unsigned Width = cast(CI.getArgOperand(2))->getZExtValue(); Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8); +Info.ptrVal = CI.getArgOperand(1); return true; } } @@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, Info.opc = ISD::INTRINSIC_VOID; unsigned Width = cast(CI.getArgOperand(2))->getZExtValue(); Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8); -Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore | - MachineMemOperand::MOVolatile; +Info.ptrVal = CI.getArgOperand(1); +Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore; return true; } case Intrinsic::amdgcn_ds_bvh_stack_rtn: { @@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo(); MachinePointerInfo StorePtrI = LoadPtrI; -StorePtrI.V = nullptr; +LoadPtrI.V = UndefValue::get( +PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS)); +LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS; StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS; auto F = LoadMMO->getFlags() & @@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo(); LoadPtrI.Offset = Op->getConstantOperandVal(5); MachinePointerInfo StorePtrI = LoadPtrI; +LoadPtrI.V = UndefValue::get( +PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS)); LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS; StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS; auto F = LoadMMO->getFlags() & diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp index ede4841b8a5fd7..50ad22130e939e 100644 --- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp +++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp @@ -31,6 +31,7 @@ #include "llvm/ADT/MapVector.h" #include "llvm/ADT/PostOrderIterator.h" #include "llvm/ADT/Sequence.h" +#include "llvm/Analysis/AliasAnalysis.h" #include "llvm/CodeGen/MachineLoopInfo.h" #include "llvm/CodeGen/MachinePostDominators.h" #include "llvm/InitializePasses.h" @@ -121,8 +122,13 @@ enum RegisterMapping { SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets. AGPR_OFFSET = 256, // Maximum programmable ArchVGPRs across all targets. SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets. - NUM_EXTRA_VGPRS = 1,// A reserved slot for DS. - EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes. + NUM_EXTRA_VGPRS = 9,// Reserved slots f
[lld] [compiler-rt] [clang] [flang] [lldb] [libc] [libcxx] [clang-tools-extra] [llvm] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
rampitec wrote: To make it easier I am splitting the patch. I have pre-comitted the test, and there is a part which fixes lack of wait on GFX10 : https://github.com/llvm/llvm-project/pull/75245 https://github.com/llvm/llvm-project/pull/74537 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lld] [compiler-rt] [clang] [flang] [lldb] [libc] [libcxx] [clang-tools-extra] [llvm] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
rampitec wrote: Another part is improving memoperand info: https://github.com/llvm/llvm-project/pull/75247. This is NFCI just by itself. https://github.com/llvm/llvm-project/pull/74537 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lld] [compiler-rt] [clang] [flang] [lldb] [libc] [libcxx] [clang-tools-extra] [llvm] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
rampitec wrote: Yet another part to fix disjoint memory checks with LDS DMA: https://github.com/llvm/llvm-project/pull/75249 https://github.com/llvm/llvm-project/pull/74537 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lld] [clang-tools-extra] [compiler-rt] [lldb] [flang] [llvm] [libcxx] [libc] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/74537 >From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Mon, 4 Dec 2023 16:11:53 -0800 Subject: [PATCH 1/7] [AMDGPU] Use alias info to relax waitcounts for LDS DMA LDA DMA loads increase VMCNT and a load from the LDS stored must wait on this counter to only read memory after it is written. Wait count insertion pass does not track memory dependencies, it tracks register dependencies. To model the LDS dependency a psuedo register is used in the scoreboard, acting like if LDS DMA writes it and LDS load reads it. This patch adds 8 more pseudo registers to use for independent LDS locations if we can prove they are disjoint using alias analysis. Fixes: SWDEV-433427 --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 16 +- llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | 73 +- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 4 +- llvm/lib/Target/AMDGPU/SIInstrInfo.h| 8 + llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll| 2 + 6 files changed, 241 insertions(+), 16 deletions(-) create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index a7f4d63229b7ef..2e079404b087fa 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, MachineMemOperand::MOStore | MachineMemOperand::MODereferenceable; - // XXX - Should this be volatile without known ordering? - Info.flags |= MachineMemOperand::MOVolatile; - switch (IntrID) { default: +// XXX - Should this be volatile without known ordering? +Info.flags |= MachineMemOperand::MOVolatile; break; case Intrinsic::amdgcn_raw_buffer_load_lds: case Intrinsic::amdgcn_raw_ptr_buffer_load_lds: @@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: { unsigned Width = cast(CI.getArgOperand(2))->getZExtValue(); Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8); +Info.ptrVal = CI.getArgOperand(1); return true; } } @@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, Info.opc = ISD::INTRINSIC_VOID; unsigned Width = cast(CI.getArgOperand(2))->getZExtValue(); Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8); -Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore | - MachineMemOperand::MOVolatile; +Info.ptrVal = CI.getArgOperand(1); +Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore; return true; } case Intrinsic::amdgcn_ds_bvh_stack_rtn: { @@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo(); MachinePointerInfo StorePtrI = LoadPtrI; -StorePtrI.V = nullptr; +LoadPtrI.V = UndefValue::get( +PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS)); +LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS; StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS; auto F = LoadMMO->getFlags() & @@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo(); LoadPtrI.Offset = Op->getConstantOperandVal(5); MachinePointerInfo StorePtrI = LoadPtrI; +LoadPtrI.V = UndefValue::get( +PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS)); LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS; StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS; auto F = LoadMMO->getFlags() & diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp index ede4841b8a5fd7..50ad22130e939e 100644 --- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp +++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp @@ -31,6 +31,7 @@ #include "llvm/ADT/MapVector.h" #include "llvm/ADT/PostOrderIterator.h" #include "llvm/ADT/Sequence.h" +#include "llvm/Analysis/AliasAnalysis.h" #include "llvm/CodeGen/MachineLoopInfo.h" #include "llvm/CodeGen/MachinePostDominators.h" #include "llvm/InitializePasses.h" @@ -121,8 +122,13 @@ enum RegisterMapping { SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets. AGPR_OFFSET = 256, // Maximum programmable ArchVGPRs across all targets. SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets. - NUM_EXTRA_VGPRS = 1,// A reserved slot for DS. - EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes. + NUM_EXTRA_VGPRS = 9,// Reserved slots f
[clang] [clang-tools-extra] [compiler-rt] [flang] [llvm] [libcxx] [libc] [AMDGPU] Fix lack of LDS DMA check in the AA handling (PR #75249)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/75249 >From 82606c4447e8aa8edde90ed420f1c48707967695 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Tue, 12 Dec 2023 13:45:47 -0800 Subject: [PATCH] [AMDGPU] Fix lack of LDS DMA check in the AA handling SIInstrInfo::areMemAccessesTriviallyDisjoint does a DS offset checks, but does not account for LDS DMA instructions. Added these checks. Without it code falls through and returns true which is wrong. As a result mayAlias would always return false for LDS DMA and a regular LDS instruction or 2 LDS DMA instructions. At the moment this is NFCI because we do not use this AA in a context which may touch LDS DMA instructions. This is also unreacheable now because of the ordered memory ref checks just above in the function and LDS DMA is marked as volatile. This volatile marking is removed in PR #75247, therefore I'd submit this check before #75247. --- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 4 ++-- llvm/lib/Target/AMDGPU/SIInstrInfo.h | 8 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp index d4e4526795f3b3..c485eb299d52a3 100644 --- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp +++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp @@ -3656,8 +3656,8 @@ bool SIInstrInfo::areMemAccessesTriviallyDisjoint(const MachineInstr &MIa, // underlying address space, even if it was lowered to a different one, // e.g. private accesses lowered to use MUBUF instructions on a scratch // buffer. - if (isDS(MIa)) { -if (isDS(MIb)) + if (isDS(MIa) || isLDSDMA(MIa)) { +if (isDS(MIb) || isLDSDMA(MIb)) return checkInstOffsetsDoNotOverlap(MIa, MIb); return !isFLAT(MIb) || isSegmentSpecificFLAT(MIb); diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h b/llvm/lib/Target/AMDGPU/SIInstrInfo.h index e794d8cf7cc220..97800bda775cda 100644 --- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h +++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h @@ -546,6 +546,14 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo { return get(Opcode).TSFlags & SIInstrFlags::DS; } + static bool isLDSDMA(const MachineInstr &MI) { +return isVALU(MI) && (isMUBUF(MI) || isFLAT(MI)); + } + + bool isLDSDMA(uint16_t Opcode) { +return isVALU(Opcode) && (isMUBUF(Opcode) || isFLAT(Opcode)); + } + static bool isGWS(const MachineInstr &MI) { return MI.getDesc().TSFlags & SIInstrFlags::GWS; } ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [llvm] [libc] [flang] [compiler-rt] [libcxx] [clang] [AMDGPU] Fix lack of LDS DMA check in the AA handling (PR #75249)
https://github.com/rampitec edited https://github.com/llvm/llvm-project/pull/75249 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[flang] [clang] [clang-tools-extra] [compiler-rt] [libcxx] [llvm] [libc] [AMDGPU] Fix lack of LDS DMA check in the AA handling (PR #75249)
@@ -3656,8 +3656,8 @@ bool SIInstrInfo::areMemAccessesTriviallyDisjoint(const MachineInstr &MIa, // underlying address space, even if it was lowered to a different one, // e.g. private accesses lowered to use MUBUF instructions on a scratch // buffer. - if (isDS(MIa)) { -if (isDS(MIb)) + if (isDS(MIa) || isLDSDMA(MIa)) { +if (isDS(MIb) || isLDSDMA(MIb)) return checkInstOffsetsDoNotOverlap(MIa, MIb); rampitec wrote: It does, even though it just bails. It goes down to getMemOperandsWithOffsetWidth and there it bails on the LDS DMA: ``` 449│ // Get appropriate operand, and compute width accordingly. 450│ DataOpIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::vdst); 451│ if (DataOpIdx == -1) 452│ DataOpIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::vdata); 453│ if (DataOpIdx == -1) // LDS DMA 454│ return false; ``` In principle these offsets are analyzable. This is a typical store memop: ``` (dereferenceable store (s32) into `ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.buffer_load_lds_dword_2_ar rays.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.buffer_load_lds_dword_2_arrays.lds, i32 0, i32 1) ``` But is you want I can bail right here. https://github.com/llvm/llvm-project/pull/75249 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [llvm] [libc] [flang] [compiler-rt] [libcxx] [clang] [AMDGPU] Fix lack of LDS DMA check in the AA handling (PR #75249)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/75249 >From 82606c4447e8aa8edde90ed420f1c48707967695 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Tue, 12 Dec 2023 13:45:47 -0800 Subject: [PATCH 1/2] [AMDGPU] Fix lack of LDS DMA check in the AA handling SIInstrInfo::areMemAccessesTriviallyDisjoint does a DS offset checks, but does not account for LDS DMA instructions. Added these checks. Without it code falls through and returns true which is wrong. As a result mayAlias would always return false for LDS DMA and a regular LDS instruction or 2 LDS DMA instructions. At the moment this is NFCI because we do not use this AA in a context which may touch LDS DMA instructions. This is also unreacheable now because of the ordered memory ref checks just above in the function and LDS DMA is marked as volatile. This volatile marking is removed in PR #75247, therefore I'd submit this check before #75247. --- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 4 ++-- llvm/lib/Target/AMDGPU/SIInstrInfo.h | 8 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp index d4e4526795f3b3..c485eb299d52a3 100644 --- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp +++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp @@ -3656,8 +3656,8 @@ bool SIInstrInfo::areMemAccessesTriviallyDisjoint(const MachineInstr &MIa, // underlying address space, even if it was lowered to a different one, // e.g. private accesses lowered to use MUBUF instructions on a scratch // buffer. - if (isDS(MIa)) { -if (isDS(MIb)) + if (isDS(MIa) || isLDSDMA(MIa)) { +if (isDS(MIb) || isLDSDMA(MIb)) return checkInstOffsetsDoNotOverlap(MIa, MIb); return !isFLAT(MIb) || isSegmentSpecificFLAT(MIb); diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h b/llvm/lib/Target/AMDGPU/SIInstrInfo.h index e794d8cf7cc220..97800bda775cda 100644 --- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h +++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h @@ -546,6 +546,14 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo { return get(Opcode).TSFlags & SIInstrFlags::DS; } + static bool isLDSDMA(const MachineInstr &MI) { +return isVALU(MI) && (isMUBUF(MI) || isFLAT(MI)); + } + + bool isLDSDMA(uint16_t Opcode) { +return isVALU(Opcode) && (isMUBUF(Opcode) || isFLAT(Opcode)); + } + static bool isGWS(const MachineInstr &MI) { return MI.getDesc().TSFlags & SIInstrFlags::GWS; } >From d8d9f3aab2d2fff2911a99d096685e78faf3d917 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Wed, 13 Dec 2023 11:42:10 -0800 Subject: [PATCH 2/2] Bail early in areMemAccessesTriviallyDisjoint --- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp index 57eaefd41b2622..31669764144530 100644 --- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp +++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp @@ -3651,6 +3651,9 @@ bool SIInstrInfo::areMemAccessesTriviallyDisjoint(const MachineInstr &MIa, if (MIa.hasOrderedMemoryRef() || MIb.hasOrderedMemoryRef()) return false; + if (isLDSDMA(MIa) || isLDSDMA(MIb)) +return false; + // TODO: Should we check the address space from the MachineMemOperand? That // would allow us to distinguish objects we know don't alias based on the // underlying address space, even if it was lowered to a different one, ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[libc] [flang] [clang-tools-extra] [libcxx] [compiler-rt] [lld] [lldb] [clang] [llvm] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/74537 >From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Mon, 4 Dec 2023 16:11:53 -0800 Subject: [PATCH 1/8] [AMDGPU] Use alias info to relax waitcounts for LDS DMA LDA DMA loads increase VMCNT and a load from the LDS stored must wait on this counter to only read memory after it is written. Wait count insertion pass does not track memory dependencies, it tracks register dependencies. To model the LDS dependency a psuedo register is used in the scoreboard, acting like if LDS DMA writes it and LDS load reads it. This patch adds 8 more pseudo registers to use for independent LDS locations if we can prove they are disjoint using alias analysis. Fixes: SWDEV-433427 --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 16 +- llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | 73 +- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 4 +- llvm/lib/Target/AMDGPU/SIInstrInfo.h| 8 + llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll| 2 + 6 files changed, 241 insertions(+), 16 deletions(-) create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index a7f4d63229b7ef..2e079404b087fa 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, MachineMemOperand::MOStore | MachineMemOperand::MODereferenceable; - // XXX - Should this be volatile without known ordering? - Info.flags |= MachineMemOperand::MOVolatile; - switch (IntrID) { default: +// XXX - Should this be volatile without known ordering? +Info.flags |= MachineMemOperand::MOVolatile; break; case Intrinsic::amdgcn_raw_buffer_load_lds: case Intrinsic::amdgcn_raw_ptr_buffer_load_lds: @@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: { unsigned Width = cast(CI.getArgOperand(2))->getZExtValue(); Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8); +Info.ptrVal = CI.getArgOperand(1); return true; } } @@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, Info.opc = ISD::INTRINSIC_VOID; unsigned Width = cast(CI.getArgOperand(2))->getZExtValue(); Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8); -Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore | - MachineMemOperand::MOVolatile; +Info.ptrVal = CI.getArgOperand(1); +Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore; return true; } case Intrinsic::amdgcn_ds_bvh_stack_rtn: { @@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo(); MachinePointerInfo StorePtrI = LoadPtrI; -StorePtrI.V = nullptr; +LoadPtrI.V = UndefValue::get( +PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS)); +LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS; StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS; auto F = LoadMMO->getFlags() & @@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo(); LoadPtrI.Offset = Op->getConstantOperandVal(5); MachinePointerInfo StorePtrI = LoadPtrI; +LoadPtrI.V = UndefValue::get( +PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS)); LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS; StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS; auto F = LoadMMO->getFlags() & diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp index ede4841b8a5fd7..50ad22130e939e 100644 --- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp +++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp @@ -31,6 +31,7 @@ #include "llvm/ADT/MapVector.h" #include "llvm/ADT/PostOrderIterator.h" #include "llvm/ADT/Sequence.h" +#include "llvm/Analysis/AliasAnalysis.h" #include "llvm/CodeGen/MachineLoopInfo.h" #include "llvm/CodeGen/MachinePostDominators.h" #include "llvm/InitializePasses.h" @@ -121,8 +122,13 @@ enum RegisterMapping { SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets. AGPR_OFFSET = 256, // Maximum programmable ArchVGPRs across all targets. SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets. - NUM_EXTRA_VGPRS = 1,// A reserved slot for DS. - EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes. + NUM_EXTRA_VGPRS = 9,// Reserved slots f
[clang-tools-extra] [lldb] [llvm] [libc] [flang] [lld] [compiler-rt] [libcxx] [clang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/74537 >From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Mon, 4 Dec 2023 16:11:53 -0800 Subject: [PATCH 1/9] [AMDGPU] Use alias info to relax waitcounts for LDS DMA LDA DMA loads increase VMCNT and a load from the LDS stored must wait on this counter to only read memory after it is written. Wait count insertion pass does not track memory dependencies, it tracks register dependencies. To model the LDS dependency a psuedo register is used in the scoreboard, acting like if LDS DMA writes it and LDS load reads it. This patch adds 8 more pseudo registers to use for independent LDS locations if we can prove they are disjoint using alias analysis. Fixes: SWDEV-433427 --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 16 +- llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | 73 +- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 4 +- llvm/lib/Target/AMDGPU/SIInstrInfo.h| 8 + llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll| 2 + 6 files changed, 241 insertions(+), 16 deletions(-) create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index a7f4d63229b7ef..2e079404b087fa 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, MachineMemOperand::MOStore | MachineMemOperand::MODereferenceable; - // XXX - Should this be volatile without known ordering? - Info.flags |= MachineMemOperand::MOVolatile; - switch (IntrID) { default: +// XXX - Should this be volatile without known ordering? +Info.flags |= MachineMemOperand::MOVolatile; break; case Intrinsic::amdgcn_raw_buffer_load_lds: case Intrinsic::amdgcn_raw_ptr_buffer_load_lds: @@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: { unsigned Width = cast(CI.getArgOperand(2))->getZExtValue(); Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8); +Info.ptrVal = CI.getArgOperand(1); return true; } } @@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, Info.opc = ISD::INTRINSIC_VOID; unsigned Width = cast(CI.getArgOperand(2))->getZExtValue(); Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8); -Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore | - MachineMemOperand::MOVolatile; +Info.ptrVal = CI.getArgOperand(1); +Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore; return true; } case Intrinsic::amdgcn_ds_bvh_stack_rtn: { @@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo(); MachinePointerInfo StorePtrI = LoadPtrI; -StorePtrI.V = nullptr; +LoadPtrI.V = UndefValue::get( +PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS)); +LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS; StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS; auto F = LoadMMO->getFlags() & @@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo(); LoadPtrI.Offset = Op->getConstantOperandVal(5); MachinePointerInfo StorePtrI = LoadPtrI; +LoadPtrI.V = UndefValue::get( +PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS)); LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS; StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS; auto F = LoadMMO->getFlags() & diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp index ede4841b8a5fd7..50ad22130e939e 100644 --- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp +++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp @@ -31,6 +31,7 @@ #include "llvm/ADT/MapVector.h" #include "llvm/ADT/PostOrderIterator.h" #include "llvm/ADT/Sequence.h" +#include "llvm/Analysis/AliasAnalysis.h" #include "llvm/CodeGen/MachineLoopInfo.h" #include "llvm/CodeGen/MachinePostDominators.h" #include "llvm/InitializePasses.h" @@ -121,8 +122,13 @@ enum RegisterMapping { SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets. AGPR_OFFSET = 256, // Maximum programmable ArchVGPRs across all targets. SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets. - NUM_EXTRA_VGPRS = 1,// A reserved slot for DS. - EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes. + NUM_EXTRA_VGPRS = 9,// Reserved slots f
[clang-tools-extra] [llvm] [libc] [flang] [compiler-rt] [libcxx] [clang] [AMDGPU] Fix lack of LDS DMA check in the AA handling (PR #75249)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/75249 >From 82606c4447e8aa8edde90ed420f1c48707967695 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Tue, 12 Dec 2023 13:45:47 -0800 Subject: [PATCH 1/3] [AMDGPU] Fix lack of LDS DMA check in the AA handling SIInstrInfo::areMemAccessesTriviallyDisjoint does a DS offset checks, but does not account for LDS DMA instructions. Added these checks. Without it code falls through and returns true which is wrong. As a result mayAlias would always return false for LDS DMA and a regular LDS instruction or 2 LDS DMA instructions. At the moment this is NFCI because we do not use this AA in a context which may touch LDS DMA instructions. This is also unreacheable now because of the ordered memory ref checks just above in the function and LDS DMA is marked as volatile. This volatile marking is removed in PR #75247, therefore I'd submit this check before #75247. --- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 4 ++-- llvm/lib/Target/AMDGPU/SIInstrInfo.h | 8 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp index d4e4526795f3b3..c485eb299d52a3 100644 --- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp +++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp @@ -3656,8 +3656,8 @@ bool SIInstrInfo::areMemAccessesTriviallyDisjoint(const MachineInstr &MIa, // underlying address space, even if it was lowered to a different one, // e.g. private accesses lowered to use MUBUF instructions on a scratch // buffer. - if (isDS(MIa)) { -if (isDS(MIb)) + if (isDS(MIa) || isLDSDMA(MIa)) { +if (isDS(MIb) || isLDSDMA(MIb)) return checkInstOffsetsDoNotOverlap(MIa, MIb); return !isFLAT(MIb) || isSegmentSpecificFLAT(MIb); diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h b/llvm/lib/Target/AMDGPU/SIInstrInfo.h index e794d8cf7cc220..97800bda775cda 100644 --- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h +++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h @@ -546,6 +546,14 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo { return get(Opcode).TSFlags & SIInstrFlags::DS; } + static bool isLDSDMA(const MachineInstr &MI) { +return isVALU(MI) && (isMUBUF(MI) || isFLAT(MI)); + } + + bool isLDSDMA(uint16_t Opcode) { +return isVALU(Opcode) && (isMUBUF(Opcode) || isFLAT(Opcode)); + } + static bool isGWS(const MachineInstr &MI) { return MI.getDesc().TSFlags & SIInstrFlags::GWS; } >From d8d9f3aab2d2fff2911a99d096685e78faf3d917 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Wed, 13 Dec 2023 11:42:10 -0800 Subject: [PATCH 2/3] Bail early in areMemAccessesTriviallyDisjoint --- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp index 57eaefd41b2622..31669764144530 100644 --- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp +++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp @@ -3651,6 +3651,9 @@ bool SIInstrInfo::areMemAccessesTriviallyDisjoint(const MachineInstr &MIa, if (MIa.hasOrderedMemoryRef() || MIb.hasOrderedMemoryRef()) return false; + if (isLDSDMA(MIa) || isLDSDMA(MIb)) +return false; + // TODO: Should we check the address space from the MachineMemOperand? That // would allow us to distinguish objects we know don't alias based on the // underlying address space, even if it was lowered to a different one, >From 609be418b81f6ce8c9b323f60636af01f862a994 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Wed, 13 Dec 2023 11:45:50 -0800 Subject: [PATCH 3/3] Remove old code --- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp index 31669764144530..d05d3c6996261f 100644 --- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp +++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp @@ -3659,8 +3659,8 @@ bool SIInstrInfo::areMemAccessesTriviallyDisjoint(const MachineInstr &MIa, // underlying address space, even if it was lowered to a different one, // e.g. private accesses lowered to use MUBUF instructions on a scratch // buffer. - if (isDS(MIa) || isLDSDMA(MIa)) { -if (isDS(MIb) || isLDSDMA(MIb)) + if (isDS(MIa)) { +if (isDS(MIb)) return checkInstOffsetsDoNotOverlap(MIa, MIb); return !isFLAT(MIb) || isSegmentSpecificFLAT(MIb); ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [llvm] [libc] [flang] [compiler-rt] [libcxx] [clang] [AMDGPU] Fix lack of LDS DMA check in the AA handling (PR #75249)
@@ -3656,8 +3656,8 @@ bool SIInstrInfo::areMemAccessesTriviallyDisjoint(const MachineInstr &MIa, // underlying address space, even if it was lowered to a different one, // e.g. private accesses lowered to use MUBUF instructions on a scratch // buffer. - if (isDS(MIa)) { -if (isDS(MIb)) + if (isDS(MIa) || isLDSDMA(MIa)) { +if (isDS(MIb) || isLDSDMA(MIb)) return checkInstOffsetsDoNotOverlap(MIa, MIb); rampitec wrote: Just bail early. https://github.com/llvm/llvm-project/pull/75249 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[libc] [compiler-rt] [clang-tools-extra] [clang] [llvm] [flang] [libcxx] [AMDGPU] Fix lack of LDS DMA check in the AA handling (PR #75249)
rampitec wrote: Ping. This one seems obvious to me. https://github.com/llvm/llvm-project/pull/75249 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[libc] [llvm] [libcxx] [clang-tools-extra] [clang] [compiler-rt] [flang] [AMDGPU] Fix lack of LDS DMA check in the AA handling (PR #75249)
https://github.com/rampitec closed https://github.com/llvm/llvm-project/pull/75249 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[compiler-rt] [clang-tools-extra] [libcxx] [libc] [clang] [llvm] [flang] [AMDGPU] Produce better memoperand for LDS DMA (PR #75247)
https://github.com/rampitec closed https://github.com/llvm/llvm-project/pull/75247 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [libcxx] [lldb] [clang-tools-extra] [libc] [compiler-rt] [flang] [lld] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/74537 >From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Mon, 4 Dec 2023 16:11:53 -0800 Subject: [PATCH 1/9] [AMDGPU] Use alias info to relax waitcounts for LDS DMA LDA DMA loads increase VMCNT and a load from the LDS stored must wait on this counter to only read memory after it is written. Wait count insertion pass does not track memory dependencies, it tracks register dependencies. To model the LDS dependency a psuedo register is used in the scoreboard, acting like if LDS DMA writes it and LDS load reads it. This patch adds 8 more pseudo registers to use for independent LDS locations if we can prove they are disjoint using alias analysis. Fixes: SWDEV-433427 --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 16 +- llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | 73 +- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 4 +- llvm/lib/Target/AMDGPU/SIInstrInfo.h| 8 + llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll| 2 + 6 files changed, 241 insertions(+), 16 deletions(-) create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index a7f4d63229b7ef..2e079404b087fa 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, MachineMemOperand::MOStore | MachineMemOperand::MODereferenceable; - // XXX - Should this be volatile without known ordering? - Info.flags |= MachineMemOperand::MOVolatile; - switch (IntrID) { default: +// XXX - Should this be volatile without known ordering? +Info.flags |= MachineMemOperand::MOVolatile; break; case Intrinsic::amdgcn_raw_buffer_load_lds: case Intrinsic::amdgcn_raw_ptr_buffer_load_lds: @@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: { unsigned Width = cast(CI.getArgOperand(2))->getZExtValue(); Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8); +Info.ptrVal = CI.getArgOperand(1); return true; } } @@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, Info.opc = ISD::INTRINSIC_VOID; unsigned Width = cast(CI.getArgOperand(2))->getZExtValue(); Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8); -Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore | - MachineMemOperand::MOVolatile; +Info.ptrVal = CI.getArgOperand(1); +Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore; return true; } case Intrinsic::amdgcn_ds_bvh_stack_rtn: { @@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo(); MachinePointerInfo StorePtrI = LoadPtrI; -StorePtrI.V = nullptr; +LoadPtrI.V = UndefValue::get( +PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS)); +LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS; StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS; auto F = LoadMMO->getFlags() & @@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo(); LoadPtrI.Offset = Op->getConstantOperandVal(5); MachinePointerInfo StorePtrI = LoadPtrI; +LoadPtrI.V = UndefValue::get( +PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS)); LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS; StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS; auto F = LoadMMO->getFlags() & diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp index ede4841b8a5fd7..50ad22130e939e 100644 --- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp +++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp @@ -31,6 +31,7 @@ #include "llvm/ADT/MapVector.h" #include "llvm/ADT/PostOrderIterator.h" #include "llvm/ADT/Sequence.h" +#include "llvm/Analysis/AliasAnalysis.h" #include "llvm/CodeGen/MachineLoopInfo.h" #include "llvm/CodeGen/MachinePostDominators.h" #include "llvm/InitializePasses.h" @@ -121,8 +122,13 @@ enum RegisterMapping { SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets. AGPR_OFFSET = 256, // Maximum programmable ArchVGPRs across all targets. SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets. - NUM_EXTRA_VGPRS = 1,// A reserved slot for DS. - EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes. + NUM_EXTRA_VGPRS = 9,// Reserved slots f
[clang] [libcxx] [compiler-rt] [lldb] [libc] [llvm] [lld] [flang] [clang-tools-extra] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
rampitec wrote: All split off parts were merged and this patch is merged with main. Only waitcount insertion pass changes remained here. https://github.com/llvm/llvm-project/pull/74537 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[compiler-rt] [libcxx] [flang] [libc] [lldb] [lld] [clang] [clang-tools-extra] [llvm] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
rampitec wrote: > How does this work in a case like this? > > ``` > call void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr addrspace(3) > @lds.3, i32 4, i32 0, i32 0, i32 0, i32 0) > call void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr addrspace(3) > %ptr, i32 4, i32 0, i32 0, i32 0, i32 0) > %val.3 = load float, ptr addrspace(3) @lds.3, align 4 > ``` > > i.e. > > * store to known lds address `@lds.3` (this will use slot 0 and another > slot e.g. slot 3?) > > * store to unknown lds address (this will use slot 0?) > > * load from known lds address `@lds.3` (this will use slot 3?) It does not know the pointer, so it uses default slot 0 and waits till 0. I have to tell anyone interested here: before I even wrote this code it didn't know of the dependency and did not wait for anything at all. Everyone was happy. https://github.com/llvm/llvm-project/pull/74537 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] [compiler-rt] [llvm] [libcxx] [lldb] [lld] [libc] [flang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
rampitec wrote: > Test case: > > ``` > @lds.0 = internal addrspace(3) global [64 x float] poison, align 16 > @lds.1 = internal addrspace(3) global [64 x float] poison, align 16 > > declare void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr > addrspace(3) nocapture, i32 %size, i32 %voffset, i32 %soffset, i32 %offset, > i32 %aux) > > define amdgpu_kernel void @f(<4 x i32> %rsrc, i32 %i1, i32 %i2, ptr > addrspace(1) %out, ptr addrspace(3) %ptr) { > main_body: > call void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr > addrspace(3) @lds.0, i32 4, i32 0, i32 0, i32 0, i32 0) > call void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr > addrspace(3) %ptr, i32 4, i32 0, i32 0, i32 0, i32 0) > %gep.0 = getelementptr float, ptr addrspace(3) @lds.0, i32 %i1 > %gep.1 = getelementptr float, ptr addrspace(3) @lds.1, i32 %i2 > %val.0 = load volatile float, ptr addrspace(3) %gep.0, align 4 > %val.1 = load volatile float, ptr addrspace(3) %gep.1, align 4 > %out.gep.1 = getelementptr float, ptr addrspace(1) %out, i32 1 > store float %val.0, ptr addrspace(1) %out > store float %val.1, ptr addrspace(1) %out.gep.1 > ret void > } > ``` > > Generates: > > ``` > s_load_dwordx8 s[4:11], s[0:1], 0x24 > s_load_dword s2, s[0:1], 0x44 > s_mov_b32 m0, 0 > v_mov_b32_e32 v2, 0 > s_waitcnt lgkmcnt(0) > buffer_load_dword off, s[4:7], 0 lds > s_mov_b32 m0, s2 > s_lshl_b32 s0, s8, 2 > buffer_load_dword off, s[4:7], 0 lds > s_lshl_b32 s1, s9, 2 > v_mov_b32_e32 v0, s0 > v_mov_b32_e32 v1, s1 > s_waitcnt vmcnt(1) > ds_read_b32 v0, v0 > s_waitcnt vmcnt(0) > ds_read_b32 v1, v1 offset:256 > s_waitcnt lgkmcnt(0) > global_store_dwordx2 v2, v[0:1], s[10:11] > s_endpgm > ``` > > The `s_waitcnt vmcnt(1)` seems incorrect, because the second > buffer-load-to-lds might clobber `@lds.0`. This is still correct, pointer argument cannot alias module global. A pointer argument to a kernel is an LDS external requested by the host side, and host cannot see module LDS. https://github.com/llvm/llvm-project/pull/74537 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] [compiler-rt] [llvm] [libcxx] [lldb] [lld] [libc] [flang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
rampitec wrote: > This is still correct, pointer argument cannot alias module global. A pointer > argument to a kernel is an LDS external requested by the host side, and host > cannot see module LDS. I.e. that is really the point of the patch: if we are able to definitively identify an LDS object targeted by both load and store we only wait on that store or stores. And the only way to definitively identify the object at this stage is via alias.scope info which we are generating ourselves during module LDS lowering. https://github.com/llvm/llvm-project/pull/74537 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [libc] [lldb] [lld] [llvm] [compiler-rt] [libcxx] [flang] [clang-tools-extra] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/74537 >From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Mon, 4 Dec 2023 16:11:53 -0800 Subject: [PATCH 01/10] [AMDGPU] Use alias info to relax waitcounts for LDS DMA LDA DMA loads increase VMCNT and a load from the LDS stored must wait on this counter to only read memory after it is written. Wait count insertion pass does not track memory dependencies, it tracks register dependencies. To model the LDS dependency a psuedo register is used in the scoreboard, acting like if LDS DMA writes it and LDS load reads it. This patch adds 8 more pseudo registers to use for independent LDS locations if we can prove they are disjoint using alias analysis. Fixes: SWDEV-433427 --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 16 +- llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | 73 +- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 4 +- llvm/lib/Target/AMDGPU/SIInstrInfo.h| 8 + llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll| 2 + 6 files changed, 241 insertions(+), 16 deletions(-) create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index a7f4d63229b7ef..2e079404b087fa 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, MachineMemOperand::MOStore | MachineMemOperand::MODereferenceable; - // XXX - Should this be volatile without known ordering? - Info.flags |= MachineMemOperand::MOVolatile; - switch (IntrID) { default: +// XXX - Should this be volatile without known ordering? +Info.flags |= MachineMemOperand::MOVolatile; break; case Intrinsic::amdgcn_raw_buffer_load_lds: case Intrinsic::amdgcn_raw_ptr_buffer_load_lds: @@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: { unsigned Width = cast(CI.getArgOperand(2))->getZExtValue(); Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8); +Info.ptrVal = CI.getArgOperand(1); return true; } } @@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, Info.opc = ISD::INTRINSIC_VOID; unsigned Width = cast(CI.getArgOperand(2))->getZExtValue(); Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8); -Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore | - MachineMemOperand::MOVolatile; +Info.ptrVal = CI.getArgOperand(1); +Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore; return true; } case Intrinsic::amdgcn_ds_bvh_stack_rtn: { @@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo(); MachinePointerInfo StorePtrI = LoadPtrI; -StorePtrI.V = nullptr; +LoadPtrI.V = UndefValue::get( +PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS)); +LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS; StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS; auto F = LoadMMO->getFlags() & @@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo(); LoadPtrI.Offset = Op->getConstantOperandVal(5); MachinePointerInfo StorePtrI = LoadPtrI; +LoadPtrI.V = UndefValue::get( +PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS)); LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS; StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS; auto F = LoadMMO->getFlags() & diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp index ede4841b8a5fd7..50ad22130e939e 100644 --- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp +++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp @@ -31,6 +31,7 @@ #include "llvm/ADT/MapVector.h" #include "llvm/ADT/PostOrderIterator.h" #include "llvm/ADT/Sequence.h" +#include "llvm/Analysis/AliasAnalysis.h" #include "llvm/CodeGen/MachineLoopInfo.h" #include "llvm/CodeGen/MachinePostDominators.h" #include "llvm/InitializePasses.h" @@ -121,8 +122,13 @@ enum RegisterMapping { SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets. AGPR_OFFSET = 256, // Maximum programmable ArchVGPRs across all targets. SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets. - NUM_EXTRA_VGPRS = 1,// A reserved slot for DS. - EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes. + NUM_EXTRA_VGPRS = 9,// Reserved slots
[compiler-rt] [llvm] [libc] [libcxx] [lldb] [clang] [lld] [clang-tools-extra] [flang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
rampitec wrote: > > This is still correct, pointer argument cannot alias module global. A > > pointer argument to a kernel is an LDS external requested by the host side, > > and host cannot see module LDS. > > I.e. that is really the point of the patch: if we are able to definitively > identify an LDS object targeted by both load and store we only wait on that > store or stores. And the only way to definitively identify the object at this > stage is via alias.scope info which we are generating ourselves during module > LDS lowering. I have added a check for the presence of alias scope info just in case we get a rogue AA. The testcase with a pointer argument still produces correct code with vmcnt(1). https://github.com/llvm/llvm-project/pull/74537 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lld] [compiler-rt] [flang] [libc] [libcxx] [llvm] [clang] [lldb] [clang-tools-extra] [AMDGPU] Use alias scope to relax waitcounts for LDS DMA (PR #75974)
https://github.com/rampitec created https://github.com/llvm/llvm-project/pull/75974 LDA DMA loads increase VMCNT and a load from the LDS stored must wait on this counter to only read memory after it is written. Wait count insertion pass does not track memory dependencies, it tracks register dependencies. To model the LDS dependency a pseudo register is used in the scoreboard, acting like if LDS DMA writes it and LDS load reads it. This patch adds 8 more pseudo registers to use for independent LDS locations if we can prove they are disjoint using alias scope info. Fixes: SWDEV-433427 >From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Mon, 4 Dec 2023 16:11:53 -0800 Subject: [PATCH 01/11] [AMDGPU] Use alias info to relax waitcounts for LDS DMA LDA DMA loads increase VMCNT and a load from the LDS stored must wait on this counter to only read memory after it is written. Wait count insertion pass does not track memory dependencies, it tracks register dependencies. To model the LDS dependency a psuedo register is used in the scoreboard, acting like if LDS DMA writes it and LDS load reads it. This patch adds 8 more pseudo registers to use for independent LDS locations if we can prove they are disjoint using alias analysis. Fixes: SWDEV-433427 --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 16 +- llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | 73 +- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 4 +- llvm/lib/Target/AMDGPU/SIInstrInfo.h| 8 + llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll| 2 + 6 files changed, 241 insertions(+), 16 deletions(-) create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index a7f4d63229b7ef..2e079404b087fa 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, MachineMemOperand::MOStore | MachineMemOperand::MODereferenceable; - // XXX - Should this be volatile without known ordering? - Info.flags |= MachineMemOperand::MOVolatile; - switch (IntrID) { default: +// XXX - Should this be volatile without known ordering? +Info.flags |= MachineMemOperand::MOVolatile; break; case Intrinsic::amdgcn_raw_buffer_load_lds: case Intrinsic::amdgcn_raw_ptr_buffer_load_lds: @@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: { unsigned Width = cast(CI.getArgOperand(2))->getZExtValue(); Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8); +Info.ptrVal = CI.getArgOperand(1); return true; } } @@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, Info.opc = ISD::INTRINSIC_VOID; unsigned Width = cast(CI.getArgOperand(2))->getZExtValue(); Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8); -Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore | - MachineMemOperand::MOVolatile; +Info.ptrVal = CI.getArgOperand(1); +Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore; return true; } case Intrinsic::amdgcn_ds_bvh_stack_rtn: { @@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo(); MachinePointerInfo StorePtrI = LoadPtrI; -StorePtrI.V = nullptr; +LoadPtrI.V = UndefValue::get( +PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS)); +LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS; StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS; auto F = LoadMMO->getFlags() & @@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo(); LoadPtrI.Offset = Op->getConstantOperandVal(5); MachinePointerInfo StorePtrI = LoadPtrI; +LoadPtrI.V = UndefValue::get( +PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS)); LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS; StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS; auto F = LoadMMO->getFlags() & diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp index ede4841b8a5fd7..50ad22130e939e 100644 --- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp +++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp @@ -31,6 +31,7 @@ #include "llvm/ADT/MapVector.h" #include "llvm/ADT/PostOrderIterator.h" #include "llvm/ADT/Sequence.h" +#include "llvm/Analysis/AliasAnalysis.h" #include "llvm/CodeGen/MachineLoopInfo.h" #include "llvm/CodeGen/Mac
[lld] [compiler-rt] [flang] [libc] [libcxx] [llvm] [clang] [lldb] [clang-tools-extra] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
rampitec wrote: Actually since I am only using alias scope I can avoid all alias analysis altogether and only compare alias scope. This does not need an analysis pass, calls to mayAlias, and in general simpler code. You can see an alternative PR if you like it more: https://github.com/llvm/llvm-project/pull/75974 https://github.com/llvm/llvm-project/pull/74537 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lld] [compiler-rt] [flang] [libc] [libcxx] [llvm] [clang] [lldb] [clang-tools-extra] [AMDGPU] Use alias scope to relax waitcounts for LDS DMA (PR #75974)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/75974 >From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Mon, 4 Dec 2023 16:11:53 -0800 Subject: [PATCH 01/12] [AMDGPU] Use alias info to relax waitcounts for LDS DMA LDA DMA loads increase VMCNT and a load from the LDS stored must wait on this counter to only read memory after it is written. Wait count insertion pass does not track memory dependencies, it tracks register dependencies. To model the LDS dependency a psuedo register is used in the scoreboard, acting like if LDS DMA writes it and LDS load reads it. This patch adds 8 more pseudo registers to use for independent LDS locations if we can prove they are disjoint using alias analysis. Fixes: SWDEV-433427 --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 16 +- llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | 73 +- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 4 +- llvm/lib/Target/AMDGPU/SIInstrInfo.h| 8 + llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll| 2 + 6 files changed, 241 insertions(+), 16 deletions(-) create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index a7f4d63229b7ef..2e079404b087fa 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, MachineMemOperand::MOStore | MachineMemOperand::MODereferenceable; - // XXX - Should this be volatile without known ordering? - Info.flags |= MachineMemOperand::MOVolatile; - switch (IntrID) { default: +// XXX - Should this be volatile without known ordering? +Info.flags |= MachineMemOperand::MOVolatile; break; case Intrinsic::amdgcn_raw_buffer_load_lds: case Intrinsic::amdgcn_raw_ptr_buffer_load_lds: @@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: { unsigned Width = cast(CI.getArgOperand(2))->getZExtValue(); Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8); +Info.ptrVal = CI.getArgOperand(1); return true; } } @@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, Info.opc = ISD::INTRINSIC_VOID; unsigned Width = cast(CI.getArgOperand(2))->getZExtValue(); Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8); -Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore | - MachineMemOperand::MOVolatile; +Info.ptrVal = CI.getArgOperand(1); +Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore; return true; } case Intrinsic::amdgcn_ds_bvh_stack_rtn: { @@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo(); MachinePointerInfo StorePtrI = LoadPtrI; -StorePtrI.V = nullptr; +LoadPtrI.V = UndefValue::get( +PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS)); +LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS; StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS; auto F = LoadMMO->getFlags() & @@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo(); LoadPtrI.Offset = Op->getConstantOperandVal(5); MachinePointerInfo StorePtrI = LoadPtrI; +LoadPtrI.V = UndefValue::get( +PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS)); LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS; StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS; auto F = LoadMMO->getFlags() & diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp index ede4841b8a5fd7..50ad22130e939e 100644 --- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp +++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp @@ -31,6 +31,7 @@ #include "llvm/ADT/MapVector.h" #include "llvm/ADT/PostOrderIterator.h" #include "llvm/ADT/Sequence.h" +#include "llvm/Analysis/AliasAnalysis.h" #include "llvm/CodeGen/MachineLoopInfo.h" #include "llvm/CodeGen/MachinePostDominators.h" #include "llvm/InitializePasses.h" @@ -121,8 +122,13 @@ enum RegisterMapping { SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets. AGPR_OFFSET = 256, // Maximum programmable ArchVGPRs across all targets. SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets. - NUM_EXTRA_VGPRS = 1,// A reserved slot for DS. - EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes. + NUM_EXTRA_VGPRS = 9,// Reserved slots
[clang] [lldb] [flang] [llvm] [libc] [libcxx] [lld] [clang-tools-extra] [compiler-rt] [AMDGPU] Use alias scope to relax waitcounts for LDS DMA (PR #75974)
rampitec wrote: One thing to note: this alias.scope I am creating myself in the module LDS lowering, so I do exactly know what to expect. And then since there is this module LDS lowering even if any alias scope would be created before (which never happens, much less for an intrinsic call) it is already lost. It is lost along with the memory objects deleted by the lowering. That is the whole point of creating alias.scope metadata during the lowering: we are putting all module LDS into a single structure, so no AA will ever disambiguate it w/o alias scope info. In this situation I am the sole creator of the metadata, instructions carrying it, memory object accessed, and the consumer of this metadata. At -O0 there will be no LDS lowering, but there will be no AA either. I do not see how to exploit it on practice. One other thing to note here: there is also !noalias metadata generated in the very same place. I do not care about this because I am really searching for a store into this memory, which is a scope. When I was writing code to generate this metadata I kept in mind exactly a scenario similar to this. https://github.com/llvm/llvm-project/pull/75974 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lldb] [lld] [flang] [clang-tools-extra] [libcxx] [llvm] [libc] [compiler-rt] [AMDGPU] Use alias scope to relax waitcounts for LDS DMA (PR #75974)
rampitec wrote: This is the place I am creating it: https://reviews.llvm.org/D108315 https://github.com/llvm/llvm-project/pull/75974 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang-tools-extra] [AMDGPU] Fix folding of v2i16/v2f16 splat imms (PR #72709)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/72709 >From 423a0d1d4640680c5db3382ca0652fe85051ad8d Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Fri, 17 Nov 2023 10:52:13 -0800 Subject: [PATCH] [AMDGPU] Fix folding of v2i16/v2f16 splat imms We can use inline constants with packed 16-bit operands, but these should use op_sel. Currently splat of inlinable constants is considered legal, which is not really true if we fail to fold it with op_sel and drop the high half. It may be legal as a literal but not as inline constant, but then usual literal checks must be performed. This patch makes these splat literals illegal but adds additional logic to the operand folding to keep current folds. This logic is somewhat heavy though. This has fixed two bugs: constant bus violation in the fdot2 test and invalid selection of inline constant 1 without op_sel in the udot2 test. --- llvm/lib/Target/AMDGPU/SIFoldOperands.cpp | 135 +++--- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp| 15 +- .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp| 10 ++ llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h | 3 + .../AMDGPU/llvm.amdgcn.fdot2.bf16.bf16.ll | 29 ++-- llvm/test/CodeGen/AMDGPU/llvm.amdgcn.udot2.ll | 4 +- 6 files changed, 128 insertions(+), 68 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp index 0ec0370e21dfc16..709de612d81d4a1 100644 --- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp +++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp @@ -80,6 +80,10 @@ class SIFoldOperands : public MachineFunctionPass { bool updateOperand(FoldCandidate &Fold) const; + bool canUseImmWithOpSel(FoldCandidate &Fold) const; + + bool tryFoldImmWithOpSel(FoldCandidate &Fold) const; + bool tryAddToFoldList(SmallVectorImpl &FoldList, MachineInstr *MI, unsigned OpNo, MachineOperand *OpToFold) const; @@ -196,60 +200,85 @@ FunctionPass *llvm::createSIFoldOperandsPass() { return new SIFoldOperands(); } -bool SIFoldOperands::updateOperand(FoldCandidate &Fold) const { +bool SIFoldOperands::canUseImmWithOpSel(FoldCandidate &Fold) const { MachineInstr *MI = Fold.UseMI; MachineOperand &Old = MI->getOperand(Fold.UseOpNo); - assert(Old.isReg()); + const uint64_t TSFlags = MI->getDesc().TSFlags; + assert(Old.isReg() && Fold.isImm()); - const uint64_t TSFlags = MI->getDesc().TSFlags; - if (Fold.isImm()) { -if (TSFlags & SIInstrFlags::IsPacked && !(TSFlags & SIInstrFlags::IsMAI) && -(!ST->hasDOTOpSelHazard() || !(TSFlags & SIInstrFlags::IsDOT)) && -AMDGPU::isFoldableLiteralV216(Fold.ImmToFold, - ST->hasInv2PiInlineImm())) { - // Set op_sel/op_sel_hi on this operand or bail out if op_sel is - // already set. - unsigned Opcode = MI->getOpcode(); - int OpNo = MI->getOperandNo(&Old); - int ModIdx = -1; - if (OpNo == AMDGPU::getNamedOperandIdx(Opcode, AMDGPU::OpName::src0)) -ModIdx = AMDGPU::OpName::src0_modifiers; - else if (OpNo == AMDGPU::getNamedOperandIdx(Opcode, AMDGPU::OpName::src1)) -ModIdx = AMDGPU::OpName::src1_modifiers; - else if (OpNo == AMDGPU::getNamedOperandIdx(Opcode, AMDGPU::OpName::src2)) -ModIdx = AMDGPU::OpName::src2_modifiers; - assert(ModIdx != -1); - ModIdx = AMDGPU::getNamedOperandIdx(Opcode, ModIdx); - MachineOperand &Mod = MI->getOperand(ModIdx); - unsigned Val = Mod.getImm(); - if (!(Val & SISrcMods::OP_SEL_0) && (Val & SISrcMods::OP_SEL_1)) { -// Only apply the following transformation if that operand requires -// a packed immediate. -switch (TII->get(Opcode).operands()[OpNo].OperandType) { -case AMDGPU::OPERAND_REG_IMM_V2FP16: -case AMDGPU::OPERAND_REG_IMM_V2INT16: -case AMDGPU::OPERAND_REG_INLINE_C_V2FP16: -case AMDGPU::OPERAND_REG_INLINE_C_V2INT16: - // If upper part is all zero we do not need op_sel_hi. - if (!isUInt<16>(Fold.ImmToFold)) { -if (!(Fold.ImmToFold & 0x)) { - Mod.setImm(Mod.getImm() | SISrcMods::OP_SEL_0); - Mod.setImm(Mod.getImm() & ~SISrcMods::OP_SEL_1); - Old.ChangeToImmediate((Fold.ImmToFold >> 16) & 0x); - return true; -} -Mod.setImm(Mod.getImm() & ~SISrcMods::OP_SEL_1); -Old.ChangeToImmediate(Fold.ImmToFold & 0x); -return true; - } - break; -default: - break; -} - } -} + if (!(TSFlags & SIInstrFlags::IsPacked) || (TSFlags & SIInstrFlags::IsMAI) || + (ST->hasDOTOpSelHazard() && (TSFlags & SIInstrFlags::IsDOT)) || + isUInt<16>(Fold.ImmToFold) || + !AMDGPU::isFoldableLiteralV216(Fold.ImmToFold, ST->hasInv2PiInlineImm())) +return false; + + unsigned Opcode = MI->getOpcode(); + int OpNo = MI-
[clang] [llvm] [clang-tools-extra] [AMDGPU] Fix folding of v2i16/v2f16 splat imms (PR #72709)
https://github.com/rampitec edited https://github.com/llvm/llvm-project/pull/72709 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [clang-tools-extra] [AMDGPU] Fix folding of v2i16/v2f16 splat imms (PR #72709)
https://github.com/rampitec closed https://github.com/llvm/llvm-project/pull/72709 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang-tools-extra] [AMDGPU] Fix folding of v2i16/v2f16 splat imms (PR #72709)
rampitec wrote: After some digging I believe with this bug fixed we are fine now. Since we are passing all bf16 inputs as i16 we can only inline small integers, and inline integer 1 shall be the same as using 1 in an input register I believe. Although we are missing a potential optimization, say we could fold 'i16 0x3f80' as inline constant 1.0, and a pair of these as 1.0 with opsel should we know this is really a bf16 operand. https://github.com/llvm/llvm-project/pull/72709 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lld] [clang] [flang] [clang-tools-extra] [llvm] [lldb] [libc] [compiler-rt] [libcxx] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
rampitec wrote: Ping https://github.com/llvm/llvm-project/pull/74537 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[libcxx] [flang] [libc] [clang-tools-extra] [lldb] [lld] [compiler-rt] [clang] [llvm] [AMDGPU] Use alias scope to relax waitcounts for LDS DMA (PR #75974)
rampitec wrote: Ping https://github.com/llvm/llvm-project/pull/75974 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang-tools-extra] [libcxx] [compiler-rt] [lld] [clang] [libc] [flang] [lldb] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
@@ -703,8 +713,37 @@ void WaitcntBrackets::updateByEvent(const SIInstrInfo *TII, setRegScore(RegNo, T, CurrScore); } } -if (Inst.mayStore() && (TII->isDS(Inst) || mayWriteLDSThroughDMA(Inst))) { - setRegScore(SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS, T, CurrScore); +if (Inst.mayStore() && +(TII->isDS(Inst) || TII->mayWriteLDSThroughDMA(Inst))) { + // MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS + // written can be accessed. A load from LDS to VMEM does not need a wait. + unsigned Slot = 0; + for (const auto *MemOp : Inst.memoperands()) { +if (!MemOp->isStore() || +MemOp->getAddrSpace() != AMDGPUAS::LOCAL_ADDRESS) + continue; +// Comparing just AA info does not guarantee memoperands are equal rampitec wrote: > PseudoSourceValue::mayAlias is supposed to report aliasing to possible IR > values. It looks like it's layered weirdly, and expects you to go through > MachineInstr::mayAlias. MachineInstr::mayAlias ought to be using the AA tags, > it shouldn't be a fundamental limitation This is all PSV::mayAlias() does: ``` bool PseudoSourceValue::mayAlias(const MachineFrameInfo *) const { return !(isGOT() || isConstantPool() || isJumpTable()); } ``` No very useful. Then even to get to the AA tags check MI:mayAlias() shall go through all IR values' checks first. https://github.com/llvm/llvm-project/pull/74537 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang-tools-extra] [libcxx] [compiler-rt] [lld] [clang] [libc] [flang] [lldb] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
@@ -703,8 +713,37 @@ void WaitcntBrackets::updateByEvent(const SIInstrInfo *TII, setRegScore(RegNo, T, CurrScore); } } -if (Inst.mayStore() && (TII->isDS(Inst) || mayWriteLDSThroughDMA(Inst))) { - setRegScore(SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS, T, CurrScore); +if (Inst.mayStore() && +(TII->isDS(Inst) || TII->mayWriteLDSThroughDMA(Inst))) { + // MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS + // written can be accessed. A load from LDS to VMEM does not need a wait. + unsigned Slot = 0; + for (const auto *MemOp : Inst.memoperands()) { +if (!MemOp->isStore() || +MemOp->getAddrSpace() != AMDGPUAS::LOCAL_ADDRESS) + continue; +// Comparing just AA info does not guarantee memoperands are equal rampitec wrote: > It looks to me like it does use it if you pass UseTBAA=true. Not sure why > this would be a parameter in the first place I am passing it, but to get to that check it shall first go through all Value and offset checks. Using AA is the last thing it does: https://llvm.org/doxygen/MachineInstr_8cpp_source.html#l01285 https://github.com/llvm/llvm-project/pull/74537 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang-tools-extra] [libcxx] [compiler-rt] [lld] [clang] [libc] [flang] [lldb] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
@@ -703,8 +713,37 @@ void WaitcntBrackets::updateByEvent(const SIInstrInfo *TII, setRegScore(RegNo, T, CurrScore); } } -if (Inst.mayStore() && (TII->isDS(Inst) || mayWriteLDSThroughDMA(Inst))) { - setRegScore(SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS, T, CurrScore); +if (Inst.mayStore() && +(TII->isDS(Inst) || TII->mayWriteLDSThroughDMA(Inst))) { + // MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS + // written can be accessed. A load from LDS to VMEM does not need a wait. + unsigned Slot = 0; + for (const auto *MemOp : Inst.memoperands()) { +if (!MemOp->isStore() || +MemOp->getAddrSpace() != AMDGPUAS::LOCAL_ADDRESS) + continue; +// Comparing just AA info does not guarantee memoperands are equal rampitec wrote: > The values don't need to be identical, that's the point of the AA query. > BasicAA will parse through the offsets I also think that values don't need to be identical. But that is what MI:mayAlias() does *before* it checks AA: https://llvm.org/doxygen/MachineInstr_8cpp_source.html#l01285 https://github.com/llvm/llvm-project/pull/74537 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AMDGPU] Add global_load_tr for GFX12 (PR #77772)
https://github.com/rampitec approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/2 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] 27439a7 - [AMDGPU] New gfx940 mfma instructions
Author: Stanislav Mekhanoshin Date: 2022-03-24T12:12:52-07:00 New Revision: 27439a764230e5eb54568b2fc053a20c9005970f URL: https://github.com/llvm/llvm-project/commit/27439a764230e5eb54568b2fc053a20c9005970f DIFF: https://github.com/llvm/llvm-project/commit/27439a764230e5eb54568b2fc053a20c9005970f.diff LOG: [AMDGPU] New gfx940 mfma instructions Differential Revision: https://reviews.llvm.org/D122044 Added: clang/test/SemaOpenCL/builtins-amdgcn-error-gfx940-param.cl llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.mfma.gfx940.mir llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx940.ll llvm/test/CodeGen/AMDGPU/mfma-vgpr-cd-select-gfx940.ll Modified: clang/include/clang/Basic/BuiltinsAMDGPU.def clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl llvm/include/llvm/IR/IntrinsicsAMDGPU.td llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td llvm/lib/Target/AMDGPU/SIInstrInfo.td llvm/lib/Target/AMDGPU/SISchedule.td llvm/lib/Target/AMDGPU/VOP3PInstructions.td llvm/test/MC/AMDGPU/mai-gfx940.s llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt Removed: diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index d2e60f85b9feb..3870b1cca6caa 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -305,5 +305,10 @@ TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_16x16x16bf16_1k, "V4fV4sV4sV4fIiIiIi", TARGET_BUILTIN(__builtin_amdgcn_mfma_f64_16x16x4f64, "V4dddV4dIiIiIi", "nc", "mai-insts") TARGET_BUILTIN(__builtin_amdgcn_mfma_f64_4x4x4f64, "IiIiIi", "nc", "mai-insts") +TARGET_BUILTIN(__builtin_amdgcn_mfma_i32_16x16x32_i8, "V4iWiWiV4iIiIiIi", "nc", "mai-insts") +TARGET_BUILTIN(__builtin_amdgcn_mfma_i32_32x32x16_i8, "V16iWiWiV16iIiIiIi", "nc", "mai-insts") +TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_16x16x8_xf32, "V4fV2fV2fV4fIiIiIi", "nc", "mai-insts") +TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_32x32x4_xf32, "V16fV2fV2fV16fIiIiIi", "nc", "mai-insts") + #undef BUILTIN #undef TARGET_BUILTIN diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl index 19ac40fe41605..fc29faf9ad1c5 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl @@ -1,9 +1,11 @@ // REQUIRES: amdgpu-registered-target // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx908 -DMFMA_GFX908_TESTS -S -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK-GFX908 // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx90a -DMFMA_GFX90A_TESTS -S -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK-GFX90A +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx940 -DMFMA_GFX940_TESTS -S -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK-GFX940 #pragma OPENCL EXTENSION cl_khr_fp64:enable +typedef float v2f __attribute__((ext_vector_type(2))); typedef float v4f __attribute__((ext_vector_type(4))); typedef float v16f __attribute__((ext_vector_type(16))); typedef float v32f __attribute__((ext_vector_type(32))); @@ -216,3 +218,33 @@ void test_mfma_f64_4x4x4f64(global double* out, double a, double b, double c) } #endif // MFMA_GFX90A_TESTS + +#ifdef MFMA_GFX940_TESTS +// CHECK-GFX940-LABEL: @test_mfma_i32_16x16x32_i8 +// CHECK-GFX940: call <4 x i32> @llvm.amdgcn.mfma.i32.16x16x32.i8(i64 %a, i64 %b, <4 x i32> %c, i32 0, i32 0, i32 0) +void test_mfma_i32_16x16x32_i8(global v4i* out, long a, long b, v4i c) +{ + *out = __builtin_amdgcn_mfma_i32_16x16x32_i8(a, b, c, 0, 0, 0); +} + +// CHECK-GFX940-LABEL: @test_mfma_i32_32x32x16_i8 +// CHECK-GFX940: call <16 x i32> @llvm.amdgcn.mfma.i32.32x32x16.i8(i64 %a, i64 %b, <16 x i32> %c, i32 0, i32 0, i32 0) +void test_mfma_i32_32x32x16_i8(global v16i* out, long a, long b, v16i c) +{ + *out = __builtin_amdgcn_mfma_i32_32x32x16_i8(a, b, c, 0, 0, 0); +} + +// CHECK-GFX940-LABEL: @test_mfma_f32_16x16x8_xf32 +// CHECK-GFX940: call <4 x float> @llvm.amdgcn.mfma.f32.16x16x8.xf32(<2 x float> %a, <2 x float> %b, <4 x float> %c, i32 0, i32 0, i32 0) +void test_mfma_f32_16x16x8_xf32(global v4f* out, v2f a, v2f b, v4f c) +{ + *out = __builtin_amdgcn_mfma_f32_16x16x8_xf32(a, b, c, 0, 0, 0); +} + +// CHECK-GFX940-LABEL: @test_mfma_f32_32x32x4_xf32 +// CHECK-GFX940: call <16 x float> @llvm.amdgcn.mfma.f32.32x32x4.xf32(<2 x float> %a, <2 x float> %b, <16 x float> %c, i32 0, i32 0, i32 0) +void test_mfma_f32_32x32x4_xf32(global v16f* out, v2f a, v2f b, v16f c) +{ + *out = __builtin_amdgcn_mfma_f32_32x32x4_xf32(a, b, c, 0, 0, 0); +} +#endif // MFMA_GFX940_TESTS diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx940-param.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx940-param.cl new file mode 100644 index 0..9e50a1
[clang] 6e3e14f - [AMDGPU] Support gfx940 smfmac instructions
Author: Stanislav Mekhanoshin Date: 2022-03-24T12:40:42-07:00 New Revision: 6e3e14f600afa1fa64a699df97c8bbac6d0f8b5a URL: https://github.com/llvm/llvm-project/commit/6e3e14f600afa1fa64a699df97c8bbac6d0f8b5a DIFF: https://github.com/llvm/llvm-project/commit/6e3e14f600afa1fa64a699df97c8bbac6d0f8b5a.diff LOG: [AMDGPU] Support gfx940 smfmac instructions Differential Revision: https://reviews.llvm.org/D122191 Added: Modified: clang/include/clang/Basic/BuiltinsAMDGPU.def clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl clang/test/SemaOpenCL/builtins-amdgcn-error-gfx940-param.cl llvm/include/llvm/IR/IntrinsicsAMDGPU.td llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h llvm/lib/Target/AMDGPU/MCTargetDesc/SIMCCodeEmitter.cpp llvm/lib/Target/AMDGPU/SIISelLowering.cpp llvm/lib/Target/AMDGPU/SIInstrInfo.cpp llvm/lib/Target/AMDGPU/SIInstrInfo.td llvm/lib/Target/AMDGPU/SIRegisterInfo.td llvm/lib/Target/AMDGPU/SISchedule.td llvm/lib/Target/AMDGPU/VOP3Instructions.td llvm/lib/Target/AMDGPU/VOP3PInstructions.td llvm/lib/Target/AMDGPU/VOPInstructions.td llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.mfma.gfx940.mir llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx940.ll llvm/test/CodeGen/AMDGPU/mfma-vgpr-cd-select-gfx940.ll llvm/test/MC/AMDGPU/mai-gfx940.s llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt Removed: diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 3870b1cca6caa..afcfa07f6df13 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -309,6 +309,12 @@ TARGET_BUILTIN(__builtin_amdgcn_mfma_i32_16x16x32_i8, "V4iWiWiV4iIiIiIi", "nc", TARGET_BUILTIN(__builtin_amdgcn_mfma_i32_32x32x16_i8, "V16iWiWiV16iIiIiIi", "nc", "mai-insts") TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_16x16x8_xf32, "V4fV2fV2fV4fIiIiIi", "nc", "mai-insts") TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_32x32x4_xf32, "V16fV2fV2fV16fIiIiIi", "nc", "mai-insts") +TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x32_f16, "V4fV4hV8hV4fiIiIi", "nc", "mai-insts") +TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x16_f16, "V16fV4hV8hV16fiIiIi", "nc", "mai-insts") +TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x32_bf16, "V4fV4sV8sV4fiIiIi", "nc", "mai-insts") +TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x16_bf16, "V16fV4sV8sV16fiIiIi", "nc", "mai-insts") +TARGET_BUILTIN(__builtin_amdgcn_smfmac_i32_16x16x64_i8, "V4iV2iV4iV4iiIiIi", "nc", "mai-insts") +TARGET_BUILTIN(__builtin_amdgcn_smfmac_i32_32x32x32_i8, "V16iV2iV4iV16iiIiIi", "nc", "mai-insts") #undef BUILTIN #undef TARGET_BUILTIN diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl index fc29faf9ad1c5..8e3cc7e382e90 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl @@ -10,13 +10,16 @@ typedef float v4f __attribute__((ext_vector_type(4))); typedef float v16f __attribute__((ext_vector_type(16))); typedef float v32f __attribute__((ext_vector_type(32))); typedef half v4h __attribute__((ext_vector_type(4))); +typedef half v8h __attribute__((ext_vector_type(8))); typedef half v16h __attribute__((ext_vector_type(16))); typedef half v32h __attribute__((ext_vector_type(32))); +typedef intv2i __attribute__((ext_vector_type(2))); typedef intv4i __attribute__((ext_vector_type(4))); typedef intv16i __attribute__((ext_vector_type(16))); typedef intv32i __attribute__((ext_vector_type(32))); typedef short v2s __attribute__((ext_vector_type(2))); typedef short v4s __attribute__((ext_vector_type(4))); +typedef short v8s __attribute__((ext_vector_type(8))); typedef short v16s __attribute__((ext_vector_type(16))); typedef short v32s __attribute__((ext_vector_type(32))); typedef double v4d __attribute__((ext_vector_type(4))); @@ -247,4 +250,46 @@ void test_mfma_f32_32x32x4_xf32(global v16f* out, v2f a, v2f b, v16f c) { *out = __builtin_amdgcn_mfma_f32_32x32x4_xf32(a, b, c, 0, 0, 0); } + +// CHECK-GFX940-LABEL: @test_smfmac_f32_16x16x32_f16 +// CHECK-GFX940: call <4 x float> @llvm.amdgcn.smfmac.f32.16x16x32.f16(<4 x half> %a, <8 x half> %b, <4 x float> %c, i32 %idx, i32 0, i32 0) +void test_smfmac_f32_16x16x32_f16(global v4f* out, v4h a, v8h b, v4f c, int idx) +{ + *out = __builtin_amdgcn_smfmac_f32_16x16x32_f16(a, b, c, idx, 0, 0); +} + +// CHECK-GFX940-LABEL: @test_smfmac_f32_32x32x16_f16 +// CHECK-GFX940: call <16 x float>
[clang] b0aa194 - [AMDGPU] Promote recursive loads from kernel argument to constant
Author: Stanislav Mekhanoshin Date: 2022-02-17T11:07:03-08:00 New Revision: b0aa1946dfe1d204e49b8238c4960f64a68f31d5 URL: https://github.com/llvm/llvm-project/commit/b0aa1946dfe1d204e49b8238c4960f64a68f31d5 DIFF: https://github.com/llvm/llvm-project/commit/b0aa1946dfe1d204e49b8238c4960f64a68f31d5.diff LOG: [AMDGPU] Promote recursive loads from kernel argument to constant Not clobbered pointer load chains are promoted to global now. That is possible to promote these loads itself into constant address space. Loaded pointers still need to point to global because we need to be able to store into that pointer and because an actual load from it may occur after a clobber. Differential Revision: https://reviews.llvm.org/D119886 Added: Modified: clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu llvm/lib/Target/AMDGPU/AMDGPUPromoteKernelArguments.cpp llvm/test/CodeGen/AMDGPU/promote-kernel-arguments.ll Removed: diff --git a/clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu b/clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu index d483803005074..01e0d3db46127 100644 --- a/clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu +++ b/clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu @@ -18,7 +18,7 @@ // COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel1Pi(i32 addrspace(1)*{{.*}} %x.coerce) // CHECK: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]* // CHECK-NOT: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]* -// OPT: [[VAL:%.*]] = load i32, i32 addrspace(1)* %x.coerce, align 4 +// OPT: [[VAL:%.*]] = load i32, i32 addrspace(4)* %x.coerce.const, align 4 // OPT: [[INC:%.*]] = add nsw i32 [[VAL]], 1 // OPT: store i32 [[INC]], i32 addrspace(1)* %x.coerce, align 4 // OPT: ret void @@ -30,7 +30,7 @@ __global__ void kernel1(int *x) { // COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel2Ri(i32 addrspace(1)*{{.*}} nonnull align 4 dereferenceable(4) %x.coerce) // CHECK: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]* // CHECK-NOT: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]* -// OPT: [[VAL:%.*]] = load i32, i32 addrspace(1)* %x.coerce, align 4 +// OPT: [[VAL:%.*]] = load i32, i32 addrspace(4)* %x.coerce.const, align 4 // OPT: [[INC:%.*]] = add nsw i32 [[VAL]], 1 // OPT: store i32 [[INC]], i32 addrspace(1)* %x.coerce, align 4 // OPT: ret void @@ -68,7 +68,8 @@ struct S { // OPT: [[R1:%.*]] = getelementptr inbounds %struct.S, %struct.S addrspace(4)* %0, i64 0, i32 1 // OPT: [[P1:%.*]] = load float*, float* addrspace(4)* [[R1]], align 8 // OPT: [[G1:%.*]] ={{.*}} addrspacecast float* [[P1]] to float addrspace(1)* -// OPT: [[V0:%.*]] = load i32, i32 addrspace(1)* [[G0]], align 4 +// OPT: [[G2:%.*]] ={{.*}} addrspacecast i32* [[P0]] to i32 addrspace(4)* +// OPT: [[V0:%.*]] = load i32, i32 addrspace(4)* [[G2]], align 4 // OPT: [[INC:%.*]] = add nsw i32 [[V0]], 1 // OPT: store i32 [[INC]], i32 addrspace(1)* [[G0]], align 4 // OPT: [[V1:%.*]] = load float, float addrspace(1)* [[G1]], align 4 @@ -103,7 +104,8 @@ struct T { // OPT: [[R1:%.*]] = getelementptr inbounds %struct.T, %struct.T addrspace(4)* %0, i64 0, i32 0, i64 1 // OPT: [[P1:%.*]] = load float*, float* addrspace(4)* [[R1]], align 8 // OPT: [[G1:%.*]] ={{.*}} addrspacecast float* [[P1]] to float addrspace(1)* -// OPT: [[V0:%.*]] = load float, float addrspace(1)* [[G0]], align 4 +// OPT: [[G2:%.*]] ={{.*}} addrspacecast float* [[P0]] to float addrspace(4)* +// OPT: [[V0:%.*]] = load float, float addrspace(4)* [[G2]], align 4 // OPT: [[ADD0:%.*]] = fadd contract float [[V0]], 1.00e+00 // OPT: store float [[ADD0]], float addrspace(1)* [[G0]], align 4 // OPT: [[V1:%.*]] = load float, float addrspace(1)* [[G1]], align 4 @@ -130,7 +132,7 @@ struct SS { // COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel82SS(float addrspace(1)*{{.*}} %a.coerce) // CHECK: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]* // CHECK-NOT: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]* -// OPT: [[VAL:%.*]] = load float, float addrspace(1)* %a.coerce, align 4 +// OPT: [[VAL:%.*]] = load float, float addrspace(4)* %a.coerce.const, align 4 // OPT: [[INC:%.*]] = fadd contract float [[VAL]], 3.00e+00 // OPT: store float [[INC]], float addrspace(1)* %a.coerce, align 4 // OPT: ret void diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPromoteKernelArguments.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPromoteKernelArguments.cpp index b9b48290dd277..65ad8b2aeacd3 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUPromoteKernelArguments.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUPromoteKernelArguments.cpp @@ -42,6 +42,8 @@ class AMDGPUPromoteKernelArguments : public FunctionPass { bool promotePointer(Value *Ptr); + bool promoteLoad(LoadInst *LI); + public: static char ID; @@ -
[clang] 9fa5a6b - [AMDGPU] Support for gfx940 fp8 conversions
Author: Stanislav Mekhanoshin Date: 2022-07-18T11:48:43-07:00 New Revision: 9fa5a6b7e8a292ec91b844a622836d2990ef5796 URL: https://github.com/llvm/llvm-project/commit/9fa5a6b7e8a292ec91b844a622836d2990ef5796 DIFF: https://github.com/llvm/llvm-project/commit/9fa5a6b7e8a292ec91b844a622836d2990ef5796.diff LOG: [AMDGPU] Support for gfx940 fp8 conversions Differential Revision: https://reviews.llvm.org/D129902 Added: clang/test/CodeGenOpenCL/builtins-amdgcn-fp8.cl llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.ll Modified: clang/include/clang/Basic/BuiltinsAMDGPU.def clang/lib/Basic/Targets/AMDGPU.cpp clang/test/CodeGenOpenCL/amdgpu-features.cl llvm/include/llvm/IR/IntrinsicsAMDGPU.td llvm/lib/Target/AMDGPU/AMDGPU.td llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp llvm/lib/Target/AMDGPU/GCNSubtarget.h llvm/lib/Target/AMDGPU/SIInstrInfo.td llvm/lib/Target/AMDGPU/VOP1Instructions.td llvm/lib/Target/AMDGPU/VOP3Instructions.td llvm/test/MC/AMDGPU/gfx940_asm_features.s llvm/test/MC/AMDGPU/gfx940_err.s llvm/test/MC/Disassembler/AMDGPU/gfx940_dasm_features.txt Removed: diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 68bcf546d177c..e9f25d783e596 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -346,5 +346,14 @@ TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x16_bf16, "V16fV4sV8sV16fiIiIi", TARGET_BUILTIN(__builtin_amdgcn_smfmac_i32_16x16x64_i8, "V4iV2iV4iV4iiIiIi", "nc", "mai-insts") TARGET_BUILTIN(__builtin_amdgcn_smfmac_i32_32x32x32_i8, "V16iV2iV4iV16iiIiIi", "nc", "mai-insts") +TARGET_BUILTIN(__builtin_amdgcn_cvt_f32_bf8, "fiIi", "nc", "fp8-insts") +TARGET_BUILTIN(__builtin_amdgcn_cvt_f32_fp8, "fiIi", "nc", "fp8-insts") +TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_f32_bf8, "V2fiIb", "nc", "fp8-insts") +TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_f32_fp8, "V2fiIb", "nc", "fp8-insts") +TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_bf8_f32, "iffiIb", "nc", "fp8-insts") +TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_fp8_f32, "iffiIb", "nc", "fp8-insts") +TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_bf8_f32, "ifiiIi", "nc", "fp8-insts") +TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_fp8_f32, "ifiiIi", "nc", "fp8-insts") + #undef BUILTIN #undef TARGET_BUILTIN diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp index 50256d8e210c9..80f2601b0a245 100644 --- a/clang/lib/Basic/Targets/AMDGPU.cpp +++ b/clang/lib/Basic/Targets/AMDGPU.cpp @@ -250,6 +250,7 @@ bool AMDGPUTargetInfo::initFeatureMap( break; case GK_GFX940: Features["gfx940-insts"] = true; + Features["fp8-insts"] = true; LLVM_FALLTHROUGH; case GK_GFX90A: Features["gfx90a-insts"] = true; diff --git a/clang/test/CodeGenOpenCL/amdgpu-features.cl b/clang/test/CodeGenOpenCL/amdgpu-features.cl index cb3a3eff01f70..ff288e530d17f 100644 --- a/clang/test/CodeGenOpenCL/amdgpu-features.cl +++ b/clang/test/CodeGenOpenCL/amdgpu-features.cl @@ -64,7 +64,7 @@ // GFX909: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst" // GFX90A: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+mai-insts,+s-memrealtime,+s-memtime-inst" // GFX90C: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst" -// GFX940: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst" +// GFX940: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst" // GFX1010: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dpp,+flat-address-space,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst" // GFX1011: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst" // GFX1012: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+flat-address-space,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst" diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-fp8.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-fp8.cl new file mode 100644
[clang] 2695f0a - [AMDGPU] Support for gfx940 fp8 mfma
Author: Stanislav Mekhanoshin Date: 2022-07-18T11:49:56-07:00 New Revision: 2695f0a688e9d26fcb0f3a4b686a2783f2eb145c URL: https://github.com/llvm/llvm-project/commit/2695f0a688e9d26fcb0f3a4b686a2783f2eb145c DIFF: https://github.com/llvm/llvm-project/commit/2695f0a688e9d26fcb0f3a4b686a2783f2eb145c.diff LOG: [AMDGPU] Support for gfx940 fp8 mfma Differential Revision: https://reviews.llvm.org/D129906 Added: Modified: clang/include/clang/Basic/BuiltinsAMDGPU.def clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl clang/test/SemaOpenCL/builtins-amdgcn-error-gfx940-param.cl llvm/include/llvm/IR/IntrinsicsAMDGPU.td llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td llvm/lib/Target/AMDGPU/SIInstrInfo.td llvm/lib/Target/AMDGPU/VOP3PInstructions.td llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx940.ll llvm/test/CodeGen/AMDGPU/mfma-vgpr-cd-select-gfx940.ll llvm/test/MC/AMDGPU/mai-gfx940.s llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt Removed: diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index e9f25d783e596..e992e22ca527a 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -339,6 +339,14 @@ TARGET_BUILTIN(__builtin_amdgcn_mfma_i32_16x16x32_i8, "V4iWiWiV4iIiIiIi", "nc", TARGET_BUILTIN(__builtin_amdgcn_mfma_i32_32x32x16_i8, "V16iWiWiV16iIiIiIi", "nc", "mai-insts") TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_16x16x8_xf32, "V4fV2fV2fV4fIiIiIi", "nc", "mai-insts") TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_32x32x4_xf32, "V16fV2fV2fV16fIiIiIi", "nc", "mai-insts") +TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_16x16x32_bf8_bf8, "V4fWiWiV4fIiIiIi", "nc", "fp8-insts") +TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_16x16x32_bf8_fp8, "V4fWiWiV4fIiIiIi", "nc", "fp8-insts") +TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_16x16x32_fp8_bf8, "V4fWiWiV4fIiIiIi", "nc", "fp8-insts") +TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_16x16x32_fp8_fp8, "V4fWiWiV4fIiIiIi", "nc", "fp8-insts") +TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_32x32x16_bf8_bf8, "V16fWiWiV16fIiIiIi", "nc", "fp8-insts") +TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_32x32x16_bf8_fp8, "V16fWiWiV16fIiIiIi", "nc", "fp8-insts") +TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_32x32x16_fp8_bf8, "V16fWiWiV16fIiIiIi", "nc", "fp8-insts") +TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_32x32x16_fp8_fp8, "V16fWiWiV16fIiIiIi", "nc", "fp8-insts") TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x32_f16, "V4fV4hV8hV4fiIiIi", "nc", "mai-insts") TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x16_f16, "V16fV4hV8hV16fiIiIi", "nc", "mai-insts") TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x32_bf16, "V4fV4sV8sV4fiIiIi", "nc", "mai-insts") diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl index 8e3cc7e382e90..192bb1062381d 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl @@ -251,6 +251,62 @@ void test_mfma_f32_32x32x4_xf32(global v16f* out, v2f a, v2f b, v16f c) *out = __builtin_amdgcn_mfma_f32_32x32x4_xf32(a, b, c, 0, 0, 0); } +// CHECK-GFX940-LABEL: @test_mfma_f32_16x16x32_bf8_bf8 +// CHECK-GFX940: call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.bf8.bf8(i64 %a, i64 %b, <4 x float> %c, i32 0, i32 0, i32 0) +void test_mfma_f32_16x16x32_bf8_bf8(global v4f* out, long a, long b, v4f c) +{ + *out = __builtin_amdgcn_mfma_f32_16x16x32_bf8_bf8(a, b, c, 0, 0, 0); +} + +// CHECK-GFX940-LABEL: @test_mfma_f32_16x16x32_bf8_fp8 +// CHECK-GFX940: call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.bf8.fp8(i64 %a, i64 %b, <4 x float> %c, i32 0, i32 0, i32 0) +void test_mfma_f32_16x16x32_bf8_fp8(global v4f* out, long a, long b, v4f c) +{ + *out = __builtin_amdgcn_mfma_f32_16x16x32_bf8_fp8(a, b, c, 0, 0, 0); +} + +// CHECK-GFX940-LABEL: @test_mfma_f32_16x16x32_fp8_bf8 +// CHECK-GFX940: call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.bf8(i64 %a, i64 %b, <4 x float> %c, i32 0, i32 0, i32 0) +void test_mfma_f32_16x16x32_fp8_bf8(global v4f* out, long a, long b, v4f c) +{ + *out = __builtin_amdgcn_mfma_f32_16x16x32_fp8_bf8(a, b, c, 0, 0, 0); +} + +// CHECK-GFX940-LABEL: @test_mfma_f32_16x16x32_fp8_fp8 +// CHECK-GFX940: call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 %a, i64 %b, <4 x float> %c, i32 0, i32 0, i32 0) +void test_mfma_f32_16x16x32_fp8_fp8(global v4f* out, long a, long b, v4f c) +{ + *out = __builtin_amdgcn_mfma_f32_16x16x32_fp8_fp8(a, b, c, 0, 0, 0); +} + +// CHECK-GFX940-LABEL: @test_mfma_f32_32x32x16_bf8_bf8 +// CHECK-GFX940: call <16 x float> @llvm.amdgcn.mfma.f32.32x32x16.bf8.bf8(i64 %a, i64 %b, <16 x float> %c, i32 0, i32 0, i32 0) +void test_mfma_f32_32x32x16_bf8_bf8(global v16f* out, long a, long b, v16f c) +{ + *out = __builtin_amdgcn_mfma_
[clang] 523a99c - [AMDGPU] Support for gfx940 fp8 smfmac
Author: Stanislav Mekhanoshin Date: 2022-07-18T12:12:41-07:00 New Revision: 523a99c0eb0331680905e9ef6fbdd114f4ee7a47 URL: https://github.com/llvm/llvm-project/commit/523a99c0eb0331680905e9ef6fbdd114f4ee7a47 DIFF: https://github.com/llvm/llvm-project/commit/523a99c0eb0331680905e9ef6fbdd114f4ee7a47.diff LOG: [AMDGPU] Support for gfx940 fp8 smfmac Differential Revision: https://reviews.llvm.org/D129908 Added: Modified: clang/include/clang/Basic/BuiltinsAMDGPU.def clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl clang/test/SemaOpenCL/builtins-amdgcn-error-gfx940-param.cl llvm/include/llvm/IR/IntrinsicsAMDGPU.td llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td llvm/lib/Target/AMDGPU/SIInstrInfo.td llvm/lib/Target/AMDGPU/VOP3PInstructions.td llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx940.ll llvm/test/CodeGen/AMDGPU/mfma-vgpr-cd-select-gfx940.ll llvm/test/MC/AMDGPU/mai-gfx940.s llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt Removed: diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index e992e22ca527..cdf5f5a85418 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -353,6 +353,14 @@ TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x32_bf16, "V4fV4sV8sV4fiIiIi", " TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x16_bf16, "V16fV4sV8sV16fiIiIi", "nc", "mai-insts") TARGET_BUILTIN(__builtin_amdgcn_smfmac_i32_16x16x64_i8, "V4iV2iV4iV4iiIiIi", "nc", "mai-insts") TARGET_BUILTIN(__builtin_amdgcn_smfmac_i32_32x32x32_i8, "V16iV2iV4iV16iiIiIi", "nc", "mai-insts") +TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x64_bf8_bf8, "V4fV2iV4iV4fiIiIi", "nc", "fp8-insts") +TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x64_bf8_fp8, "V4fV2iV4iV4fiIiIi", "nc", "fp8-insts") +TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x64_fp8_bf8, "V4fV2iV4iV4fiIiIi", "nc", "fp8-insts") +TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x64_fp8_fp8, "V4fV2iV4iV4fiIiIi", "nc", "fp8-insts") +TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x32_bf8_bf8, "V16fV2iV4iV16fiIiIi", "nc", "fp8-insts") +TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x32_bf8_fp8, "V16fV2iV4iV16fiIiIi", "nc", "fp8-insts") +TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x32_fp8_bf8, "V16fV2iV4iV16fiIiIi", "nc", "fp8-insts") +TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x32_fp8_fp8, "V16fV2iV4iV16fiIiIi", "nc", "fp8-insts") TARGET_BUILTIN(__builtin_amdgcn_cvt_f32_bf8, "fiIi", "nc", "fp8-insts") TARGET_BUILTIN(__builtin_amdgcn_cvt_f32_fp8, "fiIi", "nc", "fp8-insts") diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl index 192bb1062381..1819ff0a6177 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl @@ -348,4 +348,60 @@ void test_smfmac_i32_32x32x32_i8(global v16i* out, v2i a, v4i b, v16i c, int idx { *out = __builtin_amdgcn_smfmac_i32_32x32x32_i8(a, b, c, idx, 0, 0); } + +// CHECK-GFX940-LABEL: @test_smfmac_f32_16x16x64_bf8_bf8 +// CHECK-GFX940: call <4 x float> @llvm.amdgcn.smfmac.f32.16x16x64.bf8.bf8(<2 x i32> %a, <4 x i32> %b, <4 x float> %c, i32 %idx, i32 0, i32 0) +void test_smfmac_f32_16x16x64_bf8_bf8(global v4f* out, v2i a, v4i b, v4f c, int idx) +{ + *out = __builtin_amdgcn_smfmac_f32_16x16x64_bf8_bf8(a, b, c, idx, 0, 0); +} + +// CHECK-GFX940-LABEL: @test_smfmac_f32_16x16x64_bf8_fp8 +// CHECK-GFX940: call <4 x float> @llvm.amdgcn.smfmac.f32.16x16x64.bf8.fp8(<2 x i32> %a, <4 x i32> %b, <4 x float> %c, i32 %idx, i32 0, i32 0) +void test_smfmac_f32_16x16x64_bf8_fp8(global v4f* out, v2i a, v4i b, v4f c, int idx) +{ + *out = __builtin_amdgcn_smfmac_f32_16x16x64_bf8_fp8(a, b, c, idx, 0, 0); +} + +// CHECK-GFX940-LABEL: @test_smfmac_f32_16x16x64_fp8_bf8 +// CHECK-GFX940: call <4 x float> @llvm.amdgcn.smfmac.f32.16x16x64.fp8.bf8(<2 x i32> %a, <4 x i32> %b, <4 x float> %c, i32 %idx, i32 0, i32 0) +void test_smfmac_f32_16x16x64_fp8_bf8(global v4f* out, v2i a, v4i b, v4f c, int idx) +{ + *out = __builtin_amdgcn_smfmac_f32_16x16x64_fp8_bf8(a, b, c, idx, 0, 0); +} + +// CHECK-GFX940-LABEL: @test_smfmac_f32_16x16x64_fp8_fp8 +// CHECK-GFX940: call <4 x float> @llvm.amdgcn.smfmac.f32.16x16x64.fp8.fp8(<2 x i32> %a, <4 x i32> %b, <4 x float> %c, i32 %idx, i32 0, i32 0) +void test_smfmac_f32_16x16x64_fp8_fp8(global v4f* out, v2i a, v4i b, v4f c, int idx) +{ + *out = __builtin_amdgcn_smfmac_f32_16x16x64_fp8_fp8(a, b, c, idx, 0, 0); +} + +// CHECK-GFX940-LABEL: @test_smfmac_f32_32x32x32_bf8_bf8 +// CHECK-GFX940: call <16 x float> @llvm.amdgcn.smfmac.f32.32x32x32.bf8.bf8(<2 x i32> %a, <4 x i32> %b, <16 x float> %c, i32 %idx, i32 0, i32 0) +void test_smfm
[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)
https://github.com/rampitec commented: Do you want to rename intrinsics as well? Because now intrinsic names do not match builtin names. https://github.com/llvm/llvm-project/pull/86202 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)
@@ -432,13 +432,15 @@ TARGET_BUILTIN(__builtin_amdgcn_s_wakeup_barrier, "vi", "n", "gfx12-insts") TARGET_BUILTIN(__builtin_amdgcn_s_barrier_leave, "b", "n", "gfx12-insts") TARGET_BUILTIN(__builtin_amdgcn_s_get_barrier_state, "Uii", "n", "gfx12-insts") -TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v2i32, "V2iV2i*1", "nc", "gfx12-insts,wavefrontsize32") -TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v8i16, "V8sV8s*1", "nc", "gfx12-insts,wavefrontsize32") -TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v8f16, "V8hV8h*1", "nc", "gfx12-insts,wavefrontsize32") - -TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_i32, "ii*1", "nc", "gfx12-insts,wavefrontsize64") -TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v4i16, "V4sV4s*1", "nc", "gfx12-insts,wavefrontsize64") -TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v4f16, "V4hV4h*1", "nc", "gfx12-insts,wavefrontsize64") +TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b64_v2i32, "V2iV2i*1", "nc", "gfx12-insts,wavefrontsize32") +TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8i16, "V8sV8s*1", "nc", "gfx12-insts,wavefrontsize32") +TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8f16, "V8hV8h*1", "nc", "gfx12-insts,wavefrontsize32") +TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8bf16, "V8yV8y*1", "nc", "gfx12-insts,wavefrontsize32") + +TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b64_i32, "ii*1", "nc", "gfx12-insts,wavefrontsize64") +TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4i16, "V4sV4s*1", "nc", "gfx12-insts,wavefrontsize64") +TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4f16, "V4hV4h*1", "nc", "gfx12-insts,wavefrontsize64") +TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4bf16, "V4yV4y*1", "nc", "gfx12-insts,wavefrontsize64") rampitec wrote: There should not be legacy yet. https://github.com/llvm/llvm-project/pull/86202 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)
rampitec wrote: > > Do you want to rename intrinsics as well? Because now intrinsic names do > > not match builtin names. > > Do we have to match builtins with intrinsics? Renaming intrinsics here means > we will have to duplicate the intrinsics. Is that because of the mangling? https://github.com/llvm/llvm-project/pull/86202 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)
rampitec wrote: > I don't think intrinsics are meant for users. Builtins are the user-facing > front. :-) Depending on who you consider an user. Are folks writing MLIR generators users? https://github.com/llvm/llvm-project/pull/86202 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] AMDGPU: Rename intrinsics and remove f16/bf16 versions for load transpose (PR #86313)
rampitec wrote: > global_load_re_b64 Type global_load_re_b64. https://github.com/llvm/llvm-project/pull/86313 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)
@@ -0,0 +1,8 @@ +// RUN: llvm-mc -arch=amdgcn -mcpu=gfx1100 -show-encoding %s | FileCheck %s +// RUN: llvm-mc -arch=amdgcn -mcpu=gfx1200 -show-encoding %s | FileCheck %s + +v_dot2_bf16_bf16 v5, v1, v2, 100.0 +// CHECK: v_dot2_bf16_bf16 v5, v1, v2, 0x42c8 ; encoding: [0x05,0x00,0x67,0xd6,0x01,0x05,0xfe,0x03,0xc8,0x42,0x00,0x00] + +v_dot2_bf16_bf16 v5, v1, v2, 1.0 +// v_dot2_bf16_bf16 v5, v1, v2, 0x3f80 ; encoding: [0x05,0x00,0x67,0xd6,0x01,0x05,0xfe,0x03,0x80,0x3f,0x00,0x00] rampitec wrote: FYI: this shall be inline literal. I.e: 0xd6672005 0x03ca0501 https://github.com/llvm/llvm-project/pull/80908 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)
@@ -1562,8 +1562,9 @@ bool IRTranslator::translateBitCast(const User &U, bool IRTranslator::translateCast(unsigned Opcode, const User &U, MachineIRBuilder &MIRBuilder) { - if (U.getType()->getScalarType()->isBFloatTy() || - U.getOperand(0)->getType()->getScalarType()->isBFloatTy()) + if (Opcode != TargetOpcode::G_BITCAST && rampitec wrote: This is actually an orthogonal problem. Global ISel is completely broken for bf16 and whatever the outcome of the supporting bf16 in codegen is we just need to be ready some gisel tests will fail and will need to be disabled. https://github.com/llvm/llvm-project/pull/80908 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)
@@ -521,8 +521,11 @@ void AMDGPUInstPrinter::printImmediateV216(uint32_t Imm, uint8_t OpType, if (printImmediateFloat32(Imm, STI, O)) return; break; + case AMDGPU::OPERAND_REG_IMM_V2BF16: case AMDGPU::OPERAND_REG_IMM_V2FP16: + case AMDGPU::OPERAND_REG_INLINE_C_V2BF16: case AMDGPU::OPERAND_REG_INLINE_C_V2FP16: + case AMDGPU::OPERAND_REG_INLINE_AC_V2BF16: rampitec wrote: It does not seem right, and there are no tests for v2bf16 added. I am not sure though we have instructions which can accept this type of operand. https://github.com/llvm/llvm-project/pull/80908 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)
@@ -79,17 +79,17 @@ define amdgpu_ps void @test_llvm_amdgcn_fdot2_bf16_bf16_sis( ; GFX11: ; %bb.0: ; %entry ; GFX11-NEXT:v_mov_b32_e32 v2, s1 ; GFX11-NEXT:s_delay_alu instid0(VALU_DEP_1) -; GFX11-NEXT:v_dot2_bf16_bf16 v2, s0, 0x10001, v2 +; GFX11-NEXT:v_dot2_bf16_bf16 v2, s0, 0x3f803f80, v2 rampitec wrote: This shall be encoded as inline immediate 1.0. https://github.com/llvm/llvm-project/pull/80908 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)
@@ -4181,13 +4181,20 @@ bool SIInstrInfo::isInlineConstant(const MachineOperand &MO, case AMDGPU::OPERAND_REG_INLINE_C_V2INT16: case AMDGPU::OPERAND_REG_INLINE_AC_V2INT16: return AMDGPU::isInlinableLiteralV2I16(Imm); + case AMDGPU::OPERAND_REG_IMM_V2BF16: rampitec wrote: It does not seem isInlinableLiteralV2F16() can handle bf16. https://github.com/llvm/llvm-project/pull/80908 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)
https://github.com/rampitec edited https://github.com/llvm/llvm-project/pull/80908 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)
@@ -79,17 +79,17 @@ define amdgpu_ps void @test_llvm_amdgcn_fdot2_bf16_bf16_sis( ; GFX11: ; %bb.0: ; %entry ; GFX11-NEXT:v_mov_b32_e32 v2, s1 ; GFX11-NEXT:s_delay_alu instid0(VALU_DEP_1) -; GFX11-NEXT:v_dot2_bf16_bf16 v2, s0, 0x10001, v2 +; GFX11-NEXT:v_dot2_bf16_bf16 v2, s0, 0x3f803f80, v2 rampitec wrote: Well, this is unrelated to the patch itself. We can use inline 1.0 here, but then we must use op_sel_hi to produce it in the high half. https://github.com/llvm/llvm-project/pull/80908 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)
@@ -0,0 +1,8 @@ +// RUN: llvm-mc -arch=amdgcn -mcpu=gfx1100 -show-encoding %s | FileCheck %s rampitec wrote: You also need a disasm test for this. https://github.com/llvm/llvm-project/pull/80908 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)
@@ -1,8 +1,7 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py ; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck %s --check-prefixes=GFX11,SDAG-GFX11 -; RUN: llc -global-isel -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck %s --check-prefixes=GFX11,GISEL-GFX11 rampitec wrote: Change 'RUN' with 'XUN' and add a comment instead. https://github.com/llvm/llvm-project/pull/80908 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)
@@ -488,6 +488,49 @@ static bool printImmediateFloat16(uint32_t Imm, const MCSubtargetInfo &STI, return true; } +static bool printImmediateBFloat16(uint32_t Imm, const MCSubtargetInfo &STI, + raw_ostream &O) { + if (Imm == 0x3F80) +O << "1.0"; + else if (Imm == 0xBF80) +O << "-1.0"; + else if (Imm == 0x3F00) +O << "0.5"; + else if (Imm == 0xBF00) +O << "-0.5"; + else if (Imm == 0x4000) +O << "2.0"; + else if (Imm == 0xC000) +O << "-2.0"; + else if (Imm == 0x4080) +O << "4.0"; + else if (Imm == 0xC080) +O << "-4.0"; + else if (Imm == 0x3E22 && STI.hasFeature(AMDGPU::FeatureInv2PiInlineImm)) +O << "0.15915494"; + else +return false; + + return true; +} + +void AMDGPUInstPrinter::printImmediateBF16(uint32_t Imm, + const MCSubtargetInfo &STI, + raw_ostream &O) { + int16_t SImm = static_cast(Imm); + if (isInlinableIntLiteral(SImm)) { +O << SImm; +return; + } + + uint16_t HImm = static_cast(Imm); + if (printImmediateBFloat16(HImm, STI, O)) +return; + + uint64_t Imm16 = static_cast(Imm); rampitec wrote: It's the same as HImm above. https://github.com/llvm/llvm-project/pull/80908 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)
@@ -4185,9 +4185,17 @@ bool SIInstrInfo::isInlineConstant(const MachineOperand &MO, case AMDGPU::OPERAND_REG_INLINE_C_V2FP16: case AMDGPU::OPERAND_REG_INLINE_AC_V2FP16: return AMDGPU::isInlinableLiteralV2F16(Imm); + case AMDGPU::OPERAND_REG_IMM_V2BF16: + case AMDGPU::OPERAND_REG_INLINE_C_V2BF16: + case AMDGPU::OPERAND_REG_INLINE_AC_V2BF16: +return AMDGPU::isInlinableLiteralV2BF16(Imm); + case AMDGPU::OPERAND_REG_IMM_BF16: case AMDGPU::OPERAND_REG_IMM_FP16: + case AMDGPU::OPERAND_REG_IMM_BF16_DEFERRED: case AMDGPU::OPERAND_REG_IMM_FP16_DEFERRED: + case AMDGPU::OPERAND_REG_INLINE_C_BF16: case AMDGPU::OPERAND_REG_INLINE_C_FP16: + case AMDGPU::OPERAND_REG_INLINE_AC_BF16: rampitec wrote: It seems isInlinableLiteral16() cannot handle bf16? https://github.com/llvm/llvm-project/pull/80908 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)
@@ -2819,11 +2819,11 @@ def int_amdgcn_fdot2_f16_f16 : def int_amdgcn_fdot2_bf16_bf16 : ClangBuiltin<"__builtin_amdgcn_fdot2_bf16_bf16">, DefaultAttrsIntrinsic< -[llvm_i16_ty], // %r +[llvm_bfloat_ty], // %r rampitec wrote: clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-gfx11.cl fails. You need to insert casts to bf16 while lowering it to make it working. https://github.com/llvm/llvm-project/pull/80908 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [RFC][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)
@@ -4185,9 +4185,17 @@ bool SIInstrInfo::isInlineConstant(const MachineOperand &MO, case AMDGPU::OPERAND_REG_INLINE_C_V2FP16: case AMDGPU::OPERAND_REG_INLINE_AC_V2FP16: return AMDGPU::isInlinableLiteralV2F16(Imm); + case AMDGPU::OPERAND_REG_IMM_V2BF16: + case AMDGPU::OPERAND_REG_INLINE_C_V2BF16: + case AMDGPU::OPERAND_REG_INLINE_AC_V2BF16: +return AMDGPU::isInlinableLiteralV2BF16(Imm); + case AMDGPU::OPERAND_REG_IMM_BF16: case AMDGPU::OPERAND_REG_IMM_FP16: + case AMDGPU::OPERAND_REG_IMM_BF16_DEFERRED: case AMDGPU::OPERAND_REG_IMM_FP16_DEFERRED: + case AMDGPU::OPERAND_REG_INLINE_C_BF16: case AMDGPU::OPERAND_REG_INLINE_C_FP16: + case AMDGPU::OPERAND_REG_INLINE_AC_BF16: rampitec wrote: But right in this place you know the actual format. So you can split F16 and BF16 code and call different functions. https://github.com/llvm/llvm-project/pull/80908 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] AMDGPU: Define a feature for v_dot4_f32_* instructions (PR #84248)
https://github.com/rampitec approved this pull request. LGTM, thanks! https://github.com/llvm/llvm-project/pull/84248 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (PR #83906)
@@ -1122,7 +1122,7 @@ class S_SETREG_B32_Pseudo pattern=[]> : SOPK_Pseudo < pattern>; def S_SETREG_B32 : S_SETREG_B32_Pseudo < - [(int_amdgcn_s_setreg (i32 SIMM16bit:$simm16), i32:$sdst)]> { + [(int_amdgcn_s_setreg (i32 timm:$simm16), i32:$sdst)]> { rampitec wrote: This just reverts my patch https://github.com/llvm/llvm-project/pull/77997 and reintroduces the original problem. https://github.com/llvm/llvm-project/pull/83906 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (PR #83906)
@@ -1122,7 +1122,7 @@ class S_SETREG_B32_Pseudo pattern=[]> : SOPK_Pseudo < pattern>; def S_SETREG_B32 : S_SETREG_B32_Pseudo < - [(int_amdgcn_s_setreg (i32 SIMM16bit:$simm16), i32:$sdst)]> { + [(int_amdgcn_s_setreg (i32 timm:$simm16), i32:$sdst)]> { rampitec wrote: It is not expected to be negative, the original problem was that we used to force users to use negative constants. Now we can accept something like 0xf000 instead of a negative value. https://github.com/llvm/llvm-project/pull/83906 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (PR #83906)
@@ -1122,7 +1122,7 @@ class S_SETREG_B32_Pseudo pattern=[]> : SOPK_Pseudo < pattern>; def S_SETREG_B32 : S_SETREG_B32_Pseudo < - [(int_amdgcn_s_setreg (i32 SIMM16bit:$simm16), i32:$sdst)]> { + [(int_amdgcn_s_setreg (i32 timm:$simm16), i32:$sdst)]> { rampitec wrote: If it is sign extended, it should work. https://github.com/llvm/llvm-project/pull/83906 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [clang-tools-extra] [AMDGPU] GlobalISel for f8 conversions (PR #80503)
https://github.com/rampitec updated https://github.com/llvm/llvm-project/pull/80503 >From b07f5866aa8acf881fbdb15450ecda4dfc8a68e8 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Fri, 2 Feb 2024 14:28:00 -0800 Subject: [PATCH 1/2] [AMDGPU] Fixed byte_sel of v_cvt_f32_bf8/v_cvt_f32_fp8 Opsel bits are swapped. Actual byte select table: Byte OPSEL 0 0 1 2 2 1 3 3 --- llvm/lib/Target/AMDGPU/VOP1Instructions.td | 6 ++ llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.dpp.ll | 4 ++-- llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.ll | 8 3 files changed, 8 insertions(+), 10 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/VOP1Instructions.td b/llvm/lib/Target/AMDGPU/VOP1Instructions.td index 920c220fb2c65..58b67b21e274b 100644 --- a/llvm/lib/Target/AMDGPU/VOP1Instructions.td +++ b/llvm/lib/Target/AMDGPU/VOP1Instructions.td @@ -668,10 +668,8 @@ class Cvt_F32_F8_Pat_OpSel index, VOP1_Pseudo inst_e32, VOP3_Pseudo inst_e64> : GCNPat< (f32 (node i32:$src, index)), !if (index, - (inst_e64 !if(index{0}, - !if(index{1}, !or(SRCMODS.OP_SEL_0, SRCMODS.OP_SEL_1), - SRCMODS.OP_SEL_0), - !if(index{1}, SRCMODS.OP_SEL_1, 0)), + (inst_e64 !or(!if(index{0}, SRCMODS.OP_SEL_1, 0), + !if(index{1}, SRCMODS.OP_SEL_0, 0)), $src, 0), (inst_e32 $src)) >; diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.dpp.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.dpp.ll index f49fec60892cd..e21d61036375a 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.dpp.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.dpp.ll @@ -16,7 +16,7 @@ define amdgpu_cs float @test_cvt_f32_bf8_byte1(i32 %a) { ; GFX12: ; %bb.0: ; GFX12-NEXT:v_mov_b32_dpp v0, v0 quad_perm:[0,1,2,3] row_mask:0xf bank_mask:0xf bound_ctrl:1 ; GFX12-NEXT:s_delay_alu instid0(VALU_DEP_1) -; GFX12-NEXT:v_cvt_f32_bf8_e64 v0, v0 op_sel:[1,0] +; GFX12-NEXT:v_cvt_f32_bf8_e64 v0, v0 op_sel:[0,1] ; GFX12-NEXT:; return to shader part epilog %tmp0 = call i32 @llvm.amdgcn.mov.dpp.i32(i32 %a, i32 228, i32 15, i32 15, i1 1) %ret = tail call float @llvm.amdgcn.cvt.f32.bf8(i32 %tmp0, i32 1) @@ -28,7 +28,7 @@ define amdgpu_cs float @test_cvt_f32_bf8_byte2(i32 %a) { ; GFX12: ; %bb.0: ; GFX12-NEXT:v_mov_b32_dpp v0, v0 quad_perm:[0,1,2,3] row_mask:0xf bank_mask:0xf bound_ctrl:1 ; GFX12-NEXT:s_delay_alu instid0(VALU_DEP_1) -; GFX12-NEXT:v_cvt_f32_bf8_e64 v0, v0 op_sel:[0,1] +; GFX12-NEXT:v_cvt_f32_bf8_e64 v0, v0 op_sel:[1,0] ; GFX12-NEXT:; return to shader part epilog %tmp0 = call i32 @llvm.amdgcn.mov.dpp.i32(i32 %a, i32 228, i32 15, i32 15, i1 1) %ret = tail call float @llvm.amdgcn.cvt.f32.bf8(i32 %tmp0, i32 2) diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.ll index 17b1fcf865e94..f915fa8e6cd1c 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.ll @@ -45,7 +45,7 @@ define float @test_cvt_f32_bf8_byte1(i32 %a) { ; GFX12-NEXT:s_wait_samplecnt 0x0 ; GFX12-NEXT:s_wait_bvhcnt 0x0 ; GFX12-NEXT:s_wait_kmcnt 0x0 -; GFX12-NEXT:v_cvt_f32_bf8_e64 v0, v0 op_sel:[1,0] +; GFX12-NEXT:v_cvt_f32_bf8_e64 v0, v0 op_sel:[0,1] ; GFX12-NEXT:s_setpc_b64 s[30:31] %ret = tail call float @llvm.amdgcn.cvt.f32.bf8(i32 %a, i32 1) ret float %ret @@ -65,7 +65,7 @@ define float @test_cvt_f32_bf8_byte2(i32 %a) { ; GFX12-NEXT:s_wait_samplecnt 0x0 ; GFX12-NEXT:s_wait_bvhcnt 0x0 ; GFX12-NEXT:s_wait_kmcnt 0x0 -; GFX12-NEXT:v_cvt_f32_bf8_e64 v0, v0 op_sel:[0,1] +; GFX12-NEXT:v_cvt_f32_bf8_e64 v0, v0 op_sel:[1,0] ; GFX12-NEXT:s_setpc_b64 s[30:31] %ret = tail call float @llvm.amdgcn.cvt.f32.bf8(i32 %a, i32 2) ret float %ret @@ -125,7 +125,7 @@ define float @test_cvt_f32_fp8_byte1(i32 %a) { ; GFX12-NEXT:s_wait_samplecnt 0x0 ; GFX12-NEXT:s_wait_bvhcnt 0x0 ; GFX12-NEXT:s_wait_kmcnt 0x0 -; GFX12-NEXT:v_cvt_f32_fp8_e64 v0, v0 op_sel:[1,0] +; GFX12-NEXT:v_cvt_f32_fp8_e64 v0, v0 op_sel:[0,1] ; GFX12-NEXT:s_setpc_b64 s[30:31] %ret = tail call float @llvm.amdgcn.cvt.f32.fp8(i32 %a, i32 1) ret float %ret @@ -145,7 +145,7 @@ define float @test_cvt_f32_fp8_byte2(i32 %a) { ; GFX12-NEXT:s_wait_samplecnt 0x0 ; GFX12-NEXT:s_wait_bvhcnt 0x0 ; GFX12-NEXT:s_wait_kmcnt 0x0 -; GFX12-NEXT:v_cvt_f32_fp8_e64 v0, v0 op_sel:[0,1] +; GFX12-NEXT:v_cvt_f32_fp8_e64 v0, v0 op_sel:[1,0] ; GFX12-NEXT:s_setpc_b64 s[30:31] %ret = tail call float @llvm.amdgcn.cvt.f32.fp8(i32 %a, i32 2) ret float %ret >From 5f211ec3068988ab397d7234e2fc5a61e074bee8 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Fri, 2 Feb 2024 14:35:59 -0800 Subject: [PATCH 2/2] [AMDGPU] GlobalISel for f8 conversions --- llvm/l
[clang-tools-extra] [llvm] [clang] [AMDGPU] GlobalISel for f8 conversions (PR #80503)
https://github.com/rampitec closed https://github.com/llvm/llvm-project/pull/80503 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] Add missing `__builtin_amdgcn_wavefrontsize` builtin (PR #80741)
@@ -832,6 +832,13 @@ void test_atomic_inc_dec(local uint *lptr, global uint *gptr, uint val) { res = __builtin_amdgcn_atomic_dec32((volatile global uint*)gptr, val, __ATOMIC_SEQ_CST, ""); } +// CHECK-LABEL test_wavefrontsize( +unsigned test_wavefrontsize() { rampitec wrote: Missing check for the test. https://github.com/llvm/llvm-project/pull/80741 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] Add missing `__builtin_amdgcn_wavefrontsize` builtin (PR #80741)
https://github.com/rampitec edited https://github.com/llvm/llvm-project/pull/80741 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] Add missing `__builtin_amdgcn_wavefrontsize` builtin (PR #80741)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/80741 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] Add missing `__builtin_amdgcn_wavefrontsize` builtin (PR #80741)
@@ -832,6 +832,13 @@ void test_atomic_inc_dec(local uint *lptr, global uint *gptr, uint val) { res = __builtin_amdgcn_atomic_dec32((volatile global uint*)gptr, val, __ATOMIC_SEQ_CST, ""); } +// CHECK-LABEL test_wavefrontsize( +unsigned test_wavefrontsize() { rampitec wrote: Ugh, it's inside the body. Unusual, but test above is he same. https://github.com/llvm/llvm-project/pull/80741 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[libcxx] [flang] [mlir] [llvm] [compiler-rt] [clang-tools-extra] [openmp] [libc] [lldb] [lld] [clang] AMDGPU: Add SourceOfDivergence for int_amdgcn_global_load_tr (PR #79218)
https://github.com/rampitec approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/79218 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [flang] [clang] [clang-tools-extra] [compiler-rt] [libc] [lldb] [lld] [libcxx] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
rampitec wrote: Ping https://github.com/llvm/llvm-project/pull/74537 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits