[llvm-branch-commits] [llvm] AMDGPU: Fix constrain register logic for physregs (PR #161794)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/161794?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#161795** https://app.graphite.dev/github/pr/llvm/llvm-project/161795?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#161794** https://app.graphite.dev/github/pr/llvm/llvm-project/161794?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> ๐ https://app.graphite.dev/github/pr/llvm/llvm-project/161794?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#161793** https://app.graphite.dev/github/pr/llvm/llvm-project/161793?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#161792** https://app.graphite.dev/github/pr/llvm/llvm-project/161792?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#161790** https://app.graphite.dev/github/pr/llvm/llvm-project/161790?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/161794 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix trying to constrain physical registers in spill handling (PR #161793)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/161793?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#161795** https://app.graphite.dev/github/pr/llvm/llvm-project/161795?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#161794** https://app.graphite.dev/github/pr/llvm/llvm-project/161794?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#161793** https://app.graphite.dev/github/pr/llvm/llvm-project/161793?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> ๐ https://app.graphite.dev/github/pr/llvm/llvm-project/161793?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#161792** https://app.graphite.dev/github/pr/llvm/llvm-project/161792?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#161790** https://app.graphite.dev/github/pr/llvm/llvm-project/161790?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/161793 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Remove LDS_DIRECT_CLASS register class (PR #161762)
arsenm wrote: > Same here: drop mir tests which test what tablegen has generated. Same, the MIR tests are completely unrelated and not related to this pr. We cannot simply drop these https://github.com/llvm/llvm-project/pull/161762 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] CodeGen: Stop checking for physregs in constrainRegClass (PR #161795)
https://github.com/cdevadas approved this pull request. https://github.com/llvm/llvm-project/pull/161795 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Stop using the wavemask register class for SCC cross class copies (PR #161801)
@@ -1118,9 +1118,7 @@ SIRegisterInfo::getPointerRegClass(unsigned Kind) const {
const TargetRegisterClass *
SIRegisterInfo::getCrossCopyRegClass(const TargetRegisterClass *RC) const {
- if (RC == &AMDGPU::SCC_CLASSRegClass)
-return getWaveMaskRegClass();
- return RC;
+ return RC == &AMDGPU::SCC_CLASSRegClass ? &AMDGPU::SReg_32RegClass : RC;
cdevadas wrote:
Shouldn't SCC_CLASS depend on the wavesize?
https://github.com/llvm/llvm-project/pull/161801
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [clang] [Headers]ย Don't use unreserved names in avx10_2bf16intrin.h (#161824) (PR #161836)
llvmbot wrote: @RKSimon What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/161836 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [AMDGPU] Add builtins for wave reduction intrinsics (PR #161816)
https://github.com/easyonaadit created
https://github.com/llvm/llvm-project/pull/161816
None
>From 6cd1510d1ca606a5d08e4bbdb3c77d17d93447d8 Mon Sep 17 00:00:00 2001
From: Aaditya
Date: Tue, 30 Sep 2025 11:37:42 +0530
Subject: [PATCH] [AMDGPU] Add builtins for wave reduction intrinsics
---
clang/include/clang/Basic/BuiltinsAMDGPU.def | 4
clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp | 4
2 files changed, 8 insertions(+)
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index fda16e42d2c6b..ebc0ac35f42d9 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -402,6 +402,10 @@ BUILTIN(__builtin_amdgcn_wave_reduce_max_u64, "WUiWUiZi",
"nc")
BUILTIN(__builtin_amdgcn_wave_reduce_and_b64, "WiWiZi", "nc")
BUILTIN(__builtin_amdgcn_wave_reduce_or_b64, "WiWiZi", "nc")
BUILTIN(__builtin_amdgcn_wave_reduce_xor_b64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_add_f32, "ffZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_sub_f32, "ffZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_f32, "ffZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_f32, "ffZi", "nc")
//===--===//
// R600-NI only builtins.
diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index 07cf08c54985a..a242a73e4a822 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -301,18 +301,22 @@ static Intrinsic::ID
getIntrinsicIDforWaveReduction(unsigned BuiltinID) {
llvm_unreachable("Unknown BuiltinID for wave reduction");
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32:
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u64:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_f32:
return Intrinsic::amdgcn_wave_reduce_add;
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u32:
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u64:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_f32:
return Intrinsic::amdgcn_wave_reduce_sub;
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i32:
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i64:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_f32:
return Intrinsic::amdgcn_wave_reduce_min;
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u32:
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u64:
return Intrinsic::amdgcn_wave_reduce_umin;
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i32:
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i64:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_f32:
return Intrinsic::amdgcn_wave_reduce_max;
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u32:
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u64:
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [clang] [Headers]ย Don't use unreserved names in avx10_2bf16intrin.h (#161824) (PR #161836)
https://github.com/phoebewang approved this pull request. LGTM. https://github.com/llvm/llvm-project/pull/161836 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [PowerPC] Implement paddis (PR #161572)
https://github.com/lei137 updated
https://github.com/llvm/llvm-project/pull/161572
>From 012b638031fb72d36525234115f9d7b87d8c98e3 Mon Sep 17 00:00:00 2001
From: Lei Huang
Date: Tue, 30 Sep 2025 18:09:31 +
Subject: [PATCH 1/4] [PowerPC] Implement paddis
---
.../Target/PowerPC/AsmParser/PPCAsmParser.cpp | 4 ++
.../PowerPC/MCTargetDesc/PPCAsmBackend.cpp| 9
.../PowerPC/MCTargetDesc/PPCFixupKinds.h | 6 +++
.../PowerPC/MCTargetDesc/PPCInstPrinter.cpp | 12 +
.../PowerPC/MCTargetDesc/PPCInstPrinter.h | 2 +
.../PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp | 1 +
llvm/lib/Target/PowerPC/PPCInstrFuture.td | 44 +++
llvm/lib/Target/PowerPC/PPCRegisterInfo.td| 19
.../PowerPC/ppc-encoding-ISAFuture.txt| 6 +++
.../PowerPC/ppc64le-encoding-ISAFuture.txt| 6 +++
llvm/test/MC/PowerPC/ppc-encoding-ISAFuture.s | 8
11 files changed, 117 insertions(+)
diff --git a/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
b/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
index 561a9c51b9cc2..b07f95018ca90 100644
--- a/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
+++ b/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
@@ -365,6 +365,10 @@ struct PPCOperand : public MCParsedAsmOperand {
bool isS16ImmX4() const { return isExtImm<16>(/*Signed*/ true, 4); }
bool isS16ImmX16() const { return isExtImm<16>(/*Signed*/ true, 16); }
bool isS17Imm() const { return isExtImm<17>(/*Signed*/ true, 1); }
+ bool isS32Imm() const {
+// TODO: Is ContextImmediate needed?
+return Kind == Expression || isSImm<32>();
+ }
bool isS34Imm() const {
// Once the PC-Rel ABI is finalized, evaluate whether a 34-bit
// ContextImmediate is needed.
diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp
b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp
index 04b886ae74993..558351b515a2e 100644
--- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp
+++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp
@@ -47,6 +47,9 @@ static uint64_t adjustFixupValue(unsigned Kind, uint64_t
Value) {
case PPC::fixup_ppc_half16ds:
case PPC::fixup_ppc_half16dq:
return Value & 0xfffc;
+ case PPC::fixup_ppc_pcrel32:
+ case PPC::fixup_ppc_imm32:
+return Value & 0x;
case PPC::fixup_ppc_pcrel34:
case PPC::fixup_ppc_imm34:
return Value & 0x3;
@@ -71,6 +74,8 @@ static unsigned getFixupKindNumBytes(unsigned Kind) {
case PPC::fixup_ppc_br24abs:
case PPC::fixup_ppc_br24_notoc:
return 4;
+ case PPC::fixup_ppc_pcrel32:
+ case PPC::fixup_ppc_imm32:
case PPC::fixup_ppc_pcrel34:
case PPC::fixup_ppc_imm34:
case FK_Data_8:
@@ -154,6 +159,8 @@ MCFixupKindInfo PPCAsmBackend::getFixupKindInfo(MCFixupKind
Kind) const {
{"fixup_ppc_brcond14abs", 16, 14, 0},
{"fixup_ppc_half16", 0, 16, 0},
{"fixup_ppc_half16ds", 0, 14, 0},
+ {"fixup_ppc_pcrel32", 0, 32, 0},
+ {"fixup_ppc_imm32", 0, 32, 0},
{"fixup_ppc_pcrel34", 0, 34, 0},
{"fixup_ppc_imm34", 0, 34, 0},
{"fixup_ppc_nofixup", 0, 0, 0}};
@@ -166,6 +173,8 @@ MCFixupKindInfo PPCAsmBackend::getFixupKindInfo(MCFixupKind
Kind) const {
{"fixup_ppc_brcond14abs", 2, 14, 0},
{"fixup_ppc_half16", 0, 16, 0},
{"fixup_ppc_half16ds", 2, 14, 0},
+ {"fixup_ppc_pcrel32", 0, 32, 0},
+ {"fixup_ppc_imm32", 0, 32, 0},
{"fixup_ppc_pcrel34", 0, 34, 0},
{"fixup_ppc_imm34", 0, 34, 0},
{"fixup_ppc_nofixup", 0, 0, 0}};
diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h
b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h
index df0c666f5b113..4164b697649cd 100644
--- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h
+++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h
@@ -40,6 +40,12 @@ enum Fixups {
/// instrs like 'std'.
fixup_ppc_half16ds,
+ // A 32-bit fixup corresponding to PC-relative paddis.
+ fixup_ppc_pcrel32,
+
+ // A 32-bit fixup corresponding to Non-PC-relative paddis.
+ fixup_ppc_imm32,
+
// A 34-bit fixup corresponding to PC-relative paddi.
fixup_ppc_pcrel34,
diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
index b27bc3bd49315..e2afb9378cbf0 100644
--- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
+++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
@@ -430,6 +430,18 @@ void PPCInstPrinter::printS16ImmOperand(const MCInst *MI,
unsigned OpNo,
printOperand(MI, OpNo, STI, O);
}
+void PPCInstPrinter::printS32ImmOperand(const MCInst *MI, unsigned OpNo,
+const MCSubtargetInfo &STI,
+raw_ostream &O) {
+ if (MI->getOperand(OpNo).isImm()) {
+long long Value = MI->getOperand(OpNo).getImm();
+assert(isInt<32>(Value) && "Invalid s32imm argument!");
+O << (long long)Value;
+ }
+ else
+printOperand(MI
[llvm-branch-commits] [llvm] [AMDGPU] Add wave reduce intrinsics for float types - 2 (PR #161815)
easyonaadit wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/161815?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#161816** https://app.graphite.dev/github/pr/llvm/llvm-project/161816?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#161815** https://app.graphite.dev/github/pr/llvm/llvm-project/161815?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> ๐ https://app.graphite.dev/github/pr/llvm/llvm-project/161815?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#161814** https://app.graphite.dev/github/pr/llvm/llvm-project/161814?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/161815 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Stop trying to constrain register class of post-RA-pseudos (PR #161792)
llvmbot wrote:
@llvm/pr-subscribers-backend-amdgpu
Author: Matt Arsenault (arsenm)
Changes
This is trying to constrain the register class of a physical register,
which makes no sense.
---
Full diff: https://github.com/llvm/llvm-project/pull/161792.diff
1 Files Affected:
- (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (-2)
``diff
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index fe6b8b96cbd57..cda8069936af2 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -2112,8 +2112,6 @@ bool SIInstrInfo::expandPostRAPseudo(MachineInstr &MI)
const {
case AMDGPU::SI_RESTORE_S32_FROM_VGPR:
MI.setDesc(get(AMDGPU::V_READLANE_B32));
-MI.getMF()->getRegInfo().constrainRegClass(MI.getOperand(0).getReg(),
- &AMDGPU::SReg_32_XM0RegClass);
break;
case AMDGPU::AV_MOV_B32_IMM_PSEUDO: {
Register Dst = MI.getOperand(0).getReg();
``
https://github.com/llvm/llvm-project/pull/161792
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [HLSL] GetDimensions methods for buffer resources (PR #161929)
https://github.com/hekota updated
https://github.com/llvm/llvm-project/pull/161929
>From e50918910a0ce590228c6ecacd4ff2a578da6f58 Mon Sep 17 00:00:00 2001
From: Helena Kotas
Date: Fri, 3 Oct 2025 17:33:19 -0700
Subject: [PATCH] [HLSL] GetDimensions methods for buffer resources
Adds GetDimensions methods on all supported buffer resources.
---
clang/include/clang/Basic/Builtins.td | 12 +++
clang/lib/CodeGen/CGHLSLBuiltins.cpp | 61 ++
clang/lib/Sema/HLSLBuiltinTypeDeclBuilder.cpp | 81 ++-
clang/lib/Sema/HLSLBuiltinTypeDeclBuilder.h | 2 +
clang/lib/Sema/HLSLExternalSemaSource.cpp | 11 +++
clang/lib/Sema/SemaHLSL.cpp | 18 +
.../test/AST/HLSL/ByteAddressBuffers-AST.hlsl | 14
.../test/AST/HLSL/StructuredBuffers-AST.hlsl | 22 +
clang/test/AST/HLSL/TypedBuffers-AST.hlsl | 14
.../resources/ByteAddressBuffers-methods.hlsl | 47 +++
.../StructuredBuffers-methods-lib.hlsl| 53 +++-
.../StructuredBuffers-methods-ps.hlsl | 37 +
.../resources/TypedBuffers-methods.hlsl | 34
llvm/include/llvm/IR/IntrinsicsDirectX.td | 4 +
14 files changed, 406 insertions(+), 4 deletions(-)
create mode 100644
clang/test/CodeGenHLSL/resources/ByteAddressBuffers-methods.hlsl
diff --git a/clang/include/clang/Basic/Builtins.td
b/clang/include/clang/Basic/Builtins.td
index 468121f7d20ab..0b1587be51217 100644
--- a/clang/include/clang/Basic/Builtins.td
+++ b/clang/include/clang/Basic/Builtins.td
@@ -4951,6 +4951,18 @@ def HLSLResourceNonUniformIndex :
LangBuiltin<"HLSL_LANG"> {
let Prototype = "uint32_t(uint32_t)";
}
+def HLSLResourceGetDimensions : LangBuiltin<"HLSL_LANG"> {
+ let Spellings = ["__builtin_hlsl_buffer_getdimensions"];
+ let Attributes = [NoThrow];
+ let Prototype = "void(...)";
+}
+
+def HLSLResourceGetStride : LangBuiltin<"HLSL_LANG"> {
+ let Spellings = ["__builtin_hlsl_buffer_getstride"];
+ let Attributes = [NoThrow];
+ let Prototype = "void(...)";
+}
+
def HLSLAll : LangBuiltin<"HLSL_LANG"> {
let Spellings = ["__builtin_hlsl_all"];
let Attributes = [NoThrow, Const];
diff --git a/clang/lib/CodeGen/CGHLSLBuiltins.cpp
b/clang/lib/CodeGen/CGHLSLBuiltins.cpp
index 6c0fc8d7f07be..373153e01c128 100644
--- a/clang/lib/CodeGen/CGHLSLBuiltins.cpp
+++ b/clang/lib/CodeGen/CGHLSLBuiltins.cpp
@@ -160,6 +160,58 @@ static Value *handleHlslSplitdouble(const CallExpr *E,
CodeGenFunction *CGF) {
return LastInst;
}
+static Value *emitDXILGetDimensions(CodeGenFunction *CGF, Value *Handle,
+Value *MipLevel, LValue *OutArg0,
+LValue *OutArg1 = nullptr,
+LValue *OutArg2 = nullptr,
+LValue *OutArg3 = nullptr) {
+ assert(OutArg0 && "first output argument is required");
+
+ llvm::Type *I32 = CGF->Int32Ty;
+ StructType *RetTy = llvm::StructType::get(I32, I32, I32, I32);
+
+ CallInst *CI = CGF->Builder.CreateIntrinsic(
+ RetTy, llvm::Intrinsic::dx_resource_getdimensions,
+ ArrayRef{Handle, MipLevel});
+
+ Value *LastInst = nullptr;
+ unsigned OutArgIndex = 0;
+ for (LValue *OutArg : {OutArg0, OutArg1, OutArg2, OutArg3}) {
+if (OutArg) {
+ Value *OutArgVal = CGF->Builder.CreateExtractValue(CI, OutArgIndex);
+ LastInst = CGF->Builder.CreateStore(OutArgVal, OutArg->getAddress());
+}
+++OutArgIndex;
+ }
+ assert(LastInst && "no output argument stored?");
+ return LastInst;
+}
+
+static Value *emitBufferGetDimensions(CodeGenFunction *CGF, Value *Handle,
+ LValue &Dim) {
+ // Generate the call to get the buffer dimension.
+ switch (CGF->CGM.getTarget().getTriple().getArch()) {
+ case llvm::Triple::dxil:
+return emitDXILGetDimensions(CGF, Handle, PoisonValue::get(CGF->Int32Ty),
+ &Dim);
+break;
+ case llvm::Triple::spirv:
+llvm_unreachable("SPIR-V GetDimensions codegen not implemented yet.");
+ default:
+llvm_unreachable("GetDimensions not supported by target architecture");
+ }
+}
+
+static Value *emitBufferStride(CodeGenFunction *CGF, const Expr *HandleExpr,
+ LValue &Stride) {
+ // Figure out the stride of the buffer elements from the handle type.
+ auto *HandleTy =
+ cast(HandleExpr->getType().getTypePtr());
+ QualType ElementTy = HandleTy->getContainedType();
+ Value *StrideValue = CGF->getTypeSize(ElementTy);
+ return CGF->Builder.CreateStore(StrideValue, Stride.getAddress());
+}
+
// Return dot product intrinsic that corresponds to the QT scalar type
static Intrinsic::ID getDotProductIntrinsic(CGHLSLRuntime &RT, QualType QT) {
if (QT->isFloatingType())
@@ -359,6 +411,15 @@ Value *CodeGenFunction::EmitHLSLBuiltinExpr(unsigned
BuiltinID,
RetTy, CGM.getHLSLRuntime().getNonUniformResourceIndexIntrinsic(),
ArrayRef{IndexOp});
[llvm-branch-commits] [llvm] AMDGPU: Remove LDS_DIRECT_CLASS register class (PR #161762)
https://github.com/arsenm updated
https://github.com/llvm/llvm-project/pull/161762
>From 53a6a5b9e3adcabc51e7eff0a21642f33859b946 Mon Sep 17 00:00:00 2001
From: Matt Arsenault
Date: Fri, 3 Oct 2025 10:21:10 +0900
Subject: [PATCH] AMDGPU: Remove LDS_DIRECT_CLASS register class
This is a singleton register class which is a bad idea,
and not actually used.
---
llvm/lib/Target/AMDGPU/SIRegisterInfo.td | 20 +-
.../GlobalISel/irtranslator-inline-asm.ll | 2 +-
.../coalesce-copy-to-agpr-to-av-registers.mir | 232 +-
...class-vgpr-mfma-to-av-with-load-source.mir | 12 +-
llvm/test/CodeGen/AMDGPU/inline-asm.i128.ll | 24 +-
...al-regcopy-and-spill-missed-at-regalloc.ll | 16 +-
...lloc-failure-overlapping-insert-assert.mir | 12 +-
.../rewrite-vgpr-mfma-to-agpr-copy-from.mir | 4 +-
...gpr-mfma-to-agpr-subreg-insert-extract.mir | 12 +-
...te-vgpr-mfma-to-agpr-subreg-src2-chain.mir | 32 +--
.../CodeGen/AMDGPU/spill-vector-superclass.ll | 2 +-
.../Inputs/amdgpu_isel.ll.expected| 4 +-
12 files changed, 183 insertions(+), 189 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
index f98e31229b246..82fc2400a3754 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
@@ -761,12 +761,6 @@ def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU",
Reg128Types.types, 32,
let BaseClassOrder = 1;
}
-def LDS_DIRECT_CLASS : RegisterClass<"AMDGPU", [i32], 32,
- (add LDS_DIRECT)> {
- let isAllocatable = 0;
- let CopyCost = -1;
-}
-
let GeneratePressureSet = 0, HasSGPR = 1 in {
// Subset of SReg_32 without M0 for SMRD instructions and alike.
// See comments in SIInstructions.td for more info.
@@ -829,7 +823,7 @@ def SGPR_NULL256 : SIReg<"null">;
let GeneratePressureSet = 0 in {
def SRegOrLds_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16,
v2f16, v2bf16], 32,
- (add SReg_32, LDS_DIRECT_CLASS)> {
+ (add SReg_32, LDS_DIRECT)> {
let isAllocatable = 0;
let HasSGPR = 1;
let Size = 32;
@@ -968,7 +962,7 @@ defm "" : SRegClass<32, Reg1024Types.types, SGPR_1024Regs>;
}
def VRegOrLds_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16,
v2f16, v2bf16], 32,
- (add VGPR_32, LDS_DIRECT_CLASS)> {
+ (add VGPR_32, LDS_DIRECT)> {
let isAllocatable = 0;
let HasVGPR = 1;
let Size = 32;
@@ -1083,21 +1077,21 @@ def VReg_1 : SIRegisterClass<"AMDGPU", [i1], 32, (add)>
{
}
def VS_16 : SIRegisterClass<"AMDGPU", Reg16Types.types, 16,
- (add VGPR_16, SReg_32, LDS_DIRECT_CLASS)> {
+ (add VGPR_16, SReg_32, LDS_DIRECT)> {
let isAllocatable = 0;
let HasVGPR = 1;
let Size = 16;
}
def VS_16_Lo128 : SIRegisterClass<"AMDGPU", Reg16Types.types, 16,
- (add VGPR_16_Lo128, SReg_32, LDS_DIRECT_CLASS)> {
+ (add VGPR_16_Lo128, SReg_32, LDS_DIRECT)> {
let isAllocatable = 0;
let HasVGPR = 1;
let Size = 16;
}
def VS_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, v2f16,
v2bf16], 32,
- (add VGPR_32, SReg_32, LDS_DIRECT_CLASS)> {
+ (add VGPR_32, SReg_32, LDS_DIRECT)> {
let isAllocatable = 0;
let HasVGPR = 1;
let HasSGPR = 1;
@@ -1105,7 +1099,7 @@ def VS_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16,
f16, bf16, v2i16, v2f16, v
}
def VS_32_Lo128 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16,
v2f16, v2bf16], 32,
- (add VGPR_32_Lo128, SReg_32, LDS_DIRECT_CLASS)> {
+ (add VGPR_32_Lo128, SReg_32, LDS_DIRECT)> {
let isAllocatable = 0;
let HasVGPR = 1;
let HasSGPR = 1;
@@ -1113,7 +1107,7 @@ def VS_32_Lo128 : SIRegisterClass<"AMDGPU", [i32, f32,
i16, f16, bf16, v2i16, v2
}
def VS_32_Lo256 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16,
v2f16, v2bf16], 32,
- (add VGPR_32_Lo256, SReg_32,
LDS_DIRECT_CLASS)> {
+ (add VGPR_32_Lo256, SReg_32, LDS_DIRECT)> {
let isAllocatable = 0;
let HasVGPR = 1;
let HasSGPR = 1;
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll
b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll
index a54dc9dda16e0..e5cd0710359ac 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll
@@ -136,7 +136,7 @@ define float @test_multiple_register_outputs_same() #0 {
define double @test_multiple_register_outputs_mixed() #0 {
; CHECK-LABEL: name: test_multiple_register_outputs_mixed
; CHECK: bb.1 (%ir-block.0):
- ; CHECK-NEXT: INLINEASM &"v_mov_b32 $0, 0; v_add_f64 $1, 0, 0", 0 /*
attdialect */, 1835018 /* regdef:VGPR_32 */, def %8, 3473418 /* regdef:VReg_64
*/, def %9
+
[llvm-branch-commits] [llvm] AMDGPU: Stop using the wavemask register class for SCC cross class copies (PR #161801)
arsenm wrote: > Are there any codegen changes? I don't think so. There could maybe be dag scheduling changes but I haven't found them https://github.com/llvm/llvm-project/pull/161801 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/21.x: [SPARC] Prevent meta instructions from being inserted into delay slots (#161111) (PR #161937)
llvmbot wrote: @arsenm What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/161937 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/21.x: [SPARC] Prevent meta instructions from being inserted into delay slots (#161111) (PR #161937)
llvmbot wrote:
@llvm/pr-subscribers-backend-sparc
Author: None (llvmbot)
Changes
Backport 2e1fab93467ec8c37a236ae6e059300ebaa0c986
Requested by: @brad0
---
Full diff: https://github.com/llvm/llvm-project/pull/161937.diff
2 Files Affected:
- (modified) llvm/lib/Target/Sparc/DelaySlotFiller.cpp (+2-2)
- (modified) llvm/test/CodeGen/SPARC/2011-01-19-DelaySlot.ll (+25)
``diff
diff --git a/llvm/lib/Target/Sparc/DelaySlotFiller.cpp
b/llvm/lib/Target/Sparc/DelaySlotFiller.cpp
index 6c19049a001cf..024030d196ee3 100644
--- a/llvm/lib/Target/Sparc/DelaySlotFiller.cpp
+++ b/llvm/lib/Target/Sparc/DelaySlotFiller.cpp
@@ -206,8 +206,8 @@ Filler::findDelayInstr(MachineBasicBlock &MBB,
if (!done)
--I;
-// skip debug instruction
-if (I->isDebugInstr())
+// Skip meta instructions.
+if (I->isMetaInstruction())
continue;
if (I->hasUnmodeledSideEffects() || I->isInlineAsm() || I->isPosition() ||
diff --git a/llvm/test/CodeGen/SPARC/2011-01-19-DelaySlot.ll
b/llvm/test/CodeGen/SPARC/2011-01-19-DelaySlot.ll
index 9ccd4f1c0ac9a..767ef7eb510e6 100644
--- a/llvm/test/CodeGen/SPARC/2011-01-19-DelaySlot.ll
+++ b/llvm/test/CodeGen/SPARC/2011-01-19-DelaySlot.ll
@@ -184,4 +184,29 @@ entry:
ret i32 %2
}
+define i32 @test_generic_inst(i32 %arg) #0 {
+;CHECK-LABEL: test_generic_inst:
+;CHECK: ! fake_use: {{.*}}
+;CHECK: bne {{.*}}
+;CHECK-NEXT: nop
+ %bar1 = call i32 @bar(i32 %arg)
+ %even = and i32 %bar1, 1
+ %cmp = icmp eq i32 %even, 0
+ ; This shouldn't get reordered into a delay slot
+ call void (...) @llvm.fake.use(i32 %arg)
+ br i1 %cmp, label %true, label %false
+true:
+ %bar2 = call i32 @bar(i32 %bar1)
+ br label %cont
+
+false:
+ %inc = add nsw i32 %bar1, 1
+ br label %cont
+
+cont:
+ %ret = phi i32 [ %bar2, %true ], [ %inc, %false ]
+ ret i32 %ret
+}
+
+declare void @llvm.fake.use(...)
attributes #0 = { nounwind "disable-tail-calls"="true" }
``
https://github.com/llvm/llvm-project/pull/161937
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Stop using the wavemask register class for SCC cross class copies (PR #161801)
https://github.com/arsenm created
https://github.com/llvm/llvm-project/pull/161801
SCC should be copied to a 32-bit SGPR. Using a wave mask doesn't make
sense.
>From 04a448a4f8da85d879ffb61618a824bd2ab5a62a Mon Sep 17 00:00:00 2001
From: Matt Arsenault
Date: Fri, 3 Oct 2025 15:53:00 +0900
Subject: [PATCH] AMDGPU: Stop using the wavemask register class for SCC cross
class copies
SCC should be copied to a 32-bit SGPR. Using a wave mask doesn't make
sense.
---
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index 21735f91f4ad7..ba29dd4ae61d4 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -1118,9 +1118,7 @@ SIRegisterInfo::getPointerRegClass(unsigned Kind) const {
const TargetRegisterClass *
SIRegisterInfo::getCrossCopyRegClass(const TargetRegisterClass *RC) const {
- if (RC == &AMDGPU::SCC_CLASSRegClass)
-return getWaveMaskRegClass();
- return RC;
+ return RC == &AMDGPU::SCC_CLASSRegClass ? &AMDGPU::SReg_32RegClass : RC;
}
static unsigned getNumSubRegsForSpillOp(const MachineInstr &MI,
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AllocToken, Clang] Infer type hints from sizeof expressions and casts (PR #156841)
@@ -10,7 +10,7 @@ typedef __typeof(sizeof(int)) size_t;
void *malloc(size_t size);
// CHECK-LABEL: @test_malloc(
-// CHECK: call{{.*}} ptr @__alloc_token_malloc(i64 noundef 4, i64 0)
+// CHECK: call{{.*}} ptr @__alloc_token_malloc(i64 noundef 4, i64
2689373973731826898){{.*}} !alloc_token [[META_INT:![0-9]+]]
zmodem wrote:
nit: Use a regex instead of hard-coding the token here and in
alloc-token-nonlibcalls.c?
https://github.com/llvm/llvm-project/pull/156841
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Stop using the wavemask register class for SCC cross class copies (PR #161801)
llvmbot wrote:
@llvm/pr-subscribers-backend-amdgpu
Author: Matt Arsenault (arsenm)
Changes
SCC should be copied to a 32-bit SGPR. Using a wave mask doesn't make
sense.
---
Full diff: https://github.com/llvm/llvm-project/pull/161801.diff
1 Files Affected:
- (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp (+1-3)
``diff
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index 21735f91f4ad7..ba29dd4ae61d4 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -1118,9 +1118,7 @@ SIRegisterInfo::getPointerRegClass(unsigned Kind) const {
const TargetRegisterClass *
SIRegisterInfo::getCrossCopyRegClass(const TargetRegisterClass *RC) const {
- if (RC == &AMDGPU::SCC_CLASSRegClass)
-return getWaveMaskRegClass();
- return RC;
+ return RC == &AMDGPU::SCC_CLASSRegClass ? &AMDGPU::SReg_32RegClass : RC;
}
static unsigned getNumSubRegsForSpillOp(const MachineInstr &MI,
``
https://github.com/llvm/llvm-project/pull/161801
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AllocToken, Clang] Infer type hints from sizeof expressions and casts (PR #156841)
@@ -1353,6 +1354,92 @@ void CodeGenFunction::EmitAllocToken(llvm::CallBase *CB,
QualType AllocType) {
CB->setMetadata(llvm::LLVMContext::MD_alloc_token, MDN);
}
+/// Infer type from a simple sizeof expression.
+static QualType inferTypeFromSizeofExpr(const Expr *E) {
+ const Expr *Arg = E->IgnoreParenImpCasts();
+ if (const auto *UET = dyn_cast(Arg)) {
+if (UET->getKind() == UETT_SizeOf) {
+ if (UET->isArgumentType())
+return UET->getArgumentTypeInfo()->getType();
+ else
+return UET->getArgumentExpr()->getType();
+}
+ }
+ return QualType();
+}
+
+/// Infer type from an arithmetic expression involving a sizeof.
+static QualType inferTypeFromArithSizeofExpr(const Expr *E) {
+ const Expr *Arg = E->IgnoreParenImpCasts();
+ // The argument is a lone sizeof expression.
+ if (QualType T = inferTypeFromSizeofExpr(Arg); !T.isNull())
+return T;
+ if (const auto *BO = dyn_cast(Arg)) {
+// Argument is an arithmetic expression. Cover common arithmetic patterns
+// involving sizeof.
+switch (BO->getOpcode()) {
+case BO_Add:
+case BO_Div:
+case BO_Mul:
+case BO_Shl:
+case BO_Shr:
+case BO_Sub:
+ if (QualType T = inferTypeFromArithSizeofExpr(BO->getLHS()); !T.isNull())
+return T;
+ if (QualType T = inferTypeFromArithSizeofExpr(BO->getRHS()); !T.isNull())
+return T;
+ break;
+default:
+ break;
+}
+ }
+ return QualType();
+}
+
+/// If the expression E is a reference to a variable, infer the type from a
+/// variable's initializer if it contains a sizeof. Beware, this is a heuristic
+/// and ignores if a variable is later reassigned.
zmodem wrote:
The added examples are a great improvement. Thanks!
https://github.com/llvm/llvm-project/pull/156841
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [Clang] Introduce -fsanitize=alloc-token (PR #156839)
https://github.com/zmodem approved this pull request. lgtm https://github.com/llvm/llvm-project/pull/156839 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix trying to constrain physical registers in spill handling (PR #161793)
https://github.com/cdevadas approved this pull request. https://github.com/llvm/llvm-project/pull/161793 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [AMDGPU] Add builtins for wave reduction intrinsics (PR #161816)
https://github.com/easyonaadit updated
https://github.com/llvm/llvm-project/pull/161816
>From feaf31184bc20619448c87f3c38a1bbc3ec21719 Mon Sep 17 00:00:00 2001
From: Aaditya
Date: Tue, 30 Sep 2025 11:37:42 +0530
Subject: [PATCH] [AMDGPU] Add builtins for wave reduction intrinsics
---
clang/include/clang/Basic/BuiltinsAMDGPU.def | 4
clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp | 4
2 files changed, 8 insertions(+)
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index fda16e42d2c6b..ebc0ac35f42d9 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -402,6 +402,10 @@ BUILTIN(__builtin_amdgcn_wave_reduce_max_u64, "WUiWUiZi",
"nc")
BUILTIN(__builtin_amdgcn_wave_reduce_and_b64, "WiWiZi", "nc")
BUILTIN(__builtin_amdgcn_wave_reduce_or_b64, "WiWiZi", "nc")
BUILTIN(__builtin_amdgcn_wave_reduce_xor_b64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_add_f32, "ffZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_sub_f32, "ffZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_f32, "ffZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_f32, "ffZi", "nc")
//===--===//
// R600-NI only builtins.
diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index 07cf08c54985a..a242a73e4a822 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -301,18 +301,22 @@ static Intrinsic::ID
getIntrinsicIDforWaveReduction(unsigned BuiltinID) {
llvm_unreachable("Unknown BuiltinID for wave reduction");
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32:
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u64:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_f32:
return Intrinsic::amdgcn_wave_reduce_add;
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u32:
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u64:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_f32:
return Intrinsic::amdgcn_wave_reduce_sub;
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i32:
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i64:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_f32:
return Intrinsic::amdgcn_wave_reduce_min;
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u32:
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u64:
return Intrinsic::amdgcn_wave_reduce_umin;
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i32:
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i64:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_f32:
return Intrinsic::amdgcn_wave_reduce_max;
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u32:
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u64:
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [AMDGPU] Add builtins for wave reduction intrinsics (PR #161816)
https://github.com/easyonaadit updated
https://github.com/llvm/llvm-project/pull/161816
>From feaf31184bc20619448c87f3c38a1bbc3ec21719 Mon Sep 17 00:00:00 2001
From: Aaditya
Date: Tue, 30 Sep 2025 11:37:42 +0530
Subject: [PATCH] [AMDGPU] Add builtins for wave reduction intrinsics
---
clang/include/clang/Basic/BuiltinsAMDGPU.def | 4
clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp | 4
2 files changed, 8 insertions(+)
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index fda16e42d2c6b..ebc0ac35f42d9 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -402,6 +402,10 @@ BUILTIN(__builtin_amdgcn_wave_reduce_max_u64, "WUiWUiZi",
"nc")
BUILTIN(__builtin_amdgcn_wave_reduce_and_b64, "WiWiZi", "nc")
BUILTIN(__builtin_amdgcn_wave_reduce_or_b64, "WiWiZi", "nc")
BUILTIN(__builtin_amdgcn_wave_reduce_xor_b64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_add_f32, "ffZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_sub_f32, "ffZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_f32, "ffZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_f32, "ffZi", "nc")
//===--===//
// R600-NI only builtins.
diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index 07cf08c54985a..a242a73e4a822 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -301,18 +301,22 @@ static Intrinsic::ID
getIntrinsicIDforWaveReduction(unsigned BuiltinID) {
llvm_unreachable("Unknown BuiltinID for wave reduction");
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32:
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u64:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_f32:
return Intrinsic::amdgcn_wave_reduce_add;
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u32:
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u64:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_f32:
return Intrinsic::amdgcn_wave_reduce_sub;
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i32:
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i64:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_f32:
return Intrinsic::amdgcn_wave_reduce_min;
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u32:
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u64:
return Intrinsic::amdgcn_wave_reduce_umin;
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i32:
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i64:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_f32:
return Intrinsic::amdgcn_wave_reduce_max;
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u32:
case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u64:
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [AllocToken, Clang] Implement TypeHashPointerSplit mode (PR #156840)
https://github.com/zmodem approved this pull request. lgtm https://github.com/llvm/llvm-project/pull/156840 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Add wave reduce intrinsics for float types - 2 (PR #161815)
https://github.com/easyonaadit updated
https://github.com/llvm/llvm-project/pull/161815
>From 2e8024c70b755a3b309ec8b2965333e61a91af69 Mon Sep 17 00:00:00 2001
From: Aaditya
Date: Mon, 29 Sep 2025 18:58:10 +0530
Subject: [PATCH] [AMDGPU] Add wave reduce intrinsics for float types - 2
Supported Ops: `fadd`, `fsub`
---
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 40 +-
llvm/lib/Target/AMDGPU/SIInstructions.td | 2 +
.../CodeGen/AMDGPU/llvm.amdgcn.reduce.add.ll | 949 +
.../CodeGen/AMDGPU/llvm.amdgcn.reduce.sub.ll | 967 ++
4 files changed, 1955 insertions(+), 3 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 4d3c3de879bb6..bb58c97402527 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -5330,11 +5330,13 @@ static uint32_t
getIdentityValueFor32BitWaveReduction(unsigned Opc) {
case AMDGPU::S_MAX_U32:
return std::numeric_limits::min();
case AMDGPU::S_MAX_I32:
+ case AMDGPU::V_SUB_F32_e64: // +0.0
return std::numeric_limits::min();
case AMDGPU::S_ADD_I32:
case AMDGPU::S_SUB_I32:
case AMDGPU::S_OR_B32:
case AMDGPU::S_XOR_B32:
+ case AMDGPU::V_ADD_F32_e64: // -0.0
return std::numeric_limits::min();
case AMDGPU::S_AND_B32:
return std::numeric_limits::max();
@@ -5382,11 +5384,13 @@ static bool is32bitWaveReduceOperation(unsigned Opc) {
Opc == AMDGPU::S_ADD_I32 || Opc == AMDGPU::S_SUB_I32 ||
Opc == AMDGPU::S_AND_B32 || Opc == AMDGPU::S_OR_B32 ||
Opc == AMDGPU::S_XOR_B32 || Opc == AMDGPU::V_MIN_F32_e64 ||
- Opc == AMDGPU::V_MAX_F32_e64;
+ Opc == AMDGPU::V_MAX_F32_e64 || Opc == AMDGPU::V_ADD_F32_e64 ||
+ Opc == AMDGPU::V_SUB_F32_e64;
}
static bool isFloatingPointWaveReduceOperation(unsigned Opc) {
- return Opc == AMDGPU::V_MIN_F32_e64 || Opc == AMDGPU::V_MAX_F32_e64;
+ return Opc == AMDGPU::V_MIN_F32_e64 || Opc == AMDGPU::V_MAX_F32_e64 ||
+ Opc == AMDGPU::V_ADD_F32_e64 || Opc == AMDGPU::V_SUB_F32_e64;
}
static MachineBasicBlock *lowerWaveReduce(MachineInstr &MI,
@@ -5433,8 +5437,10 @@ static MachineBasicBlock *lowerWaveReduce(MachineInstr
&MI,
case AMDGPU::S_XOR_B64:
case AMDGPU::S_ADD_I32:
case AMDGPU::S_ADD_U64_PSEUDO:
+case AMDGPU::V_ADD_F32_e64:
case AMDGPU::S_SUB_I32:
-case AMDGPU::S_SUB_U64_PSEUDO: {
+case AMDGPU::S_SUB_U64_PSEUDO:
+case AMDGPU::V_SUB_F32_e64: {
const TargetRegisterClass *WaveMaskRegClass = TRI->getWaveMaskRegClass();
const TargetRegisterClass *DstRegClass = MRI.getRegClass(DstReg);
Register ExecMask = MRI.createVirtualRegister(WaveMaskRegClass);
@@ -5589,6 +5595,30 @@ static MachineBasicBlock *lowerWaveReduce(MachineInstr
&MI,
.addImm(AMDGPU::sub1);
break;
}
+ case AMDGPU::V_ADD_F32_e64:
+ case AMDGPU::V_SUB_F32_e64: {
+Register ActiveLanesVreg =
+MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
+Register DstVreg = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
+// Get number of active lanes as a float val.
+BuildMI(BB, MI, DL, TII->get(AMDGPU::V_CVT_F32_I32_e64),
+ActiveLanesVreg)
+.addReg(NewAccumulator->getOperand(0).getReg())
+.addImm(0) // clamp
+.addImm(0); // output-modifier
+
+// Take negation of input for SUB reduction
+unsigned srcMod = Opc == AMDGPU::V_SUB_F32_e64 ? 1 : 0;
+BuildMI(BB, MI, DL, TII->get(AMDGPU::V_MUL_F32_e64), DstVreg)
+.addImm(srcMod) // src0 modifier
+.addReg(SrcReg)
+.addImm(0) // src1 modifier
+.addReg(ActiveLanesVreg)
+.addImm(0) // clamp
+.addImm(0); // output-mod
+BuildMI(BB, MI, DL, TII->get(AMDGPU::V_READFIRSTLANE_B32), DstReg)
+.addReg(DstVreg);
+ }
}
RetBB = &BB;
}
@@ -5833,10 +5863,14 @@
SITargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_ADD_I32);
case AMDGPU::WAVE_REDUCE_ADD_PSEUDO_U64:
return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_ADD_U64_PSEUDO);
+ case AMDGPU::WAVE_REDUCE_ADD_PSEUDO_F32:
+return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::V_ADD_F32_e64);
case AMDGPU::WAVE_REDUCE_SUB_PSEUDO_I32:
return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_SUB_I32);
case AMDGPU::WAVE_REDUCE_SUB_PSEUDO_U64:
return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_SUB_U64_PSEUDO);
+ case AMDGPU::WAVE_REDUCE_SUB_PSEUDO_F32:
+return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::V_SUB_F32_e64);
case AMDGPU::WAVE_REDUCE_AND_PSEUDO_B32:
return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_AND_B32);
case AMDGPU::WAVE_REDUCE_AND_PSEUDO_B64:
diff --git a/llvm/lib/Target/AMDGPU/SIInstruc
[llvm-branch-commits] [clang] [AMDGPU] Add builtins for wave reduction intrinsics (PR #161816)
easyonaadit wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/161816?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#161816** https://app.graphite.dev/github/pr/llvm/llvm-project/161816?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> ๐ https://app.graphite.dev/github/pr/llvm/llvm-project/161816?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#161815** https://app.graphite.dev/github/pr/llvm/llvm-project/161815?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#161814** https://app.graphite.dev/github/pr/llvm/llvm-project/161814?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/161816 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [clang] [Headers]ย Don't use unreserved names in avx10_2bf16intrin.h (#161824) (PR #161836)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/161836 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [clang] [Headers]ย Don't use unreserved names in avx10_2bf16intrin.h (#161824) (PR #161836)
https://github.com/RKSimon approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/161836 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [clang] [Headers]ย Don't use unreserved names in avx10_2bf16intrin.h (#161824) (PR #161836)
llvmbot wrote:
@llvm/pr-subscribers-clang
Author: None (llvmbot)
Changes
Backport 3c5c82d09c691a83fec5d09df2f6a308a789ead1
Requested by: @mstorsjo
---
Full diff: https://github.com/llvm/llvm-project/pull/161836.diff
1 Files Affected:
- (modified) clang/lib/Headers/avx10_2bf16intrin.h (+18-18)
``diff
diff --git a/clang/lib/Headers/avx10_2bf16intrin.h
b/clang/lib/Headers/avx10_2bf16intrin.h
index 66797ae00fe4f..0ca5380829391 100644
--- a/clang/lib/Headers/avx10_2bf16intrin.h
+++ b/clang/lib/Headers/avx10_2bf16intrin.h
@@ -519,34 +519,34 @@ _mm_maskz_min_pbh(__mmask8 __U, __m128bh __A, __m128bh
__B) {
(__mmask8)__U, (__v8bf)_mm_min_pbh(__A, __B), (__v8bf)_mm_setzero_pbh());
}
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comieq_sbh(__m128bh A,
- __m128bh B) {
- return __builtin_ia32_vcomisbf16eq((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comieq_sbh(__m128bh __A,
+ __m128bh __B) {
+ return __builtin_ia32_vcomisbf16eq((__v8bf)__A, (__v8bf)__B);
}
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comilt_sbh(__m128bh A,
- __m128bh B) {
- return __builtin_ia32_vcomisbf16lt((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comilt_sbh(__m128bh __A,
+ __m128bh __B) {
+ return __builtin_ia32_vcomisbf16lt((__v8bf)__A, (__v8bf)__B);
}
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comile_sbh(__m128bh A,
- __m128bh B) {
- return __builtin_ia32_vcomisbf16le((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comile_sbh(__m128bh __A,
+ __m128bh __B) {
+ return __builtin_ia32_vcomisbf16le((__v8bf)__A, (__v8bf)__B);
}
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comigt_sbh(__m128bh A,
- __m128bh B) {
- return __builtin_ia32_vcomisbf16gt((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comigt_sbh(__m128bh __A,
+ __m128bh __B) {
+ return __builtin_ia32_vcomisbf16gt((__v8bf)__A, (__v8bf)__B);
}
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comige_sbh(__m128bh A,
- __m128bh B) {
- return __builtin_ia32_vcomisbf16ge((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comige_sbh(__m128bh __A,
+ __m128bh __B) {
+ return __builtin_ia32_vcomisbf16ge((__v8bf)__A, (__v8bf)__B);
}
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comineq_sbh(__m128bh A,
-__m128bh B) {
- return __builtin_ia32_vcomisbf16neq((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comineq_sbh(__m128bh __A,
+__m128bh __B) {
+ return __builtin_ia32_vcomisbf16neq((__v8bf)__A, (__v8bf)__B);
}
#define _mm256_cmp_pbh_mask(__A, __B, __P)
\
``
https://github.com/llvm/llvm-project/pull/161836
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] CodeGen: Stop checking for physregs in constrainRegClass (PR #161795)
@@ -83,8 +83,6 @@ constrainRegClass(MachineRegisterInfo &MRI, Register Reg,
const TargetRegisterClass *MachineRegisterInfo::constrainRegClass(
Register Reg, const TargetRegisterClass *RC, unsigned MinNumRegs) {
- if (Reg.isPhysical())
cdevadas wrote:
What if someone unkonwingly uses this function post-RA?
https://github.com/llvm/llvm-project/pull/161795
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Stop using the wavemask register class for SCC cross class copies (PR #161801)
@@ -1118,9 +1118,7 @@ SIRegisterInfo::getPointerRegClass(unsigned Kind) const {
const TargetRegisterClass *
SIRegisterInfo::getCrossCopyRegClass(const TargetRegisterClass *RC) const {
- if (RC == &AMDGPU::SCC_CLASSRegClass)
-return getWaveMaskRegClass();
- return RC;
+ return RC == &AMDGPU::SCC_CLASSRegClass ? &AMDGPU::SReg_32RegClass : RC;
arsenm wrote:
No. These have nothing to do with each other. To extract a value into an
allocatable register, a 32-bit SGPR is the natural choice
https://github.com/llvm/llvm-project/pull/161801
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Add wave reduce intrinsics for float types - 2 (PR #161815)
https://github.com/easyonaadit updated
https://github.com/llvm/llvm-project/pull/161815
>From 2e8024c70b755a3b309ec8b2965333e61a91af69 Mon Sep 17 00:00:00 2001
From: Aaditya
Date: Mon, 29 Sep 2025 18:58:10 +0530
Subject: [PATCH] [AMDGPU] Add wave reduce intrinsics for float types - 2
Supported Ops: `fadd`, `fsub`
---
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 40 +-
llvm/lib/Target/AMDGPU/SIInstructions.td | 2 +
.../CodeGen/AMDGPU/llvm.amdgcn.reduce.add.ll | 949 +
.../CodeGen/AMDGPU/llvm.amdgcn.reduce.sub.ll | 967 ++
4 files changed, 1955 insertions(+), 3 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 4d3c3de879bb6..bb58c97402527 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -5330,11 +5330,13 @@ static uint32_t
getIdentityValueFor32BitWaveReduction(unsigned Opc) {
case AMDGPU::S_MAX_U32:
return std::numeric_limits::min();
case AMDGPU::S_MAX_I32:
+ case AMDGPU::V_SUB_F32_e64: // +0.0
return std::numeric_limits::min();
case AMDGPU::S_ADD_I32:
case AMDGPU::S_SUB_I32:
case AMDGPU::S_OR_B32:
case AMDGPU::S_XOR_B32:
+ case AMDGPU::V_ADD_F32_e64: // -0.0
return std::numeric_limits::min();
case AMDGPU::S_AND_B32:
return std::numeric_limits::max();
@@ -5382,11 +5384,13 @@ static bool is32bitWaveReduceOperation(unsigned Opc) {
Opc == AMDGPU::S_ADD_I32 || Opc == AMDGPU::S_SUB_I32 ||
Opc == AMDGPU::S_AND_B32 || Opc == AMDGPU::S_OR_B32 ||
Opc == AMDGPU::S_XOR_B32 || Opc == AMDGPU::V_MIN_F32_e64 ||
- Opc == AMDGPU::V_MAX_F32_e64;
+ Opc == AMDGPU::V_MAX_F32_e64 || Opc == AMDGPU::V_ADD_F32_e64 ||
+ Opc == AMDGPU::V_SUB_F32_e64;
}
static bool isFloatingPointWaveReduceOperation(unsigned Opc) {
- return Opc == AMDGPU::V_MIN_F32_e64 || Opc == AMDGPU::V_MAX_F32_e64;
+ return Opc == AMDGPU::V_MIN_F32_e64 || Opc == AMDGPU::V_MAX_F32_e64 ||
+ Opc == AMDGPU::V_ADD_F32_e64 || Opc == AMDGPU::V_SUB_F32_e64;
}
static MachineBasicBlock *lowerWaveReduce(MachineInstr &MI,
@@ -5433,8 +5437,10 @@ static MachineBasicBlock *lowerWaveReduce(MachineInstr
&MI,
case AMDGPU::S_XOR_B64:
case AMDGPU::S_ADD_I32:
case AMDGPU::S_ADD_U64_PSEUDO:
+case AMDGPU::V_ADD_F32_e64:
case AMDGPU::S_SUB_I32:
-case AMDGPU::S_SUB_U64_PSEUDO: {
+case AMDGPU::S_SUB_U64_PSEUDO:
+case AMDGPU::V_SUB_F32_e64: {
const TargetRegisterClass *WaveMaskRegClass = TRI->getWaveMaskRegClass();
const TargetRegisterClass *DstRegClass = MRI.getRegClass(DstReg);
Register ExecMask = MRI.createVirtualRegister(WaveMaskRegClass);
@@ -5589,6 +5595,30 @@ static MachineBasicBlock *lowerWaveReduce(MachineInstr
&MI,
.addImm(AMDGPU::sub1);
break;
}
+ case AMDGPU::V_ADD_F32_e64:
+ case AMDGPU::V_SUB_F32_e64: {
+Register ActiveLanesVreg =
+MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
+Register DstVreg = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
+// Get number of active lanes as a float val.
+BuildMI(BB, MI, DL, TII->get(AMDGPU::V_CVT_F32_I32_e64),
+ActiveLanesVreg)
+.addReg(NewAccumulator->getOperand(0).getReg())
+.addImm(0) // clamp
+.addImm(0); // output-modifier
+
+// Take negation of input for SUB reduction
+unsigned srcMod = Opc == AMDGPU::V_SUB_F32_e64 ? 1 : 0;
+BuildMI(BB, MI, DL, TII->get(AMDGPU::V_MUL_F32_e64), DstVreg)
+.addImm(srcMod) // src0 modifier
+.addReg(SrcReg)
+.addImm(0) // src1 modifier
+.addReg(ActiveLanesVreg)
+.addImm(0) // clamp
+.addImm(0); // output-mod
+BuildMI(BB, MI, DL, TII->get(AMDGPU::V_READFIRSTLANE_B32), DstReg)
+.addReg(DstVreg);
+ }
}
RetBB = &BB;
}
@@ -5833,10 +5863,14 @@
SITargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_ADD_I32);
case AMDGPU::WAVE_REDUCE_ADD_PSEUDO_U64:
return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_ADD_U64_PSEUDO);
+ case AMDGPU::WAVE_REDUCE_ADD_PSEUDO_F32:
+return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::V_ADD_F32_e64);
case AMDGPU::WAVE_REDUCE_SUB_PSEUDO_I32:
return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_SUB_I32);
case AMDGPU::WAVE_REDUCE_SUB_PSEUDO_U64:
return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_SUB_U64_PSEUDO);
+ case AMDGPU::WAVE_REDUCE_SUB_PSEUDO_F32:
+return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::V_SUB_F32_e64);
case AMDGPU::WAVE_REDUCE_AND_PSEUDO_B32:
return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_AND_B32);
case AMDGPU::WAVE_REDUCE_AND_PSEUDO_B64:
diff --git a/llvm/lib/Target/AMDGPU/SIInstruc
[llvm-branch-commits] [llvm] AMDGPU: Remove LDS_DIRECT_CLASS register class (PR #161762)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/161762 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [clang] [Headers]ย Don't use unreserved names in avx10_2bf16intrin.h (#161824) (PR #161836)
llvmbot wrote:
@llvm/pr-subscribers-backend-x86
Author: None (llvmbot)
Changes
Backport 3c5c82d09c691a83fec5d09df2f6a308a789ead1
Requested by: @mstorsjo
---
Full diff: https://github.com/llvm/llvm-project/pull/161836.diff
1 Files Affected:
- (modified) clang/lib/Headers/avx10_2bf16intrin.h (+18-18)
``diff
diff --git a/clang/lib/Headers/avx10_2bf16intrin.h
b/clang/lib/Headers/avx10_2bf16intrin.h
index 66797ae00fe4f..0ca5380829391 100644
--- a/clang/lib/Headers/avx10_2bf16intrin.h
+++ b/clang/lib/Headers/avx10_2bf16intrin.h
@@ -519,34 +519,34 @@ _mm_maskz_min_pbh(__mmask8 __U, __m128bh __A, __m128bh
__B) {
(__mmask8)__U, (__v8bf)_mm_min_pbh(__A, __B), (__v8bf)_mm_setzero_pbh());
}
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comieq_sbh(__m128bh A,
- __m128bh B) {
- return __builtin_ia32_vcomisbf16eq((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comieq_sbh(__m128bh __A,
+ __m128bh __B) {
+ return __builtin_ia32_vcomisbf16eq((__v8bf)__A, (__v8bf)__B);
}
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comilt_sbh(__m128bh A,
- __m128bh B) {
- return __builtin_ia32_vcomisbf16lt((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comilt_sbh(__m128bh __A,
+ __m128bh __B) {
+ return __builtin_ia32_vcomisbf16lt((__v8bf)__A, (__v8bf)__B);
}
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comile_sbh(__m128bh A,
- __m128bh B) {
- return __builtin_ia32_vcomisbf16le((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comile_sbh(__m128bh __A,
+ __m128bh __B) {
+ return __builtin_ia32_vcomisbf16le((__v8bf)__A, (__v8bf)__B);
}
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comigt_sbh(__m128bh A,
- __m128bh B) {
- return __builtin_ia32_vcomisbf16gt((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comigt_sbh(__m128bh __A,
+ __m128bh __B) {
+ return __builtin_ia32_vcomisbf16gt((__v8bf)__A, (__v8bf)__B);
}
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comige_sbh(__m128bh A,
- __m128bh B) {
- return __builtin_ia32_vcomisbf16ge((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comige_sbh(__m128bh __A,
+ __m128bh __B) {
+ return __builtin_ia32_vcomisbf16ge((__v8bf)__A, (__v8bf)__B);
}
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comineq_sbh(__m128bh A,
-__m128bh B) {
- return __builtin_ia32_vcomisbf16neq((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comineq_sbh(__m128bh __A,
+__m128bh __B) {
+ return __builtin_ia32_vcomisbf16neq((__v8bf)__A, (__v8bf)__B);
}
#define _mm256_cmp_pbh_mask(__A, __B, __P)
\
``
https://github.com/llvm/llvm-project/pull/161836
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [clang] [Headers]ย Don't use unreserved names in avx10_2bf16intrin.h (#161824) (PR #161836)
https://github.com/llvmbot created
https://github.com/llvm/llvm-project/pull/161836
Backport 3c5c82d09c691a83fec5d09df2f6a308a789ead1
Requested by: @mstorsjo
>From 290ad0a527af6e28eb53782226af41b682be721c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Martin=20Storsj=C3=B6?=
Date: Fri, 3 Oct 2025 15:25:24 +0300
Subject: [PATCH] =?UTF-8?q?[clang]=20[Headers]=C2=A0Don't=20use=20unreserv?=
=?UTF-8?q?ed=20names=20in=20avx10=5F2bf16intrin.h=20(#161824)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This can cause breakage with user code that does "#define A ...".
This fixes issue https://github.com/llvm/llvm-project/issues/161808.
(cherry picked from commit 3c5c82d09c691a83fec5d09df2f6a308a789ead1)
---
clang/lib/Headers/avx10_2bf16intrin.h | 36 +--
1 file changed, 18 insertions(+), 18 deletions(-)
diff --git a/clang/lib/Headers/avx10_2bf16intrin.h
b/clang/lib/Headers/avx10_2bf16intrin.h
index 66797ae00fe4f..0ca5380829391 100644
--- a/clang/lib/Headers/avx10_2bf16intrin.h
+++ b/clang/lib/Headers/avx10_2bf16intrin.h
@@ -519,34 +519,34 @@ _mm_maskz_min_pbh(__mmask8 __U, __m128bh __A, __m128bh
__B) {
(__mmask8)__U, (__v8bf)_mm_min_pbh(__A, __B), (__v8bf)_mm_setzero_pbh());
}
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comieq_sbh(__m128bh A,
- __m128bh B) {
- return __builtin_ia32_vcomisbf16eq((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comieq_sbh(__m128bh __A,
+ __m128bh __B) {
+ return __builtin_ia32_vcomisbf16eq((__v8bf)__A, (__v8bf)__B);
}
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comilt_sbh(__m128bh A,
- __m128bh B) {
- return __builtin_ia32_vcomisbf16lt((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comilt_sbh(__m128bh __A,
+ __m128bh __B) {
+ return __builtin_ia32_vcomisbf16lt((__v8bf)__A, (__v8bf)__B);
}
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comile_sbh(__m128bh A,
- __m128bh B) {
- return __builtin_ia32_vcomisbf16le((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comile_sbh(__m128bh __A,
+ __m128bh __B) {
+ return __builtin_ia32_vcomisbf16le((__v8bf)__A, (__v8bf)__B);
}
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comigt_sbh(__m128bh A,
- __m128bh B) {
- return __builtin_ia32_vcomisbf16gt((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comigt_sbh(__m128bh __A,
+ __m128bh __B) {
+ return __builtin_ia32_vcomisbf16gt((__v8bf)__A, (__v8bf)__B);
}
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comige_sbh(__m128bh A,
- __m128bh B) {
- return __builtin_ia32_vcomisbf16ge((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comige_sbh(__m128bh __A,
+ __m128bh __B) {
+ return __builtin_ia32_vcomisbf16ge((__v8bf)__A, (__v8bf)__B);
}
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comineq_sbh(__m128bh A,
-__m128bh B) {
- return __builtin_ia32_vcomisbf16neq((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comineq_sbh(__m128bh __A,
+__m128bh __B) {
+ return __builtin_ia32_vcomisbf16neq((__v8bf)__A, (__v8bf)__B);
}
#define _mm256_cmp_pbh_mask(__A, __B, __P)
\
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] CodeGen: Stop checking for physregs in constrainRegClass (PR #161795)
@@ -83,8 +83,6 @@ constrainRegClass(MachineRegisterInfo &MRI, Register Reg,
const TargetRegisterClass *MachineRegisterInfo::constrainRegClass(
Register Reg, const TargetRegisterClass *RC, unsigned MinNumRegs) {
- if (Reg.isPhysical())
arsenm wrote:
It will assert
https://github.com/llvm/llvm-project/pull/161795
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Add wave reduce intrinsics for float types - 2 (PR #161815)
github-actions[bot] wrote:
:warning: C/C++ code formatter, clang-format found issues in your code.
:warning:
You can test this locally with the following command:
``bash
git-clang-format --diff origin/main HEAD --extensions cpp --
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
``
:warning:
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing `origin/main` to the base branch/commit you want to compare against.
:warning:
View the diff from clang-format here.
``diff
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 042380383..8b8a10964 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -5396,7 +5396,7 @@ static bool is32bitWaveReduceOperation(unsigned Opc) {
}
static bool isFloatingPointWaveReduceOperation(unsigned Opc) {
- return Opc == AMDGPU::V_MIN_F32_e64 || Opc == AMDGPU::V_MAX_F32_e64 ||
+ return Opc == AMDGPU::V_MIN_F32_e64 || Opc == AMDGPU::V_MAX_F32_e64 ||
Opc == AMDGPU::V_ADD_F32_e64 || Opc == AMDGPU::V_SUB_F32_e64;
}
@@ -5446,7 +5446,7 @@ static MachineBasicBlock *lowerWaveReduce(MachineInstr
&MI,
case AMDGPU::S_ADD_U64_PSEUDO:
case AMDGPU::V_ADD_F32_e64:
case AMDGPU::S_SUB_I32:
-case AMDGPU::S_SUB_U64_PSEUDO:
+case AMDGPU::S_SUB_U64_PSEUDO:
case AMDGPU::V_SUB_F32_e64: {
const TargetRegisterClass *WaveMaskRegClass = TRI->getWaveMaskRegClass();
const TargetRegisterClass *DstRegClass = MRI.getRegClass(DstReg);
@@ -5604,33 +5604,38 @@ static MachineBasicBlock *lowerWaveReduce(MachineInstr
&MI,
}
case AMDGPU::V_ADD_F32_e64:
case AMDGPU::V_SUB_F32_e64: {
- /// for FPop: #activebits: int, src: float.
- /// convert int to float, and then mul. there is only V_MUL_F32, so copy
to vgpr.
- ///
/home/aalokdes/dockerx/work/llvm-trunk/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-fadd.s32.mir
- /// ig: 1(01) -> negation, 2(10) -> abs, 3(11) -> abs and neg
- // V_CVT_F32_I32_e64
- // get #active lanes in vgpr
- Register ActiveLanesVreg =
MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
- Register DstVreg = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
- BuildMI(BB, MI, DL, TII->get(AMDGPU::V_CVT_F32_I32_e64), ActiveLanesVreg)
+/// for FPop: #activebits: int, src: float.
+/// convert int to float, and then mul. there is only V_MUL_F32, so
copy
+/// to vgpr.
+///
/home/aalokdes/dockerx/work/llvm-trunk/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-fadd.s32.mir
+/// ig: 1(01) -> negation, 2(10) -> abs, 3(11) -> abs and neg
+// V_CVT_F32_I32_e64
+// get #active lanes in vgpr
+Register ActiveLanesVreg =
+MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
+Register DstVreg = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
+BuildMI(BB, MI, DL, TII->get(AMDGPU::V_CVT_F32_I32_e64),
+ActiveLanesVreg)
// .addReg(SrcReg)
.addReg(NewAccumulator->getOperand(0).getReg())
-.addImm(0) // clamp
+.addImm(0) // clamp
.addImm(0); // output-modifier
- // Multiply numactivelanes * src
- // Take negation of input for SUB reduction
- unsigned srcMod = Opc == AMDGPU::V_SUB_F32_e64 ? 1 : 0; // check this to
make sure i am taking negation
- BuildMI(BB, MI, DL, TII->get(AMDGPU::V_MUL_F32_e64), DstVreg)
+// Multiply numactivelanes * src
+// Take negation of input for SUB reduction
+unsigned srcMod =
+Opc == AMDGPU::V_SUB_F32_e64
+? 1
+: 0; // check this to make sure i am taking negation
+BuildMI(BB, MI, DL, TII->get(AMDGPU::V_MUL_F32_e64), DstVreg)
.addImm(srcMod) // src0 modifier
.addReg(SrcReg)
.addImm(0) // src1 modifier
.addReg(ActiveLanesVreg)
-.addImm(0) // clamp
+.addImm(0) // clamp
.addImm(0); // output-mod
- BuildMI(BB, MI, DL,
- TII->get(AMDGPU::V_READFIRSTLANE_B32), DstReg)
- .addReg(DstVreg);
+BuildMI(BB, MI, DL, TII->get(AMDGPU::V_READFIRSTLANE_B32), DstReg)
+.addReg(DstVreg);
}
}
RetBB = &BB;
``
https://github.com/llvm/llvm-project/pull/161815
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SimplifyCFG][profcheck] Profile propagation for `indirectbr` (PR #161747)
https://github.com/mtrofin ready_for_review https://github.com/llvm/llvm-project/pull/161747 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SimplifyCFG][profcheck] Profile propagation for `indirectbr` (PR #161747)
https://github.com/mtrofin edited https://github.com/llvm/llvm-project/pull/161747 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SimplifyCFG][profcheck] Profile propagation for `indirectbr` (PR #161747)
https://github.com/mtrofin edited https://github.com/llvm/llvm-project/pull/161747 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [mlir] [OpenMP][OMPIRBuilder] Support parallel in Generic kernels (PR #150926)
https://github.com/skatrak updated
https://github.com/llvm/llvm-project/pull/150926
>From 9533653e89c7d9abf065a62c7c880cc012886be4 Mon Sep 17 00:00:00 2001
From: Sergio Afonso
Date: Fri, 4 Jul 2025 16:32:03 +0100
Subject: [PATCH 1/2] [OpenMP][OMPIRBuilder] Support parallel in Generic
kernels
This patch introduces codegen logic to produce a wrapper function argument for
the `__kmpc_parallel_51` DeviceRTL function needed to handle arguments passed
using device shared memory in Generic mode.
---
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 100 --
.../LLVMIR/omptarget-parallel-llvm.mlir | 25 -
2 files changed, 116 insertions(+), 9 deletions(-)
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index e0b7378c34f77..9e43784412d53 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -1426,6 +1426,86 @@ Error OpenMPIRBuilder::emitCancelationCheckImpl(
return Error::success();
}
+// Create wrapper function used to gather the outlined function's argument
+// structure from a shared buffer and to forward them to it when running in
+// Generic mode.
+//
+// The outlined function is expected to receive 2 integer arguments followed by
+// an optional pointer argument to an argument structure holding the rest.
+static Function *createTargetParallelWrapper(OpenMPIRBuilder *OMPIRBuilder,
+ Function &OutlinedFn) {
+ size_t NumArgs = OutlinedFn.arg_size();
+ assert((NumArgs == 2 || NumArgs == 3) &&
+ "expected a 2-3 argument parallel outlined function");
+ bool UseArgStruct = NumArgs == 3;
+
+ IRBuilder<> &Builder = OMPIRBuilder->Builder;
+ IRBuilder<>::InsertPointGuard IPG(Builder);
+ auto *FnTy = FunctionType::get(Builder.getVoidTy(),
+ {Builder.getInt16Ty(), Builder.getInt32Ty()},
+ /*isVarArg=*/false);
+ auto *WrapperFn =
+ Function::Create(FnTy, GlobalValue::InternalLinkage,
+ OutlinedFn.getName() + ".wrapper", OMPIRBuilder->M);
+
+ WrapperFn->addParamAttr(0, Attribute::NoUndef);
+ WrapperFn->addParamAttr(0, Attribute::ZExt);
+ WrapperFn->addParamAttr(1, Attribute::NoUndef);
+
+ BasicBlock *EntryBB =
+ BasicBlock::Create(OMPIRBuilder->M.getContext(), "entry", WrapperFn);
+ Builder.SetInsertPoint(EntryBB);
+
+ // Allocation.
+ Value *AddrAlloca = Builder.CreateAlloca(Builder.getInt32Ty(),
+ /*ArraySize=*/nullptr, "addr");
+ AddrAlloca = Builder.CreatePointerBitCastOrAddrSpaceCast(
+ AddrAlloca, Builder.getPtrTy(/*AddrSpace=*/0),
+ AddrAlloca->getName() + ".ascast");
+
+ Value *ZeroAlloca = Builder.CreateAlloca(Builder.getInt32Ty(),
+ /*ArraySize=*/nullptr, "zero");
+ ZeroAlloca = Builder.CreatePointerBitCastOrAddrSpaceCast(
+ ZeroAlloca, Builder.getPtrTy(/*AddrSpace=*/0),
+ ZeroAlloca->getName() + ".ascast");
+
+ Value *ArgsAlloca = nullptr;
+ if (UseArgStruct) {
+ArgsAlloca = Builder.CreateAlloca(Builder.getPtrTy(),
+ /*ArraySize=*/nullptr, "global_args");
+ArgsAlloca = Builder.CreatePointerBitCastOrAddrSpaceCast(
+ArgsAlloca, Builder.getPtrTy(/*AddrSpace=*/0),
+ArgsAlloca->getName() + ".ascast");
+ }
+
+ // Initialization.
+ Builder.CreateStore(WrapperFn->getArg(1), AddrAlloca);
+ Builder.CreateStore(Builder.getInt32(0), ZeroAlloca);
+ if (UseArgStruct) {
+Builder.CreateCall(
+OMPIRBuilder->getOrCreateRuntimeFunctionPtr(
+llvm::omp::RuntimeFunction::OMPRTL___kmpc_get_shared_variables),
+{ArgsAlloca});
+ }
+
+ SmallVector Args{AddrAlloca, ZeroAlloca};
+
+ // Load structArg from global_args.
+ if (UseArgStruct) {
+Value *StructArg = Builder.CreateLoad(Builder.getPtrTy(), ArgsAlloca);
+StructArg = Builder.CreateInBoundsGEP(Builder.getPtrTy(), StructArg,
+ {Builder.getInt64(0)});
+StructArg = Builder.CreateLoad(Builder.getPtrTy(), StructArg, "structArg");
+Args.push_back(StructArg);
+ }
+
+ // Call the outlined function holding the parallel body.
+ Builder.CreateCall(&OutlinedFn, Args);
+ Builder.CreateRetVoid();
+
+ return WrapperFn;
+}
+
// Callback used to create OpenMP runtime calls to support
// omp parallel clause for the device.
// We need to use this callback to replace call to the OutlinedFn in OuterFn
@@ -1435,6 +1515,10 @@ static void targetParallelCallback(
BasicBlock *OuterAllocaBB, Value *Ident, Value *IfCondition,
Value *NumThreads, Instruction *PrivTID, AllocaInst *PrivTIDAddr,
Value *ThreadID, const SmallVector &ToBeDeleted) {
+ assert(OutlinedFn.arg_size() >= 2 &&
+ "Expected at least tid and bounded tid as arguments");
+ unsigned NumCapturedVars = OutlinedFn.arg_size() - /* tid & bounded tid */ 2;
+
//
[llvm-branch-commits] [llvm] [mlir] [MLIR][OpenMP][OMPIRBuilder] Improve shared memory checks (PR #161864)
github-actions[bot] wrote:
:warning: C/C++ code formatter, clang-format found issues in your code.
:warning:
You can test this locally with the following command:
``bash
git-clang-format --diff origin/main HEAD --extensions cpp,h --
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
``
:warning:
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing `origin/main` to the base branch/commit you want to compare against.
:warning:
View the diff from clang-format here.
``diff
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 530477e6d..c164d32f8 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -312,7 +312,7 @@ getTargetKernelExecMode(Function &Kernel) {
return static_cast(KernelMode->getZExtValue());
}
-static bool isGenericKernel(Function &Fn){
+static bool isGenericKernel(Function &Fn) {
std::optional ExecMode =
getTargetKernelExecMode(Fn);
return !ExecMode || (*ExecMode & OMP_TGT_EXEC_MODE_GENERIC);
``
https://github.com/llvm/llvm-project/pull/161864
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [CIR] Upstream `AddressSpace` conversions support (PR #161212)
bcardosolopes wrote: > I made a change tn the function: `performAddrSpaceCast` and I opted for > getting rid of the `LangAS` parameters for both source and destination Sounds legit, thanks! https://github.com/llvm/llvm-project/pull/161212 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [OpenMPOpt] Make parallel regions reachable from new DeviceRTL loop functions (PR #150927)
skatrak wrote: Moving to draft because I've noticed this doesn't currently work whenever there are different calls to new DeviceRTL loop functions. The state machine rewrite optimization of OpenMPOpt causes the following code to not run properly, whereas the same code without the `unused_problematic` subroutine in it (or compiled with `-mllvm -openmp-opt-disable-state-machine-rewrite`) works: ```f90 ! flang -fopenmp -fopenmp-version=52 --offload-arch=gfx1030 test.f90 && OMP_TARGET_OFFLOAD=MANDATORY ./a.out subroutine test_subroutine(counter) implicit none integer, intent(out) :: counter integer :: i1, i2, n1, n2 n1 = 100 n2 = 50 counter = 0 !$omp target teams distribute reduction(+:counter) do i1=1, n1 !$omp parallel do reduction(+:counter) do i2=1, n2 counter = counter + 1 end do end do end subroutine program main implicit none integer :: counter call test_subroutine(counter) ! Should print: 5000 print '(I0)', counter end program subroutine foo(i) integer, intent(inout) :: i end subroutine ! The presence of this unreachable function in the compilation unit causes ! the result of `test_subroutine` to be incorrect. Removing the `distribute` ! OpenMP directive avoids the problem. subroutine unused_problematic() implicit none integer :: i !$omp target teams !$omp distribute do i=1, 100 call foo(i) end do !$omp end target teams end subroutine ``` https://github.com/llvm/llvm-project/pull/150927 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [mlir] [MLIR][OpenMP][OMPIRBuilder] Improve shared memory checks (PR #161864)
llvmbot wrote:
@llvm/pr-subscribers-mlir-openmp
Author: Sergio Afonso (skatrak)
Changes
This patch refines checks to decide whether to use device shared memory or
regular stack allocations. In particular, it adds support for parallel regions
residing on standalone target device functions.
The changes are:
- Shared memory is introduced for `omp.target` implicit allocations, such as
those related to privatization and mapping, as long as they are shared across
threads in a nested parallel region.
- Standalone target device functions are interpreted as being part of a Generic
kernel, since the fact that they are present in the module after filtering
means they must be reachable from a target region.
- Prevent allocations whose only shared uses inside of an `omp.parallel` region
are as part of a `private` clause from being moved to device shared memory.
---
Patch is 26.11 KiB, truncated to 20.00 KiB below, full version:
https://github.com/llvm/llvm-project/pull/161864.diff
7 Files Affected:
- (modified) llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h (+2-2)
- (modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+15-13)
- (modified) llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp (+38-17)
- (modified)
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+84-38)
- (modified) mlir/test/Target/LLVMIR/omptarget-parallel-llvm.mlir (+4-4)
- (modified) mlir/test/Target/LLVMIR/omptarget-parallel-wsloop.mlir (+6-1)
- (added) offload/test/offloading/fortran/target-generic-outlined-loops.f90
(+109)
``diff
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index d8e5f8cf5a45e..410912ba375a3 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -3292,8 +3292,8 @@ class OpenMPIRBuilder {
ArrayRef DeallocIPs)>;
using TargetGenArgAccessorsCallbackTy = function_ref;
+ Argument &Arg, Value *Input, Value *&RetVal, InsertPointTy AllocIP,
+ InsertPointTy CodeGenIP, ArrayRef DeallocIPs)>;
/// Generator for '#omp target'
///
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index a18db939b5876..530477e6d2f6d 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -312,6 +312,12 @@ getTargetKernelExecMode(Function &Kernel) {
return static_cast(KernelMode->getZExtValue());
}
+static bool isGenericKernel(Function &Fn){
+ std::optional ExecMode =
+ getTargetKernelExecMode(Fn);
+ return !ExecMode || (*ExecMode & OMP_TGT_EXEC_MODE_GENERIC);
+}
+
/// Make \p Source branch to \p Target.
///
/// Handles two situations:
@@ -1535,11 +1541,9 @@ static void targetParallelCallback(
IfCondition ? Builder.CreateSExtOrTrunc(IfCondition, OMPIRBuilder->Int32)
: Builder.getInt32(1);
- // If this is not a Generic kernel, we can skip generating the wrapper.
- std::optional ExecMode =
- getTargetKernelExecMode(*OuterFn);
+ // If this is a Generic kernel, we can generate the wrapper.
Value *WrapperFn;
- if (ExecMode && (*ExecMode & OMP_TGT_EXEC_MODE_GENERIC))
+ if (isGenericKernel(*OuterFn))
WrapperFn = createTargetParallelWrapper(OMPIRBuilder, OutlinedFn);
else
WrapperFn = Constant::getNullValue(PtrTy);
@@ -1812,13 +1816,10 @@ OpenMPIRBuilder::InsertPointOrErrorTy
OpenMPIRBuilder::createParallel(
auto OI = [&]() -> std::unique_ptr {
if (Config.isTargetDevice()) {
- std::optional ExecMode =
- getTargetKernelExecMode(*OuterFn);
-
- // If OuterFn is not a Generic kernel, skip custom allocation. This
causes
- // the CodeExtractor to follow its default behavior. Otherwise, we need
to
- // use device shared memory to allocate argument structures.
- if (ExecMode && *ExecMode & OMP_TGT_EXEC_MODE_GENERIC)
+ // If OuterFn is a Generic kernel, we need to use device shared memory to
+ // allocate argument structures. Otherwise, we use stack allocations as
+ // usual.
+ if (isGenericKernel(*OuterFn))
return std::make_unique(*this);
}
return std::make_unique();
@@ -7806,8 +7807,9 @@ static Expected createOutlinedFunction(
Argument &Arg = std::get<1>(InArg);
Value *InputCopy = nullptr;
-llvm::OpenMPIRBuilder::InsertPointOrErrorTy AfterIP =
-ArgAccessorFuncCB(Arg, Input, InputCopy, AllocaIP, Builder.saveIP());
+llvm::OpenMPIRBuilder::InsertPointOrErrorTy AfterIP = ArgAccessorFuncCB(
+Arg, Input, InputCopy, AllocaIP, Builder.saveIP(),
+OpenMPIRBuilder::InsertPointTy(ExitBB, ExitBB->begin()));
if (!AfterIP)
return AfterIP.takeError();
Builder.restoreIP(*AfterIP);
diff --git a/llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
b/llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
index 1e5b8145d5cdc..d231a778a8a97 100644
--- a/llvm/unittests/Frontend/OpenM
[llvm-branch-commits] [llvm] [mlir] [MLIR][OpenMP][OMPIRBuilder] Improve shared memory checks (PR #161864)
llvmbot wrote:
@llvm/pr-subscribers-mlir-llvm
Author: Sergio Afonso (skatrak)
Changes
This patch refines checks to decide whether to use device shared memory or
regular stack allocations. In particular, it adds support for parallel regions
residing on standalone target device functions.
The changes are:
- Shared memory is introduced for `omp.target` implicit allocations, such as
those related to privatization and mapping, as long as they are shared across
threads in a nested parallel region.
- Standalone target device functions are interpreted as being part of a Generic
kernel, since the fact that they are present in the module after filtering
means they must be reachable from a target region.
- Prevent allocations whose only shared uses inside of an `omp.parallel` region
are as part of a `private` clause from being moved to device shared memory.
---
Patch is 26.11 KiB, truncated to 20.00 KiB below, full version:
https://github.com/llvm/llvm-project/pull/161864.diff
7 Files Affected:
- (modified) llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h (+2-2)
- (modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+15-13)
- (modified) llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp (+38-17)
- (modified)
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+84-38)
- (modified) mlir/test/Target/LLVMIR/omptarget-parallel-llvm.mlir (+4-4)
- (modified) mlir/test/Target/LLVMIR/omptarget-parallel-wsloop.mlir (+6-1)
- (added) offload/test/offloading/fortran/target-generic-outlined-loops.f90
(+109)
``diff
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index d8e5f8cf5a45e..410912ba375a3 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -3292,8 +3292,8 @@ class OpenMPIRBuilder {
ArrayRef DeallocIPs)>;
using TargetGenArgAccessorsCallbackTy = function_ref;
+ Argument &Arg, Value *Input, Value *&RetVal, InsertPointTy AllocIP,
+ InsertPointTy CodeGenIP, ArrayRef DeallocIPs)>;
/// Generator for '#omp target'
///
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index a18db939b5876..530477e6d2f6d 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -312,6 +312,12 @@ getTargetKernelExecMode(Function &Kernel) {
return static_cast(KernelMode->getZExtValue());
}
+static bool isGenericKernel(Function &Fn){
+ std::optional ExecMode =
+ getTargetKernelExecMode(Fn);
+ return !ExecMode || (*ExecMode & OMP_TGT_EXEC_MODE_GENERIC);
+}
+
/// Make \p Source branch to \p Target.
///
/// Handles two situations:
@@ -1535,11 +1541,9 @@ static void targetParallelCallback(
IfCondition ? Builder.CreateSExtOrTrunc(IfCondition, OMPIRBuilder->Int32)
: Builder.getInt32(1);
- // If this is not a Generic kernel, we can skip generating the wrapper.
- std::optional ExecMode =
- getTargetKernelExecMode(*OuterFn);
+ // If this is a Generic kernel, we can generate the wrapper.
Value *WrapperFn;
- if (ExecMode && (*ExecMode & OMP_TGT_EXEC_MODE_GENERIC))
+ if (isGenericKernel(*OuterFn))
WrapperFn = createTargetParallelWrapper(OMPIRBuilder, OutlinedFn);
else
WrapperFn = Constant::getNullValue(PtrTy);
@@ -1812,13 +1816,10 @@ OpenMPIRBuilder::InsertPointOrErrorTy
OpenMPIRBuilder::createParallel(
auto OI = [&]() -> std::unique_ptr {
if (Config.isTargetDevice()) {
- std::optional ExecMode =
- getTargetKernelExecMode(*OuterFn);
-
- // If OuterFn is not a Generic kernel, skip custom allocation. This
causes
- // the CodeExtractor to follow its default behavior. Otherwise, we need
to
- // use device shared memory to allocate argument structures.
- if (ExecMode && *ExecMode & OMP_TGT_EXEC_MODE_GENERIC)
+ // If OuterFn is a Generic kernel, we need to use device shared memory to
+ // allocate argument structures. Otherwise, we use stack allocations as
+ // usual.
+ if (isGenericKernel(*OuterFn))
return std::make_unique(*this);
}
return std::make_unique();
@@ -7806,8 +7807,9 @@ static Expected createOutlinedFunction(
Argument &Arg = std::get<1>(InArg);
Value *InputCopy = nullptr;
-llvm::OpenMPIRBuilder::InsertPointOrErrorTy AfterIP =
-ArgAccessorFuncCB(Arg, Input, InputCopy, AllocaIP, Builder.saveIP());
+llvm::OpenMPIRBuilder::InsertPointOrErrorTy AfterIP = ArgAccessorFuncCB(
+Arg, Input, InputCopy, AllocaIP, Builder.saveIP(),
+OpenMPIRBuilder::InsertPointTy(ExitBB, ExitBB->begin()));
if (!AfterIP)
return AfterIP.takeError();
Builder.restoreIP(*AfterIP);
diff --git a/llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
b/llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
index 1e5b8145d5cdc..d231a778a8a97 100644
--- a/llvm/unittests/Frontend/OpenMPI
[llvm-branch-commits] [llvm] [mlir] [MLIR][OpenMP][OMPIRBuilder] Improve shared memory checks (PR #161864)
https://github.com/skatrak created
https://github.com/llvm/llvm-project/pull/161864
This patch refines checks to decide whether to use device shared memory or
regular stack allocations. In particular, it adds support for parallel regions
residing on standalone target device functions.
The changes are:
- Shared memory is introduced for `omp.target` implicit allocations, such as
those related to privatization and mapping, as long as they are shared across
threads in a nested parallel region.
- Standalone target device functions are interpreted as being part of a Generic
kernel, since the fact that they are present in the module after filtering
means they must be reachable from a target region.
- Prevent allocations whose only shared uses inside of an `omp.parallel` region
are as part of a `private` clause from being moved to device shared memory.
>From 2fb029503af54575f46ea2c8304772a5b8097638 Mon Sep 17 00:00:00 2001
From: Sergio Afonso
Date: Tue, 16 Sep 2025 14:18:39 +0100
Subject: [PATCH] [MLIR][OpenMP][OMPIRBuilder] Improve shared memory checks
This patch refines checks to decide whether to use device shared memory or
regular stack allocations. In particular, it adds support for parallel regions
residing on standalone target device functions.
The changes are:
- Shared memory is introduced for `omp.target` implicit allocations, such as
those related to privatization and mapping, as long as they are shared across
threads in a nested parallel region.
- Standalone target device functions are interpreted as being part of a Generic
kernel, since the fact that they are present in the module after filtering
means they must be reachable from a target region.
- Prevent allocations whose only shared uses inside of an `omp.parallel` region
are as part of a `private` clause from being moved to device shared memory.
---
.../llvm/Frontend/OpenMP/OMPIRBuilder.h | 4 +-
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 28 ++--
.../Frontend/OpenMPIRBuilderTest.cpp | 55 +---
.../OpenMP/OpenMPToLLVMIRTranslation.cpp | 122 --
.../LLVMIR/omptarget-parallel-llvm.mlir | 8 +-
.../LLVMIR/omptarget-parallel-wsloop.mlir | 7 +-
.../fortran/target-generic-outlined-loops.f90 | 109
7 files changed, 258 insertions(+), 75 deletions(-)
create mode 100644
offload/test/offloading/fortran/target-generic-outlined-loops.f90
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index d8e5f8cf5a45e..410912ba375a3 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -3292,8 +3292,8 @@ class OpenMPIRBuilder {
ArrayRef DeallocIPs)>;
using TargetGenArgAccessorsCallbackTy = function_ref;
+ Argument &Arg, Value *Input, Value *&RetVal, InsertPointTy AllocIP,
+ InsertPointTy CodeGenIP, ArrayRef DeallocIPs)>;
/// Generator for '#omp target'
///
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index a18db939b5876..530477e6d2f6d 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -312,6 +312,12 @@ getTargetKernelExecMode(Function &Kernel) {
return static_cast(KernelMode->getZExtValue());
}
+static bool isGenericKernel(Function &Fn){
+ std::optional ExecMode =
+ getTargetKernelExecMode(Fn);
+ return !ExecMode || (*ExecMode & OMP_TGT_EXEC_MODE_GENERIC);
+}
+
/// Make \p Source branch to \p Target.
///
/// Handles two situations:
@@ -1535,11 +1541,9 @@ static void targetParallelCallback(
IfCondition ? Builder.CreateSExtOrTrunc(IfCondition, OMPIRBuilder->Int32)
: Builder.getInt32(1);
- // If this is not a Generic kernel, we can skip generating the wrapper.
- std::optional ExecMode =
- getTargetKernelExecMode(*OuterFn);
+ // If this is a Generic kernel, we can generate the wrapper.
Value *WrapperFn;
- if (ExecMode && (*ExecMode & OMP_TGT_EXEC_MODE_GENERIC))
+ if (isGenericKernel(*OuterFn))
WrapperFn = createTargetParallelWrapper(OMPIRBuilder, OutlinedFn);
else
WrapperFn = Constant::getNullValue(PtrTy);
@@ -1812,13 +1816,10 @@ OpenMPIRBuilder::InsertPointOrErrorTy
OpenMPIRBuilder::createParallel(
auto OI = [&]() -> std::unique_ptr {
if (Config.isTargetDevice()) {
- std::optional ExecMode =
- getTargetKernelExecMode(*OuterFn);
-
- // If OuterFn is not a Generic kernel, skip custom allocation. This
causes
- // the CodeExtractor to follow its default behavior. Otherwise, we need
to
- // use device shared memory to allocate argument structures.
- if (ExecMode && *ExecMode & OMP_TGT_EXEC_MODE_GENERIC)
+ // If OuterFn is a Generic kernel, we need to use device shared memory to
+ // allocate argument structures. Otherwise, we use stack allocations as
+ // usual.
+ if (isGenericKerne
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Refactor omp.target_allocmem to allow reuse, NFC (PR #161861)
skatrak wrote: PR stack: - #150922 - #150923 - #150924 - #150925 - #150926 - #150927 - #154752 - #161861 โ๏ธ - #161862 - #161863 - #161864 https://github.com/llvm/llvm-project/pull/161861 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [mlir] [MLIR][OpenMP][OMPIRBuilder] Improve shared memory checks (PR #161864)
llvmbot wrote:
@llvm/pr-subscribers-offload
Author: Sergio Afonso (skatrak)
Changes
This patch refines checks to decide whether to use device shared memory or
regular stack allocations. In particular, it adds support for parallel regions
residing on standalone target device functions.
The changes are:
- Shared memory is introduced for `omp.target` implicit allocations, such as
those related to privatization and mapping, as long as they are shared across
threads in a nested parallel region.
- Standalone target device functions are interpreted as being part of a Generic
kernel, since the fact that they are present in the module after filtering
means they must be reachable from a target region.
- Prevent allocations whose only shared uses inside of an `omp.parallel` region
are as part of a `private` clause from being moved to device shared memory.
---
Patch is 26.11 KiB, truncated to 20.00 KiB below, full version:
https://github.com/llvm/llvm-project/pull/161864.diff
7 Files Affected:
- (modified) llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h (+2-2)
- (modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+15-13)
- (modified) llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp (+38-17)
- (modified)
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+84-38)
- (modified) mlir/test/Target/LLVMIR/omptarget-parallel-llvm.mlir (+4-4)
- (modified) mlir/test/Target/LLVMIR/omptarget-parallel-wsloop.mlir (+6-1)
- (added) offload/test/offloading/fortran/target-generic-outlined-loops.f90
(+109)
``diff
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index d8e5f8cf5a45e..410912ba375a3 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -3292,8 +3292,8 @@ class OpenMPIRBuilder {
ArrayRef DeallocIPs)>;
using TargetGenArgAccessorsCallbackTy = function_ref;
+ Argument &Arg, Value *Input, Value *&RetVal, InsertPointTy AllocIP,
+ InsertPointTy CodeGenIP, ArrayRef DeallocIPs)>;
/// Generator for '#omp target'
///
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index a18db939b5876..530477e6d2f6d 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -312,6 +312,12 @@ getTargetKernelExecMode(Function &Kernel) {
return static_cast(KernelMode->getZExtValue());
}
+static bool isGenericKernel(Function &Fn){
+ std::optional ExecMode =
+ getTargetKernelExecMode(Fn);
+ return !ExecMode || (*ExecMode & OMP_TGT_EXEC_MODE_GENERIC);
+}
+
/// Make \p Source branch to \p Target.
///
/// Handles two situations:
@@ -1535,11 +1541,9 @@ static void targetParallelCallback(
IfCondition ? Builder.CreateSExtOrTrunc(IfCondition, OMPIRBuilder->Int32)
: Builder.getInt32(1);
- // If this is not a Generic kernel, we can skip generating the wrapper.
- std::optional ExecMode =
- getTargetKernelExecMode(*OuterFn);
+ // If this is a Generic kernel, we can generate the wrapper.
Value *WrapperFn;
- if (ExecMode && (*ExecMode & OMP_TGT_EXEC_MODE_GENERIC))
+ if (isGenericKernel(*OuterFn))
WrapperFn = createTargetParallelWrapper(OMPIRBuilder, OutlinedFn);
else
WrapperFn = Constant::getNullValue(PtrTy);
@@ -1812,13 +1816,10 @@ OpenMPIRBuilder::InsertPointOrErrorTy
OpenMPIRBuilder::createParallel(
auto OI = [&]() -> std::unique_ptr {
if (Config.isTargetDevice()) {
- std::optional ExecMode =
- getTargetKernelExecMode(*OuterFn);
-
- // If OuterFn is not a Generic kernel, skip custom allocation. This
causes
- // the CodeExtractor to follow its default behavior. Otherwise, we need
to
- // use device shared memory to allocate argument structures.
- if (ExecMode && *ExecMode & OMP_TGT_EXEC_MODE_GENERIC)
+ // If OuterFn is a Generic kernel, we need to use device shared memory to
+ // allocate argument structures. Otherwise, we use stack allocations as
+ // usual.
+ if (isGenericKernel(*OuterFn))
return std::make_unique(*this);
}
return std::make_unique();
@@ -7806,8 +7807,9 @@ static Expected createOutlinedFunction(
Argument &Arg = std::get<1>(InArg);
Value *InputCopy = nullptr;
-llvm::OpenMPIRBuilder::InsertPointOrErrorTy AfterIP =
-ArgAccessorFuncCB(Arg, Input, InputCopy, AllocaIP, Builder.saveIP());
+llvm::OpenMPIRBuilder::InsertPointOrErrorTy AfterIP = ArgAccessorFuncCB(
+Arg, Input, InputCopy, AllocaIP, Builder.saveIP(),
+OpenMPIRBuilder::InsertPointTy(ExitBB, ExitBB->begin()));
if (!AfterIP)
return AfterIP.takeError();
Builder.restoreIP(*AfterIP);
diff --git a/llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
b/llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
index 1e5b8145d5cdc..d231a778a8a97 100644
--- a/llvm/unittests/Frontend/OpenMPIRB
[llvm-branch-commits] [clang] [llvm] [openmp] [OpenMP] Taskgraph Clang 'record and replay' frontend support (PR #159774)
jtb20 wrote: This rebased version mentions partial taskgraph support in `clang/docs/ReleaseNotes.rst`. There's already a table entry regarding record-and-replay taskgraph support's in-progress status in `clang/docs/OpenMPSupport.rst`. https://github.com/llvm/llvm-project/pull/159774 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SimplifyCFG][profcheck] Handle branch weights in `simplifySwitchLookup` (PR #161739)
https://github.com/mtrofin updated
https://github.com/llvm/llvm-project/pull/161739
>From 4979ae9e8486f51e124fe94471fec97ff93698c8 Mon Sep 17 00:00:00 2001
From: Mircea Trofin
Date: Wed, 1 Oct 2025 17:08:48 -0700
Subject: [PATCH] [SimplifyCFG][profcheck] Handle branch weights in
`simplifySwitchLookup`
---
llvm/lib/Transforms/Utils/SimplifyCFG.cpp | 25 +++
.../SimplifyCFG/X86/switch_to_lookup_table.ll | 13 +++---
.../Transforms/SimplifyCFG/rangereduce.ll | 24 +++---
3 files changed, 50 insertions(+), 12 deletions(-)
diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 63f4b2e030b69..5aff662bc3586 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -7227,6 +7227,7 @@ static bool simplifySwitchLookup(SwitchInst *SI,
IRBuilder<> &Builder,
Mod.getContext(), "switch.lookup", CommonDest->getParent(), CommonDest);
BranchInst *RangeCheckBranch = nullptr;
+ BranchInst *CondBranch = nullptr;
Builder.SetInsertPoint(SI);
const bool GeneratingCoveredLookupTable = (MaxTableSize == TableSize);
@@ -7241,6 +7242,7 @@ static bool simplifySwitchLookup(SwitchInst *SI,
IRBuilder<> &Builder,
TableIndex, ConstantInt::get(MinCaseVal->getType(), TableSize));
RangeCheckBranch =
Builder.CreateCondBr(Cmp, LookupBB, SI->getDefaultDest());
+CondBranch = RangeCheckBranch;
if (DTU)
Updates.push_back({DominatorTree::Insert, BB, LookupBB});
}
@@ -7279,7 +7281,7 @@ static bool simplifySwitchLookup(SwitchInst *SI,
IRBuilder<> &Builder,
Value *Shifted = Builder.CreateLShr(TableMask, MaskIndex,
"switch.shifted");
Value *LoBit = Builder.CreateTrunc(
Shifted, Type::getInt1Ty(Mod.getContext()), "switch.lobit");
-Builder.CreateCondBr(LoBit, LookupBB, SI->getDefaultDest());
+CondBranch = Builder.CreateCondBr(LoBit, LookupBB, SI->getDefaultDest());
if (DTU) {
Updates.push_back({DominatorTree::Insert, MaskBB, LookupBB});
Updates.push_back({DominatorTree::Insert, MaskBB, SI->getDefaultDest()});
@@ -7319,19 +7321,32 @@ static bool simplifySwitchLookup(SwitchInst *SI,
IRBuilder<> &Builder,
if (DTU)
Updates.push_back({DominatorTree::Insert, LookupBB, CommonDest});
+ SmallVector BranchWeights;
+ const bool HasBranchWeights = CondBranch && !ProfcheckDisableMetadataFixes &&
+extractBranchWeights(*SI, BranchWeights);
+ uint64_t ToLookupWeight = 0;
+ uint64_t ToDefaultWeight = 0;
+
// Remove the switch.
SmallPtrSet RemovedSuccessors;
- for (unsigned i = 0, e = SI->getNumSuccessors(); i < e; ++i) {
-BasicBlock *Succ = SI->getSuccessor(i);
+ for (unsigned I = 0, E = SI->getNumSuccessors(); I < E; ++I) {
+BasicBlock *Succ = SI->getSuccessor(I);
-if (Succ == SI->getDefaultDest())
+if (Succ == SI->getDefaultDest()) {
+ if (HasBranchWeights)
+ToDefaultWeight += BranchWeights[I];
continue;
+}
Succ->removePredecessor(BB);
if (DTU && RemovedSuccessors.insert(Succ).second)
Updates.push_back({DominatorTree::Delete, BB, Succ});
+if (HasBranchWeights)
+ ToLookupWeight += BranchWeights[I];
}
SI->eraseFromParent();
-
+ if (HasBranchWeights)
+setFittedBranchWeights(*CondBranch, {ToLookupWeight, ToDefaultWeight},
+ /*IsExpected=*/false);
if (DTU)
DTU->applyUpdates(Updates);
diff --git a/llvm/test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll
b/llvm/test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll
index f9e79cabac51d..bee6b375ea11a 100644
--- a/llvm/test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll
+++ b/llvm/test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll
@@ -1565,14 +1565,14 @@ end:
; lookup (since i3 can only hold values in the range of explicit
; values) and simultaneously trying to generate a branch to deal with
; the fact that we have holes in the range.
-define i32 @covered_switch_with_bit_tests(i3) {
+define i32 @covered_switch_with_bit_tests(i3) !prof !0 {
; CHECK-LABEL: @covered_switch_with_bit_tests(
; CHECK-NEXT: entry:
; CHECK-NEXT:[[SWITCH_TABLEIDX:%.*]] = sub i3 [[TMP0:%.*]], -4
; CHECK-NEXT:[[SWITCH_MASKINDEX:%.*]] = zext i3 [[SWITCH_TABLEIDX]] to i8
; CHECK-NEXT:[[SWITCH_SHIFTED:%.*]] = lshr i8 -61, [[SWITCH_MASKINDEX]]
; CHECK-NEXT:[[SWITCH_LOBIT:%.*]] = trunc i8 [[SWITCH_SHIFTED]] to i1
-; CHECK-NEXT:br i1 [[SWITCH_LOBIT]], label [[SWITCH_LOOKUP:%.*]], label
[[L6:%.*]]
+; CHECK-NEXT:br i1 [[SWITCH_LOBIT]], label [[SWITCH_LOOKUP:%.*]], label
[[L6:%.*]], !prof [[PROF1:![0-9]+]]
; CHECK: switch.lookup:
; CHECK-NEXT:[[TMP1:%.*]] = zext i3 [[SWITCH_TABLEIDX]] to i64
; CHECK-NEXT:[[SWITCH_GEP:%.*]] = getelementptr inbounds [8 x i32], ptr
@switch.table.covered_switch_with_bit_tests, i64 0, i64 [[TMP1]]
@@ -1588,7 +1588,7 @@ entry:
i3 -4, label
[llvm-branch-commits] [clang] [llvm] [clang][SPARC] Pass 16-aligned structs with the correct alignment in CC (#155829) (PR #161766)
https://github.com/efriedma-quic approved this pull request. LGTM. This is an ABI fix which is important for the active SPARC developers. and it very obviously only affects SPARC targets. https://github.com/llvm/llvm-project/pull/161766 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/21.x: [SPARC] Prevent meta instructions from being inserted into delay slots (#161111) (PR #161937)
https://github.com/llvmbot created
https://github.com/llvm/llvm-project/pull/161937
Backport 2e1fab93467ec8c37a236ae6e059300ebaa0c986
Requested by: @brad0
>From 4cb5d998e732df6a28462288ffae97d3bd16 Mon Sep 17 00:00:00 2001
From: Koakuma
Date: Fri, 3 Oct 2025 19:25:08 +0700
Subject: [PATCH] [SPARC] Prevent meta instructions from being inserted into
delay slots (#16)
Do not move meta instructions like `FAKE_USE`/`@llvm.fake.use` into
delay slots, as they don't correspond to real machine instructions.
This should fix crashes when compiling with, for example, `clang -Og`.
(cherry picked from commit 2e1fab93467ec8c37a236ae6e059300ebaa0c986)
---
llvm/lib/Target/Sparc/DelaySlotFiller.cpp | 4 +--
.../CodeGen/SPARC/2011-01-19-DelaySlot.ll | 25 +++
2 files changed, 27 insertions(+), 2 deletions(-)
diff --git a/llvm/lib/Target/Sparc/DelaySlotFiller.cpp
b/llvm/lib/Target/Sparc/DelaySlotFiller.cpp
index 6c19049a001cf..024030d196ee3 100644
--- a/llvm/lib/Target/Sparc/DelaySlotFiller.cpp
+++ b/llvm/lib/Target/Sparc/DelaySlotFiller.cpp
@@ -206,8 +206,8 @@ Filler::findDelayInstr(MachineBasicBlock &MBB,
if (!done)
--I;
-// skip debug instruction
-if (I->isDebugInstr())
+// Skip meta instructions.
+if (I->isMetaInstruction())
continue;
if (I->hasUnmodeledSideEffects() || I->isInlineAsm() || I->isPosition() ||
diff --git a/llvm/test/CodeGen/SPARC/2011-01-19-DelaySlot.ll
b/llvm/test/CodeGen/SPARC/2011-01-19-DelaySlot.ll
index 9ccd4f1c0ac9a..767ef7eb510e6 100644
--- a/llvm/test/CodeGen/SPARC/2011-01-19-DelaySlot.ll
+++ b/llvm/test/CodeGen/SPARC/2011-01-19-DelaySlot.ll
@@ -184,4 +184,29 @@ entry:
ret i32 %2
}
+define i32 @test_generic_inst(i32 %arg) #0 {
+;CHECK-LABEL: test_generic_inst:
+;CHECK: ! fake_use: {{.*}}
+;CHECK: bne {{.*}}
+;CHECK-NEXT: nop
+ %bar1 = call i32 @bar(i32 %arg)
+ %even = and i32 %bar1, 1
+ %cmp = icmp eq i32 %even, 0
+ ; This shouldn't get reordered into a delay slot
+ call void (...) @llvm.fake.use(i32 %arg)
+ br i1 %cmp, label %true, label %false
+true:
+ %bar2 = call i32 @bar(i32 %bar1)
+ br label %cont
+
+false:
+ %inc = add nsw i32 %bar1, 1
+ br label %cont
+
+cont:
+ %ret = phi i32 [ %bar2, %true ], [ %inc, %false ]
+ ret i32 %ret
+}
+
+declare void @llvm.fake.use(...)
attributes #0 = { nounwind "disable-tail-calls"="true" }
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [profcheck] Update exclusion list to reflect fixes (PR #161943)
https://github.com/mtrofin ready_for_review https://github.com/llvm/llvm-project/pull/161943 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SimplifyCFG][profcheck] Handle branch weights in `simplifySwitchLookup` (PR #161739)
https://github.com/mtrofin updated
https://github.com/llvm/llvm-project/pull/161739
>From d1ddd8929f07ddbbcaea73ee99d788a6cd623110 Mon Sep 17 00:00:00 2001
From: Mircea Trofin
Date: Wed, 1 Oct 2025 17:08:48 -0700
Subject: [PATCH] [SimplifyCFG][profcheck] Handle branch weights in
`simplifySwitchLookup`
---
llvm/lib/Transforms/Utils/SimplifyCFG.cpp | 25 +++
.../SimplifyCFG/X86/switch_to_lookup_table.ll | 13 +++---
.../Transforms/SimplifyCFG/rangereduce.ll | 24 +++---
3 files changed, 50 insertions(+), 12 deletions(-)
diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 084d9d87c4778..48055ad6ea7e4 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -7227,6 +7227,7 @@ static bool simplifySwitchLookup(SwitchInst *SI,
IRBuilder<> &Builder,
Mod.getContext(), "switch.lookup", CommonDest->getParent(), CommonDest);
BranchInst *RangeCheckBranch = nullptr;
+ BranchInst *CondBranch = nullptr;
Builder.SetInsertPoint(SI);
const bool GeneratingCoveredLookupTable = (MaxTableSize == TableSize);
@@ -7241,6 +7242,7 @@ static bool simplifySwitchLookup(SwitchInst *SI,
IRBuilder<> &Builder,
TableIndex, ConstantInt::get(MinCaseVal->getType(), TableSize));
RangeCheckBranch =
Builder.CreateCondBr(Cmp, LookupBB, SI->getDefaultDest());
+CondBranch = RangeCheckBranch;
if (DTU)
Updates.push_back({DominatorTree::Insert, BB, LookupBB});
}
@@ -7279,7 +7281,7 @@ static bool simplifySwitchLookup(SwitchInst *SI,
IRBuilder<> &Builder,
Value *Shifted = Builder.CreateLShr(TableMask, MaskIndex,
"switch.shifted");
Value *LoBit = Builder.CreateTrunc(
Shifted, Type::getInt1Ty(Mod.getContext()), "switch.lobit");
-Builder.CreateCondBr(LoBit, LookupBB, SI->getDefaultDest());
+CondBranch = Builder.CreateCondBr(LoBit, LookupBB, SI->getDefaultDest());
if (DTU) {
Updates.push_back({DominatorTree::Insert, MaskBB, LookupBB});
Updates.push_back({DominatorTree::Insert, MaskBB, SI->getDefaultDest()});
@@ -7319,19 +7321,32 @@ static bool simplifySwitchLookup(SwitchInst *SI,
IRBuilder<> &Builder,
if (DTU)
Updates.push_back({DominatorTree::Insert, LookupBB, CommonDest});
+ SmallVector BranchWeights;
+ const bool HasBranchWeights = CondBranch && !ProfcheckDisableMetadataFixes &&
+extractBranchWeights(*SI, BranchWeights);
+ uint64_t ToLookupWeight = 0;
+ uint64_t ToDefaultWeight = 0;
+
// Remove the switch.
SmallPtrSet RemovedSuccessors;
- for (unsigned i = 0, e = SI->getNumSuccessors(); i < e; ++i) {
-BasicBlock *Succ = SI->getSuccessor(i);
+ for (unsigned I = 0, E = SI->getNumSuccessors(); I < E; ++I) {
+BasicBlock *Succ = SI->getSuccessor(I);
-if (Succ == SI->getDefaultDest())
+if (Succ == SI->getDefaultDest()) {
+ if (HasBranchWeights)
+ToDefaultWeight += BranchWeights[I];
continue;
+}
Succ->removePredecessor(BB);
if (DTU && RemovedSuccessors.insert(Succ).second)
Updates.push_back({DominatorTree::Delete, BB, Succ});
+if (HasBranchWeights)
+ ToLookupWeight += BranchWeights[I];
}
SI->eraseFromParent();
-
+ if (HasBranchWeights)
+setFittedBranchWeights(*CondBranch, {ToLookupWeight, ToDefaultWeight},
+ /*IsExpected=*/false);
if (DTU)
DTU->applyUpdates(Updates);
diff --git a/llvm/test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll
b/llvm/test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll
index f9e79cabac51d..bee6b375ea11a 100644
--- a/llvm/test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll
+++ b/llvm/test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll
@@ -1565,14 +1565,14 @@ end:
; lookup (since i3 can only hold values in the range of explicit
; values) and simultaneously trying to generate a branch to deal with
; the fact that we have holes in the range.
-define i32 @covered_switch_with_bit_tests(i3) {
+define i32 @covered_switch_with_bit_tests(i3) !prof !0 {
; CHECK-LABEL: @covered_switch_with_bit_tests(
; CHECK-NEXT: entry:
; CHECK-NEXT:[[SWITCH_TABLEIDX:%.*]] = sub i3 [[TMP0:%.*]], -4
; CHECK-NEXT:[[SWITCH_MASKINDEX:%.*]] = zext i3 [[SWITCH_TABLEIDX]] to i8
; CHECK-NEXT:[[SWITCH_SHIFTED:%.*]] = lshr i8 -61, [[SWITCH_MASKINDEX]]
; CHECK-NEXT:[[SWITCH_LOBIT:%.*]] = trunc i8 [[SWITCH_SHIFTED]] to i1
-; CHECK-NEXT:br i1 [[SWITCH_LOBIT]], label [[SWITCH_LOOKUP:%.*]], label
[[L6:%.*]]
+; CHECK-NEXT:br i1 [[SWITCH_LOBIT]], label [[SWITCH_LOOKUP:%.*]], label
[[L6:%.*]], !prof [[PROF1:![0-9]+]]
; CHECK: switch.lookup:
; CHECK-NEXT:[[TMP1:%.*]] = zext i3 [[SWITCH_TABLEIDX]] to i64
; CHECK-NEXT:[[SWITCH_GEP:%.*]] = getelementptr inbounds [8 x i32], ptr
@switch.table.covered_switch_with_bit_tests, i64 0, i64 [[TMP1]]
@@ -1588,7 +1588,7 @@ entry:
i3 -4, label
[llvm-branch-commits] [llvm] [SimplifyCFG][profcheck] Profile propagation for `indirectbr` (PR #161747)
https://github.com/mtrofin updated
https://github.com/llvm/llvm-project/pull/161747
>From a55a7e2f13a6606f9660fdacc3287f741d3f2ac2 Mon Sep 17 00:00:00 2001
From: Mircea Trofin
Date: Thu, 2 Oct 2025 15:56:16 -0700
Subject: [PATCH] [SimplifyCFG][profcheck] Profile propagation for `indirectbr`
---
llvm/lib/Transforms/Utils/SimplifyCFG.cpp | 39 +--
.../test/Transforms/SimplifyCFG/indirectbr.ll | 32 +++
2 files changed, 51 insertions(+), 20 deletions(-)
diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 48055ad6ea7e4..0b7d3b2fb2d7b 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -4895,9 +4895,8 @@ bool
SimplifyCFGOpt::simplifyTerminatorOnSelect(Instruction *OldTerm,
// We found both of the successors we were looking for.
// Create a conditional branch sharing the condition of the select.
BranchInst *NewBI = Builder.CreateCondBr(Cond, TrueBB, FalseBB);
- if (TrueWeight != FalseWeight)
-setBranchWeights(*NewBI, {TrueWeight, FalseWeight},
- /*IsExpected=*/false, /*ElideAllZero=*/true);
+ setBranchWeights(*NewBI, {TrueWeight, FalseWeight},
+ /*IsExpected=*/false, /*ElideAllZero=*/true);
}
} else if (KeepEdge1 && (KeepEdge2 || TrueBB == FalseBB)) {
// Neither of the selected blocks were successors, so this
@@ -4982,9 +4981,15 @@ bool
SimplifyCFGOpt::simplifyIndirectBrOnSelect(IndirectBrInst *IBI,
BasicBlock *TrueBB = TBA->getBasicBlock();
BasicBlock *FalseBB = FBA->getBasicBlock();
+ // The select's profile becomes the profile of the conditional branch that
+ // replaces the indirect branch.
+ SmallVector SelectBranchWeights(2);
+ if (!ProfcheckDisableMetadataFixes)
+extractBranchWeights(*SI, SelectBranchWeights);
// Perform the actual simplification.
- return simplifyTerminatorOnSelect(IBI, SI->getCondition(), TrueBB, FalseBB,
0,
-0);
+ return simplifyTerminatorOnSelect(IBI, SI->getCondition(), TrueBB, FalseBB,
+SelectBranchWeights[0],
+SelectBranchWeights[1]);
}
/// This is called when we find an icmp instruction
@@ -7877,20 +7882,29 @@ bool SimplifyCFGOpt::simplifySwitch(SwitchInst *SI,
IRBuilder<> &Builder) {
bool SimplifyCFGOpt::simplifyIndirectBr(IndirectBrInst *IBI) {
BasicBlock *BB = IBI->getParent();
bool Changed = false;
-
+ SmallVector BranchWeights;
+ const bool HasBranchWeights = !ProfcheckDisableMetadataFixes &&
+extractBranchWeights(*IBI, BranchWeights);
+ SmallVector NewBranchWeights;
// Eliminate redundant destinations.
SmallPtrSet Succs;
SmallSetVector RemovedSuccs;
- for (unsigned i = 0, e = IBI->getNumDestinations(); i != e; ++i) {
-BasicBlock *Dest = IBI->getDestination(i);
+ for (unsigned I = 0, E = IBI->getNumDestinations(); I != E; ++I) {
+BasicBlock *Dest = IBI->getDestination(I);
if (!Dest->hasAddressTaken() || !Succs.insert(Dest).second) {
if (!Dest->hasAddressTaken())
RemovedSuccs.insert(Dest);
Dest->removePredecessor(BB);
- IBI->removeDestination(i);
- --i;
- --e;
+ IBI->removeDestination(I);
+ --I;
+ --E;
Changed = true;
+ if (HasBranchWeights && BranchWeights[I] != 0) {
+LLVM_DEBUG(dbgs() << "Elided indirectbr edge with non-zero profile. "
+ "This is unexpected\n");
+ }
+} else if (HasBranchWeights) {
+ NewBranchWeights.push_back(BranchWeights[I]);
}
}
@@ -7915,7 +7929,8 @@ bool SimplifyCFGOpt::simplifyIndirectBr(IndirectBrInst
*IBI) {
eraseTerminatorAndDCECond(IBI);
return true;
}
-
+ if (HasBranchWeights)
+setBranchWeights(*IBI, NewBranchWeights, /*IsExpected=*/false);
if (SelectInst *SI = dyn_cast(IBI->getAddress())) {
if (simplifyIndirectBrOnSelect(IBI, SI))
return requestResimplify();
diff --git a/llvm/test/Transforms/SimplifyCFG/indirectbr.ll
b/llvm/test/Transforms/SimplifyCFG/indirectbr.ll
index 87d8b399494ce..3127b2c643f74 100644
--- a/llvm/test/Transforms/SimplifyCFG/indirectbr.ll
+++ b/llvm/test/Transforms/SimplifyCFG/indirectbr.ll
@@ -1,4 +1,4 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
UTC_ARGS: --check-globals
; RUN: opt -S -passes=simplifycfg -simplifycfg-require-and-preserve-domtree=1
< %s | FileCheck %s
; SimplifyCFG should eliminate redundant indirectbr edges.
@@ -8,7 +8,11 @@ declare void @A()
declare void @B(i32)
declare void @C()
-define void @indbrtest0(ptr %P, ptr %Q) {
+;.
+; CHECK: @anchor = constant [13 x ptr] [ptr blockaddress(@indbrtest3, %L1),
ptr blockaddress(@indbrtest3, %L2), ptr inttoptr (i32 1 to ptr), ptr
blockaddress(@indbrtest4,
[llvm-branch-commits] [llvm] [PowerPC] Implement paddis (PR #161572)
https://github.com/lei137 updated
https://github.com/llvm/llvm-project/pull/161572
>From 012b638031fb72d36525234115f9d7b87d8c98e3 Mon Sep 17 00:00:00 2001
From: Lei Huang
Date: Tue, 30 Sep 2025 18:09:31 +
Subject: [PATCH 1/5] [PowerPC] Implement paddis
---
.../Target/PowerPC/AsmParser/PPCAsmParser.cpp | 4 ++
.../PowerPC/MCTargetDesc/PPCAsmBackend.cpp| 9
.../PowerPC/MCTargetDesc/PPCFixupKinds.h | 6 +++
.../PowerPC/MCTargetDesc/PPCInstPrinter.cpp | 12 +
.../PowerPC/MCTargetDesc/PPCInstPrinter.h | 2 +
.../PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp | 1 +
llvm/lib/Target/PowerPC/PPCInstrFuture.td | 44 +++
llvm/lib/Target/PowerPC/PPCRegisterInfo.td| 19
.../PowerPC/ppc-encoding-ISAFuture.txt| 6 +++
.../PowerPC/ppc64le-encoding-ISAFuture.txt| 6 +++
llvm/test/MC/PowerPC/ppc-encoding-ISAFuture.s | 8
11 files changed, 117 insertions(+)
diff --git a/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
b/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
index 561a9c51b9cc2..b07f95018ca90 100644
--- a/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
+++ b/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
@@ -365,6 +365,10 @@ struct PPCOperand : public MCParsedAsmOperand {
bool isS16ImmX4() const { return isExtImm<16>(/*Signed*/ true, 4); }
bool isS16ImmX16() const { return isExtImm<16>(/*Signed*/ true, 16); }
bool isS17Imm() const { return isExtImm<17>(/*Signed*/ true, 1); }
+ bool isS32Imm() const {
+// TODO: Is ContextImmediate needed?
+return Kind == Expression || isSImm<32>();
+ }
bool isS34Imm() const {
// Once the PC-Rel ABI is finalized, evaluate whether a 34-bit
// ContextImmediate is needed.
diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp
b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp
index 04b886ae74993..558351b515a2e 100644
--- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp
+++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp
@@ -47,6 +47,9 @@ static uint64_t adjustFixupValue(unsigned Kind, uint64_t
Value) {
case PPC::fixup_ppc_half16ds:
case PPC::fixup_ppc_half16dq:
return Value & 0xfffc;
+ case PPC::fixup_ppc_pcrel32:
+ case PPC::fixup_ppc_imm32:
+return Value & 0x;
case PPC::fixup_ppc_pcrel34:
case PPC::fixup_ppc_imm34:
return Value & 0x3;
@@ -71,6 +74,8 @@ static unsigned getFixupKindNumBytes(unsigned Kind) {
case PPC::fixup_ppc_br24abs:
case PPC::fixup_ppc_br24_notoc:
return 4;
+ case PPC::fixup_ppc_pcrel32:
+ case PPC::fixup_ppc_imm32:
case PPC::fixup_ppc_pcrel34:
case PPC::fixup_ppc_imm34:
case FK_Data_8:
@@ -154,6 +159,8 @@ MCFixupKindInfo PPCAsmBackend::getFixupKindInfo(MCFixupKind
Kind) const {
{"fixup_ppc_brcond14abs", 16, 14, 0},
{"fixup_ppc_half16", 0, 16, 0},
{"fixup_ppc_half16ds", 0, 14, 0},
+ {"fixup_ppc_pcrel32", 0, 32, 0},
+ {"fixup_ppc_imm32", 0, 32, 0},
{"fixup_ppc_pcrel34", 0, 34, 0},
{"fixup_ppc_imm34", 0, 34, 0},
{"fixup_ppc_nofixup", 0, 0, 0}};
@@ -166,6 +173,8 @@ MCFixupKindInfo PPCAsmBackend::getFixupKindInfo(MCFixupKind
Kind) const {
{"fixup_ppc_brcond14abs", 2, 14, 0},
{"fixup_ppc_half16", 0, 16, 0},
{"fixup_ppc_half16ds", 2, 14, 0},
+ {"fixup_ppc_pcrel32", 0, 32, 0},
+ {"fixup_ppc_imm32", 0, 32, 0},
{"fixup_ppc_pcrel34", 0, 34, 0},
{"fixup_ppc_imm34", 0, 34, 0},
{"fixup_ppc_nofixup", 0, 0, 0}};
diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h
b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h
index df0c666f5b113..4164b697649cd 100644
--- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h
+++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h
@@ -40,6 +40,12 @@ enum Fixups {
/// instrs like 'std'.
fixup_ppc_half16ds,
+ // A 32-bit fixup corresponding to PC-relative paddis.
+ fixup_ppc_pcrel32,
+
+ // A 32-bit fixup corresponding to Non-PC-relative paddis.
+ fixup_ppc_imm32,
+
// A 34-bit fixup corresponding to PC-relative paddi.
fixup_ppc_pcrel34,
diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
index b27bc3bd49315..e2afb9378cbf0 100644
--- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
+++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
@@ -430,6 +430,18 @@ void PPCInstPrinter::printS16ImmOperand(const MCInst *MI,
unsigned OpNo,
printOperand(MI, OpNo, STI, O);
}
+void PPCInstPrinter::printS32ImmOperand(const MCInst *MI, unsigned OpNo,
+const MCSubtargetInfo &STI,
+raw_ostream &O) {
+ if (MI->getOperand(OpNo).isImm()) {
+long long Value = MI->getOperand(OpNo).getImm();
+assert(isInt<32>(Value) && "Invalid s32imm argument!");
+O << (long long)Value;
+ }
+ else
+printOperand(MI
