date:20251003

[llvm-branch-commits] [llvm] AMDGPU: Fix constrain register logic for physregs (PR #161794)

2025-10-03 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/161794?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#161795** https://app.graphite.dev/github/pr/llvm/llvm-project/161795?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#161794** https://app.graphite.dev/github/pr/llvm/llvm-project/161794?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/161794?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#161793** https://app.graphite.dev/github/pr/llvm/llvm-project/161793?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#161792** https://app.graphite.dev/github/pr/llvm/llvm-project/161792?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#161790** https://app.graphite.dev/github/pr/llvm/llvm-project/161790?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/161794
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Fix trying to constrain physical registers in spill handling (PR #161793)

2025-10-03 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/161793?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#161795** https://app.graphite.dev/github/pr/llvm/llvm-project/161795?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#161794** https://app.graphite.dev/github/pr/llvm/llvm-project/161794?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#161793** https://app.graphite.dev/github/pr/llvm/llvm-project/161793?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/161793?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#161792** https://app.graphite.dev/github/pr/llvm/llvm-project/161792?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#161790** https://app.graphite.dev/github/pr/llvm/llvm-project/161790?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/161793
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Remove LDS_DIRECT_CLASS register class (PR #161762)

2025-10-03 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

> Same here: drop mir tests which test what tablegen has generated.

Same, the MIR tests are completely unrelated and not related to this pr. We 
cannot simply drop these 

https://github.com/llvm/llvm-project/pull/161762
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] CodeGen: Stop checking for physregs in constrainRegClass (PR #161795)

2025-10-03 Thread Christudasan Devadasan via llvm-branch-commits


https://github.com/cdevadas approved this pull request.


https://github.com/llvm/llvm-project/pull/161795
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Stop using the wavemask register class for SCC cross class copies (PR #161801)

2025-10-03 Thread Christudasan Devadasan via llvm-branch-commits



@@ -1118,9 +1118,7 @@ SIRegisterInfo::getPointerRegClass(unsigned Kind) const {
 
 const TargetRegisterClass *
 SIRegisterInfo::getCrossCopyRegClass(const TargetRegisterClass *RC) const {
-  if (RC == &AMDGPU::SCC_CLASSRegClass)
-return getWaveMaskRegClass();
-  return RC;
+  return RC == &AMDGPU::SCC_CLASSRegClass ? &AMDGPU::SReg_32RegClass : RC;

cdevadas wrote:

Shouldn't SCC_CLASS depend on the wavesize?

https://github.com/llvm/llvm-project/pull/161801
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/21.x: [clang] [Headers] Don't use unreserved names in avx10_2bf16intrin.h (#161824) (PR #161836)

2025-10-03 Thread via llvm-branch-commits


llvmbot wrote:

@RKSimon What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/161836
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [AMDGPU] Add builtins for wave reduction intrinsics (PR #161816)

2025-10-03 Thread via llvm-branch-commits


https://github.com/easyonaadit created 
https://github.com/llvm/llvm-project/pull/161816

None

>From 6cd1510d1ca606a5d08e4bbdb3c77d17d93447d8 Mon Sep 17 00:00:00 2001
From: Aaditya 
Date: Tue, 30 Sep 2025 11:37:42 +0530
Subject: [PATCH] [AMDGPU] Add builtins for wave reduction intrinsics

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def | 4 
 clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp  | 4 
 2 files changed, 8 insertions(+)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index fda16e42d2c6b..ebc0ac35f42d9 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -402,6 +402,10 @@ BUILTIN(__builtin_amdgcn_wave_reduce_max_u64, "WUiWUiZi", 
"nc")
 BUILTIN(__builtin_amdgcn_wave_reduce_and_b64, "WiWiZi", "nc")
 BUILTIN(__builtin_amdgcn_wave_reduce_or_b64, "WiWiZi", "nc")
 BUILTIN(__builtin_amdgcn_wave_reduce_xor_b64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_add_f32, "ffZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_sub_f32, "ffZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_f32, "ffZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_f32, "ffZi", "nc")
 
 
//===--===//
 // R600-NI only builtins.
diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp 
b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index 07cf08c54985a..a242a73e4a822 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -301,18 +301,22 @@ static Intrinsic::ID 
getIntrinsicIDforWaveReduction(unsigned BuiltinID) {
 llvm_unreachable("Unknown BuiltinID for wave reduction");
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32:
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u64:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_f32:
 return Intrinsic::amdgcn_wave_reduce_add;
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u32:
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u64:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_f32:
 return Intrinsic::amdgcn_wave_reduce_sub;
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i32:
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i64:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_f32:
 return Intrinsic::amdgcn_wave_reduce_min;
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u32:
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u64:
 return Intrinsic::amdgcn_wave_reduce_umin;
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i32:
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i64:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_f32:
 return Intrinsic::amdgcn_wave_reduce_max;
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u32:
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u64:

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/21.x: [clang] [Headers] Don't use unreserved names in avx10_2bf16intrin.h (#161824) (PR #161836)

2025-10-03 Thread Phoebe Wang via llvm-branch-commits


https://github.com/phoebewang approved this pull request.

LGTM.

https://github.com/llvm/llvm-project/pull/161836
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [PowerPC] Implement paddis (PR #161572)

2025-10-03 Thread Lei Huang via llvm-branch-commits


https://github.com/lei137 updated 
https://github.com/llvm/llvm-project/pull/161572

>From 012b638031fb72d36525234115f9d7b87d8c98e3 Mon Sep 17 00:00:00 2001
From: Lei Huang 
Date: Tue, 30 Sep 2025 18:09:31 +
Subject: [PATCH 1/4] [PowerPC] Implement paddis

---
 .../Target/PowerPC/AsmParser/PPCAsmParser.cpp |  4 ++
 .../PowerPC/MCTargetDesc/PPCAsmBackend.cpp|  9 
 .../PowerPC/MCTargetDesc/PPCFixupKinds.h  |  6 +++
 .../PowerPC/MCTargetDesc/PPCInstPrinter.cpp   | 12 +
 .../PowerPC/MCTargetDesc/PPCInstPrinter.h |  2 +
 .../PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp |  1 +
 llvm/lib/Target/PowerPC/PPCInstrFuture.td | 44 +++
 llvm/lib/Target/PowerPC/PPCRegisterInfo.td| 19 
 .../PowerPC/ppc-encoding-ISAFuture.txt|  6 +++
 .../PowerPC/ppc64le-encoding-ISAFuture.txt|  6 +++
 llvm/test/MC/PowerPC/ppc-encoding-ISAFuture.s |  8 
 11 files changed, 117 insertions(+)

diff --git a/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp 
b/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
index 561a9c51b9cc2..b07f95018ca90 100644
--- a/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
+++ b/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
@@ -365,6 +365,10 @@ struct PPCOperand : public MCParsedAsmOperand {
   bool isS16ImmX4() const { return isExtImm<16>(/*Signed*/ true, 4); }
   bool isS16ImmX16() const { return isExtImm<16>(/*Signed*/ true, 16); }
   bool isS17Imm() const { return isExtImm<17>(/*Signed*/ true, 1); }
+  bool isS32Imm() const {
+// TODO: Is ContextImmediate needed?
+return Kind == Expression || isSImm<32>();
+  }
   bool isS34Imm() const {
 // Once the PC-Rel ABI is finalized, evaluate whether a 34-bit
 // ContextImmediate is needed.
diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp 
b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp
index 04b886ae74993..558351b515a2e 100644
--- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp
+++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp
@@ -47,6 +47,9 @@ static uint64_t adjustFixupValue(unsigned Kind, uint64_t 
Value) {
   case PPC::fixup_ppc_half16ds:
   case PPC::fixup_ppc_half16dq:
 return Value & 0xfffc;
+  case PPC::fixup_ppc_pcrel32:
+  case PPC::fixup_ppc_imm32:
+return Value & 0x;
   case PPC::fixup_ppc_pcrel34:
   case PPC::fixup_ppc_imm34:
 return Value & 0x3;
@@ -71,6 +74,8 @@ static unsigned getFixupKindNumBytes(unsigned Kind) {
   case PPC::fixup_ppc_br24abs:
   case PPC::fixup_ppc_br24_notoc:
 return 4;
+  case PPC::fixup_ppc_pcrel32:
+  case PPC::fixup_ppc_imm32:
   case PPC::fixup_ppc_pcrel34:
   case PPC::fixup_ppc_imm34:
   case FK_Data_8:
@@ -154,6 +159,8 @@ MCFixupKindInfo PPCAsmBackend::getFixupKindInfo(MCFixupKind 
Kind) const {
   {"fixup_ppc_brcond14abs", 16, 14, 0},
   {"fixup_ppc_half16", 0, 16, 0},
   {"fixup_ppc_half16ds", 0, 14, 0},
+  {"fixup_ppc_pcrel32", 0, 32, 0},
+  {"fixup_ppc_imm32", 0, 32, 0},
   {"fixup_ppc_pcrel34", 0, 34, 0},
   {"fixup_ppc_imm34", 0, 34, 0},
   {"fixup_ppc_nofixup", 0, 0, 0}};
@@ -166,6 +173,8 @@ MCFixupKindInfo PPCAsmBackend::getFixupKindInfo(MCFixupKind 
Kind) const {
   {"fixup_ppc_brcond14abs", 2, 14, 0},
   {"fixup_ppc_half16", 0, 16, 0},
   {"fixup_ppc_half16ds", 2, 14, 0},
+  {"fixup_ppc_pcrel32", 0, 32, 0},
+  {"fixup_ppc_imm32", 0, 32, 0},
   {"fixup_ppc_pcrel34", 0, 34, 0},
   {"fixup_ppc_imm34", 0, 34, 0},
   {"fixup_ppc_nofixup", 0, 0, 0}};
diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h 
b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h
index df0c666f5b113..4164b697649cd 100644
--- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h
+++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h
@@ -40,6 +40,12 @@ enum Fixups {
   /// instrs like 'std'.
   fixup_ppc_half16ds,
 
+  // A 32-bit fixup corresponding to PC-relative paddis.
+  fixup_ppc_pcrel32,
+
+  // A 32-bit fixup corresponding to Non-PC-relative paddis.
+  fixup_ppc_imm32,
+
   // A 34-bit fixup corresponding to PC-relative paddi.
   fixup_ppc_pcrel34,
 
diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp 
b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
index b27bc3bd49315..e2afb9378cbf0 100644
--- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
+++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
@@ -430,6 +430,18 @@ void PPCInstPrinter::printS16ImmOperand(const MCInst *MI, 
unsigned OpNo,
 printOperand(MI, OpNo, STI, O);
 }
 
+void PPCInstPrinter::printS32ImmOperand(const MCInst *MI, unsigned OpNo,
+const MCSubtargetInfo &STI,
+raw_ostream &O) {
+  if (MI->getOperand(OpNo).isImm()) {
+long long Value = MI->getOperand(OpNo).getImm();
+assert(isInt<32>(Value) && "Invalid s32imm argument!");
+O << (long long)Value;
+  }
+  else
+printOperand(MI

[llvm-branch-commits] [llvm] [AMDGPU] Add wave reduce intrinsics for float types - 2 (PR #161815)

2025-10-03 Thread via llvm-branch-commits


easyonaadit wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/161815?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#161816** https://app.graphite.dev/github/pr/llvm/llvm-project/161816?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#161815** https://app.graphite.dev/github/pr/llvm/llvm-project/161815?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/161815?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#161814** https://app.graphite.dev/github/pr/llvm/llvm-project/161814?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/161815
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Stop trying to constrain register class of post-RA-pseudos (PR #161792)

2025-10-03 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

This is trying to constrain the register class of a physical register,
which makes no sense.

---
Full diff: https://github.com/llvm/llvm-project/pull/161792.diff


1 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (-2) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index fe6b8b96cbd57..cda8069936af2 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -2112,8 +2112,6 @@ bool SIInstrInfo::expandPostRAPseudo(MachineInstr &MI) 
const {
 
   case AMDGPU::SI_RESTORE_S32_FROM_VGPR:
 MI.setDesc(get(AMDGPU::V_READLANE_B32));
-MI.getMF()->getRegInfo().constrainRegClass(MI.getOperand(0).getReg(),
-   &AMDGPU::SReg_32_XM0RegClass);
 break;
   case AMDGPU::AV_MOV_B32_IMM_PSEUDO: {
 Register Dst = MI.getOperand(0).getReg();

``




https://github.com/llvm/llvm-project/pull/161792
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] [HLSL] GetDimensions methods for buffer resources (PR #161929)

2025-10-03 Thread Helena Kotas via llvm-branch-commits


https://github.com/hekota updated 
https://github.com/llvm/llvm-project/pull/161929

>From e50918910a0ce590228c6ecacd4ff2a578da6f58 Mon Sep 17 00:00:00 2001
From: Helena Kotas 
Date: Fri, 3 Oct 2025 17:33:19 -0700
Subject: [PATCH] [HLSL] GetDimensions methods for buffer resources

Adds GetDimensions methods on all supported buffer resources.
---
 clang/include/clang/Basic/Builtins.td | 12 +++
 clang/lib/CodeGen/CGHLSLBuiltins.cpp  | 61 ++
 clang/lib/Sema/HLSLBuiltinTypeDeclBuilder.cpp | 81 ++-
 clang/lib/Sema/HLSLBuiltinTypeDeclBuilder.h   |  2 +
 clang/lib/Sema/HLSLExternalSemaSource.cpp | 11 +++
 clang/lib/Sema/SemaHLSL.cpp   | 18 +
 .../test/AST/HLSL/ByteAddressBuffers-AST.hlsl | 14 
 .../test/AST/HLSL/StructuredBuffers-AST.hlsl  | 22 +
 clang/test/AST/HLSL/TypedBuffers-AST.hlsl | 14 
 .../resources/ByteAddressBuffers-methods.hlsl | 47 +++
 .../StructuredBuffers-methods-lib.hlsl| 53 +++-
 .../StructuredBuffers-methods-ps.hlsl | 37 +
 .../resources/TypedBuffers-methods.hlsl   | 34 
 llvm/include/llvm/IR/IntrinsicsDirectX.td |  4 +
 14 files changed, 406 insertions(+), 4 deletions(-)
 create mode 100644 
clang/test/CodeGenHLSL/resources/ByteAddressBuffers-methods.hlsl

diff --git a/clang/include/clang/Basic/Builtins.td 
b/clang/include/clang/Basic/Builtins.td
index 468121f7d20ab..0b1587be51217 100644
--- a/clang/include/clang/Basic/Builtins.td
+++ b/clang/include/clang/Basic/Builtins.td
@@ -4951,6 +4951,18 @@ def HLSLResourceNonUniformIndex : 
LangBuiltin<"HLSL_LANG"> {
   let Prototype = "uint32_t(uint32_t)";
 }
 
+def HLSLResourceGetDimensions : LangBuiltin<"HLSL_LANG"> {
+  let Spellings = ["__builtin_hlsl_buffer_getdimensions"];
+  let Attributes = [NoThrow];
+  let Prototype = "void(...)";
+}
+
+def HLSLResourceGetStride : LangBuiltin<"HLSL_LANG"> {
+  let Spellings = ["__builtin_hlsl_buffer_getstride"];
+  let Attributes = [NoThrow];
+  let Prototype = "void(...)";
+}
+
 def HLSLAll : LangBuiltin<"HLSL_LANG"> {
   let Spellings = ["__builtin_hlsl_all"];
   let Attributes = [NoThrow, Const];
diff --git a/clang/lib/CodeGen/CGHLSLBuiltins.cpp 
b/clang/lib/CodeGen/CGHLSLBuiltins.cpp
index 6c0fc8d7f07be..373153e01c128 100644
--- a/clang/lib/CodeGen/CGHLSLBuiltins.cpp
+++ b/clang/lib/CodeGen/CGHLSLBuiltins.cpp
@@ -160,6 +160,58 @@ static Value *handleHlslSplitdouble(const CallExpr *E, 
CodeGenFunction *CGF) {
   return LastInst;
 }
 
+static Value *emitDXILGetDimensions(CodeGenFunction *CGF, Value *Handle,
+Value *MipLevel, LValue *OutArg0,
+LValue *OutArg1 = nullptr,
+LValue *OutArg2 = nullptr,
+LValue *OutArg3 = nullptr) {
+  assert(OutArg0 && "first output argument is required");
+
+  llvm::Type *I32 = CGF->Int32Ty;
+  StructType *RetTy = llvm::StructType::get(I32, I32, I32, I32);
+
+  CallInst *CI = CGF->Builder.CreateIntrinsic(
+  RetTy, llvm::Intrinsic::dx_resource_getdimensions,
+  ArrayRef{Handle, MipLevel});
+
+  Value *LastInst = nullptr;
+  unsigned OutArgIndex = 0;
+  for (LValue *OutArg : {OutArg0, OutArg1, OutArg2, OutArg3}) {
+if (OutArg) {
+  Value *OutArgVal = CGF->Builder.CreateExtractValue(CI, OutArgIndex);
+  LastInst = CGF->Builder.CreateStore(OutArgVal, OutArg->getAddress());
+}
+++OutArgIndex;
+  }
+  assert(LastInst && "no output argument stored?");
+  return LastInst;
+}
+
+static Value *emitBufferGetDimensions(CodeGenFunction *CGF, Value *Handle,
+  LValue &Dim) {
+  // Generate the call to get the buffer dimension.
+  switch (CGF->CGM.getTarget().getTriple().getArch()) {
+  case llvm::Triple::dxil:
+return emitDXILGetDimensions(CGF, Handle, PoisonValue::get(CGF->Int32Ty),
+ &Dim);
+break;
+  case llvm::Triple::spirv:
+llvm_unreachable("SPIR-V GetDimensions codegen not implemented yet.");
+  default:
+llvm_unreachable("GetDimensions not supported by target architecture");
+  }
+}
+
+static Value *emitBufferStride(CodeGenFunction *CGF, const Expr *HandleExpr,
+   LValue &Stride) {
+  // Figure out the stride of the buffer elements from the handle type.
+  auto *HandleTy =
+  cast(HandleExpr->getType().getTypePtr());
+  QualType ElementTy = HandleTy->getContainedType();
+  Value *StrideValue = CGF->getTypeSize(ElementTy);
+  return CGF->Builder.CreateStore(StrideValue, Stride.getAddress());
+}
+
 // Return dot product intrinsic that corresponds to the QT scalar type
 static Intrinsic::ID getDotProductIntrinsic(CGHLSLRuntime &RT, QualType QT) {
   if (QT->isFloatingType())
@@ -359,6 +411,15 @@ Value *CodeGenFunction::EmitHLSLBuiltinExpr(unsigned 
BuiltinID,
 RetTy, CGM.getHLSLRuntime().getNonUniformResourceIndexIntrinsic(),
 ArrayRef{IndexOp});

[llvm-branch-commits] [llvm] AMDGPU: Remove LDS_DIRECT_CLASS register class (PR #161762)

2025-10-03 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/161762

>From 53a6a5b9e3adcabc51e7eff0a21642f33859b946 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Fri, 3 Oct 2025 10:21:10 +0900
Subject: [PATCH] AMDGPU: Remove LDS_DIRECT_CLASS register class

This is a singleton register class which is a bad idea,
and not actually used.
---
 llvm/lib/Target/AMDGPU/SIRegisterInfo.td  |  20 +-
 .../GlobalISel/irtranslator-inline-asm.ll |   2 +-
 .../coalesce-copy-to-agpr-to-av-registers.mir | 232 +-
 ...class-vgpr-mfma-to-av-with-load-source.mir |  12 +-
 llvm/test/CodeGen/AMDGPU/inline-asm.i128.ll   |  24 +-
 ...al-regcopy-and-spill-missed-at-regalloc.ll |  16 +-
 ...lloc-failure-overlapping-insert-assert.mir |  12 +-
 .../rewrite-vgpr-mfma-to-agpr-copy-from.mir   |   4 +-
 ...gpr-mfma-to-agpr-subreg-insert-extract.mir |  12 +-
 ...te-vgpr-mfma-to-agpr-subreg-src2-chain.mir |  32 +--
 .../CodeGen/AMDGPU/spill-vector-superclass.ll |   2 +-
 .../Inputs/amdgpu_isel.ll.expected|   4 +-
 12 files changed, 183 insertions(+), 189 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td 
b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
index f98e31229b246..82fc2400a3754 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
@@ -761,12 +761,6 @@ def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", 
Reg128Types.types, 32,
   let BaseClassOrder = 1;
 }
 
-def LDS_DIRECT_CLASS : RegisterClass<"AMDGPU", [i32], 32,
-  (add LDS_DIRECT)> {
-  let isAllocatable = 0;
-  let CopyCost = -1;
-}
-
 let GeneratePressureSet = 0, HasSGPR = 1 in {
 // Subset of SReg_32 without M0 for SMRD instructions and alike.
 // See comments in SIInstructions.td for more info.
@@ -829,7 +823,7 @@ def SGPR_NULL256 : SIReg<"null">;
 
 let GeneratePressureSet = 0 in {
 def SRegOrLds_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, 
v2f16, v2bf16], 32,
-  (add SReg_32, LDS_DIRECT_CLASS)> {
+  (add SReg_32, LDS_DIRECT)> {
   let isAllocatable = 0;
   let HasSGPR = 1;
   let Size = 32;
@@ -968,7 +962,7 @@ defm "" : SRegClass<32, Reg1024Types.types, SGPR_1024Regs>;
 }
 
 def VRegOrLds_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, 
v2f16, v2bf16], 32,
- (add VGPR_32, LDS_DIRECT_CLASS)> {
+ (add VGPR_32, LDS_DIRECT)> {
   let isAllocatable = 0;
   let HasVGPR = 1;
   let Size = 32;
@@ -1083,21 +1077,21 @@ def VReg_1 : SIRegisterClass<"AMDGPU", [i1], 32, (add)> 
{
 }
 
 def VS_16 : SIRegisterClass<"AMDGPU", Reg16Types.types, 16,
-  (add VGPR_16, SReg_32, LDS_DIRECT_CLASS)> {
+  (add VGPR_16, SReg_32, LDS_DIRECT)> {
   let isAllocatable = 0;
   let HasVGPR = 1;
   let Size = 16;
 }
 
 def VS_16_Lo128 : SIRegisterClass<"AMDGPU", Reg16Types.types, 16,
-  (add VGPR_16_Lo128, SReg_32, LDS_DIRECT_CLASS)> {
+  (add VGPR_16_Lo128, SReg_32, LDS_DIRECT)> {
   let isAllocatable = 0;
   let HasVGPR = 1;
   let Size = 16;
 }
 
 def VS_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, v2f16, 
v2bf16], 32,
-  (add VGPR_32, SReg_32, LDS_DIRECT_CLASS)> {
+  (add VGPR_32, SReg_32, LDS_DIRECT)> {
   let isAllocatable = 0;
   let HasVGPR = 1;
   let HasSGPR = 1;
@@ -1105,7 +1099,7 @@ def VS_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, 
f16, bf16, v2i16, v2f16, v
 }
 
 def VS_32_Lo128 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, 
v2f16, v2bf16], 32,
-  (add VGPR_32_Lo128, SReg_32, LDS_DIRECT_CLASS)> {
+  (add VGPR_32_Lo128, SReg_32, LDS_DIRECT)> {
   let isAllocatable = 0;
   let HasVGPR = 1;
   let HasSGPR = 1;
@@ -1113,7 +1107,7 @@ def VS_32_Lo128 : SIRegisterClass<"AMDGPU", [i32, f32, 
i16, f16, bf16, v2i16, v2
 }
 
 def VS_32_Lo256 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, 
v2f16, v2bf16], 32,
-  (add VGPR_32_Lo256, SReg_32, 
LDS_DIRECT_CLASS)> {
+  (add VGPR_32_Lo256, SReg_32, LDS_DIRECT)> {
   let isAllocatable = 0;
   let HasVGPR = 1;
   let HasSGPR = 1;
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll
index a54dc9dda16e0..e5cd0710359ac 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll
@@ -136,7 +136,7 @@ define float @test_multiple_register_outputs_same() #0 {
 define double @test_multiple_register_outputs_mixed() #0 {
   ; CHECK-LABEL: name: test_multiple_register_outputs_mixed
   ; CHECK: bb.1 (%ir-block.0):
-  ; CHECK-NEXT:   INLINEASM &"v_mov_b32 $0, 0; v_add_f64 $1, 0, 0", 0 /* 
attdialect */, 1835018 /* regdef:VGPR_32 */, def %8, 3473418 /* regdef:VReg_64 
*/, def %9
+

[llvm-branch-commits] [llvm] AMDGPU: Stop using the wavemask register class for SCC cross class copies (PR #161801)

2025-10-03 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

> Are there any codegen changes?

I don't think so. There could maybe be dag scheduling changes but I haven't 
found them 

https://github.com/llvm/llvm-project/pull/161801
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/21.x: [SPARC] Prevent meta instructions from being inserted into delay slots (#161111) (PR #161937)

2025-10-03 Thread via llvm-branch-commits


llvmbot wrote:

@arsenm What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/161937
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/21.x: [SPARC] Prevent meta instructions from being inserted into delay slots (#161111) (PR #161937)

2025-10-03 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-sparc

Author: None (llvmbot)


Changes

Backport 2e1fab93467ec8c37a236ae6e059300ebaa0c986

Requested by: @brad0

---
Full diff: https://github.com/llvm/llvm-project/pull/161937.diff


2 Files Affected:

- (modified) llvm/lib/Target/Sparc/DelaySlotFiller.cpp (+2-2) 
- (modified) llvm/test/CodeGen/SPARC/2011-01-19-DelaySlot.ll (+25) 


``diff
diff --git a/llvm/lib/Target/Sparc/DelaySlotFiller.cpp 
b/llvm/lib/Target/Sparc/DelaySlotFiller.cpp
index 6c19049a001cf..024030d196ee3 100644
--- a/llvm/lib/Target/Sparc/DelaySlotFiller.cpp
+++ b/llvm/lib/Target/Sparc/DelaySlotFiller.cpp
@@ -206,8 +206,8 @@ Filler::findDelayInstr(MachineBasicBlock &MBB,
 if (!done)
   --I;
 
-// skip debug instruction
-if (I->isDebugInstr())
+// Skip meta instructions.
+if (I->isMetaInstruction())
   continue;
 
 if (I->hasUnmodeledSideEffects() || I->isInlineAsm() || I->isPosition() ||
diff --git a/llvm/test/CodeGen/SPARC/2011-01-19-DelaySlot.ll 
b/llvm/test/CodeGen/SPARC/2011-01-19-DelaySlot.ll
index 9ccd4f1c0ac9a..767ef7eb510e6 100644
--- a/llvm/test/CodeGen/SPARC/2011-01-19-DelaySlot.ll
+++ b/llvm/test/CodeGen/SPARC/2011-01-19-DelaySlot.ll
@@ -184,4 +184,29 @@ entry:
   ret i32 %2
 }
 
+define i32 @test_generic_inst(i32 %arg) #0 {
+;CHECK-LABEL: test_generic_inst:
+;CHECK: ! fake_use: {{.*}}
+;CHECK: bne {{.*}}
+;CHECK-NEXT: nop
+  %bar1 = call i32 @bar(i32 %arg)
+  %even = and i32 %bar1, 1
+  %cmp = icmp eq i32 %even, 0
+  ; This shouldn't get reordered into a delay slot
+  call void (...) @llvm.fake.use(i32 %arg)
+  br i1 %cmp, label %true, label %false
+true:
+  %bar2 = call i32 @bar(i32 %bar1)
+  br label %cont
+
+false:
+  %inc = add nsw i32 %bar1, 1
+  br label %cont
+
+cont:
+  %ret = phi i32 [ %bar2, %true ], [ %inc, %false ]
+  ret i32 %ret
+}
+
+declare void @llvm.fake.use(...)
 attributes #0 = { nounwind "disable-tail-calls"="true" }

``




https://github.com/llvm/llvm-project/pull/161937
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Stop using the wavemask register class for SCC cross class copies (PR #161801)

2025-10-03 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/161801

SCC should be copied to a 32-bit SGPR. Using a wave mask doesn't make
sense.

>From 04a448a4f8da85d879ffb61618a824bd2ab5a62a Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Fri, 3 Oct 2025 15:53:00 +0900
Subject: [PATCH] AMDGPU: Stop using the wavemask register class for SCC cross
 class copies

SCC should be copied to a 32-bit SGPR. Using a wave mask doesn't make
sense.
---
 llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index 21735f91f4ad7..ba29dd4ae61d4 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -1118,9 +1118,7 @@ SIRegisterInfo::getPointerRegClass(unsigned Kind) const {
 
 const TargetRegisterClass *
 SIRegisterInfo::getCrossCopyRegClass(const TargetRegisterClass *RC) const {
-  if (RC == &AMDGPU::SCC_CLASSRegClass)
-return getWaveMaskRegClass();
-  return RC;
+  return RC == &AMDGPU::SCC_CLASSRegClass ? &AMDGPU::SReg_32RegClass : RC;
 }
 
 static unsigned getNumSubRegsForSpillOp(const MachineInstr &MI,

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [AllocToken, Clang] Infer type hints from sizeof expressions and casts (PR #156841)

2025-10-03 Thread Hans Wennborg via llvm-branch-commits



@@ -10,7 +10,7 @@ typedef __typeof(sizeof(int)) size_t;
 void *malloc(size_t size);
 
 // CHECK-LABEL: @test_malloc(
-// CHECK: call{{.*}} ptr @__alloc_token_malloc(i64 noundef 4, i64 0)
+// CHECK: call{{.*}} ptr @__alloc_token_malloc(i64 noundef 4, i64 
2689373973731826898){{.*}} !alloc_token [[META_INT:![0-9]+]]

zmodem wrote:

nit: Use a regex instead of hard-coding the token here and in 
alloc-token-nonlibcalls.c?

https://github.com/llvm/llvm-project/pull/156841
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Stop using the wavemask register class for SCC cross class copies (PR #161801)

2025-10-03 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

SCC should be copied to a 32-bit SGPR. Using a wave mask doesn't make
sense.

---
Full diff: https://github.com/llvm/llvm-project/pull/161801.diff


1 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp (+1-3) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index 21735f91f4ad7..ba29dd4ae61d4 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -1118,9 +1118,7 @@ SIRegisterInfo::getPointerRegClass(unsigned Kind) const {
 
 const TargetRegisterClass *
 SIRegisterInfo::getCrossCopyRegClass(const TargetRegisterClass *RC) const {
-  if (RC == &AMDGPU::SCC_CLASSRegClass)
-return getWaveMaskRegClass();
-  return RC;
+  return RC == &AMDGPU::SCC_CLASSRegClass ? &AMDGPU::SReg_32RegClass : RC;
 }
 
 static unsigned getNumSubRegsForSpillOp(const MachineInstr &MI,

``




https://github.com/llvm/llvm-project/pull/161801
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [AllocToken, Clang] Infer type hints from sizeof expressions and casts (PR #156841)

2025-10-03 Thread Hans Wennborg via llvm-branch-commits



@@ -1353,6 +1354,92 @@ void CodeGenFunction::EmitAllocToken(llvm::CallBase *CB, 
QualType AllocType) {
   CB->setMetadata(llvm::LLVMContext::MD_alloc_token, MDN);
 }
 
+/// Infer type from a simple sizeof expression.
+static QualType inferTypeFromSizeofExpr(const Expr *E) {
+  const Expr *Arg = E->IgnoreParenImpCasts();
+  if (const auto *UET = dyn_cast(Arg)) {
+if (UET->getKind() == UETT_SizeOf) {
+  if (UET->isArgumentType())
+return UET->getArgumentTypeInfo()->getType();
+  else
+return UET->getArgumentExpr()->getType();
+}
+  }
+  return QualType();
+}
+
+/// Infer type from an arithmetic expression involving a sizeof.
+static QualType inferTypeFromArithSizeofExpr(const Expr *E) {
+  const Expr *Arg = E->IgnoreParenImpCasts();
+  // The argument is a lone sizeof expression.
+  if (QualType T = inferTypeFromSizeofExpr(Arg); !T.isNull())
+return T;
+  if (const auto *BO = dyn_cast(Arg)) {
+// Argument is an arithmetic expression. Cover common arithmetic patterns
+// involving sizeof.
+switch (BO->getOpcode()) {
+case BO_Add:
+case BO_Div:
+case BO_Mul:
+case BO_Shl:
+case BO_Shr:
+case BO_Sub:
+  if (QualType T = inferTypeFromArithSizeofExpr(BO->getLHS()); !T.isNull())
+return T;
+  if (QualType T = inferTypeFromArithSizeofExpr(BO->getRHS()); !T.isNull())
+return T;
+  break;
+default:
+  break;
+}
+  }
+  return QualType();
+}
+
+/// If the expression E is a reference to a variable, infer the type from a
+/// variable's initializer if it contains a sizeof. Beware, this is a heuristic
+/// and ignores if a variable is later reassigned.

zmodem wrote:

The added examples are a great improvement. Thanks!

https://github.com/llvm/llvm-project/pull/156841
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [Clang] Introduce -fsanitize=alloc-token (PR #156839)

2025-10-03 Thread Hans Wennborg via llvm-branch-commits


https://github.com/zmodem approved this pull request.

lgtm

https://github.com/llvm/llvm-project/pull/156839
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Fix trying to constrain physical registers in spill handling (PR #161793)

2025-10-03 Thread Christudasan Devadasan via llvm-branch-commits


https://github.com/cdevadas approved this pull request.


https://github.com/llvm/llvm-project/pull/161793
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [AMDGPU] Add builtins for wave reduction intrinsics (PR #161816)

2025-10-03 Thread via llvm-branch-commits


https://github.com/easyonaadit updated 
https://github.com/llvm/llvm-project/pull/161816

>From feaf31184bc20619448c87f3c38a1bbc3ec21719 Mon Sep 17 00:00:00 2001
From: Aaditya 
Date: Tue, 30 Sep 2025 11:37:42 +0530
Subject: [PATCH] [AMDGPU] Add builtins for wave reduction intrinsics

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def | 4 
 clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp  | 4 
 2 files changed, 8 insertions(+)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index fda16e42d2c6b..ebc0ac35f42d9 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -402,6 +402,10 @@ BUILTIN(__builtin_amdgcn_wave_reduce_max_u64, "WUiWUiZi", 
"nc")
 BUILTIN(__builtin_amdgcn_wave_reduce_and_b64, "WiWiZi", "nc")
 BUILTIN(__builtin_amdgcn_wave_reduce_or_b64, "WiWiZi", "nc")
 BUILTIN(__builtin_amdgcn_wave_reduce_xor_b64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_add_f32, "ffZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_sub_f32, "ffZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_f32, "ffZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_f32, "ffZi", "nc")
 
 
//===--===//
 // R600-NI only builtins.
diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp 
b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index 07cf08c54985a..a242a73e4a822 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -301,18 +301,22 @@ static Intrinsic::ID 
getIntrinsicIDforWaveReduction(unsigned BuiltinID) {
 llvm_unreachable("Unknown BuiltinID for wave reduction");
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32:
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u64:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_f32:
 return Intrinsic::amdgcn_wave_reduce_add;
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u32:
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u64:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_f32:
 return Intrinsic::amdgcn_wave_reduce_sub;
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i32:
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i64:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_f32:
 return Intrinsic::amdgcn_wave_reduce_min;
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u32:
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u64:
 return Intrinsic::amdgcn_wave_reduce_umin;
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i32:
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i64:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_f32:
 return Intrinsic::amdgcn_wave_reduce_max;
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u32:
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u64:

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [AMDGPU] Add builtins for wave reduction intrinsics (PR #161816)

2025-10-03 Thread via llvm-branch-commits


https://github.com/easyonaadit updated 
https://github.com/llvm/llvm-project/pull/161816

>From feaf31184bc20619448c87f3c38a1bbc3ec21719 Mon Sep 17 00:00:00 2001
From: Aaditya 
Date: Tue, 30 Sep 2025 11:37:42 +0530
Subject: [PATCH] [AMDGPU] Add builtins for wave reduction intrinsics

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def | 4 
 clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp  | 4 
 2 files changed, 8 insertions(+)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index fda16e42d2c6b..ebc0ac35f42d9 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -402,6 +402,10 @@ BUILTIN(__builtin_amdgcn_wave_reduce_max_u64, "WUiWUiZi", 
"nc")
 BUILTIN(__builtin_amdgcn_wave_reduce_and_b64, "WiWiZi", "nc")
 BUILTIN(__builtin_amdgcn_wave_reduce_or_b64, "WiWiZi", "nc")
 BUILTIN(__builtin_amdgcn_wave_reduce_xor_b64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_add_f32, "ffZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_sub_f32, "ffZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_f32, "ffZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_f32, "ffZi", "nc")
 
 
//===--===//
 // R600-NI only builtins.
diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp 
b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index 07cf08c54985a..a242a73e4a822 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -301,18 +301,22 @@ static Intrinsic::ID 
getIntrinsicIDforWaveReduction(unsigned BuiltinID) {
 llvm_unreachable("Unknown BuiltinID for wave reduction");
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32:
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u64:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_f32:
 return Intrinsic::amdgcn_wave_reduce_add;
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u32:
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u64:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_f32:
 return Intrinsic::amdgcn_wave_reduce_sub;
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i32:
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i64:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_f32:
 return Intrinsic::amdgcn_wave_reduce_min;
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u32:
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u64:
 return Intrinsic::amdgcn_wave_reduce_umin;
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i32:
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i64:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_f32:
 return Intrinsic::amdgcn_wave_reduce_max;
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u32:
   case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u64:

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [AllocToken, Clang] Implement TypeHashPointerSplit mode (PR #156840)

2025-10-03 Thread Hans Wennborg via llvm-branch-commits


https://github.com/zmodem approved this pull request.

lgtm

https://github.com/llvm/llvm-project/pull/156840
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Add wave reduce intrinsics for float types - 2 (PR #161815)

2025-10-03 Thread via llvm-branch-commits


https://github.com/easyonaadit updated 
https://github.com/llvm/llvm-project/pull/161815

>From 2e8024c70b755a3b309ec8b2965333e61a91af69 Mon Sep 17 00:00:00 2001
From: Aaditya 
Date: Mon, 29 Sep 2025 18:58:10 +0530
Subject: [PATCH] [AMDGPU] Add wave reduce intrinsics for float types - 2

Supported Ops: `fadd`, `fsub`
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |  40 +-
 llvm/lib/Target/AMDGPU/SIInstructions.td  |   2 +
 .../CodeGen/AMDGPU/llvm.amdgcn.reduce.add.ll  | 949 +
 .../CodeGen/AMDGPU/llvm.amdgcn.reduce.sub.ll  | 967 ++
 4 files changed, 1955 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 4d3c3de879bb6..bb58c97402527 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -5330,11 +5330,13 @@ static uint32_t 
getIdentityValueFor32BitWaveReduction(unsigned Opc) {
   case AMDGPU::S_MAX_U32:
 return std::numeric_limits::min();
   case AMDGPU::S_MAX_I32:
+  case AMDGPU::V_SUB_F32_e64: // +0.0
 return std::numeric_limits::min();
   case AMDGPU::S_ADD_I32:
   case AMDGPU::S_SUB_I32:
   case AMDGPU::S_OR_B32:
   case AMDGPU::S_XOR_B32:
+  case AMDGPU::V_ADD_F32_e64: // -0.0
 return std::numeric_limits::min();
   case AMDGPU::S_AND_B32:
 return std::numeric_limits::max();
@@ -5382,11 +5384,13 @@ static bool is32bitWaveReduceOperation(unsigned Opc) {
  Opc == AMDGPU::S_ADD_I32 || Opc == AMDGPU::S_SUB_I32 ||
  Opc == AMDGPU::S_AND_B32 || Opc == AMDGPU::S_OR_B32 ||
  Opc == AMDGPU::S_XOR_B32 || Opc == AMDGPU::V_MIN_F32_e64 ||
- Opc == AMDGPU::V_MAX_F32_e64;
+ Opc == AMDGPU::V_MAX_F32_e64 || Opc == AMDGPU::V_ADD_F32_e64 ||
+ Opc == AMDGPU::V_SUB_F32_e64;
 }
 
 static bool isFloatingPointWaveReduceOperation(unsigned Opc) {
-  return Opc == AMDGPU::V_MIN_F32_e64 || Opc == AMDGPU::V_MAX_F32_e64;
+  return Opc == AMDGPU::V_MIN_F32_e64 || Opc == AMDGPU::V_MAX_F32_e64 ||
+ Opc == AMDGPU::V_ADD_F32_e64 || Opc == AMDGPU::V_SUB_F32_e64;
 }
 
 static MachineBasicBlock *lowerWaveReduce(MachineInstr &MI,
@@ -5433,8 +5437,10 @@ static MachineBasicBlock *lowerWaveReduce(MachineInstr 
&MI,
 case AMDGPU::S_XOR_B64:
 case AMDGPU::S_ADD_I32:
 case AMDGPU::S_ADD_U64_PSEUDO:
+case AMDGPU::V_ADD_F32_e64:
 case AMDGPU::S_SUB_I32:
-case AMDGPU::S_SUB_U64_PSEUDO: {
+case AMDGPU::S_SUB_U64_PSEUDO:
+case AMDGPU::V_SUB_F32_e64: {
   const TargetRegisterClass *WaveMaskRegClass = TRI->getWaveMaskRegClass();
   const TargetRegisterClass *DstRegClass = MRI.getRegClass(DstReg);
   Register ExecMask = MRI.createVirtualRegister(WaveMaskRegClass);
@@ -5589,6 +5595,30 @@ static MachineBasicBlock *lowerWaveReduce(MachineInstr 
&MI,
 .addImm(AMDGPU::sub1);
 break;
   }
+  case AMDGPU::V_ADD_F32_e64:
+  case AMDGPU::V_SUB_F32_e64: {
+Register ActiveLanesVreg =
+MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
+Register DstVreg = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
+// Get number of active lanes as a float val.
+BuildMI(BB, MI, DL, TII->get(AMDGPU::V_CVT_F32_I32_e64),
+ActiveLanesVreg)
+.addReg(NewAccumulator->getOperand(0).getReg())
+.addImm(0)  // clamp
+.addImm(0); // output-modifier
+
+// Take negation of input for SUB reduction
+unsigned srcMod = Opc == AMDGPU::V_SUB_F32_e64 ? 1 : 0;
+BuildMI(BB, MI, DL, TII->get(AMDGPU::V_MUL_F32_e64), DstVreg)
+.addImm(srcMod) // src0 modifier
+.addReg(SrcReg)
+.addImm(0) // src1 modifier
+.addReg(ActiveLanesVreg)
+.addImm(0)  // clamp
+.addImm(0); // output-mod
+BuildMI(BB, MI, DL, TII->get(AMDGPU::V_READFIRSTLANE_B32), DstReg)
+.addReg(DstVreg);
+  }
   }
   RetBB = &BB;
 }
@@ -5833,10 +5863,14 @@ 
SITargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
 return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_ADD_I32);
   case AMDGPU::WAVE_REDUCE_ADD_PSEUDO_U64:
 return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_ADD_U64_PSEUDO);
+  case AMDGPU::WAVE_REDUCE_ADD_PSEUDO_F32:
+return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::V_ADD_F32_e64);
   case AMDGPU::WAVE_REDUCE_SUB_PSEUDO_I32:
 return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_SUB_I32);
   case AMDGPU::WAVE_REDUCE_SUB_PSEUDO_U64:
 return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_SUB_U64_PSEUDO);
+  case AMDGPU::WAVE_REDUCE_SUB_PSEUDO_F32:
+return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::V_SUB_F32_e64);
   case AMDGPU::WAVE_REDUCE_AND_PSEUDO_B32:
 return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_AND_B32);
   case AMDGPU::WAVE_REDUCE_AND_PSEUDO_B64:
diff --git a/llvm/lib/Target/AMDGPU/SIInstruc

[llvm-branch-commits] [clang] [AMDGPU] Add builtins for wave reduction intrinsics (PR #161816)

2025-10-03 Thread via llvm-branch-commits


easyonaadit wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/161816?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#161816** https://app.graphite.dev/github/pr/llvm/llvm-project/161816?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/161816?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#161815** https://app.graphite.dev/github/pr/llvm/llvm-project/161815?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#161814** https://app.graphite.dev/github/pr/llvm/llvm-project/161814?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/161816
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/21.x: [clang] [Headers] Don't use unreserved names in avx10_2bf16intrin.h (#161824) (PR #161836)

2025-10-03 Thread via llvm-branch-commits


https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/161836
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/21.x: [clang] [Headers] Don't use unreserved names in avx10_2bf16intrin.h (#161824) (PR #161836)

2025-10-03 Thread Simon Pilgrim via llvm-branch-commits


https://github.com/RKSimon approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/161836
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/21.x: [clang] [Headers] Don't use unreserved names in avx10_2bf16intrin.h (#161824) (PR #161836)

2025-10-03 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-clang

Author: None (llvmbot)


Changes

Backport 3c5c82d09c691a83fec5d09df2f6a308a789ead1

Requested by: @mstorsjo

---
Full diff: https://github.com/llvm/llvm-project/pull/161836.diff


1 Files Affected:

- (modified) clang/lib/Headers/avx10_2bf16intrin.h (+18-18) 


``diff
diff --git a/clang/lib/Headers/avx10_2bf16intrin.h 
b/clang/lib/Headers/avx10_2bf16intrin.h
index 66797ae00fe4f..0ca5380829391 100644
--- a/clang/lib/Headers/avx10_2bf16intrin.h
+++ b/clang/lib/Headers/avx10_2bf16intrin.h
@@ -519,34 +519,34 @@ _mm_maskz_min_pbh(__mmask8 __U, __m128bh __A, __m128bh 
__B) {
   (__mmask8)__U, (__v8bf)_mm_min_pbh(__A, __B), (__v8bf)_mm_setzero_pbh());
 }
 
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comieq_sbh(__m128bh A,
-   __m128bh B) {
-  return __builtin_ia32_vcomisbf16eq((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comieq_sbh(__m128bh __A,
+   __m128bh __B) {
+  return __builtin_ia32_vcomisbf16eq((__v8bf)__A, (__v8bf)__B);
 }
 
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comilt_sbh(__m128bh A,
-   __m128bh B) {
-  return __builtin_ia32_vcomisbf16lt((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comilt_sbh(__m128bh __A,
+   __m128bh __B) {
+  return __builtin_ia32_vcomisbf16lt((__v8bf)__A, (__v8bf)__B);
 }
 
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comile_sbh(__m128bh A,
-   __m128bh B) {
-  return __builtin_ia32_vcomisbf16le((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comile_sbh(__m128bh __A,
+   __m128bh __B) {
+  return __builtin_ia32_vcomisbf16le((__v8bf)__A, (__v8bf)__B);
 }
 
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comigt_sbh(__m128bh A,
-   __m128bh B) {
-  return __builtin_ia32_vcomisbf16gt((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comigt_sbh(__m128bh __A,
+   __m128bh __B) {
+  return __builtin_ia32_vcomisbf16gt((__v8bf)__A, (__v8bf)__B);
 }
 
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comige_sbh(__m128bh A,
-   __m128bh B) {
-  return __builtin_ia32_vcomisbf16ge((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comige_sbh(__m128bh __A,
+   __m128bh __B) {
+  return __builtin_ia32_vcomisbf16ge((__v8bf)__A, (__v8bf)__B);
 }
 
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comineq_sbh(__m128bh A,
-__m128bh B) {
-  return __builtin_ia32_vcomisbf16neq((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comineq_sbh(__m128bh __A,
+__m128bh __B) {
+  return __builtin_ia32_vcomisbf16neq((__v8bf)__A, (__v8bf)__B);
 }
 
 #define _mm256_cmp_pbh_mask(__A, __B, __P) 
\

``




https://github.com/llvm/llvm-project/pull/161836
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] CodeGen: Stop checking for physregs in constrainRegClass (PR #161795)

2025-10-03 Thread Christudasan Devadasan via llvm-branch-commits



@@ -83,8 +83,6 @@ constrainRegClass(MachineRegisterInfo &MRI, Register Reg,
 
 const TargetRegisterClass *MachineRegisterInfo::constrainRegClass(
 Register Reg, const TargetRegisterClass *RC, unsigned MinNumRegs) {
-  if (Reg.isPhysical())

cdevadas wrote:

What if someone unkonwingly uses this function post-RA? 

https://github.com/llvm/llvm-project/pull/161795
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Stop using the wavemask register class for SCC cross class copies (PR #161801)

2025-10-03 Thread Matt Arsenault via llvm-branch-commits



@@ -1118,9 +1118,7 @@ SIRegisterInfo::getPointerRegClass(unsigned Kind) const {
 
 const TargetRegisterClass *
 SIRegisterInfo::getCrossCopyRegClass(const TargetRegisterClass *RC) const {
-  if (RC == &AMDGPU::SCC_CLASSRegClass)
-return getWaveMaskRegClass();
-  return RC;
+  return RC == &AMDGPU::SCC_CLASSRegClass ? &AMDGPU::SReg_32RegClass : RC;

arsenm wrote:

No. These have nothing to do with each other. To extract a value into an 
allocatable register, a 32-bit SGPR is the natural choice 

https://github.com/llvm/llvm-project/pull/161801
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Add wave reduce intrinsics for float types - 2 (PR #161815)

2025-10-03 Thread via llvm-branch-commits


https://github.com/easyonaadit updated 
https://github.com/llvm/llvm-project/pull/161815

>From 2e8024c70b755a3b309ec8b2965333e61a91af69 Mon Sep 17 00:00:00 2001
From: Aaditya 
Date: Mon, 29 Sep 2025 18:58:10 +0530
Subject: [PATCH] [AMDGPU] Add wave reduce intrinsics for float types - 2

Supported Ops: `fadd`, `fsub`
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |  40 +-
 llvm/lib/Target/AMDGPU/SIInstructions.td  |   2 +
 .../CodeGen/AMDGPU/llvm.amdgcn.reduce.add.ll  | 949 +
 .../CodeGen/AMDGPU/llvm.amdgcn.reduce.sub.ll  | 967 ++
 4 files changed, 1955 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 4d3c3de879bb6..bb58c97402527 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -5330,11 +5330,13 @@ static uint32_t 
getIdentityValueFor32BitWaveReduction(unsigned Opc) {
   case AMDGPU::S_MAX_U32:
 return std::numeric_limits::min();
   case AMDGPU::S_MAX_I32:
+  case AMDGPU::V_SUB_F32_e64: // +0.0
 return std::numeric_limits::min();
   case AMDGPU::S_ADD_I32:
   case AMDGPU::S_SUB_I32:
   case AMDGPU::S_OR_B32:
   case AMDGPU::S_XOR_B32:
+  case AMDGPU::V_ADD_F32_e64: // -0.0
 return std::numeric_limits::min();
   case AMDGPU::S_AND_B32:
 return std::numeric_limits::max();
@@ -5382,11 +5384,13 @@ static bool is32bitWaveReduceOperation(unsigned Opc) {
  Opc == AMDGPU::S_ADD_I32 || Opc == AMDGPU::S_SUB_I32 ||
  Opc == AMDGPU::S_AND_B32 || Opc == AMDGPU::S_OR_B32 ||
  Opc == AMDGPU::S_XOR_B32 || Opc == AMDGPU::V_MIN_F32_e64 ||
- Opc == AMDGPU::V_MAX_F32_e64;
+ Opc == AMDGPU::V_MAX_F32_e64 || Opc == AMDGPU::V_ADD_F32_e64 ||
+ Opc == AMDGPU::V_SUB_F32_e64;
 }
 
 static bool isFloatingPointWaveReduceOperation(unsigned Opc) {
-  return Opc == AMDGPU::V_MIN_F32_e64 || Opc == AMDGPU::V_MAX_F32_e64;
+  return Opc == AMDGPU::V_MIN_F32_e64 || Opc == AMDGPU::V_MAX_F32_e64 ||
+ Opc == AMDGPU::V_ADD_F32_e64 || Opc == AMDGPU::V_SUB_F32_e64;
 }
 
 static MachineBasicBlock *lowerWaveReduce(MachineInstr &MI,
@@ -5433,8 +5437,10 @@ static MachineBasicBlock *lowerWaveReduce(MachineInstr 
&MI,
 case AMDGPU::S_XOR_B64:
 case AMDGPU::S_ADD_I32:
 case AMDGPU::S_ADD_U64_PSEUDO:
+case AMDGPU::V_ADD_F32_e64:
 case AMDGPU::S_SUB_I32:
-case AMDGPU::S_SUB_U64_PSEUDO: {
+case AMDGPU::S_SUB_U64_PSEUDO:
+case AMDGPU::V_SUB_F32_e64: {
   const TargetRegisterClass *WaveMaskRegClass = TRI->getWaveMaskRegClass();
   const TargetRegisterClass *DstRegClass = MRI.getRegClass(DstReg);
   Register ExecMask = MRI.createVirtualRegister(WaveMaskRegClass);
@@ -5589,6 +5595,30 @@ static MachineBasicBlock *lowerWaveReduce(MachineInstr 
&MI,
 .addImm(AMDGPU::sub1);
 break;
   }
+  case AMDGPU::V_ADD_F32_e64:
+  case AMDGPU::V_SUB_F32_e64: {
+Register ActiveLanesVreg =
+MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
+Register DstVreg = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
+// Get number of active lanes as a float val.
+BuildMI(BB, MI, DL, TII->get(AMDGPU::V_CVT_F32_I32_e64),
+ActiveLanesVreg)
+.addReg(NewAccumulator->getOperand(0).getReg())
+.addImm(0)  // clamp
+.addImm(0); // output-modifier
+
+// Take negation of input for SUB reduction
+unsigned srcMod = Opc == AMDGPU::V_SUB_F32_e64 ? 1 : 0;
+BuildMI(BB, MI, DL, TII->get(AMDGPU::V_MUL_F32_e64), DstVreg)
+.addImm(srcMod) // src0 modifier
+.addReg(SrcReg)
+.addImm(0) // src1 modifier
+.addReg(ActiveLanesVreg)
+.addImm(0)  // clamp
+.addImm(0); // output-mod
+BuildMI(BB, MI, DL, TII->get(AMDGPU::V_READFIRSTLANE_B32), DstReg)
+.addReg(DstVreg);
+  }
   }
   RetBB = &BB;
 }
@@ -5833,10 +5863,14 @@ 
SITargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
 return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_ADD_I32);
   case AMDGPU::WAVE_REDUCE_ADD_PSEUDO_U64:
 return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_ADD_U64_PSEUDO);
+  case AMDGPU::WAVE_REDUCE_ADD_PSEUDO_F32:
+return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::V_ADD_F32_e64);
   case AMDGPU::WAVE_REDUCE_SUB_PSEUDO_I32:
 return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_SUB_I32);
   case AMDGPU::WAVE_REDUCE_SUB_PSEUDO_U64:
 return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_SUB_U64_PSEUDO);
+  case AMDGPU::WAVE_REDUCE_SUB_PSEUDO_F32:
+return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::V_SUB_F32_e64);
   case AMDGPU::WAVE_REDUCE_AND_PSEUDO_B32:
 return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_AND_B32);
   case AMDGPU::WAVE_REDUCE_AND_PSEUDO_B64:
diff --git a/llvm/lib/Target/AMDGPU/SIInstruc

[llvm-branch-commits] [llvm] AMDGPU: Remove LDS_DIRECT_CLASS register class (PR #161762)

2025-10-03 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/161762
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/21.x: [clang] [Headers] Don't use unreserved names in avx10_2bf16intrin.h (#161824) (PR #161836)

2025-10-03 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-x86

Author: None (llvmbot)


Changes

Backport 3c5c82d09c691a83fec5d09df2f6a308a789ead1

Requested by: @mstorsjo

---
Full diff: https://github.com/llvm/llvm-project/pull/161836.diff


1 Files Affected:

- (modified) clang/lib/Headers/avx10_2bf16intrin.h (+18-18) 


``diff
diff --git a/clang/lib/Headers/avx10_2bf16intrin.h 
b/clang/lib/Headers/avx10_2bf16intrin.h
index 66797ae00fe4f..0ca5380829391 100644
--- a/clang/lib/Headers/avx10_2bf16intrin.h
+++ b/clang/lib/Headers/avx10_2bf16intrin.h
@@ -519,34 +519,34 @@ _mm_maskz_min_pbh(__mmask8 __U, __m128bh __A, __m128bh 
__B) {
   (__mmask8)__U, (__v8bf)_mm_min_pbh(__A, __B), (__v8bf)_mm_setzero_pbh());
 }
 
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comieq_sbh(__m128bh A,
-   __m128bh B) {
-  return __builtin_ia32_vcomisbf16eq((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comieq_sbh(__m128bh __A,
+   __m128bh __B) {
+  return __builtin_ia32_vcomisbf16eq((__v8bf)__A, (__v8bf)__B);
 }
 
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comilt_sbh(__m128bh A,
-   __m128bh B) {
-  return __builtin_ia32_vcomisbf16lt((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comilt_sbh(__m128bh __A,
+   __m128bh __B) {
+  return __builtin_ia32_vcomisbf16lt((__v8bf)__A, (__v8bf)__B);
 }
 
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comile_sbh(__m128bh A,
-   __m128bh B) {
-  return __builtin_ia32_vcomisbf16le((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comile_sbh(__m128bh __A,
+   __m128bh __B) {
+  return __builtin_ia32_vcomisbf16le((__v8bf)__A, (__v8bf)__B);
 }
 
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comigt_sbh(__m128bh A,
-   __m128bh B) {
-  return __builtin_ia32_vcomisbf16gt((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comigt_sbh(__m128bh __A,
+   __m128bh __B) {
+  return __builtin_ia32_vcomisbf16gt((__v8bf)__A, (__v8bf)__B);
 }
 
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comige_sbh(__m128bh A,
-   __m128bh B) {
-  return __builtin_ia32_vcomisbf16ge((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comige_sbh(__m128bh __A,
+   __m128bh __B) {
+  return __builtin_ia32_vcomisbf16ge((__v8bf)__A, (__v8bf)__B);
 }
 
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comineq_sbh(__m128bh A,
-__m128bh B) {
-  return __builtin_ia32_vcomisbf16neq((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comineq_sbh(__m128bh __A,
+__m128bh __B) {
+  return __builtin_ia32_vcomisbf16neq((__v8bf)__A, (__v8bf)__B);
 }
 
 #define _mm256_cmp_pbh_mask(__A, __B, __P) 
\

``




https://github.com/llvm/llvm-project/pull/161836
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/21.x: [clang] [Headers] Don't use unreserved names in avx10_2bf16intrin.h (#161824) (PR #161836)

2025-10-03 Thread via llvm-branch-commits


https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/161836

Backport 3c5c82d09c691a83fec5d09df2f6a308a789ead1

Requested by: @mstorsjo

>From 290ad0a527af6e28eb53782226af41b682be721c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Martin=20Storsj=C3=B6?= 
Date: Fri, 3 Oct 2025 15:25:24 +0300
Subject: [PATCH] =?UTF-8?q?[clang]=20[Headers]=C2=A0Don't=20use=20unreserv?=
 =?UTF-8?q?ed=20names=20in=20avx10=5F2bf16intrin.h=20(#161824)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This can cause breakage with user code that does "#define A ...".

This fixes issue https://github.com/llvm/llvm-project/issues/161808.

(cherry picked from commit 3c5c82d09c691a83fec5d09df2f6a308a789ead1)
---
 clang/lib/Headers/avx10_2bf16intrin.h | 36 +--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/clang/lib/Headers/avx10_2bf16intrin.h 
b/clang/lib/Headers/avx10_2bf16intrin.h
index 66797ae00fe4f..0ca5380829391 100644
--- a/clang/lib/Headers/avx10_2bf16intrin.h
+++ b/clang/lib/Headers/avx10_2bf16intrin.h
@@ -519,34 +519,34 @@ _mm_maskz_min_pbh(__mmask8 __U, __m128bh __A, __m128bh 
__B) {
   (__mmask8)__U, (__v8bf)_mm_min_pbh(__A, __B), (__v8bf)_mm_setzero_pbh());
 }
 
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comieq_sbh(__m128bh A,
-   __m128bh B) {
-  return __builtin_ia32_vcomisbf16eq((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comieq_sbh(__m128bh __A,
+   __m128bh __B) {
+  return __builtin_ia32_vcomisbf16eq((__v8bf)__A, (__v8bf)__B);
 }
 
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comilt_sbh(__m128bh A,
-   __m128bh B) {
-  return __builtin_ia32_vcomisbf16lt((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comilt_sbh(__m128bh __A,
+   __m128bh __B) {
+  return __builtin_ia32_vcomisbf16lt((__v8bf)__A, (__v8bf)__B);
 }
 
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comile_sbh(__m128bh A,
-   __m128bh B) {
-  return __builtin_ia32_vcomisbf16le((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comile_sbh(__m128bh __A,
+   __m128bh __B) {
+  return __builtin_ia32_vcomisbf16le((__v8bf)__A, (__v8bf)__B);
 }
 
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comigt_sbh(__m128bh A,
-   __m128bh B) {
-  return __builtin_ia32_vcomisbf16gt((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comigt_sbh(__m128bh __A,
+   __m128bh __B) {
+  return __builtin_ia32_vcomisbf16gt((__v8bf)__A, (__v8bf)__B);
 }
 
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comige_sbh(__m128bh A,
-   __m128bh B) {
-  return __builtin_ia32_vcomisbf16ge((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comige_sbh(__m128bh __A,
+   __m128bh __B) {
+  return __builtin_ia32_vcomisbf16ge((__v8bf)__A, (__v8bf)__B);
 }
 
-static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comineq_sbh(__m128bh A,
-__m128bh B) {
-  return __builtin_ia32_vcomisbf16neq((__v8bf)A, (__v8bf)B);
+static __inline__ int __DEFAULT_FN_ATTRS128 _mm_comineq_sbh(__m128bh __A,
+__m128bh __B) {
+  return __builtin_ia32_vcomisbf16neq((__v8bf)__A, (__v8bf)__B);
 }
 
 #define _mm256_cmp_pbh_mask(__A, __B, __P) 
\

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] CodeGen: Stop checking for physregs in constrainRegClass (PR #161795)

2025-10-03 Thread Matt Arsenault via llvm-branch-commits



@@ -83,8 +83,6 @@ constrainRegClass(MachineRegisterInfo &MRI, Register Reg,
 
 const TargetRegisterClass *MachineRegisterInfo::constrainRegClass(
 Register Reg, const TargetRegisterClass *RC, unsigned MinNumRegs) {
-  if (Reg.isPhysical())

arsenm wrote:

It will assert 

https://github.com/llvm/llvm-project/pull/161795
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Add wave reduce intrinsics for float types - 2 (PR #161815)

2025-10-03 Thread via llvm-branch-commits


github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff origin/main HEAD --extensions cpp -- 
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
``

:warning:
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing `origin/main` to the base branch/commit you want to compare against.
:warning:





View the diff from clang-format here.


``diff
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 042380383..8b8a10964 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -5396,7 +5396,7 @@ static bool is32bitWaveReduceOperation(unsigned Opc) {
 }
 
 static bool isFloatingPointWaveReduceOperation(unsigned Opc) {
-  return Opc == AMDGPU::V_MIN_F32_e64 || Opc == AMDGPU::V_MAX_F32_e64 || 
+  return Opc == AMDGPU::V_MIN_F32_e64 || Opc == AMDGPU::V_MAX_F32_e64 ||
  Opc == AMDGPU::V_ADD_F32_e64 || Opc == AMDGPU::V_SUB_F32_e64;
 }
 
@@ -5446,7 +5446,7 @@ static MachineBasicBlock *lowerWaveReduce(MachineInstr 
&MI,
 case AMDGPU::S_ADD_U64_PSEUDO:
 case AMDGPU::V_ADD_F32_e64:
 case AMDGPU::S_SUB_I32:
-case AMDGPU::S_SUB_U64_PSEUDO: 
+case AMDGPU::S_SUB_U64_PSEUDO:
 case AMDGPU::V_SUB_F32_e64: {
   const TargetRegisterClass *WaveMaskRegClass = TRI->getWaveMaskRegClass();
   const TargetRegisterClass *DstRegClass = MRI.getRegClass(DstReg);
@@ -5604,33 +5604,38 @@ static MachineBasicBlock *lowerWaveReduce(MachineInstr 
&MI,
   }
   case AMDGPU::V_ADD_F32_e64:
   case AMDGPU::V_SUB_F32_e64: {
-  /// for FPop: #activebits: int, src: float.
-  /// convert int to float, and then mul. there is only V_MUL_F32, so copy 
to vgpr.
-  /// 
/home/aalokdes/dockerx/work/llvm-trunk/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-fadd.s32.mir
-  /// ig: 1(01) -> negation, 2(10) -> abs, 3(11) -> abs and neg
-  // V_CVT_F32_I32_e64
-  // get #active lanes in vgpr
-  Register ActiveLanesVreg = 
MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
-  Register DstVreg = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
-  BuildMI(BB, MI, DL, TII->get(AMDGPU::V_CVT_F32_I32_e64), ActiveLanesVreg)
+/// for FPop: #activebits: int, src: float.
+/// convert int to float, and then mul. there is only V_MUL_F32, so 
copy
+/// to vgpr.
+/// 
/home/aalokdes/dockerx/work/llvm-trunk/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-fadd.s32.mir
+/// ig: 1(01) -> negation, 2(10) -> abs, 3(11) -> abs and neg
+// V_CVT_F32_I32_e64
+// get #active lanes in vgpr
+Register ActiveLanesVreg =
+MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
+Register DstVreg = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
+BuildMI(BB, MI, DL, TII->get(AMDGPU::V_CVT_F32_I32_e64),
+ActiveLanesVreg)
 // .addReg(SrcReg)
 .addReg(NewAccumulator->getOperand(0).getReg())
-.addImm(0) // clamp
+.addImm(0)  // clamp
 .addImm(0); // output-modifier
 
-  // Multiply numactivelanes * src
-  // Take negation of input for SUB reduction
-  unsigned srcMod = Opc == AMDGPU::V_SUB_F32_e64 ? 1 : 0; // check this to 
make sure i am taking negation
-  BuildMI(BB, MI, DL, TII->get(AMDGPU::V_MUL_F32_e64), DstVreg)
+// Multiply numactivelanes * src
+// Take negation of input for SUB reduction
+unsigned srcMod =
+Opc == AMDGPU::V_SUB_F32_e64
+? 1
+: 0; // check this to make sure i am taking negation
+BuildMI(BB, MI, DL, TII->get(AMDGPU::V_MUL_F32_e64), DstVreg)
 .addImm(srcMod) // src0 modifier
 .addReg(SrcReg)
 .addImm(0) // src1 modifier
 .addReg(ActiveLanesVreg)
-.addImm(0) // clamp
+.addImm(0)  // clamp
 .addImm(0); // output-mod
-  BuildMI(BB, MI, DL,
- TII->get(AMDGPU::V_READFIRSTLANE_B32), DstReg)
- .addReg(DstVreg);
+BuildMI(BB, MI, DL, TII->get(AMDGPU::V_READFIRSTLANE_B32), DstReg)
+.addReg(DstVreg);
   }
   }
   RetBB = &BB;

``




https://github.com/llvm/llvm-project/pull/161815
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [SimplifyCFG][profcheck] Profile propagation for `indirectbr` (PR #161747)

2025-10-03 Thread Mircea Trofin via llvm-branch-commits


https://github.com/mtrofin ready_for_review 
https://github.com/llvm/llvm-project/pull/161747
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [SimplifyCFG][profcheck] Profile propagation for `indirectbr` (PR #161747)

2025-10-03 Thread Mircea Trofin via llvm-branch-commits


https://github.com/mtrofin edited 
https://github.com/llvm/llvm-project/pull/161747
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [SimplifyCFG][profcheck] Profile propagation for `indirectbr` (PR #161747)

2025-10-03 Thread Mircea Trofin via llvm-branch-commits


https://github.com/mtrofin edited 
https://github.com/llvm/llvm-project/pull/161747
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [mlir] [OpenMP][OMPIRBuilder] Support parallel in Generic kernels (PR #150926)

2025-10-03 Thread Sergio Afonso via llvm-branch-commits


https://github.com/skatrak updated 
https://github.com/llvm/llvm-project/pull/150926

>From 9533653e89c7d9abf065a62c7c880cc012886be4 Mon Sep 17 00:00:00 2001
From: Sergio Afonso 
Date: Fri, 4 Jul 2025 16:32:03 +0100
Subject: [PATCH 1/2] [OpenMP][OMPIRBuilder] Support parallel in Generic
 kernels

This patch introduces codegen logic to produce a wrapper function argument for
the `__kmpc_parallel_51` DeviceRTL function needed to handle arguments passed
using device shared memory in Generic mode.
---
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 100 --
 .../LLVMIR/omptarget-parallel-llvm.mlir   |  25 -
 2 files changed, 116 insertions(+), 9 deletions(-)

diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index e0b7378c34f77..9e43784412d53 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -1426,6 +1426,86 @@ Error OpenMPIRBuilder::emitCancelationCheckImpl(
   return Error::success();
 }
 
+// Create wrapper function used to gather the outlined function's argument
+// structure from a shared buffer and to forward them to it when running in
+// Generic mode.
+//
+// The outlined function is expected to receive 2 integer arguments followed by
+// an optional pointer argument to an argument structure holding the rest.
+static Function *createTargetParallelWrapper(OpenMPIRBuilder *OMPIRBuilder,
+ Function &OutlinedFn) {
+  size_t NumArgs = OutlinedFn.arg_size();
+  assert((NumArgs == 2 || NumArgs == 3) &&
+ "expected a 2-3 argument parallel outlined function");
+  bool UseArgStruct = NumArgs == 3;
+
+  IRBuilder<> &Builder = OMPIRBuilder->Builder;
+  IRBuilder<>::InsertPointGuard IPG(Builder);
+  auto *FnTy = FunctionType::get(Builder.getVoidTy(),
+ {Builder.getInt16Ty(), Builder.getInt32Ty()},
+ /*isVarArg=*/false);
+  auto *WrapperFn =
+  Function::Create(FnTy, GlobalValue::InternalLinkage,
+   OutlinedFn.getName() + ".wrapper", OMPIRBuilder->M);
+
+  WrapperFn->addParamAttr(0, Attribute::NoUndef);
+  WrapperFn->addParamAttr(0, Attribute::ZExt);
+  WrapperFn->addParamAttr(1, Attribute::NoUndef);
+
+  BasicBlock *EntryBB =
+  BasicBlock::Create(OMPIRBuilder->M.getContext(), "entry", WrapperFn);
+  Builder.SetInsertPoint(EntryBB);
+
+  // Allocation.
+  Value *AddrAlloca = Builder.CreateAlloca(Builder.getInt32Ty(),
+   /*ArraySize=*/nullptr, "addr");
+  AddrAlloca = Builder.CreatePointerBitCastOrAddrSpaceCast(
+  AddrAlloca, Builder.getPtrTy(/*AddrSpace=*/0),
+  AddrAlloca->getName() + ".ascast");
+
+  Value *ZeroAlloca = Builder.CreateAlloca(Builder.getInt32Ty(),
+   /*ArraySize=*/nullptr, "zero");
+  ZeroAlloca = Builder.CreatePointerBitCastOrAddrSpaceCast(
+  ZeroAlloca, Builder.getPtrTy(/*AddrSpace=*/0),
+  ZeroAlloca->getName() + ".ascast");
+
+  Value *ArgsAlloca = nullptr;
+  if (UseArgStruct) {
+ArgsAlloca = Builder.CreateAlloca(Builder.getPtrTy(),
+  /*ArraySize=*/nullptr, "global_args");
+ArgsAlloca = Builder.CreatePointerBitCastOrAddrSpaceCast(
+ArgsAlloca, Builder.getPtrTy(/*AddrSpace=*/0),
+ArgsAlloca->getName() + ".ascast");
+  }
+
+  // Initialization.
+  Builder.CreateStore(WrapperFn->getArg(1), AddrAlloca);
+  Builder.CreateStore(Builder.getInt32(0), ZeroAlloca);
+  if (UseArgStruct) {
+Builder.CreateCall(
+OMPIRBuilder->getOrCreateRuntimeFunctionPtr(
+llvm::omp::RuntimeFunction::OMPRTL___kmpc_get_shared_variables),
+{ArgsAlloca});
+  }
+
+  SmallVector Args{AddrAlloca, ZeroAlloca};
+
+  // Load structArg from global_args.
+  if (UseArgStruct) {
+Value *StructArg = Builder.CreateLoad(Builder.getPtrTy(), ArgsAlloca);
+StructArg = Builder.CreateInBoundsGEP(Builder.getPtrTy(), StructArg,
+  {Builder.getInt64(0)});
+StructArg = Builder.CreateLoad(Builder.getPtrTy(), StructArg, "structArg");
+Args.push_back(StructArg);
+  }
+
+  // Call the outlined function holding the parallel body.
+  Builder.CreateCall(&OutlinedFn, Args);
+  Builder.CreateRetVoid();
+
+  return WrapperFn;
+}
+
 // Callback used to create OpenMP runtime calls to support
 // omp parallel clause for the device.
 // We need to use this callback to replace call to the OutlinedFn in OuterFn
@@ -1435,6 +1515,10 @@ static void targetParallelCallback(
 BasicBlock *OuterAllocaBB, Value *Ident, Value *IfCondition,
 Value *NumThreads, Instruction *PrivTID, AllocaInst *PrivTIDAddr,
 Value *ThreadID, const SmallVector &ToBeDeleted) {
+  assert(OutlinedFn.arg_size() >= 2 &&
+ "Expected at least tid and bounded tid as arguments");
+  unsigned NumCapturedVars = OutlinedFn.arg_size() - /* tid & bounded tid */ 2;
+
   //

[llvm-branch-commits] [llvm] [mlir] [MLIR][OpenMP][OMPIRBuilder] Improve shared memory checks (PR #161864)

2025-10-03 Thread via llvm-branch-commits


github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff origin/main HEAD --extensions cpp,h -- 
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h 
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp 
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
``

:warning:
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing `origin/main` to the base branch/commit you want to compare against.
:warning:





View the diff from clang-format here.


``diff
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 530477e6d..c164d32f8 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -312,7 +312,7 @@ getTargetKernelExecMode(Function &Kernel) {
   return static_cast(KernelMode->getZExtValue());
 }
 
-static bool isGenericKernel(Function &Fn){
+static bool isGenericKernel(Function &Fn) {
   std::optional ExecMode =
   getTargetKernelExecMode(Fn);
   return !ExecMode || (*ExecMode & OMP_TGT_EXEC_MODE_GENERIC);

``




https://github.com/llvm/llvm-project/pull/161864
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [CIR] Upstream `AddressSpace` conversions support (PR #161212)

2025-10-03 Thread Bruno Cardoso Lopes via llvm-branch-commits


bcardosolopes wrote:

> I made a change tn the function: `performAddrSpaceCast` and I opted for 
> getting rid of the `LangAS` parameters for both source and destination

Sounds legit, thanks! 

https://github.com/llvm/llvm-project/pull/161212
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [OpenMPOpt] Make parallel regions reachable from new DeviceRTL loop functions (PR #150927)

2025-10-03 Thread Sergio Afonso via llvm-branch-commits


skatrak wrote:

Moving to draft because I've noticed this doesn't currently work whenever there 
are different calls to new DeviceRTL loop functions. The state machine rewrite 
optimization of OpenMPOpt causes the following code to not run properly, 
whereas the same code without the `unused_problematic` subroutine in it (or 
compiled with `-mllvm -openmp-opt-disable-state-machine-rewrite`) works:
```f90
! flang -fopenmp -fopenmp-version=52 --offload-arch=gfx1030 test.f90 && 
OMP_TARGET_OFFLOAD=MANDATORY ./a.out
subroutine test_subroutine(counter)
  implicit none
  integer, intent(out) :: counter
  integer :: i1, i2, n1, n2

  n1 = 100
  n2 = 50

  counter = 0
  !$omp target teams distribute reduction(+:counter)
  do i1=1, n1
!$omp parallel do reduction(+:counter)
do i2=1, n2
  counter = counter + 1
end do
  end do
end subroutine

program main
  implicit none
  integer :: counter
  call test_subroutine(counter)

  ! Should print: 5000
  print '(I0)', counter
end program

subroutine foo(i)
  integer, intent(inout) :: i
end subroutine

! The presence of this unreachable function in the compilation unit causes
! the result of `test_subroutine` to be incorrect. Removing the `distribute`
! OpenMP directive avoids the problem.
subroutine unused_problematic()
  implicit none
  integer :: i

  !$omp target teams
  !$omp distribute
  do i=1, 100
call foo(i)
  end do
  !$omp end target teams
end subroutine
```

https://github.com/llvm/llvm-project/pull/150927
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [mlir] [MLIR][OpenMP][OMPIRBuilder] Improve shared memory checks (PR #161864)

2025-10-03 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-mlir-openmp

Author: Sergio Afonso (skatrak)


Changes

This patch refines checks to decide whether to use device shared memory or 
regular stack allocations. In particular, it adds support for parallel regions 
residing on standalone target device functions.

The changes are:
- Shared memory is introduced for `omp.target` implicit allocations, such as 
those related to privatization and mapping, as long as they are shared across 
threads in a nested parallel region.
- Standalone target device functions are interpreted as being part of a Generic 
kernel, since the fact that they are present in the module after filtering 
means they must be reachable from a target region.
- Prevent allocations whose only shared uses inside of an `omp.parallel` region 
are as part of a `private` clause from being moved to device shared memory.

---

Patch is 26.11 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/161864.diff


7 Files Affected:

- (modified) llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h (+2-2) 
- (modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+15-13) 
- (modified) llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp (+38-17) 
- (modified) 
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+84-38) 
- (modified) mlir/test/Target/LLVMIR/omptarget-parallel-llvm.mlir (+4-4) 
- (modified) mlir/test/Target/LLVMIR/omptarget-parallel-wsloop.mlir (+6-1) 
- (added) offload/test/offloading/fortran/target-generic-outlined-loops.f90 
(+109) 


``diff
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h 
b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index d8e5f8cf5a45e..410912ba375a3 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -3292,8 +3292,8 @@ class OpenMPIRBuilder {
   ArrayRef DeallocIPs)>;
 
   using TargetGenArgAccessorsCallbackTy = function_ref;
+  Argument &Arg, Value *Input, Value *&RetVal, InsertPointTy AllocIP,
+  InsertPointTy CodeGenIP, ArrayRef DeallocIPs)>;
 
   /// Generator for '#omp target'
   ///
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index a18db939b5876..530477e6d2f6d 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -312,6 +312,12 @@ getTargetKernelExecMode(Function &Kernel) {
   return static_cast(KernelMode->getZExtValue());
 }
 
+static bool isGenericKernel(Function &Fn){
+  std::optional ExecMode =
+  getTargetKernelExecMode(Fn);
+  return !ExecMode || (*ExecMode & OMP_TGT_EXEC_MODE_GENERIC);
+}
+
 /// Make \p Source branch to \p Target.
 ///
 /// Handles two situations:
@@ -1535,11 +1541,9 @@ static void targetParallelCallback(
   IfCondition ? Builder.CreateSExtOrTrunc(IfCondition, OMPIRBuilder->Int32)
   : Builder.getInt32(1);
 
-  // If this is not a Generic kernel, we can skip generating the wrapper.
-  std::optional ExecMode =
-  getTargetKernelExecMode(*OuterFn);
+  // If this is a Generic kernel, we can generate the wrapper.
   Value *WrapperFn;
-  if (ExecMode && (*ExecMode & OMP_TGT_EXEC_MODE_GENERIC))
+  if (isGenericKernel(*OuterFn))
 WrapperFn = createTargetParallelWrapper(OMPIRBuilder, OutlinedFn);
   else
 WrapperFn = Constant::getNullValue(PtrTy);
@@ -1812,13 +1816,10 @@ OpenMPIRBuilder::InsertPointOrErrorTy 
OpenMPIRBuilder::createParallel(
 
   auto OI = [&]() -> std::unique_ptr {
 if (Config.isTargetDevice()) {
-  std::optional ExecMode =
-  getTargetKernelExecMode(*OuterFn);
-
-  // If OuterFn is not a Generic kernel, skip custom allocation. This 
causes
-  // the CodeExtractor to follow its default behavior. Otherwise, we need 
to
-  // use device shared memory to allocate argument structures.
-  if (ExecMode && *ExecMode & OMP_TGT_EXEC_MODE_GENERIC)
+  // If OuterFn is a Generic kernel, we need to use device shared memory to
+  // allocate argument structures. Otherwise, we use stack allocations as
+  // usual.
+  if (isGenericKernel(*OuterFn))
 return std::make_unique(*this);
 }
 return std::make_unique();
@@ -7806,8 +7807,9 @@ static Expected createOutlinedFunction(
 Argument &Arg = std::get<1>(InArg);
 Value *InputCopy = nullptr;
 
-llvm::OpenMPIRBuilder::InsertPointOrErrorTy AfterIP =
-ArgAccessorFuncCB(Arg, Input, InputCopy, AllocaIP, Builder.saveIP());
+llvm::OpenMPIRBuilder::InsertPointOrErrorTy AfterIP = ArgAccessorFuncCB(
+Arg, Input, InputCopy, AllocaIP, Builder.saveIP(),
+OpenMPIRBuilder::InsertPointTy(ExitBB, ExitBB->begin()));
 if (!AfterIP)
   return AfterIP.takeError();
 Builder.restoreIP(*AfterIP);
diff --git a/llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp 
b/llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
index 1e5b8145d5cdc..d231a778a8a97 100644
--- a/llvm/unittests/Frontend/OpenM

[llvm-branch-commits] [llvm] [mlir] [MLIR][OpenMP][OMPIRBuilder] Improve shared memory checks (PR #161864)

2025-10-03 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-mlir-llvm

Author: Sergio Afonso (skatrak)


Changes

This patch refines checks to decide whether to use device shared memory or 
regular stack allocations. In particular, it adds support for parallel regions 
residing on standalone target device functions.

The changes are:
- Shared memory is introduced for `omp.target` implicit allocations, such as 
those related to privatization and mapping, as long as they are shared across 
threads in a nested parallel region.
- Standalone target device functions are interpreted as being part of a Generic 
kernel, since the fact that they are present in the module after filtering 
means they must be reachable from a target region.
- Prevent allocations whose only shared uses inside of an `omp.parallel` region 
are as part of a `private` clause from being moved to device shared memory.

---

Patch is 26.11 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/161864.diff


7 Files Affected:

- (modified) llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h (+2-2) 
- (modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+15-13) 
- (modified) llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp (+38-17) 
- (modified) 
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+84-38) 
- (modified) mlir/test/Target/LLVMIR/omptarget-parallel-llvm.mlir (+4-4) 
- (modified) mlir/test/Target/LLVMIR/omptarget-parallel-wsloop.mlir (+6-1) 
- (added) offload/test/offloading/fortran/target-generic-outlined-loops.f90 
(+109) 


``diff
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h 
b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index d8e5f8cf5a45e..410912ba375a3 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -3292,8 +3292,8 @@ class OpenMPIRBuilder {
   ArrayRef DeallocIPs)>;
 
   using TargetGenArgAccessorsCallbackTy = function_ref;
+  Argument &Arg, Value *Input, Value *&RetVal, InsertPointTy AllocIP,
+  InsertPointTy CodeGenIP, ArrayRef DeallocIPs)>;
 
   /// Generator for '#omp target'
   ///
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index a18db939b5876..530477e6d2f6d 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -312,6 +312,12 @@ getTargetKernelExecMode(Function &Kernel) {
   return static_cast(KernelMode->getZExtValue());
 }
 
+static bool isGenericKernel(Function &Fn){
+  std::optional ExecMode =
+  getTargetKernelExecMode(Fn);
+  return !ExecMode || (*ExecMode & OMP_TGT_EXEC_MODE_GENERIC);
+}
+
 /// Make \p Source branch to \p Target.
 ///
 /// Handles two situations:
@@ -1535,11 +1541,9 @@ static void targetParallelCallback(
   IfCondition ? Builder.CreateSExtOrTrunc(IfCondition, OMPIRBuilder->Int32)
   : Builder.getInt32(1);
 
-  // If this is not a Generic kernel, we can skip generating the wrapper.
-  std::optional ExecMode =
-  getTargetKernelExecMode(*OuterFn);
+  // If this is a Generic kernel, we can generate the wrapper.
   Value *WrapperFn;
-  if (ExecMode && (*ExecMode & OMP_TGT_EXEC_MODE_GENERIC))
+  if (isGenericKernel(*OuterFn))
 WrapperFn = createTargetParallelWrapper(OMPIRBuilder, OutlinedFn);
   else
 WrapperFn = Constant::getNullValue(PtrTy);
@@ -1812,13 +1816,10 @@ OpenMPIRBuilder::InsertPointOrErrorTy 
OpenMPIRBuilder::createParallel(
 
   auto OI = [&]() -> std::unique_ptr {
 if (Config.isTargetDevice()) {
-  std::optional ExecMode =
-  getTargetKernelExecMode(*OuterFn);
-
-  // If OuterFn is not a Generic kernel, skip custom allocation. This 
causes
-  // the CodeExtractor to follow its default behavior. Otherwise, we need 
to
-  // use device shared memory to allocate argument structures.
-  if (ExecMode && *ExecMode & OMP_TGT_EXEC_MODE_GENERIC)
+  // If OuterFn is a Generic kernel, we need to use device shared memory to
+  // allocate argument structures. Otherwise, we use stack allocations as
+  // usual.
+  if (isGenericKernel(*OuterFn))
 return std::make_unique(*this);
 }
 return std::make_unique();
@@ -7806,8 +7807,9 @@ static Expected createOutlinedFunction(
 Argument &Arg = std::get<1>(InArg);
 Value *InputCopy = nullptr;
 
-llvm::OpenMPIRBuilder::InsertPointOrErrorTy AfterIP =
-ArgAccessorFuncCB(Arg, Input, InputCopy, AllocaIP, Builder.saveIP());
+llvm::OpenMPIRBuilder::InsertPointOrErrorTy AfterIP = ArgAccessorFuncCB(
+Arg, Input, InputCopy, AllocaIP, Builder.saveIP(),
+OpenMPIRBuilder::InsertPointTy(ExitBB, ExitBB->begin()));
 if (!AfterIP)
   return AfterIP.takeError();
 Builder.restoreIP(*AfterIP);
diff --git a/llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp 
b/llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
index 1e5b8145d5cdc..d231a778a8a97 100644
--- a/llvm/unittests/Frontend/OpenMPI

[llvm-branch-commits] [llvm] [mlir] [MLIR][OpenMP][OMPIRBuilder] Improve shared memory checks (PR #161864)

2025-10-03 Thread Sergio Afonso via llvm-branch-commits


https://github.com/skatrak created 
https://github.com/llvm/llvm-project/pull/161864

This patch refines checks to decide whether to use device shared memory or 
regular stack allocations. In particular, it adds support for parallel regions 
residing on standalone target device functions.

The changes are:
- Shared memory is introduced for `omp.target` implicit allocations, such as 
those related to privatization and mapping, as long as they are shared across 
threads in a nested parallel region.
- Standalone target device functions are interpreted as being part of a Generic 
kernel, since the fact that they are present in the module after filtering 
means they must be reachable from a target region.
- Prevent allocations whose only shared uses inside of an `omp.parallel` region 
are as part of a `private` clause from being moved to device shared memory.

>From 2fb029503af54575f46ea2c8304772a5b8097638 Mon Sep 17 00:00:00 2001
From: Sergio Afonso 
Date: Tue, 16 Sep 2025 14:18:39 +0100
Subject: [PATCH] [MLIR][OpenMP][OMPIRBuilder] Improve shared memory checks

This patch refines checks to decide whether to use device shared memory or
regular stack allocations. In particular, it adds support for parallel regions
residing on standalone target device functions.

The changes are:
- Shared memory is introduced for `omp.target` implicit allocations, such as
those related to privatization and mapping, as long as they are shared across
threads in a nested parallel region.
- Standalone target device functions are interpreted as being part of a Generic
kernel, since the fact that they are present in the module after filtering
means they must be reachable from a target region.
- Prevent allocations whose only shared uses inside of an `omp.parallel` region
are as part of a `private` clause from being moved to device shared memory.
---
 .../llvm/Frontend/OpenMP/OMPIRBuilder.h   |   4 +-
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp |  28 ++--
 .../Frontend/OpenMPIRBuilderTest.cpp  |  55 +---
 .../OpenMP/OpenMPToLLVMIRTranslation.cpp  | 122 --
 .../LLVMIR/omptarget-parallel-llvm.mlir   |   8 +-
 .../LLVMIR/omptarget-parallel-wsloop.mlir |   7 +-
 .../fortran/target-generic-outlined-loops.f90 | 109 
 7 files changed, 258 insertions(+), 75 deletions(-)
 create mode 100644 
offload/test/offloading/fortran/target-generic-outlined-loops.f90

diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h 
b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index d8e5f8cf5a45e..410912ba375a3 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -3292,8 +3292,8 @@ class OpenMPIRBuilder {
   ArrayRef DeallocIPs)>;
 
   using TargetGenArgAccessorsCallbackTy = function_ref;
+  Argument &Arg, Value *Input, Value *&RetVal, InsertPointTy AllocIP,
+  InsertPointTy CodeGenIP, ArrayRef DeallocIPs)>;
 
   /// Generator for '#omp target'
   ///
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index a18db939b5876..530477e6d2f6d 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -312,6 +312,12 @@ getTargetKernelExecMode(Function &Kernel) {
   return static_cast(KernelMode->getZExtValue());
 }
 
+static bool isGenericKernel(Function &Fn){
+  std::optional ExecMode =
+  getTargetKernelExecMode(Fn);
+  return !ExecMode || (*ExecMode & OMP_TGT_EXEC_MODE_GENERIC);
+}
+
 /// Make \p Source branch to \p Target.
 ///
 /// Handles two situations:
@@ -1535,11 +1541,9 @@ static void targetParallelCallback(
   IfCondition ? Builder.CreateSExtOrTrunc(IfCondition, OMPIRBuilder->Int32)
   : Builder.getInt32(1);
 
-  // If this is not a Generic kernel, we can skip generating the wrapper.
-  std::optional ExecMode =
-  getTargetKernelExecMode(*OuterFn);
+  // If this is a Generic kernel, we can generate the wrapper.
   Value *WrapperFn;
-  if (ExecMode && (*ExecMode & OMP_TGT_EXEC_MODE_GENERIC))
+  if (isGenericKernel(*OuterFn))
 WrapperFn = createTargetParallelWrapper(OMPIRBuilder, OutlinedFn);
   else
 WrapperFn = Constant::getNullValue(PtrTy);
@@ -1812,13 +1816,10 @@ OpenMPIRBuilder::InsertPointOrErrorTy 
OpenMPIRBuilder::createParallel(
 
   auto OI = [&]() -> std::unique_ptr {
 if (Config.isTargetDevice()) {
-  std::optional ExecMode =
-  getTargetKernelExecMode(*OuterFn);
-
-  // If OuterFn is not a Generic kernel, skip custom allocation. This 
causes
-  // the CodeExtractor to follow its default behavior. Otherwise, we need 
to
-  // use device shared memory to allocate argument structures.
-  if (ExecMode && *ExecMode & OMP_TGT_EXEC_MODE_GENERIC)
+  // If OuterFn is a Generic kernel, we need to use device shared memory to
+  // allocate argument structures. Otherwise, we use stack allocations as
+  // usual.
+  if (isGenericKerne

[llvm-branch-commits] [mlir] [MLIR][OpenMP] Refactor omp.target_allocmem to allow reuse, NFC (PR #161861)

2025-10-03 Thread Sergio Afonso via llvm-branch-commits


skatrak wrote:

PR stack:
- #150922
- #150923
- #150924
- #150925
- #150926
- #150927
- #154752
- #161861 ◀️
- #161862
- #161863
- #161864

https://github.com/llvm/llvm-project/pull/161861
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [mlir] [MLIR][OpenMP][OMPIRBuilder] Improve shared memory checks (PR #161864)

2025-10-03 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-offload

Author: Sergio Afonso (skatrak)


Changes

This patch refines checks to decide whether to use device shared memory or 
regular stack allocations. In particular, it adds support for parallel regions 
residing on standalone target device functions.

The changes are:
- Shared memory is introduced for `omp.target` implicit allocations, such as 
those related to privatization and mapping, as long as they are shared across 
threads in a nested parallel region.
- Standalone target device functions are interpreted as being part of a Generic 
kernel, since the fact that they are present in the module after filtering 
means they must be reachable from a target region.
- Prevent allocations whose only shared uses inside of an `omp.parallel` region 
are as part of a `private` clause from being moved to device shared memory.

---

Patch is 26.11 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/161864.diff


7 Files Affected:

- (modified) llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h (+2-2) 
- (modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+15-13) 
- (modified) llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp (+38-17) 
- (modified) 
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+84-38) 
- (modified) mlir/test/Target/LLVMIR/omptarget-parallel-llvm.mlir (+4-4) 
- (modified) mlir/test/Target/LLVMIR/omptarget-parallel-wsloop.mlir (+6-1) 
- (added) offload/test/offloading/fortran/target-generic-outlined-loops.f90 
(+109) 


``diff
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h 
b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index d8e5f8cf5a45e..410912ba375a3 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -3292,8 +3292,8 @@ class OpenMPIRBuilder {
   ArrayRef DeallocIPs)>;
 
   using TargetGenArgAccessorsCallbackTy = function_ref;
+  Argument &Arg, Value *Input, Value *&RetVal, InsertPointTy AllocIP,
+  InsertPointTy CodeGenIP, ArrayRef DeallocIPs)>;
 
   /// Generator for '#omp target'
   ///
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index a18db939b5876..530477e6d2f6d 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -312,6 +312,12 @@ getTargetKernelExecMode(Function &Kernel) {
   return static_cast(KernelMode->getZExtValue());
 }
 
+static bool isGenericKernel(Function &Fn){
+  std::optional ExecMode =
+  getTargetKernelExecMode(Fn);
+  return !ExecMode || (*ExecMode & OMP_TGT_EXEC_MODE_GENERIC);
+}
+
 /// Make \p Source branch to \p Target.
 ///
 /// Handles two situations:
@@ -1535,11 +1541,9 @@ static void targetParallelCallback(
   IfCondition ? Builder.CreateSExtOrTrunc(IfCondition, OMPIRBuilder->Int32)
   : Builder.getInt32(1);
 
-  // If this is not a Generic kernel, we can skip generating the wrapper.
-  std::optional ExecMode =
-  getTargetKernelExecMode(*OuterFn);
+  // If this is a Generic kernel, we can generate the wrapper.
   Value *WrapperFn;
-  if (ExecMode && (*ExecMode & OMP_TGT_EXEC_MODE_GENERIC))
+  if (isGenericKernel(*OuterFn))
 WrapperFn = createTargetParallelWrapper(OMPIRBuilder, OutlinedFn);
   else
 WrapperFn = Constant::getNullValue(PtrTy);
@@ -1812,13 +1816,10 @@ OpenMPIRBuilder::InsertPointOrErrorTy 
OpenMPIRBuilder::createParallel(
 
   auto OI = [&]() -> std::unique_ptr {
 if (Config.isTargetDevice()) {
-  std::optional ExecMode =
-  getTargetKernelExecMode(*OuterFn);
-
-  // If OuterFn is not a Generic kernel, skip custom allocation. This 
causes
-  // the CodeExtractor to follow its default behavior. Otherwise, we need 
to
-  // use device shared memory to allocate argument structures.
-  if (ExecMode && *ExecMode & OMP_TGT_EXEC_MODE_GENERIC)
+  // If OuterFn is a Generic kernel, we need to use device shared memory to
+  // allocate argument structures. Otherwise, we use stack allocations as
+  // usual.
+  if (isGenericKernel(*OuterFn))
 return std::make_unique(*this);
 }
 return std::make_unique();
@@ -7806,8 +7807,9 @@ static Expected createOutlinedFunction(
 Argument &Arg = std::get<1>(InArg);
 Value *InputCopy = nullptr;
 
-llvm::OpenMPIRBuilder::InsertPointOrErrorTy AfterIP =
-ArgAccessorFuncCB(Arg, Input, InputCopy, AllocaIP, Builder.saveIP());
+llvm::OpenMPIRBuilder::InsertPointOrErrorTy AfterIP = ArgAccessorFuncCB(
+Arg, Input, InputCopy, AllocaIP, Builder.saveIP(),
+OpenMPIRBuilder::InsertPointTy(ExitBB, ExitBB->begin()));
 if (!AfterIP)
   return AfterIP.takeError();
 Builder.restoreIP(*AfterIP);
diff --git a/llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp 
b/llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
index 1e5b8145d5cdc..d231a778a8a97 100644
--- a/llvm/unittests/Frontend/OpenMPIRB

[llvm-branch-commits] [clang] [llvm] [openmp] [OpenMP] Taskgraph Clang 'record and replay' frontend support (PR #159774)

2025-10-03 Thread Julian Brown via llvm-branch-commits


jtb20 wrote:

This rebased version mentions partial taskgraph support in 
`clang/docs/ReleaseNotes.rst`. There's already a table entry regarding 
record-and-replay taskgraph support's in-progress status in 
`clang/docs/OpenMPSupport.rst`.

https://github.com/llvm/llvm-project/pull/159774
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [SimplifyCFG][profcheck] Handle branch weights in `simplifySwitchLookup` (PR #161739)

2025-10-03 Thread Mircea Trofin via llvm-branch-commits


https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/161739

>From 4979ae9e8486f51e124fe94471fec97ff93698c8 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Wed, 1 Oct 2025 17:08:48 -0700
Subject: [PATCH] [SimplifyCFG][profcheck] Handle branch weights in
 `simplifySwitchLookup`

---
 llvm/lib/Transforms/Utils/SimplifyCFG.cpp | 25 +++
 .../SimplifyCFG/X86/switch_to_lookup_table.ll | 13 +++---
 .../Transforms/SimplifyCFG/rangereduce.ll | 24 +++---
 3 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp 
b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 63f4b2e030b69..5aff662bc3586 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -7227,6 +7227,7 @@ static bool simplifySwitchLookup(SwitchInst *SI, 
IRBuilder<> &Builder,
   Mod.getContext(), "switch.lookup", CommonDest->getParent(), CommonDest);
 
   BranchInst *RangeCheckBranch = nullptr;
+  BranchInst *CondBranch = nullptr;
 
   Builder.SetInsertPoint(SI);
   const bool GeneratingCoveredLookupTable = (MaxTableSize == TableSize);
@@ -7241,6 +7242,7 @@ static bool simplifySwitchLookup(SwitchInst *SI, 
IRBuilder<> &Builder,
 TableIndex, ConstantInt::get(MinCaseVal->getType(), TableSize));
 RangeCheckBranch =
 Builder.CreateCondBr(Cmp, LookupBB, SI->getDefaultDest());
+CondBranch = RangeCheckBranch;
 if (DTU)
   Updates.push_back({DominatorTree::Insert, BB, LookupBB});
   }
@@ -7279,7 +7281,7 @@ static bool simplifySwitchLookup(SwitchInst *SI, 
IRBuilder<> &Builder,
 Value *Shifted = Builder.CreateLShr(TableMask, MaskIndex, 
"switch.shifted");
 Value *LoBit = Builder.CreateTrunc(
 Shifted, Type::getInt1Ty(Mod.getContext()), "switch.lobit");
-Builder.CreateCondBr(LoBit, LookupBB, SI->getDefaultDest());
+CondBranch = Builder.CreateCondBr(LoBit, LookupBB, SI->getDefaultDest());
 if (DTU) {
   Updates.push_back({DominatorTree::Insert, MaskBB, LookupBB});
   Updates.push_back({DominatorTree::Insert, MaskBB, SI->getDefaultDest()});
@@ -7319,19 +7321,32 @@ static bool simplifySwitchLookup(SwitchInst *SI, 
IRBuilder<> &Builder,
   if (DTU)
 Updates.push_back({DominatorTree::Insert, LookupBB, CommonDest});
 
+  SmallVector BranchWeights;
+  const bool HasBranchWeights = CondBranch && !ProfcheckDisableMetadataFixes &&
+extractBranchWeights(*SI, BranchWeights);
+  uint64_t ToLookupWeight = 0;
+  uint64_t ToDefaultWeight = 0;
+
   // Remove the switch.
   SmallPtrSet RemovedSuccessors;
-  for (unsigned i = 0, e = SI->getNumSuccessors(); i < e; ++i) {
-BasicBlock *Succ = SI->getSuccessor(i);
+  for (unsigned I = 0, E = SI->getNumSuccessors(); I < E; ++I) {
+BasicBlock *Succ = SI->getSuccessor(I);
 
-if (Succ == SI->getDefaultDest())
+if (Succ == SI->getDefaultDest()) {
+  if (HasBranchWeights)
+ToDefaultWeight += BranchWeights[I];
   continue;
+}
 Succ->removePredecessor(BB);
 if (DTU && RemovedSuccessors.insert(Succ).second)
   Updates.push_back({DominatorTree::Delete, BB, Succ});
+if (HasBranchWeights)
+  ToLookupWeight += BranchWeights[I];
   }
   SI->eraseFromParent();
-
+  if (HasBranchWeights)
+setFittedBranchWeights(*CondBranch, {ToLookupWeight, ToDefaultWeight},
+   /*IsExpected=*/false);
   if (DTU)
 DTU->applyUpdates(Updates);
 
diff --git a/llvm/test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll 
b/llvm/test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll
index f9e79cabac51d..bee6b375ea11a 100644
--- a/llvm/test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll
+++ b/llvm/test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll
@@ -1565,14 +1565,14 @@ end:
 ; lookup (since i3 can only hold values in the range of explicit
 ; values) and simultaneously trying to generate a branch to deal with
 ; the fact that we have holes in the range.
-define i32 @covered_switch_with_bit_tests(i3) {
+define i32 @covered_switch_with_bit_tests(i3) !prof !0 {
 ; CHECK-LABEL: @covered_switch_with_bit_tests(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:[[SWITCH_TABLEIDX:%.*]] = sub i3 [[TMP0:%.*]], -4
 ; CHECK-NEXT:[[SWITCH_MASKINDEX:%.*]] = zext i3 [[SWITCH_TABLEIDX]] to i8
 ; CHECK-NEXT:[[SWITCH_SHIFTED:%.*]] = lshr i8 -61, [[SWITCH_MASKINDEX]]
 ; CHECK-NEXT:[[SWITCH_LOBIT:%.*]] = trunc i8 [[SWITCH_SHIFTED]] to i1
-; CHECK-NEXT:br i1 [[SWITCH_LOBIT]], label [[SWITCH_LOOKUP:%.*]], label 
[[L6:%.*]]
+; CHECK-NEXT:br i1 [[SWITCH_LOBIT]], label [[SWITCH_LOOKUP:%.*]], label 
[[L6:%.*]], !prof [[PROF1:![0-9]+]]
 ; CHECK:   switch.lookup:
 ; CHECK-NEXT:[[TMP1:%.*]] = zext i3 [[SWITCH_TABLEIDX]] to i64
 ; CHECK-NEXT:[[SWITCH_GEP:%.*]] = getelementptr inbounds [8 x i32], ptr 
@switch.table.covered_switch_with_bit_tests, i64 0, i64 [[TMP1]]
@@ -1588,7 +1588,7 @@ entry:
   i3 -4, label

[llvm-branch-commits] [clang] [llvm] [clang][SPARC] Pass 16-aligned structs with the correct alignment in CC (#155829) (PR #161766)

2025-10-03 Thread Eli Friedman via llvm-branch-commits


https://github.com/efriedma-quic approved this pull request.

LGTM.  This is an ABI fix which is important for the active SPARC developers. 
and it very obviously only affects SPARC targets.

https://github.com/llvm/llvm-project/pull/161766
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/21.x: [SPARC] Prevent meta instructions from being inserted into delay slots (#161111) (PR #161937)

2025-10-03 Thread via llvm-branch-commits


https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/161937

Backport 2e1fab93467ec8c37a236ae6e059300ebaa0c986

Requested by: @brad0

>From 4cb5d998e732df6a28462288ffae97d3bd16 Mon Sep 17 00:00:00 2001
From: Koakuma 
Date: Fri, 3 Oct 2025 19:25:08 +0700
Subject: [PATCH] [SPARC] Prevent meta instructions from being inserted into
 delay slots (#16)

Do not move meta instructions like `FAKE_USE`/`@llvm.fake.use` into
delay slots, as they don't correspond to real machine instructions.

This should fix crashes when compiling with, for example, `clang -Og`.

(cherry picked from commit 2e1fab93467ec8c37a236ae6e059300ebaa0c986)
---
 llvm/lib/Target/Sparc/DelaySlotFiller.cpp |  4 +--
 .../CodeGen/SPARC/2011-01-19-DelaySlot.ll | 25 +++
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Target/Sparc/DelaySlotFiller.cpp 
b/llvm/lib/Target/Sparc/DelaySlotFiller.cpp
index 6c19049a001cf..024030d196ee3 100644
--- a/llvm/lib/Target/Sparc/DelaySlotFiller.cpp
+++ b/llvm/lib/Target/Sparc/DelaySlotFiller.cpp
@@ -206,8 +206,8 @@ Filler::findDelayInstr(MachineBasicBlock &MBB,
 if (!done)
   --I;
 
-// skip debug instruction
-if (I->isDebugInstr())
+// Skip meta instructions.
+if (I->isMetaInstruction())
   continue;
 
 if (I->hasUnmodeledSideEffects() || I->isInlineAsm() || I->isPosition() ||
diff --git a/llvm/test/CodeGen/SPARC/2011-01-19-DelaySlot.ll 
b/llvm/test/CodeGen/SPARC/2011-01-19-DelaySlot.ll
index 9ccd4f1c0ac9a..767ef7eb510e6 100644
--- a/llvm/test/CodeGen/SPARC/2011-01-19-DelaySlot.ll
+++ b/llvm/test/CodeGen/SPARC/2011-01-19-DelaySlot.ll
@@ -184,4 +184,29 @@ entry:
   ret i32 %2
 }
 
+define i32 @test_generic_inst(i32 %arg) #0 {
+;CHECK-LABEL: test_generic_inst:
+;CHECK: ! fake_use: {{.*}}
+;CHECK: bne {{.*}}
+;CHECK-NEXT: nop
+  %bar1 = call i32 @bar(i32 %arg)
+  %even = and i32 %bar1, 1
+  %cmp = icmp eq i32 %even, 0
+  ; This shouldn't get reordered into a delay slot
+  call void (...) @llvm.fake.use(i32 %arg)
+  br i1 %cmp, label %true, label %false
+true:
+  %bar2 = call i32 @bar(i32 %bar1)
+  br label %cont
+
+false:
+  %inc = add nsw i32 %bar1, 1
+  br label %cont
+
+cont:
+  %ret = phi i32 [ %bar2, %true ], [ %inc, %false ]
+  ret i32 %ret
+}
+
+declare void @llvm.fake.use(...)
 attributes #0 = { nounwind "disable-tail-calls"="true" }

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [profcheck] Update exclusion list to reflect fixes (PR #161943)

2025-10-03 Thread Mircea Trofin via llvm-branch-commits


https://github.com/mtrofin ready_for_review 
https://github.com/llvm/llvm-project/pull/161943
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [SimplifyCFG][profcheck] Handle branch weights in `simplifySwitchLookup` (PR #161739)

2025-10-03 Thread Mircea Trofin via llvm-branch-commits


https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/161739

>From d1ddd8929f07ddbbcaea73ee99d788a6cd623110 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Wed, 1 Oct 2025 17:08:48 -0700
Subject: [PATCH] [SimplifyCFG][profcheck] Handle branch weights in
 `simplifySwitchLookup`

---
 llvm/lib/Transforms/Utils/SimplifyCFG.cpp | 25 +++
 .../SimplifyCFG/X86/switch_to_lookup_table.ll | 13 +++---
 .../Transforms/SimplifyCFG/rangereduce.ll | 24 +++---
 3 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp 
b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 084d9d87c4778..48055ad6ea7e4 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -7227,6 +7227,7 @@ static bool simplifySwitchLookup(SwitchInst *SI, 
IRBuilder<> &Builder,
   Mod.getContext(), "switch.lookup", CommonDest->getParent(), CommonDest);
 
   BranchInst *RangeCheckBranch = nullptr;
+  BranchInst *CondBranch = nullptr;
 
   Builder.SetInsertPoint(SI);
   const bool GeneratingCoveredLookupTable = (MaxTableSize == TableSize);
@@ -7241,6 +7242,7 @@ static bool simplifySwitchLookup(SwitchInst *SI, 
IRBuilder<> &Builder,
 TableIndex, ConstantInt::get(MinCaseVal->getType(), TableSize));
 RangeCheckBranch =
 Builder.CreateCondBr(Cmp, LookupBB, SI->getDefaultDest());
+CondBranch = RangeCheckBranch;
 if (DTU)
   Updates.push_back({DominatorTree::Insert, BB, LookupBB});
   }
@@ -7279,7 +7281,7 @@ static bool simplifySwitchLookup(SwitchInst *SI, 
IRBuilder<> &Builder,
 Value *Shifted = Builder.CreateLShr(TableMask, MaskIndex, 
"switch.shifted");
 Value *LoBit = Builder.CreateTrunc(
 Shifted, Type::getInt1Ty(Mod.getContext()), "switch.lobit");
-Builder.CreateCondBr(LoBit, LookupBB, SI->getDefaultDest());
+CondBranch = Builder.CreateCondBr(LoBit, LookupBB, SI->getDefaultDest());
 if (DTU) {
   Updates.push_back({DominatorTree::Insert, MaskBB, LookupBB});
   Updates.push_back({DominatorTree::Insert, MaskBB, SI->getDefaultDest()});
@@ -7319,19 +7321,32 @@ static bool simplifySwitchLookup(SwitchInst *SI, 
IRBuilder<> &Builder,
   if (DTU)
 Updates.push_back({DominatorTree::Insert, LookupBB, CommonDest});
 
+  SmallVector BranchWeights;
+  const bool HasBranchWeights = CondBranch && !ProfcheckDisableMetadataFixes &&
+extractBranchWeights(*SI, BranchWeights);
+  uint64_t ToLookupWeight = 0;
+  uint64_t ToDefaultWeight = 0;
+
   // Remove the switch.
   SmallPtrSet RemovedSuccessors;
-  for (unsigned i = 0, e = SI->getNumSuccessors(); i < e; ++i) {
-BasicBlock *Succ = SI->getSuccessor(i);
+  for (unsigned I = 0, E = SI->getNumSuccessors(); I < E; ++I) {
+BasicBlock *Succ = SI->getSuccessor(I);
 
-if (Succ == SI->getDefaultDest())
+if (Succ == SI->getDefaultDest()) {
+  if (HasBranchWeights)
+ToDefaultWeight += BranchWeights[I];
   continue;
+}
 Succ->removePredecessor(BB);
 if (DTU && RemovedSuccessors.insert(Succ).second)
   Updates.push_back({DominatorTree::Delete, BB, Succ});
+if (HasBranchWeights)
+  ToLookupWeight += BranchWeights[I];
   }
   SI->eraseFromParent();
-
+  if (HasBranchWeights)
+setFittedBranchWeights(*CondBranch, {ToLookupWeight, ToDefaultWeight},
+   /*IsExpected=*/false);
   if (DTU)
 DTU->applyUpdates(Updates);
 
diff --git a/llvm/test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll 
b/llvm/test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll
index f9e79cabac51d..bee6b375ea11a 100644
--- a/llvm/test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll
+++ b/llvm/test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll
@@ -1565,14 +1565,14 @@ end:
 ; lookup (since i3 can only hold values in the range of explicit
 ; values) and simultaneously trying to generate a branch to deal with
 ; the fact that we have holes in the range.
-define i32 @covered_switch_with_bit_tests(i3) {
+define i32 @covered_switch_with_bit_tests(i3) !prof !0 {
 ; CHECK-LABEL: @covered_switch_with_bit_tests(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:[[SWITCH_TABLEIDX:%.*]] = sub i3 [[TMP0:%.*]], -4
 ; CHECK-NEXT:[[SWITCH_MASKINDEX:%.*]] = zext i3 [[SWITCH_TABLEIDX]] to i8
 ; CHECK-NEXT:[[SWITCH_SHIFTED:%.*]] = lshr i8 -61, [[SWITCH_MASKINDEX]]
 ; CHECK-NEXT:[[SWITCH_LOBIT:%.*]] = trunc i8 [[SWITCH_SHIFTED]] to i1
-; CHECK-NEXT:br i1 [[SWITCH_LOBIT]], label [[SWITCH_LOOKUP:%.*]], label 
[[L6:%.*]]
+; CHECK-NEXT:br i1 [[SWITCH_LOBIT]], label [[SWITCH_LOOKUP:%.*]], label 
[[L6:%.*]], !prof [[PROF1:![0-9]+]]
 ; CHECK:   switch.lookup:
 ; CHECK-NEXT:[[TMP1:%.*]] = zext i3 [[SWITCH_TABLEIDX]] to i64
 ; CHECK-NEXT:[[SWITCH_GEP:%.*]] = getelementptr inbounds [8 x i32], ptr 
@switch.table.covered_switch_with_bit_tests, i64 0, i64 [[TMP1]]
@@ -1588,7 +1588,7 @@ entry:
   i3 -4, label

[llvm-branch-commits] [llvm] [SimplifyCFG][profcheck] Profile propagation for `indirectbr` (PR #161747)

2025-10-03 Thread Mircea Trofin via llvm-branch-commits


https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/161747

>From a55a7e2f13a6606f9660fdacc3287f741d3f2ac2 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Thu, 2 Oct 2025 15:56:16 -0700
Subject: [PATCH] [SimplifyCFG][profcheck] Profile propagation for `indirectbr`

---
 llvm/lib/Transforms/Utils/SimplifyCFG.cpp | 39 +--
 .../test/Transforms/SimplifyCFG/indirectbr.ll | 32 +++
 2 files changed, 51 insertions(+), 20 deletions(-)

diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp 
b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 48055ad6ea7e4..0b7d3b2fb2d7b 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -4895,9 +4895,8 @@ bool 
SimplifyCFGOpt::simplifyTerminatorOnSelect(Instruction *OldTerm,
   // We found both of the successors we were looking for.
   // Create a conditional branch sharing the condition of the select.
   BranchInst *NewBI = Builder.CreateCondBr(Cond, TrueBB, FalseBB);
-  if (TrueWeight != FalseWeight)
-setBranchWeights(*NewBI, {TrueWeight, FalseWeight},
- /*IsExpected=*/false, /*ElideAllZero=*/true);
+  setBranchWeights(*NewBI, {TrueWeight, FalseWeight},
+   /*IsExpected=*/false, /*ElideAllZero=*/true);
 }
   } else if (KeepEdge1 && (KeepEdge2 || TrueBB == FalseBB)) {
 // Neither of the selected blocks were successors, so this
@@ -4982,9 +4981,15 @@ bool 
SimplifyCFGOpt::simplifyIndirectBrOnSelect(IndirectBrInst *IBI,
   BasicBlock *TrueBB = TBA->getBasicBlock();
   BasicBlock *FalseBB = FBA->getBasicBlock();
 
+  // The select's profile becomes the profile of the conditional branch that
+  // replaces the indirect branch.
+  SmallVector SelectBranchWeights(2);
+  if (!ProfcheckDisableMetadataFixes)
+extractBranchWeights(*SI, SelectBranchWeights);
   // Perform the actual simplification.
-  return simplifyTerminatorOnSelect(IBI, SI->getCondition(), TrueBB, FalseBB, 
0,
-0);
+  return simplifyTerminatorOnSelect(IBI, SI->getCondition(), TrueBB, FalseBB,
+SelectBranchWeights[0],
+SelectBranchWeights[1]);
 }
 
 /// This is called when we find an icmp instruction
@@ -7877,20 +7882,29 @@ bool SimplifyCFGOpt::simplifySwitch(SwitchInst *SI, 
IRBuilder<> &Builder) {
 bool SimplifyCFGOpt::simplifyIndirectBr(IndirectBrInst *IBI) {
   BasicBlock *BB = IBI->getParent();
   bool Changed = false;
-
+  SmallVector BranchWeights;
+  const bool HasBranchWeights = !ProfcheckDisableMetadataFixes &&
+extractBranchWeights(*IBI, BranchWeights);
+  SmallVector NewBranchWeights;
   // Eliminate redundant destinations.
   SmallPtrSet Succs;
   SmallSetVector RemovedSuccs;
-  for (unsigned i = 0, e = IBI->getNumDestinations(); i != e; ++i) {
-BasicBlock *Dest = IBI->getDestination(i);
+  for (unsigned I = 0, E = IBI->getNumDestinations(); I != E; ++I) {
+BasicBlock *Dest = IBI->getDestination(I);
 if (!Dest->hasAddressTaken() || !Succs.insert(Dest).second) {
   if (!Dest->hasAddressTaken())
 RemovedSuccs.insert(Dest);
   Dest->removePredecessor(BB);
-  IBI->removeDestination(i);
-  --i;
-  --e;
+  IBI->removeDestination(I);
+  --I;
+  --E;
   Changed = true;
+  if (HasBranchWeights && BranchWeights[I] != 0) {
+LLVM_DEBUG(dbgs() << "Elided indirectbr edge with non-zero profile. "
+ "This is unexpected\n");
+  }
+} else if (HasBranchWeights) {
+  NewBranchWeights.push_back(BranchWeights[I]);
 }
   }
 
@@ -7915,7 +7929,8 @@ bool SimplifyCFGOpt::simplifyIndirectBr(IndirectBrInst 
*IBI) {
 eraseTerminatorAndDCECond(IBI);
 return true;
   }
-
+  if (HasBranchWeights)
+setBranchWeights(*IBI, NewBranchWeights, /*IsExpected=*/false);
   if (SelectInst *SI = dyn_cast(IBI->getAddress())) {
 if (simplifyIndirectBrOnSelect(IBI, SI))
   return requestResimplify();
diff --git a/llvm/test/Transforms/SimplifyCFG/indirectbr.ll 
b/llvm/test/Transforms/SimplifyCFG/indirectbr.ll
index 87d8b399494ce..3127b2c643f74 100644
--- a/llvm/test/Transforms/SimplifyCFG/indirectbr.ll
+++ b/llvm/test/Transforms/SimplifyCFG/indirectbr.ll
@@ -1,4 +1,4 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py 
UTC_ARGS: --check-globals
 ; RUN: opt -S -passes=simplifycfg -simplifycfg-require-and-preserve-domtree=1 
< %s | FileCheck %s
 
 ; SimplifyCFG should eliminate redundant indirectbr edges.
@@ -8,7 +8,11 @@ declare void @A()
 declare void @B(i32)
 declare void @C()
 
-define void @indbrtest0(ptr %P, ptr %Q) {
+;.
+; CHECK: @anchor = constant [13 x ptr] [ptr blockaddress(@indbrtest3, %L1), 
ptr blockaddress(@indbrtest3, %L2), ptr inttoptr (i32 1 to ptr), ptr 
blockaddress(@indbrtest4,

[llvm-branch-commits] [llvm] [PowerPC] Implement paddis (PR #161572)

2025-10-03 Thread Lei Huang via llvm-branch-commits


https://github.com/lei137 updated 
https://github.com/llvm/llvm-project/pull/161572

>From 012b638031fb72d36525234115f9d7b87d8c98e3 Mon Sep 17 00:00:00 2001
From: Lei Huang 
Date: Tue, 30 Sep 2025 18:09:31 +
Subject: [PATCH 1/5] [PowerPC] Implement paddis

---
 .../Target/PowerPC/AsmParser/PPCAsmParser.cpp |  4 ++
 .../PowerPC/MCTargetDesc/PPCAsmBackend.cpp|  9 
 .../PowerPC/MCTargetDesc/PPCFixupKinds.h  |  6 +++
 .../PowerPC/MCTargetDesc/PPCInstPrinter.cpp   | 12 +
 .../PowerPC/MCTargetDesc/PPCInstPrinter.h |  2 +
 .../PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp |  1 +
 llvm/lib/Target/PowerPC/PPCInstrFuture.td | 44 +++
 llvm/lib/Target/PowerPC/PPCRegisterInfo.td| 19 
 .../PowerPC/ppc-encoding-ISAFuture.txt|  6 +++
 .../PowerPC/ppc64le-encoding-ISAFuture.txt|  6 +++
 llvm/test/MC/PowerPC/ppc-encoding-ISAFuture.s |  8 
 11 files changed, 117 insertions(+)

diff --git a/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp 
b/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
index 561a9c51b9cc2..b07f95018ca90 100644
--- a/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
+++ b/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
@@ -365,6 +365,10 @@ struct PPCOperand : public MCParsedAsmOperand {
   bool isS16ImmX4() const { return isExtImm<16>(/*Signed*/ true, 4); }
   bool isS16ImmX16() const { return isExtImm<16>(/*Signed*/ true, 16); }
   bool isS17Imm() const { return isExtImm<17>(/*Signed*/ true, 1); }
+  bool isS32Imm() const {
+// TODO: Is ContextImmediate needed?
+return Kind == Expression || isSImm<32>();
+  }
   bool isS34Imm() const {
 // Once the PC-Rel ABI is finalized, evaluate whether a 34-bit
 // ContextImmediate is needed.
diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp 
b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp
index 04b886ae74993..558351b515a2e 100644
--- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp
+++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp
@@ -47,6 +47,9 @@ static uint64_t adjustFixupValue(unsigned Kind, uint64_t 
Value) {
   case PPC::fixup_ppc_half16ds:
   case PPC::fixup_ppc_half16dq:
 return Value & 0xfffc;
+  case PPC::fixup_ppc_pcrel32:
+  case PPC::fixup_ppc_imm32:
+return Value & 0x;
   case PPC::fixup_ppc_pcrel34:
   case PPC::fixup_ppc_imm34:
 return Value & 0x3;
@@ -71,6 +74,8 @@ static unsigned getFixupKindNumBytes(unsigned Kind) {
   case PPC::fixup_ppc_br24abs:
   case PPC::fixup_ppc_br24_notoc:
 return 4;
+  case PPC::fixup_ppc_pcrel32:
+  case PPC::fixup_ppc_imm32:
   case PPC::fixup_ppc_pcrel34:
   case PPC::fixup_ppc_imm34:
   case FK_Data_8:
@@ -154,6 +159,8 @@ MCFixupKindInfo PPCAsmBackend::getFixupKindInfo(MCFixupKind 
Kind) const {
   {"fixup_ppc_brcond14abs", 16, 14, 0},
   {"fixup_ppc_half16", 0, 16, 0},
   {"fixup_ppc_half16ds", 0, 14, 0},
+  {"fixup_ppc_pcrel32", 0, 32, 0},
+  {"fixup_ppc_imm32", 0, 32, 0},
   {"fixup_ppc_pcrel34", 0, 34, 0},
   {"fixup_ppc_imm34", 0, 34, 0},
   {"fixup_ppc_nofixup", 0, 0, 0}};
@@ -166,6 +173,8 @@ MCFixupKindInfo PPCAsmBackend::getFixupKindInfo(MCFixupKind 
Kind) const {
   {"fixup_ppc_brcond14abs", 2, 14, 0},
   {"fixup_ppc_half16", 0, 16, 0},
   {"fixup_ppc_half16ds", 2, 14, 0},
+  {"fixup_ppc_pcrel32", 0, 32, 0},
+  {"fixup_ppc_imm32", 0, 32, 0},
   {"fixup_ppc_pcrel34", 0, 34, 0},
   {"fixup_ppc_imm34", 0, 34, 0},
   {"fixup_ppc_nofixup", 0, 0, 0}};
diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h 
b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h
index df0c666f5b113..4164b697649cd 100644
--- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h
+++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h
@@ -40,6 +40,12 @@ enum Fixups {
   /// instrs like 'std'.
   fixup_ppc_half16ds,
 
+  // A 32-bit fixup corresponding to PC-relative paddis.
+  fixup_ppc_pcrel32,
+
+  // A 32-bit fixup corresponding to Non-PC-relative paddis.
+  fixup_ppc_imm32,
+
   // A 34-bit fixup corresponding to PC-relative paddi.
   fixup_ppc_pcrel34,
 
diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp 
b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
index b27bc3bd49315..e2afb9378cbf0 100644
--- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
+++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
@@ -430,6 +430,18 @@ void PPCInstPrinter::printS16ImmOperand(const MCInst *MI, 
unsigned OpNo,
 printOperand(MI, OpNo, STI, O);
 }
 
+void PPCInstPrinter::printS32ImmOperand(const MCInst *MI, unsigned OpNo,
+const MCSubtargetInfo &STI,
+raw_ostream &O) {
+  if (MI->getOperand(OpNo).isImm()) {
+long long Value = MI->getOperand(OpNo).getImm();
+assert(isInt<32>(Value) && "Invalid s32imm argument!");
+O << (long long)Value;
+  }
+  else
+printOperand(MI

58 matches

Mail list logo