[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: prevent false positives due to jump tables (PR #138884)

2025-10-02 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/138884

>From f568ed034b0d9d91654f842653cd7260fe6d773d Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 6 May 2025 11:31:03 +0300
Subject: [PATCH] [BOLT] Gadget scanner: prevent false positives due to jump
 tables

As part of PAuth hardening, AArch64 LLVM backend can use a special
BR_JumpTable pseudo (enabled by -faarch64-jump-table-hardening
Clang option) which is expanded in the AsmPrinter into a contiguous
sequence without unsafe instructions in the middle.

This commit adds another target-specific callback to MCPlusBuilder
to make it possible to inhibit false positives for known-safe jump
table dispatch sequences. Without special handling, the branch
instruction is likely to be reported as a non-protected call (as its
destination is not produced by an auth instruction, PC-relative address
materialization, etc.) and possibly as a tail call being performed with
unsafe link register (as the detection whether the branch instruction
is a tail call is an heuristic).

For now, only the specific instruction sequence used by the AArch64
LLVM backend is matched.
---
 bolt/include/bolt/Core/MCInstUtils.h  |   9 +
 bolt/include/bolt/Core/MCPlusBuilder.h|  14 +
 bolt/lib/Core/MCInstUtils.cpp |  20 +
 bolt/lib/Passes/PAuthGadgetScanner.cpp|  10 +
 .../Target/AArch64/AArch64MCPlusBuilder.cpp   |  73 ++
 .../AArch64/gs-pauth-jump-table.s | 703 ++
 6 files changed, 829 insertions(+)
 create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-jump-table.s

diff --git a/bolt/include/bolt/Core/MCInstUtils.h 
b/bolt/include/bolt/Core/MCInstUtils.h
index 291e31e0e0fdf..a240ca07bd02c 100644
--- a/bolt/include/bolt/Core/MCInstUtils.h
+++ b/bolt/include/bolt/Core/MCInstUtils.h
@@ -101,6 +101,15 @@ class MCInstReference {
   /// this function may be called from multithreaded code.
   uint64_t computeAddress(const MCCodeEmitter *Emitter = nullptr) const;
 
+  /// Returns the only preceding instruction, or std::nullopt if multiple or no
+  /// predecessors are possible.
+  ///
+  /// If CFG information is available, basic block boundary can be crossed,
+  /// provided there is exactly one predecessor. If CFG is not available, the
+  /// preceding instruction in the offset order is returned, unless this is the
+  /// first instruction of the function.
+  std::optional getSinglePredecessor();
+
   raw_ostream &print(raw_ostream &OS) const;
 
 private:
diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h 
b/bolt/include/bolt/Core/MCPlusBuilder.h
index 5b711b0e27bab..8c191b113afbc 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -15,6 +15,7 @@
 #define BOLT_CORE_MCPLUSBUILDER_H
 
 #include "bolt/Core/BinaryBasicBlock.h"
+#include "bolt/Core/MCInstUtils.h"
 #include "bolt/Core/MCPlus.h"
 #include "bolt/Core/Relocation.h"
 #include "llvm/ADT/ArrayRef.h"
@@ -718,6 +719,19 @@ class MCPlusBuilder {
 return std::nullopt;
   }
 
+  /// Tests if BranchInst corresponds to an instruction sequence which is known
+  /// to be a safe dispatch via jump table.
+  ///
+  /// The target can decide which instruction sequences to consider "safe" from
+  /// the Pointer Authentication point of view, such as any jump table dispatch
+  /// sequence without function calls inside, any sequence which is contiguous,
+  /// or only some specific well-known sequences.
+  virtual bool
+  isSafeJumpTableBranchForPtrAuth(MCInstReference BranchInst) const {
+llvm_unreachable("not implemented");
+return false;
+  }
+
   virtual bool isTerminator(const MCInst &Inst) const;
 
   virtual bool isNoop(const MCInst &Inst) const {
diff --git a/bolt/lib/Core/MCInstUtils.cpp b/bolt/lib/Core/MCInstUtils.cpp
index f505bf73c64eb..f07616cdb86b9 100644
--- a/bolt/lib/Core/MCInstUtils.cpp
+++ b/bolt/lib/Core/MCInstUtils.cpp
@@ -84,3 +84,23 @@ raw_ostream &MCInstReference::print(raw_ostream &OS) const {
   OS << ">";
   return OS;
 }
+
+std::optional MCInstReference::getSinglePredecessor() {
+  if (const RefInBB *Ref = tryGetRefInBB()) {
+if (Ref->Index != 0)
+  return MCInstReference(*Ref->BB, Ref->Index - 1);
+
+if (Ref->BB->pred_size() != 1)
+  return std::nullopt;
+
+BinaryBasicBlock &PredBB = **Ref->BB->pred_begin();
+assert(!PredBB.empty() && "Empty basic blocks are not supported yet");
+return MCInstReference(PredBB, *PredBB.rbegin());
+  }
+
+  const RefInBF &Ref = getRefInBF();
+  if (Ref.It == Ref.BF->instrs().begin())
+return std::nullopt;
+
+  return MCInstReference(*Ref.BF, std::prev(Ref.It));
+}
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index cfe4b6ba785e4..af453a5aa6871 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -1364,6 +1364,11 @@ shouldReportUnsafeTailCall(const BinaryContext &BC, 
const BinaryFunction

[llvm-branch-commits] [clang] [Clang] Introduce -fsanitize=alloc-token (PR #156839)

2025-10-02 Thread Hans Wennborg via llvm-branch-commits


@@ -0,0 +1,58 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 6
+//
+// Test optimization pipelines do not interfere with AllocToken lowering, and 
we
+// pass on function attributes correctly.
+//
+// RUN: %clang_cc1 -fsanitize=alloc-token -triple x86_64-linux-gnu 
-emit-llvm %s -o - | FileCheck --check-prefix=CHECK-O0 %s
+// RUN: %clang_cc1 -O1 -fsanitize=alloc-token -triple x86_64-linux-gnu 
-emit-llvm %s -o - | FileCheck --check-prefix=CHECK-O1 %s
+// RUN: %clang_cc1 -O2 -fsanitize=alloc-token -triple x86_64-linux-gnu 
-emit-llvm %s -o - | FileCheck --check-prefix=CHECK-O2 %s

zmodem wrote:

Shouldn't the expectations be the same for all optimization levels actually?

If you do need to differentiate some expectations by level, you could do  
`--check-prefixes=CHECK,CHECK-O0`, `--check-prefixes=CHECK,CHECK-O1` etc. and 
then use `CHECK` for the shared expectations, and `CHECK-O` for the 
level-specific ones. But I'm not sure that's needed here.

https://github.com/llvm/llvm-project/pull/156839
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Clang] Introduce -fsanitize=alloc-token (PR #156839)

2025-10-02 Thread Hans Wennborg via llvm-branch-commits


@@ -0,0 +1,58 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 6
+//
+// Test optimization pipelines do not interfere with AllocToken lowering, and 
we
+// pass on function attributes correctly.
+//
+// RUN: %clang_cc1 -fsanitize=alloc-token -triple x86_64-linux-gnu 
-emit-llvm %s -o - | FileCheck --check-prefix=CHECK-O0 %s
+// RUN: %clang_cc1 -O1 -fsanitize=alloc-token -triple x86_64-linux-gnu 
-emit-llvm %s -o - | FileCheck --check-prefix=CHECK-O1 %s
+// RUN: %clang_cc1 -O2 -fsanitize=alloc-token -triple x86_64-linux-gnu 
-emit-llvm %s -o - | FileCheck --check-prefix=CHECK-O2 %s
+
+typedef __typeof(sizeof(int)) size_t;
+
+void *malloc(size_t size);
+
+// CHECK-O0-LABEL: define dso_local ptr @test_malloc(
+// CHECK-O0-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-O0-NEXT:  [[ENTRY:.*:]]
+// CHECK-O0-NEXT:[[TMP0:%.*]] = call ptr @__alloc_token_malloc(i64 noundef 
4, i64 0) #[[ATTR3:[0-9]+]]

zmodem wrote:

Here's another check for an attribute, but it's not clear what attribute that 
is, or if it's important.

https://github.com/llvm/llvm-project/pull/156839
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Clang] Introduce -fsanitize=alloc-token (PR #156839)

2025-10-02 Thread Hans Wennborg via llvm-branch-commits


@@ -0,0 +1,52 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 6

zmodem wrote:

Same here, I'd suggest writing the checks by hand. Only the call instructions 
are interesting (especially their metadata), so I think the test should just 
focus on those.

Also for alloc-token.cpp.

https://github.com/llvm/llvm-project/pull/156839
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Clang] Introduce -fsanitize=alloc-token (PR #156839)

2025-10-02 Thread Hans Wennborg via llvm-branch-commits


@@ -0,0 +1,58 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 6
+//
+// Test optimization pipelines do not interfere with AllocToken lowering, and 
we
+// pass on function attributes correctly.
+//
+// RUN: %clang_cc1 -fsanitize=alloc-token -triple x86_64-linux-gnu 
-emit-llvm %s -o - | FileCheck --check-prefix=CHECK-O0 %s
+// RUN: %clang_cc1 -O1 -fsanitize=alloc-token -triple x86_64-linux-gnu 
-emit-llvm %s -o - | FileCheck --check-prefix=CHECK-O1 %s
+// RUN: %clang_cc1 -O2 -fsanitize=alloc-token -triple x86_64-linux-gnu 
-emit-llvm %s -o - | FileCheck --check-prefix=CHECK-O2 %s
+
+typedef __typeof(sizeof(int)) size_t;
+
+void *malloc(size_t size);
+
+// CHECK-O0-LABEL: define dso_local ptr @test_malloc(
+// CHECK-O0-SAME: ) #[[ATTR0:[0-9]+]] {

zmodem wrote:

Is this matching an attribute that's actually relevant for the test? If so, 
there should be a check for what the attribute is. If not, it would be better 
not to check for it.

https://github.com/llvm/llvm-project/pull/156839
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lldb] release/21.x: [LLDB][ProcessWindows] Set exit status on instance rather than going through all targets (#159308) (PR #161541)

2025-10-02 Thread David Spickett via llvm-branch-commits

DavidSpickett wrote:

* This removes a significant lag when quitting lldb on Windows.
* The code change is basically "do the same thing with fewer steps", very low 
risk. I was able to review by just looking at the code paths.
* We've not had any instability on Windows on Arm, or reports of instability 
elsewhere since this landed.

So I am in favour of backporting this.

Assuming release managers are willing to accept performance changes like this 
one.

https://github.com/llvm/llvm-project/pull/161541
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Clang] Introduce -fsanitize=alloc-token (PR #156839)

2025-10-02 Thread Marco Elver via llvm-branch-commits


@@ -0,0 +1,58 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 6
+//
+// Test optimization pipelines do not interfere with AllocToken lowering, and 
we
+// pass on function attributes correctly.
+//
+// RUN: %clang_cc1 -fsanitize=alloc-token -triple x86_64-linux-gnu 
-emit-llvm %s -o - | FileCheck --check-prefix=CHECK-O0 %s
+// RUN: %clang_cc1 -O1 -fsanitize=alloc-token -triple x86_64-linux-gnu 
-emit-llvm %s -o - | FileCheck --check-prefix=CHECK-O1 %s
+// RUN: %clang_cc1 -O2 -fsanitize=alloc-token -triple x86_64-linux-gnu 
-emit-llvm %s -o - | FileCheck --check-prefix=CHECK-O2 %s

melver wrote:

This was for the auto-generated tests. Switching back to hand-written tests, so 
this will be unified.

https://github.com/llvm/llvm-project/pull/156839
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][SME] Reshuffle emit[prologue|epilogue]() for splitSVEObjects (NFCI) (PR #161217)

2025-10-02 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm approved this pull request.


https://github.com/llvm/llvm-project/pull/161217
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Clang] Introduce -fsanitize=alloc-token (PR #156839)

2025-10-02 Thread Marco Elver via llvm-branch-commits

https://github.com/melver updated 
https://github.com/llvm/llvm-project/pull/156839

>From b3653330c2c39ebaa094670f11afb0f9d36b9de2 Mon Sep 17 00:00:00 2001
From: Marco Elver 
Date: Thu, 4 Sep 2025 12:07:26 +0200
Subject: [PATCH] fixup! Insert AllocToken into index.rst

Created using spr 1.3.8-beta.1
---
 clang/docs/index.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/clang/docs/index.rst b/clang/docs/index.rst
index be654af57f890..aa2b3a73dc11b 100644
--- a/clang/docs/index.rst
+++ b/clang/docs/index.rst
@@ -40,6 +40,7 @@ Using Clang as a Compiler
SanitizerCoverage
SanitizerStats
SanitizerSpecialCaseList
+   AllocToken
BoundsSafety
BoundsSafetyAdoptionGuide
BoundsSafetyImplPlans

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Clang] Introduce -fsanitize=alloc-token (PR #156839)

2025-10-02 Thread Marco Elver via llvm-branch-commits


@@ -0,0 +1,52 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 6

melver wrote:

Switched back to hand-written tests.

https://github.com/llvm/llvm-project/pull/156839
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [Hexagon] Add opcode V6_vS32Ub_npred_ai for offset validity check (#161618) (PR #161692)

2025-10-02 Thread via llvm-branch-commits

llvmbot wrote:

@androm3da What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/161692
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [Hexagon] Add opcode V6_vS32Ub_npred_ai for offset validity check (#161618) (PR #161692)

2025-10-02 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/161692

Backport daa4e57ccf38ff6ac22243e98a035c87b9f9f3ae

Requested by: @androm3da

>From 220bac16a417e97bf97fdcb34855e28b2e6dfdf7 Mon Sep 17 00:00:00 2001
From: Ikhlas Ajbar 
Date: Thu, 2 Oct 2025 09:43:24 -0500
Subject: [PATCH] [Hexagon] Add opcode V6_vS32Ub_npred_ai for offset validity
 check (#161618)

Check for a valid offset for unaligned vector store V6_vS32Ub_npred_ai.
isValidOffset() is updated to evaluate offset of this instruction.
Fixes #160647

(cherry picked from commit daa4e57ccf38ff6ac22243e98a035c87b9f9f3ae)
---
 llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp  |  1 +
 .../CodeGen/Hexagon/unaligned-vec-store.ll| 23 +++
 2 files changed, 24 insertions(+)
 create mode 100644 llvm/test/CodeGen/Hexagon/unaligned-vec-store.ll

diff --git a/llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp 
b/llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp
index 64bc5ca134c86..35863f790eae4 100644
--- a/llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp
@@ -2803,6 +2803,7 @@ bool HexagonInstrInfo::isValidOffset(unsigned Opcode, int 
Offset,
   case Hexagon::V6_vL32b_nt_cur_npred_ai:
   case Hexagon::V6_vL32b_nt_tmp_pred_ai:
   case Hexagon::V6_vL32b_nt_tmp_npred_ai:
+  case Hexagon::V6_vS32Ub_npred_ai:
   case Hexagon::V6_vgathermh_pseudo:
   case Hexagon::V6_vgathermw_pseudo:
   case Hexagon::V6_vgathermhw_pseudo:
diff --git a/llvm/test/CodeGen/Hexagon/unaligned-vec-store.ll 
b/llvm/test/CodeGen/Hexagon/unaligned-vec-store.ll
new file mode 100644
index 0..267e365243711
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/unaligned-vec-store.ll
@@ -0,0 +1,23 @@
+; RUN: llc -march=hexagon -mcpu=hexagonv68 -mattr=+hvxv68,+hvx-length128B < %s 
| FileCheck %s
+; REQUIRES: asserts
+
+; Check that the test does not assert when unaligned vector store 
V6_vS32Ub_npred_ai is generated.
+; CHECK: if (!p{{[0-3]}}) vmemu
+
+target triple = "hexagon-unknown-unknown-elf"
+
+define fastcc void @test(i1 %cmp.i.i) {
+entry:
+  %call.i.i.i172 = load ptr, ptr null, align 4
+  %add.ptr = getelementptr i8, ptr %call.i.i.i172, i32 1
+  store <32 x i32> zeroinitializer, ptr %add.ptr, align 128
+  %add.ptr4.i4 = getelementptr i8, ptr %call.i.i.i172, i32 129
+  br i1 %cmp.i.i, label %common.ret, label %if.end.i.i
+
+common.ret:   ; preds = %if.end.i.i, %entry
+  ret void
+
+if.end.i.i:   ; preds = %entry
+  store <32 x i32> zeroinitializer, ptr %add.ptr4.i4, align 1
+  br label %common.ret
+}

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [Hexagon] Add opcode V6_vS32Ub_npred_ai for offset validity check (#161618) (PR #161692)

2025-10-02 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-hexagon

Author: None (llvmbot)


Changes

Backport daa4e57ccf38ff6ac22243e98a035c87b9f9f3ae

Requested by: @androm3da

---
Full diff: https://github.com/llvm/llvm-project/pull/161692.diff


2 Files Affected:

- (modified) llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp (+1) 
- (added) llvm/test/CodeGen/Hexagon/unaligned-vec-store.ll (+23) 


``diff
diff --git a/llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp 
b/llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp
index 64bc5ca134c86..35863f790eae4 100644
--- a/llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp
@@ -2803,6 +2803,7 @@ bool HexagonInstrInfo::isValidOffset(unsigned Opcode, int 
Offset,
   case Hexagon::V6_vL32b_nt_cur_npred_ai:
   case Hexagon::V6_vL32b_nt_tmp_pred_ai:
   case Hexagon::V6_vL32b_nt_tmp_npred_ai:
+  case Hexagon::V6_vS32Ub_npred_ai:
   case Hexagon::V6_vgathermh_pseudo:
   case Hexagon::V6_vgathermw_pseudo:
   case Hexagon::V6_vgathermhw_pseudo:
diff --git a/llvm/test/CodeGen/Hexagon/unaligned-vec-store.ll 
b/llvm/test/CodeGen/Hexagon/unaligned-vec-store.ll
new file mode 100644
index 0..267e365243711
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/unaligned-vec-store.ll
@@ -0,0 +1,23 @@
+; RUN: llc -march=hexagon -mcpu=hexagonv68 -mattr=+hvxv68,+hvx-length128B < %s 
| FileCheck %s
+; REQUIRES: asserts
+
+; Check that the test does not assert when unaligned vector store 
V6_vS32Ub_npred_ai is generated.
+; CHECK: if (!p{{[0-3]}}) vmemu
+
+target triple = "hexagon-unknown-unknown-elf"
+
+define fastcc void @test(i1 %cmp.i.i) {
+entry:
+  %call.i.i.i172 = load ptr, ptr null, align 4
+  %add.ptr = getelementptr i8, ptr %call.i.i.i172, i32 1
+  store <32 x i32> zeroinitializer, ptr %add.ptr, align 128
+  %add.ptr4.i4 = getelementptr i8, ptr %call.i.i.i172, i32 129
+  br i1 %cmp.i.i, label %common.ret, label %if.end.i.i
+
+common.ret:   ; preds = %if.end.i.i, %entry
+  ret void
+
+if.end.i.i:   ; preds = %entry
+  store <32 x i32> zeroinitializer, ptr %add.ptr4.i4, align 1
+  br label %common.ret
+}

``




https://github.com/llvm/llvm-project/pull/161692
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [Hexagon] Add opcode V6_vS32Ub_npred_ai for offset validity check (#161618) (PR #161692)

2025-10-02 Thread Brian Cain via llvm-branch-commits

https://github.com/androm3da approved this pull request.


https://github.com/llvm/llvm-project/pull/161692
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [Hexagon] Add opcode V6_vS32Ub_npred_ai for offset validity check (#161618) (PR #161692)

2025-10-02 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/161692
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Clang] Introduce -fsanitize=alloc-token (PR #156839)

2025-10-02 Thread Marco Elver via llvm-branch-commits

https://github.com/melver updated 
https://github.com/llvm/llvm-project/pull/156839

>From b3653330c2c39ebaa094670f11afb0f9d36b9de2 Mon Sep 17 00:00:00 2001
From: Marco Elver 
Date: Thu, 4 Sep 2025 12:07:26 +0200
Subject: [PATCH] fixup! Insert AllocToken into index.rst

Created using spr 1.3.8-beta.1
---
 clang/docs/index.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/clang/docs/index.rst b/clang/docs/index.rst
index be654af57f890..aa2b3a73dc11b 100644
--- a/clang/docs/index.rst
+++ b/clang/docs/index.rst
@@ -40,6 +40,7 @@ Using Clang as a Compiler
SanitizerCoverage
SanitizerStats
SanitizerSpecialCaseList
+   AllocToken
BoundsSafety
BoundsSafetyAdoptionGuide
BoundsSafetyImplPlans

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Update code sequence for CU-mode Release Fences in GFX10+ (PR #161638)

2025-10-02 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/161638?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#161638** https://app.graphite.dev/github/pr/llvm/llvm-project/161638?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/161638?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#161637** https://app.graphite.dev/github/pr/llvm/llvm-project/161637?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/161638
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Update code sequence for CU-mode Release Fences in GFX10+ (PR #161638)

2025-10-02 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)


Changes

They were previously optimized to not emit any waitcnt, which is technically 
correct because there is no reordering of operations at workgroup scope in CU 
mode for GFX10+.

This breaks transitivity however, for example if we have the following sequence 
of events in one thread:

- some stores
- store atomic release syncscope("workgroup")
- barrier

then another thread follows with

- barrier
- load atomic acquire
- store atomic release syncscope("agent")

It does not work because, while the other thread sees the stores, it cannot 
release them at the wider scope. Our release fences aren't strong enough to 
"wait" on stores from other waves.

We also cannot strengthen our release fences any further to allow for releasing 
other wave's stores because only GFX12 can do that with `global_wb`. GFX10-11 
do not have the writeback instruction.
It'd also add yet another level of complexity to code sequences, with both 
acquire/release having CU-mode only alternatives.
Lastly, acq/rel are always used together. The price for synchronization has to 
be paid either at the acq, or the rel. Strengthening the releases would just 
make the memory model more complex but wouldn't help performance.

So the choice here is to streamline the code sequences by making  CU and WGP 
mode emit identical code for release (or stronger) atomic ordering.

This also removes the `vm_vsrc(0)` wait before barriers. Now that the release 
fence in CU mode is strong enough, it is no longer needed.

Supersedes #160501
Solves SC1-6454

---

Patch is 428.16 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/161638.diff


16 Files Affected:

- (modified) llvm/docs/AMDGPUUsage.rst (+3-58) 
- (modified) llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp (+14-37) 
- (modified) 
llvm/test/CodeGen/AMDGPU/GlobalISel/memory-legalizer-atomic-fence.ll (+24-6) 
- (modified) llvm/test/CodeGen/AMDGPU/lds-dma-workgroup-release.ll (-1) 
- (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-barriers.ll (+8-12) 
- (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-fence-mmra-global.ll 
(+48) 
- (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-fence.ll (+48-9) 
- (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-volatile.ll (+8-3) 
- (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-workgroup.ll 
(+601-185) 
- (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-global-volatile.ll 
(+8-3) 
- (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-global-workgroup.ll 
(+540-92) 
- (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-local-agent.ll (+240-90) 
- (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-local-cluster.ll 
(+240-90) 
- (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-local-system.ll 
(+240-90) 
- (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-local-volatile.ll (+8-3) 
- (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-local-workgroup.ll 
(+240-90) 


``diff
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 74b7604fda56d..cba86b3d5447e 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -13229,9 +13229,6 @@ table 
:ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx10-gfx11-table`.
  store atomic release  - workgroup- global   1. s_waitcnt 
lgkmcnt(0) &
   - generic vmcnt(0) & vscnt(0)
 
-   - If CU wavefront 
execution
- mode, omit 
vmcnt(0) and
- vscnt(0).
- If OpenCL, omit
  lgkmcnt(0).
- Could be split 
into
@@ -13277,8 +13274,6 @@ table 
:ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx10-gfx11-table`.
  2. 
buffer/global/flat_store
  store atomic release  - workgroup- local1. s_waitcnt vmcnt(0) 
& vscnt(0)
 
-   - If CU wavefront 
execution
- mode, omit.
- If OpenCL, omit.
- Could be split 
into
  separate s_waitcnt
@@ -13366,9 +13361,6 @@ table 
:ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx10-gfx11-table`.
  atomicrmwrelease  - workgroup- global   1. s_waitcnt 
lgkmcnt(0) &
   - generic vmcnt(0) & vscnt(0)
 
-   - 

[llvm-branch-commits] [llvm] [AArch64][SME] Support split ZPR and PPR area allocation (PR #142392)

2025-10-02 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm approved this pull request.

Thanks @MacDue, LGTM!

https://github.com/llvm/llvm-project/pull/142392
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AllocToken, Clang] Infer type hints from sizeof expressions and casts (PR #156841)

2025-10-02 Thread Marco Elver via llvm-branch-commits


@@ -1353,6 +1354,92 @@ void CodeGenFunction::EmitAllocToken(llvm::CallBase *CB, 
QualType AllocType) {
   CB->setMetadata(llvm::LLVMContext::MD_alloc_token, MDN);
 }
 
+/// Infer type from a simple sizeof expression.
+static QualType inferTypeFromSizeofExpr(const Expr *E) {
+  const Expr *Arg = E->IgnoreParenImpCasts();
+  if (const auto *UET = dyn_cast(Arg)) {
+if (UET->getKind() == UETT_SizeOf) {
+  if (UET->isArgumentType())
+return UET->getArgumentTypeInfo()->getType();
+  else
+return UET->getArgumentExpr()->getType();
+}
+  }
+  return QualType();
+}
+
+/// Infer type from an arithmetic expression involving a sizeof.
+static QualType inferTypeFromArithSizeofExpr(const Expr *E) {
+  const Expr *Arg = E->IgnoreParenImpCasts();
+  // The argument is a lone sizeof expression.
+  if (QualType T = inferTypeFromSizeofExpr(Arg); !T.isNull())
+return T;
+  if (const auto *BO = dyn_cast(Arg)) {
+// Argument is an arithmetic expression. Cover common arithmetic patterns
+// involving sizeof.
+switch (BO->getOpcode()) {
+case BO_Add:
+case BO_Div:
+case BO_Mul:
+case BO_Shl:
+case BO_Shr:
+case BO_Sub:
+  if (QualType T = inferTypeFromArithSizeofExpr(BO->getLHS()); !T.isNull())
+return T;
+  if (QualType T = inferTypeFromArithSizeofExpr(BO->getRHS()); !T.isNull())
+return T;
+  break;
+default:
+  break;
+}
+  }
+  return QualType();
+}
+
+/// If the expression E is a reference to a variable, infer the type from a
+/// variable's initializer if it contains a sizeof. Beware, this is a heuristic
+/// and ignores if a variable is later reassigned.
+static QualType inferTypeFromVarInitSizeofExpr(const Expr *E) {
+  const Expr *Arg = E->IgnoreParenImpCasts();
+  if (const auto *DRE = dyn_cast(Arg)) {
+if (const auto *VD = dyn_cast(DRE->getDecl())) {
+  if (const Expr *Init = VD->getInit())
+return inferTypeFromArithSizeofExpr(Init);
+}
+  }
+  return QualType();
+}
+
+/// Deduces the allocated type by checking if the allocation call's result
+/// is immediately used in a cast expression.
+static QualType inferTypeFromCastExpr(const CallExpr *CallE,

melver wrote:

You mean for inferTypeFromCastExpr specifically, or in general for all?

General: I think some of the recursive visitors (RecursiveASTVisitor or also 
EvaluatedExprVisitor) could help, but they could only give us a differently 
shaped inferPossibleTypeFromArithSizeofExpr(). They also don't make early 
return and control over the order of visits that easy (unless we override the 
Visit function, which defeats the point). And as for efficiency, as I wrote 
elsewhere, here we can do more targeted visiting of relevant AST nodes and the 
overall algorithm is much more lightweight.

For the "infer from cast" problem specifically, I used to have this 
implementation:
```
static QualType inferTypeFromCastExpr(const CallExpr *E, ASTContext &Ctx) {
  DynTypedNodeList Parents = Ctx.getParentMapContext().getParents(*E);
  if (Parents.empty())
return QualType();
  // We only care about the most direct parent for this heuristic.
  if (const auto *CE = Parents[0].get()) {
QualType PtrType = CE->getType();
if (PtrType->isPointerType())
  return PtrType->getPointeeType();
  }
  return QualType();
}
```
But the `getParents` implementation uses RecursiveASTVisitor underneath, which 
turned out to be really slow and also ended up crashing on some code using 
coroutines. So instead, the current inferTypeFromCastExpr() implementation is 
just O(1) by keeping the innermost CurCast saved during normal AST walk.

https://github.com/llvm/llvm-project/pull/156841
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [CIR] Upstream `AddressSpace` conversions support (PR #161212)

2025-10-02 Thread David Rivera via llvm-branch-commits

https://github.com/RiverDave edited 
https://github.com/llvm/llvm-project/pull/161212
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [mlir] [mlir][omp] Improve canonloop/iv naming (PR #159773)

2025-10-02 Thread Michael Kruse via llvm-branch-commits

Meinersbur wrote:

> The reduction and privatisation declaration operations have multiple regions.

They are 
"[IsolatedFromAbove](https://mlir.llvm.org/docs/Traits/#isolatedfromabove)" 
though, meaning each indivually have they separated namespace.

This also meant that I previously misunderstool the "IsolatedFromAbove" trait, 
which I though was about the operation itself, not its region argument. I found 
out by trying to create a naming conflict of variable names in different region 
arguments. They do not conflict, so no reason to append an `_r` suffix. I 
did not find a operation with multiple region arguments in either the LLVMIR 
nor OpenMP dialects. There is one in FIR: `fir.if`. I added a test into flang 
as well to use it.

https://github.com/llvm/llvm-project/pull/159773
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [AllocToken, Clang] Implement TypeHashPointerSplit mode (PR #156840)

2025-10-02 Thread Hans Wennborg via llvm-branch-commits


@@ -0,0 +1,301 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 5

zmodem wrote:

For the record, I'm not against auto-generated tests in general. But for these 
tests, the auto-generated versions looked a lot like verbose "golden tests", 
whereas what we want to test is very specific: do the calls have a certain 
piece of metadata on them.

melver's proposal sgtm.

https://github.com/llvm/llvm-project/pull/156840
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] CodeGen: Stop checking for physregs in constrainRegClass (PR #161795)

2025-10-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/161795

It's nonsensical to call this function on a physical register.

>From 2833da98821c20e651468e63ef834acfa66cac88 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Fri, 3 Oct 2025 14:00:55 +0900
Subject: [PATCH] CodeGen: Stop checking for physregs in constrainRegClass

It's nonsensical to call this function on a physical register.
---
 llvm/lib/CodeGen/MachineRegisterInfo.cpp | 2 --
 1 file changed, 2 deletions(-)

diff --git a/llvm/lib/CodeGen/MachineRegisterInfo.cpp 
b/llvm/lib/CodeGen/MachineRegisterInfo.cpp
index abb3f3e612000..ae284f3ae2929 100644
--- a/llvm/lib/CodeGen/MachineRegisterInfo.cpp
+++ b/llvm/lib/CodeGen/MachineRegisterInfo.cpp
@@ -83,8 +83,6 @@ constrainRegClass(MachineRegisterInfo &MRI, Register Reg,
 
 const TargetRegisterClass *MachineRegisterInfo::constrainRegClass(
 Register Reg, const TargetRegisterClass *RC, unsigned MinNumRegs) {
-  if (Reg.isPhysical())
-return nullptr;
   return ::constrainRegClass(*this, Reg, getRegClass(Reg), RC, MinNumRegs);
 }
 

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoongArch] Custom legalize vector_shuffle to `xvinsve0.{w/d}` when possible (PR #161156)

2025-10-02 Thread via llvm-branch-commits

zhaoqi5 wrote:

Same as https://github.com/llvm/llvm-project/pull/160857 which has been closed 
because of my silly mistake.

https://github.com/llvm/llvm-project/pull/161156
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [llvm][mustache] Use single pass when tokenizing (PR #159196)

2025-10-02 Thread Paul Kirth via llvm-branch-commits

https://github.com/ilovepi updated 
https://github.com/llvm/llvm-project/pull/159196

>From 709be29237e8bab16b3d7f4703cc4127ca7f59fc Mon Sep 17 00:00:00 2001
From: Paul Kirth 
Date: Mon, 15 Sep 2025 23:27:50 -0700
Subject: [PATCH] [llvm][mustache] Use single pass when tokenizing

The old implementation used many string searches over the same portions
of the strings. This version sacrifices some API niceness for perf wins.

  Metric | Baseline | Single-Pass | Change
  -- |  | --- | ---
  Time (ms)  | 36.09| 35.78   | -0.86%
  Cycles | 35.3M| 35.0M   | -0.79%
  Instructions   | 86.7M| 85.8M   | -1.03%
  Branch Misses  | 116K | 114K| -1.91%
  Cache Misses   | 244K | 232K| -4.98%
---
 llvm/lib/Support/Mustache.cpp | 186 +-
 1 file changed, 73 insertions(+), 113 deletions(-)

diff --git a/llvm/lib/Support/Mustache.cpp b/llvm/lib/Support/Mustache.cpp
index 30ced31bd7c43..0053a425b758d 100644
--- a/llvm/lib/Support/Mustache.cpp
+++ b/llvm/lib/Support/Mustache.cpp
@@ -368,141 +368,101 @@ static const char *jsonKindToString(json::Value::Kind 
K) {
   llvm_unreachable("Unknown json::Value::Kind");
 }
 
-static Tag findNextTag(StringRef Template, size_t StartPos, StringRef Open,
-   StringRef Close) {
-  const StringLiteral TripleOpen("{{{");
-  const StringLiteral TripleClose("}}}");
-
-  size_t NormalOpenPos = Template.find(Open, StartPos);
-  size_t TripleOpenPos = Template.find(TripleOpen, StartPos);
-
-  Tag Result;
-
-  // Determine which tag comes first.
-  if (TripleOpenPos != StringRef::npos &&
-  (NormalOpenPos == StringRef::npos || TripleOpenPos <= NormalOpenPos)) {
-// Found a triple mustache tag.
-size_t EndPos =
-Template.find(TripleClose, TripleOpenPos + TripleOpen.size());
-if (EndPos == StringRef::npos)
-  return Result; // No closing tag found.
-
-Result.TagKind = Tag::Kind::Triple;
-Result.StartPosition = TripleOpenPos;
-size_t ContentStart = TripleOpenPos + TripleOpen.size();
-Result.Content = Template.substr(ContentStart, EndPos - ContentStart);
-Result.FullMatch = Template.substr(
-TripleOpenPos, (EndPos + TripleClose.size()) - TripleOpenPos);
-  } else if (NormalOpenPos != StringRef::npos) {
-// Found a normal mustache tag.
-size_t EndPos = Template.find(Close, NormalOpenPos + Open.size());
-if (EndPos == StringRef::npos)
-  return Result; // No closing tag found.
-
-Result.TagKind = Tag::Kind::Normal;
-Result.StartPosition = NormalOpenPos;
-size_t ContentStart = NormalOpenPos + Open.size();
-Result.Content = Template.substr(ContentStart, EndPos - ContentStart);
-Result.FullMatch =
-Template.substr(NormalOpenPos, (EndPos + Close.size()) - 
NormalOpenPos);
-  }
-
-  return Result;
-}
-
-static std::optional>
-processTag(const Tag &T, SmallVectorImpl &Tokens, MustacheContext &Ctx) 
{
-  LLVM_DEBUG(dbgs() << "[Tag] " << T.FullMatch << ", Content: " << T.Content
-<< ", Kind: " << tagKindToString(T.TagKind) << "\n");
-  if (T.TagKind == Tag::Kind::Triple) {
-Tokens.emplace_back(T.FullMatch, Ctx.Saver.save("&" + T.Content), '&', 
Ctx);
-return std::nullopt;
-  }
-  StringRef Interpolated = T.Content;
-  if (!Interpolated.trim().starts_with("=")) {
-char Front = Interpolated.empty() ? ' ' : Interpolated.trim().front();
-Tokens.emplace_back(T.FullMatch, Interpolated, Front, Ctx);
-return std::nullopt;
-  }
-  Tokens.emplace_back(T.FullMatch, Interpolated, '=', Ctx);
-  StringRef DelimSpec = Interpolated.trim();
-  DelimSpec = DelimSpec.drop_front(1);
-  DelimSpec = DelimSpec.take_until([](char C) { return C == '='; });
-  DelimSpec = DelimSpec.trim();
-
-  std::pair Ret = DelimSpec.split(' ');
-  LLVM_DEBUG(dbgs() << "[Set Delimiter] NewOpen: " << Ret.first
-<< ", NewClose: " << Ret.second << "\n");
-  return Ret;
-}
-
 // Simple tokenizer that splits the template into tokens.
-// The mustache spec allows {{{ }}} to unescape variables,
-// but we don't support that here. An unescape variable
-// is represented only by {{& variable}}.
 static SmallVector tokenize(StringRef Template, MustacheContext &Ctx) {
   LLVM_DEBUG(dbgs() << "[Tokenize Template] \"" << Template << "\"\n");
   SmallVector Tokens;
   SmallString<8> Open("{{");
   SmallString<8> Close("}}");
-  size_t Start = 0;
+  size_t Cursor = 0;
+  size_t TextStart = 0;
+
+  const StringLiteral TripleOpen("{{{");
+  const StringLiteral TripleClose("}}}");
 
-  while (Start < Template.size()) {
-LLVM_DEBUG(dbgs() << "[Tokenize Loop] Start=" << Start << ", Open='" << 
Open
-  << "', Close='" << Close << "'\n");
-Tag T = findNextTag(Template, Start, Open, Close);
+  while (Cursor < Template.size()) {
+StringRef TemplateSuffix = Template.substr(Cursor);
+StringRef TagOpen, TagClose;
+Tag::Kind Kind;
+
+ 

[llvm-branch-commits] [llvm] AMDGPU: Remove LDS_DIRECT_CLASS register class (PR #161762)

2025-10-02 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec requested changes to this pull request.

Same here: drop mir tests which test what tablegen has generated.

https://github.com/llvm/llvm-project/pull/161762
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] MIRVocabulary changes (PR #161713)

2025-10-02 Thread S. VenkataKeerthy via llvm-branch-commits

svkeerthy wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/161713?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#161713** https://app.graphite.dev/github/pr/llvm/llvm-project/161713?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/161713?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#161463** https://app.graphite.dev/github/pr/llvm/llvm-project/161463?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#158376** https://app.graphite.dev/github/pr/llvm/llvm-project/158376?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#156952** https://app.graphite.dev/github/pr/llvm/llvm-project/156952?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#155690** https://app.graphite.dev/github/pr/llvm/llvm-project/155690?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#155516** https://app.graphite.dev/github/pr/llvm/llvm-project/155516?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#155323** https://app.graphite.dev/github/pr/llvm/llvm-project/155323?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>: 1 other dependent PR 
([#155700](https://github.com/llvm/llvm-project/pull/155700) https://app.graphite.dev/github/pr/llvm/llvm-project/155700?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>)
* **#153094** https://app.graphite.dev/github/pr/llvm/llvm-project/153094?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#153089** https://app.graphite.dev/github/pr/llvm/llvm-project/153089?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#153087** https://app.graphite.dev/github/pr/llvm/llvm-project/153087?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#152613** https://app.graphite.dev/github/pr/llvm/llvm-project/152613?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/161713
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [clang][SPARC] Pass 16-aligned structs with the correct alignment in CC (PR #161766)

2025-10-02 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang-codegen

Author: Brad Smith (brad0)


Changes

Pad argument registers to preserve overaligned structs in LLVM IR.
Additionally, since i128 values will be lowered as split i64 pairs in
the backend, correctly set the alignment of such arguments as 16 bytes.

This should make clang compliant with the ABI specification and fix
https://github.com/llvm/llvm-project/issues/144709.

(cherry picked from commit 6679e43937a87db3ce59a02f0cfc86951a4881e4)

---
Full diff: https://github.com/llvm/llvm-project/pull/161766.diff


4 Files Affected:

- (modified) clang/lib/CodeGen/Targets/Sparc.cpp (+57-68) 
- (modified) clang/test/CodeGen/sparcv9-abi.c (+56) 
- (modified) llvm/lib/Target/Sparc/SparcISelLowering.cpp (+2-1) 
- (modified) llvm/test/CodeGen/SPARC/64abi.ll (+14-1) 


``diff
diff --git a/clang/lib/CodeGen/Targets/Sparc.cpp 
b/clang/lib/CodeGen/Targets/Sparc.cpp
index 9642196b78c63..0461f121d76c9 100644
--- a/clang/lib/CodeGen/Targets/Sparc.cpp
+++ b/clang/lib/CodeGen/Targets/Sparc.cpp
@@ -8,6 +8,7 @@
 
 #include "ABIInfoImpl.h"
 #include "TargetInfo.h"
+#include 
 
 using namespace clang;
 using namespace clang::CodeGen;
@@ -109,7 +110,8 @@ class SparcV9ABIInfo : public ABIInfo {
   SparcV9ABIInfo(CodeGenTypes &CGT) : ABIInfo(CGT) {}
 
 private:
-  ABIArgInfo classifyType(QualType RetTy, unsigned SizeLimit) const;
+  ABIArgInfo classifyType(QualType RetTy, unsigned SizeLimit,
+  unsigned &RegOffset) const;
   void computeInfo(CGFunctionInfo &FI) const override;
   RValue EmitVAArg(CodeGenFunction &CGF, Address VAListAddr, QualType Ty,
AggValueSlot Slot) const override;
@@ -222,127 +224,114 @@ class SparcV9ABIInfo : public ABIInfo {
 };
 } // end anonymous namespace
 
-ABIArgInfo
-SparcV9ABIInfo::classifyType(QualType Ty, unsigned SizeLimit) const {
+ABIArgInfo SparcV9ABIInfo::classifyType(QualType Ty, unsigned SizeLimit,
+unsigned &RegOffset) const {
   if (Ty->isVoidType())
 return ABIArgInfo::getIgnore();
 
-  uint64_t Size = getContext().getTypeSize(Ty);
+  auto &Context = getContext();
+  auto &VMContext = getVMContext();
+
+  uint64_t Size = Context.getTypeSize(Ty);
+  unsigned Alignment = Context.getTypeAlign(Ty);
+  bool NeedPadding = (Alignment > 64) && (RegOffset % 2 != 0);
 
   // Anything too big to fit in registers is passed with an explicit indirect
   // pointer / sret pointer.
-  if (Size > SizeLimit)
+  if (Size > SizeLimit) {
+RegOffset += 1;
 return getNaturalAlignIndirect(
 Ty, /*AddrSpace=*/getDataLayout().getAllocaAddrSpace(),
 /*ByVal=*/false);
+  }
 
   // Treat an enum type as its underlying type.
   if (const EnumType *EnumTy = Ty->getAs())
 Ty = EnumTy->getDecl()->getIntegerType();
 
   // Integer types smaller than a register are extended.
-  if (Size < 64 && Ty->isIntegerType())
+  if (Size < 64 && Ty->isIntegerType()) {
+RegOffset += 1;
 return ABIArgInfo::getExtend(Ty);
+  }
 
   if (const auto *EIT = Ty->getAs())
-if (EIT->getNumBits() < 64)
+if (EIT->getNumBits() < 64) {
+  RegOffset += 1;
   return ABIArgInfo::getExtend(Ty);
+}
 
   // Other non-aggregates go in registers.
-  if (!isAggregateTypeForABI(Ty))
+  if (!isAggregateTypeForABI(Ty)) {
+RegOffset += Size / 64;
 return ABIArgInfo::getDirect();
+  }
 
   // If a C++ object has either a non-trivial copy constructor or a non-trivial
   // destructor, it is passed with an explicit indirect pointer / sret pointer.
-  if (CGCXXABI::RecordArgABI RAA = getRecordArgABI(Ty, getCXXABI()))
+  if (CGCXXABI::RecordArgABI RAA = getRecordArgABI(Ty, getCXXABI())) {
+RegOffset += 1;
 return getNaturalAlignIndirect(Ty, getDataLayout().getAllocaAddrSpace(),
RAA == CGCXXABI::RAA_DirectInMemory);
+  }
 
   // This is a small aggregate type that should be passed in registers.
   // Build a coercion type from the LLVM struct type.
   llvm::StructType *StrTy = dyn_cast(CGT.ConvertType(Ty));
-  if (!StrTy)
+  if (!StrTy) {
+RegOffset += Size / 64;
 return ABIArgInfo::getDirect();
+  }
 
-  CoerceBuilder CB(getVMContext(), getDataLayout());
+  CoerceBuilder CB(VMContext, getDataLayout());
   CB.addStruct(0, StrTy);
   // All structs, even empty ones, should take up a register argument slot,
   // so pin the minimum struct size to one bit.
   CB.pad(llvm::alignTo(
   std::max(CB.DL.getTypeSizeInBits(StrTy).getKnownMinValue(), uint64_t(1)),
   64));
+  RegOffset += CB.Size / 64;
+
+  // If we're dealing with overaligned structs we may need to add a padding in
+  // the front, to preserve the correct register-memory mapping.
+  //
+  // See SCD 2.4.1, pages 3P-11 and 3P-12.
+  llvm::Type *Padding =
+  NeedPadding ? llvm::Type::getInt64Ty(VMContext) : nullptr;
+  RegOffset += NeedPadding ? 1 : 0;
 
   // Try to use the original type for coercion.
   llvm::Type *CoerceTy = CB.isUsableType(

[llvm-branch-commits] [clang] [llvm] [clang][SPARC] Pass 16-aligned structs with the correct alignment in CC (PR #161766)

2025-10-02 Thread Brad Smith via llvm-branch-commits

https://github.com/brad0 updated 
https://github.com/llvm/llvm-project/pull/161766

>From 9ee4ac8a83592385794978cd15fa094d926cca2c Mon Sep 17 00:00:00 2001
From: Koakuma 
Date: Thu, 2 Oct 2025 22:22:07 -0400
Subject: [PATCH] [clang][SPARC] Pass 16-aligned structs with the correct
 alignment in CC (#155829)

Pad argument registers to preserve overaligned structs in LLVM IR.
Additionally, since i128 values will be lowered as split i64 pairs in
the backend, correctly set the alignment of such arguments as 16 bytes.

This should make clang compliant with the ABI specification and fix
https://github.com/llvm/llvm-project/issues/144709.

(cherry picked from commit 6679e43937a87db3ce59a02f0cfc86951a4881e4)
---
 clang/lib/CodeGen/Targets/Sparc.cpp | 125 +---
 clang/test/CodeGen/sparcv9-abi.c|  56 +
 llvm/lib/Target/Sparc/SparcISelLowering.cpp |   3 +-
 llvm/test/CodeGen/SPARC/64abi.ll|  15 ++-
 4 files changed, 129 insertions(+), 70 deletions(-)

diff --git a/clang/lib/CodeGen/Targets/Sparc.cpp 
b/clang/lib/CodeGen/Targets/Sparc.cpp
index 9642196b78c63..0461f121d76c9 100644
--- a/clang/lib/CodeGen/Targets/Sparc.cpp
+++ b/clang/lib/CodeGen/Targets/Sparc.cpp
@@ -8,6 +8,7 @@
 
 #include "ABIInfoImpl.h"
 #include "TargetInfo.h"
+#include 
 
 using namespace clang;
 using namespace clang::CodeGen;
@@ -109,7 +110,8 @@ class SparcV9ABIInfo : public ABIInfo {
   SparcV9ABIInfo(CodeGenTypes &CGT) : ABIInfo(CGT) {}
 
 private:
-  ABIArgInfo classifyType(QualType RetTy, unsigned SizeLimit) const;
+  ABIArgInfo classifyType(QualType RetTy, unsigned SizeLimit,
+  unsigned &RegOffset) const;
   void computeInfo(CGFunctionInfo &FI) const override;
   RValue EmitVAArg(CodeGenFunction &CGF, Address VAListAddr, QualType Ty,
AggValueSlot Slot) const override;
@@ -222,127 +224,114 @@ class SparcV9ABIInfo : public ABIInfo {
 };
 } // end anonymous namespace
 
-ABIArgInfo
-SparcV9ABIInfo::classifyType(QualType Ty, unsigned SizeLimit) const {
+ABIArgInfo SparcV9ABIInfo::classifyType(QualType Ty, unsigned SizeLimit,
+unsigned &RegOffset) const {
   if (Ty->isVoidType())
 return ABIArgInfo::getIgnore();
 
-  uint64_t Size = getContext().getTypeSize(Ty);
+  auto &Context = getContext();
+  auto &VMContext = getVMContext();
+
+  uint64_t Size = Context.getTypeSize(Ty);
+  unsigned Alignment = Context.getTypeAlign(Ty);
+  bool NeedPadding = (Alignment > 64) && (RegOffset % 2 != 0);
 
   // Anything too big to fit in registers is passed with an explicit indirect
   // pointer / sret pointer.
-  if (Size > SizeLimit)
+  if (Size > SizeLimit) {
+RegOffset += 1;
 return getNaturalAlignIndirect(
 Ty, /*AddrSpace=*/getDataLayout().getAllocaAddrSpace(),
 /*ByVal=*/false);
+  }
 
   // Treat an enum type as its underlying type.
   if (const EnumType *EnumTy = Ty->getAs())
 Ty = EnumTy->getDecl()->getIntegerType();
 
   // Integer types smaller than a register are extended.
-  if (Size < 64 && Ty->isIntegerType())
+  if (Size < 64 && Ty->isIntegerType()) {
+RegOffset += 1;
 return ABIArgInfo::getExtend(Ty);
+  }
 
   if (const auto *EIT = Ty->getAs())
-if (EIT->getNumBits() < 64)
+if (EIT->getNumBits() < 64) {
+  RegOffset += 1;
   return ABIArgInfo::getExtend(Ty);
+}
 
   // Other non-aggregates go in registers.
-  if (!isAggregateTypeForABI(Ty))
+  if (!isAggregateTypeForABI(Ty)) {
+RegOffset += Size / 64;
 return ABIArgInfo::getDirect();
+  }
 
   // If a C++ object has either a non-trivial copy constructor or a non-trivial
   // destructor, it is passed with an explicit indirect pointer / sret pointer.
-  if (CGCXXABI::RecordArgABI RAA = getRecordArgABI(Ty, getCXXABI()))
+  if (CGCXXABI::RecordArgABI RAA = getRecordArgABI(Ty, getCXXABI())) {
+RegOffset += 1;
 return getNaturalAlignIndirect(Ty, getDataLayout().getAllocaAddrSpace(),
RAA == CGCXXABI::RAA_DirectInMemory);
+  }
 
   // This is a small aggregate type that should be passed in registers.
   // Build a coercion type from the LLVM struct type.
   llvm::StructType *StrTy = dyn_cast(CGT.ConvertType(Ty));
-  if (!StrTy)
+  if (!StrTy) {
+RegOffset += Size / 64;
 return ABIArgInfo::getDirect();
+  }
 
-  CoerceBuilder CB(getVMContext(), getDataLayout());
+  CoerceBuilder CB(VMContext, getDataLayout());
   CB.addStruct(0, StrTy);
   // All structs, even empty ones, should take up a register argument slot,
   // so pin the minimum struct size to one bit.
   CB.pad(llvm::alignTo(
   std::max(CB.DL.getTypeSizeInBits(StrTy).getKnownMinValue(), uint64_t(1)),
   64));
+  RegOffset += CB.Size / 64;
+
+  // If we're dealing with overaligned structs we may need to add a padding in
+  // the front, to preserve the correct register-memory mapping.
+  //
+  // See SCD 2.4.1, pages 3P-11 and 3P-12.
+  llvm::Type *Padding =
+  NeedPadding ?

[llvm-branch-commits] [clang] [llvm] [clang][SPARC] Pass 16-aligned structs with the correct alignment in CC (#155829) (PR #161766)

2025-10-02 Thread Brad Smith via llvm-branch-commits

https://github.com/brad0 edited https://github.com/llvm/llvm-project/pull/161766
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [CIR] Upstream `AddressSpace` conversions support (PR #161212)

2025-10-02 Thread David Rivera via llvm-branch-commits

https://github.com/RiverDave updated 
https://github.com/llvm/llvm-project/pull/161212

>From baaea0b9d214bd5940f4b16909d47f491918c584 Mon Sep 17 00:00:00 2001
From: David Rivera 
Date: Mon, 29 Sep 2025 11:05:44 -0400
Subject: [PATCH 1/4] [CIR] Upstream AddressSpace casting support

---
 .../CIR/Dialect/Builder/CIRBaseBuilder.h  |  9 +++
 clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp   | 41 +++
 clang/lib/CIR/CodeGen/CIRGenExpr.cpp  | 19 +-
 clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp| 22 ++
 clang/lib/CIR/CodeGen/CIRGenFunction.h|  4 ++
 clang/lib/CIR/CodeGen/CIRGenModule.cpp| 17 +
 clang/lib/CIR/CodeGen/CIRGenModule.h  |  6 ++
 clang/lib/CIR/CodeGen/CIRGenTypes.cpp |  2 +-
 clang/lib/CIR/CodeGen/TargetInfo.cpp  | 13 
 clang/lib/CIR/CodeGen/TargetInfo.h| 13 
 clang/test/CIR/address-space-conversion.cpp   | 68 +++
 11 files changed, 196 insertions(+), 18 deletions(-)
 create mode 100644 clang/test/CIR/address-space-conversion.cpp

diff --git a/clang/include/clang/CIR/Dialect/Builder/CIRBaseBuilder.h 
b/clang/include/clang/CIR/Dialect/Builder/CIRBaseBuilder.h
index b875fac9b7969..4765934582bb8 100644
--- a/clang/include/clang/CIR/Dialect/Builder/CIRBaseBuilder.h
+++ b/clang/include/clang/CIR/Dialect/Builder/CIRBaseBuilder.h
@@ -442,6 +442,15 @@ class CIRBaseBuilderTy : public mlir::OpBuilder {
 return createBitcast(src, getPointerTo(newPointeeTy));
   }
 
+  mlir::Value createAddrSpaceCast(mlir::Location loc, mlir::Value src,
+  mlir::Type newTy) {
+return createCast(loc, cir::CastKind::address_space, src, newTy);
+  }
+
+  mlir::Value createAddrSpaceCast(mlir::Value src, mlir::Type newTy) {
+return createAddrSpaceCast(src.getLoc(), src, newTy);
+  }
+
   
//======//
   // Binary Operators
   
//======//
diff --git a/clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp 
b/clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp
index cf17de144f4d9..95e392d860518 100644
--- a/clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp
@@ -58,6 +58,24 @@ static RValue emitBuiltinBitOp(CIRGenFunction &cgf, const 
CallExpr *e,
   return RValue::get(result);
 }
 
+// Initialize the alloca with the given size and alignment according to the 
lang
+// opts. Supporting only the trivial non-initialization for now.
+static void initializeAlloca(CIRGenFunction &CGF,
+ [[maybe_unused]] mlir::Value AllocaAddr,
+ [[maybe_unused]] mlir::Value Size,
+ [[maybe_unused]] CharUnits AlignmentInBytes) {
+
+  switch (CGF.getLangOpts().getTrivialAutoVarInit()) {
+  case LangOptions::TrivialAutoVarInitKind::Uninitialized:
+// Nothing to initialize.
+return;
+  case LangOptions::TrivialAutoVarInitKind::Zero:
+  case LangOptions::TrivialAutoVarInitKind::Pattern:
+assert(false && "unexpected trivial auto var init kind NYI");
+return;
+  }
+}
+
 RValue CIRGenFunction::emitRotate(const CallExpr *e, bool isRotateLeft) {
   mlir::Value input = emitScalarExpr(e->getArg(0));
   mlir::Value amount = emitScalarExpr(e->getArg(1));
@@ -172,21 +190,8 @@ RValue CIRGenFunction::emitBuiltinExpr(const GlobalDecl 
&gd, unsigned builtinID,
 builder.getUInt8Ty(), "bi_alloca", suitableAlignmentInBytes, size);
 
 // Initialize the allocated buffer if required.
-if (builtinID != Builtin::BI__builtin_alloca_uninitialized) {
-  // Initialize the alloca with the given size and alignment according to
-  // the lang opts. Only the trivial non-initialization is supported for
-  // now.
-
-  switch (getLangOpts().getTrivialAutoVarInit()) {
-  case LangOptions::TrivialAutoVarInitKind::Uninitialized:
-// Nothing to initialize.
-break;
-  case LangOptions::TrivialAutoVarInitKind::Zero:
-  case LangOptions::TrivialAutoVarInitKind::Pattern:
-cgm.errorNYI("trivial auto var init");
-break;
-  }
-}
+if (builtinID != Builtin::BI__builtin_alloca_uninitialized)
+  initializeAlloca(*this, allocaAddr, size, suitableAlignmentInBytes);
 
 // An alloca will always return a pointer to the alloca (stack) address
 // space. This address space need not be the same as the AST / Language
@@ -194,6 +199,12 @@ RValue CIRGenFunction::emitBuiltinExpr(const GlobalDecl 
&gd, unsigned builtinID,
 // the AST level this is handled within CreateTempAlloca et al., but for 
the
 // builtin / dynamic alloca we have to handle it here.
 assert(!cir::MissingFeatures::addressSpace());
+cir::AddressSpace aas = getCIRAllocaAddressSpace();
+cir::AddressSpace eas = cir::toCIRAddressSpace(
+e->getType()->getPointeeType().getAddressSpace());
+if (eas != aas) {
+  assert(false && "Non-default add

[llvm-branch-commits] [clang] [CIR] Upstream `AddressSpace` conversions support (PR #161212)

2025-10-02 Thread David Rivera via llvm-branch-commits

RiverDave wrote:

I updated this based on the recent feedback on #161028 

I made a change tn the function: `performAddrSpaceCast` and I opted for getting 
rid of the `LangAS` parameters for both source and destination, there’s to main 
reasons:

1. They were redundant and not utilized.
2. In 
[OG](https://github.com/llvm/llvm-project/blob/487cdf14f67e95f61a42389bd168b32c00995ea4/clang/lib/CodeGen/TargetInfo.cpp#L149)
 they seem to be used to name SSA values in casts? I do not think that applies 
to CIR/MLIR in any way (correct me if I’m wrong)
3. This change made not long ago: 
https://github.com/llvm/llvm-project/pull/138866 => this applies only to dest, 
however as I previously mentioned, I don't see the point in replicating OG in 
that regard.

https://github.com/llvm/llvm-project/pull/161212
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SIInsertWaitCnts] De-duplicate code (NFC) (PR #161161)

2025-10-02 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh edited 
https://github.com/llvm/llvm-project/pull/161161
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SIInsertWaitCnts] Remove redundant TII/TRI/MRI arguments (NFC) (PR #161357)

2025-10-02 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/161357

>From 73c43575873aa2bc3dfc051a49bb05fc4fc99ca9 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Mon, 29 Sep 2025 12:24:57 +0200
Subject: [PATCH] [AMDGPU][SIInsertWaitCnts] Remove redundant TII/TRI/MRI
 arguments (NFC)

WaitCntBrackets already has a pointer to its SIInsertWaitCnt instance.
With a small change, it can directly access TII/TRI/MRI that way.
This simplifies a lot of call sites which make the code easier to follow.
---
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | 121 +---
 1 file changed, 54 insertions(+), 67 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index 3f9a1f492ace5..76bfce8c0f6f9 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -418,15 +418,14 @@ class WaitcntGeneratorGFX12Plus : public WaitcntGenerator 
{
 class SIInsertWaitcnts {
 public:
   const GCNSubtarget *ST;
+  const SIInstrInfo *TII = nullptr;
+  const SIRegisterInfo *TRI = nullptr;
+  const MachineRegisterInfo *MRI = nullptr;
   InstCounterType SmemAccessCounter;
   InstCounterType MaxCounter;
   const unsigned *WaitEventMaskForInst;
 
 private:
-  const SIInstrInfo *TII = nullptr;
-  const SIRegisterInfo *TRI = nullptr;
-  const MachineRegisterInfo *MRI = nullptr;
-
   DenseMap SLoadAddresses;
   DenseMap PreheadersToFlush;
   MachineLoopInfo *MLI;
@@ -631,8 +630,6 @@ class WaitcntBrackets {
   bool merge(const WaitcntBrackets &Other);
 
   RegInterval getRegInterval(const MachineInstr *MI,
- const MachineRegisterInfo *MRI,
- const SIRegisterInfo *TRI,
  const MachineOperand &Op) const;
 
   bool counterOutOfOrder(InstCounterType T) const;
@@ -650,9 +647,7 @@ class WaitcntBrackets {
   void applyWaitcnt(const AMDGPU::Waitcnt &Wait);
   void applyWaitcnt(InstCounterType T, unsigned Count);
   void applyXcnt(const AMDGPU::Waitcnt &Wait);
-  void updateByEvent(const SIInstrInfo *TII, const SIRegisterInfo *TRI,
- const MachineRegisterInfo *MRI, WaitEventType E,
- MachineInstr &MI);
+  void updateByEvent(WaitEventType E, MachineInstr &MI);
 
   unsigned hasPendingEvent() const { return PendingEvents; }
   unsigned hasPendingEvent(WaitEventType E) const {
@@ -761,10 +756,8 @@ class WaitcntBrackets {
   void setScoreByInterval(RegInterval Interval, InstCounterType CntTy,
   unsigned Score);
 
-  void setScoreByOperand(const MachineInstr *MI, const SIRegisterInfo *TRI,
- const MachineRegisterInfo *MRI,
- const MachineOperand &Op, InstCounterType CntTy,
- unsigned Val);
+  void setScoreByOperand(const MachineInstr *MI, const MachineOperand &Op,
+ InstCounterType CntTy, unsigned Val);
 
   const SIInsertWaitcnts *Context;
 
@@ -821,12 +814,13 @@ class SIInsertWaitcntsLegacy : public MachineFunctionPass 
{
 } // end anonymous namespace
 
 RegInterval WaitcntBrackets::getRegInterval(const MachineInstr *MI,
-const MachineRegisterInfo *MRI,
-const SIRegisterInfo *TRI,
 const MachineOperand &Op) const {
   if (Op.getReg() == AMDGPU::SCC)
 return {SCC, SCC + 1};
 
+  const SIRegisterInfo *TRI = Context->TRI;
+  const MachineRegisterInfo *MRI = Context->MRI;
+
   if (!TRI->isInAllocatableClass(Op.getReg()))
 return {-1, -1};
 
@@ -891,11 +885,9 @@ void WaitcntBrackets::setScoreByInterval(RegInterval 
Interval,
 }
 
 void WaitcntBrackets::setScoreByOperand(const MachineInstr *MI,
-const SIRegisterInfo *TRI,
-const MachineRegisterInfo *MRI,
 const MachineOperand &Op,
 InstCounterType CntTy, unsigned Score) 
{
-  RegInterval Interval = getRegInterval(MI, MRI, TRI, Op);
+  RegInterval Interval = getRegInterval(MI, Op);
   setScoreByInterval(Interval, CntTy, Score);
 }
 
@@ -927,10 +919,7 @@ bool WaitcntBrackets::hasPointSamplePendingVmemTypes(
   return hasOtherPendingVmemTypes(Interval, VMEM_NOSAMPLER);
 }
 
-void WaitcntBrackets::updateByEvent(const SIInstrInfo *TII,
-const SIRegisterInfo *TRI,
-const MachineRegisterInfo *MRI,
-WaitEventType E, MachineInstr &Inst) {
+void WaitcntBrackets::updateByEvent(WaitEventType E, MachineInstr &Inst) {
   InstCounterType T = eventCounter(Context->WaitEventMaskForInst, E);
 
   unsigned UB = getScoreUB(T);
@@ -943,6 +932,10 @@ void WaitcntBrackets::updateByEvent(const SIInstrInfo *TII,
   PendingEvents |= 1 << E;
   setScoreUB(T, Cur

[llvm-branch-commits] [llvm] [AArch64][SME] Support split ZPR and PPR area allocation (PR #142392)

2025-10-02 Thread Sander de Smalen via llvm-branch-commits


@@ -531,6 +538,10 @@ bool AArch64FrameLowering::canUseRedZone(const 
MachineFunction &MF) const {
   if (!EnableRedZone)
 return false;
 
+  const AArch64FunctionInfo *AFI = MF.getInfo();
+  if (AFI->hasSplitSVEObjects())
+return false;
+

sdesmalen-arm wrote:

This should already be covered by the `AFI->hasSVEStackSize()` below.

https://github.com/llvm/llvm-project/pull/142392
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [CIR] Upstream `AddressSpace` conversions support (PR #161212)

2025-10-02 Thread David Rivera via llvm-branch-commits


@@ -172,28 +190,21 @@ RValue CIRGenFunction::emitBuiltinExpr(const GlobalDecl 
&gd, unsigned builtinID,
 builder.getUInt8Ty(), "bi_alloca", suitableAlignmentInBytes, size);
 
 // Initialize the allocated buffer if required.
-if (builtinID != Builtin::BI__builtin_alloca_uninitialized) {
-  // Initialize the alloca with the given size and alignment according to
-  // the lang opts. Only the trivial non-initialization is supported for
-  // now.
-
-  switch (getLangOpts().getTrivialAutoVarInit()) {
-  case LangOptions::TrivialAutoVarInitKind::Uninitialized:
-// Nothing to initialize.
-break;
-  case LangOptions::TrivialAutoVarInitKind::Zero:
-  case LangOptions::TrivialAutoVarInitKind::Pattern:
-cgm.errorNYI("trivial auto var init");
-break;
-  }
-}
+if (builtinID != Builtin::BI__builtin_alloca_uninitialized)
+  initializeAlloca(*this, allocaAddr, size, suitableAlignmentInBytes);
 
 // An alloca will always return a pointer to the alloca (stack) address
 // space. This address space need not be the same as the AST / Language
 // default (e.g. in C / C++ auto vars are in the generic address space). At
 // the AST level this is handled within CreateTempAlloca et al., but for 
the
 // builtin / dynamic alloca we have to handle it here.
 assert(!cir::MissingFeatures::addressSpace());
+cir::AddressSpace aas = getCIRAllocaAddressSpace();
+cir::AddressSpace eas = cir::toCIRAddressSpace(
+e->getType()->getPointeeType().getAddressSpace());
+if (eas != aas) {
+  assert(false && "Non-default address space for alloca NYI");

RiverDave wrote:

I prefer we defer this for a different PR. Reason is: We cannot simply perform 
an address space cast in a case like this where pointee types differ. (src 
differs from allocaDest)
see:

```cpp
//cpp
void test_builtin_alloca_addrspace() {
  // Alloca happens in default address space (0), then we cast to address space 
1
  void *raw_ptr = __builtin_alloca(sizeof(int));
  int __attribute__((address_space(1))) *as1_ptr = 
  (int __attribute__((address_space(1))) *)raw_ptr;
}

//cir
"cir.func"() <{dso_local, function_type = !cir.func<()>, global_visibility = 
#cir, linkage = 0 : i32, sym_name = 
"_Z29test_builtin_alloca_addrspacev"}> ({
%0 = "cir.alloca"() <{alignment = 8 : i64, allocaType = !cir.ptr, 
init, name = "raw_ptr"}> : () -> !cir.ptr> loc(#loc11)
%1 = "cir.const"() <{value = #cir.int<4> : !u64i}> : () -> !u64i loc(#loc12)
%2 = "cir.alloca"(%1) <{alignment = 16 : i64, allocaType = !u8i, name = 
"bi_alloca"}> : (!u64i) -> !cir.ptr loc(#loc13)
%3 = "cir.alloca"() <{alignment = 8 : i64, allocaType = !cir.ptr, init, name = "as1_ptr"}> : () -> 
!cir.ptr> loc(#loc14)
%4 = "cir.cast"(%2) <{kind = 1 : i32}> : (!cir.ptr) -> 
!cir.ptr loc(#loc13)
"cir.store"(%4, %0) <{alignment = 8 : i64}> : (!cir.ptr, 
!cir.ptr>) -> () loc(#loc11)
%5 = "cir.load"(%0) <{alignment = 8 : i64}> : (!cir.ptr>) 
-> !cir.ptr loc(#loc9)
/* CAST IS INVALID HERE =>*/%6 = "cir.cast"(%5) <{kind = 63 : i32}> : 
(!cir.ptr) -> !cir.ptr loc(#loc9)
"cir.store"(%6, %3) <{alignment = 8 : i64}> : (!cir.ptr, !cir.ptr>) 
-> () loc(#loc14)
"cir.return"() : () -> () loc(#loc2)
  }) : () -> () loc(#loc10)
  // OG:
  
  
```

As you can see, we’d hit the verifier (address casts are valid as long as both 
pointees are of the same type). I assume this is not the case in OG since they 
moved to utilize opaque/generic ptrs.

I assume the solution is to introduce an intermediate bitcast. (I experimented 
this on my own and worked fine, OG IR generated seemed to be equal). But again 
if you want we can dive more into that in the future.

https://github.com/llvm/llvm-project/pull/161212
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [CIR] Upstream `AddressSpace` conversions support (PR #161212)

2025-10-02 Thread David Rivera via llvm-branch-commits

https://github.com/RiverDave updated 
https://github.com/llvm/llvm-project/pull/161212

>From baaea0b9d214bd5940f4b16909d47f491918c584 Mon Sep 17 00:00:00 2001
From: David Rivera 
Date: Mon, 29 Sep 2025 11:05:44 -0400
Subject: [PATCH 1/4] [CIR] Upstream AddressSpace casting support

---
 .../CIR/Dialect/Builder/CIRBaseBuilder.h  |  9 +++
 clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp   | 41 +++
 clang/lib/CIR/CodeGen/CIRGenExpr.cpp  | 19 +-
 clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp| 22 ++
 clang/lib/CIR/CodeGen/CIRGenFunction.h|  4 ++
 clang/lib/CIR/CodeGen/CIRGenModule.cpp| 17 +
 clang/lib/CIR/CodeGen/CIRGenModule.h  |  6 ++
 clang/lib/CIR/CodeGen/CIRGenTypes.cpp |  2 +-
 clang/lib/CIR/CodeGen/TargetInfo.cpp  | 13 
 clang/lib/CIR/CodeGen/TargetInfo.h| 13 
 clang/test/CIR/address-space-conversion.cpp   | 68 +++
 11 files changed, 196 insertions(+), 18 deletions(-)
 create mode 100644 clang/test/CIR/address-space-conversion.cpp

diff --git a/clang/include/clang/CIR/Dialect/Builder/CIRBaseBuilder.h 
b/clang/include/clang/CIR/Dialect/Builder/CIRBaseBuilder.h
index b875fac9b7969..4765934582bb8 100644
--- a/clang/include/clang/CIR/Dialect/Builder/CIRBaseBuilder.h
+++ b/clang/include/clang/CIR/Dialect/Builder/CIRBaseBuilder.h
@@ -442,6 +442,15 @@ class CIRBaseBuilderTy : public mlir::OpBuilder {
 return createBitcast(src, getPointerTo(newPointeeTy));
   }
 
+  mlir::Value createAddrSpaceCast(mlir::Location loc, mlir::Value src,
+  mlir::Type newTy) {
+return createCast(loc, cir::CastKind::address_space, src, newTy);
+  }
+
+  mlir::Value createAddrSpaceCast(mlir::Value src, mlir::Type newTy) {
+return createAddrSpaceCast(src.getLoc(), src, newTy);
+  }
+
   
//======//
   // Binary Operators
   
//======//
diff --git a/clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp 
b/clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp
index cf17de144f4d9..95e392d860518 100644
--- a/clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp
@@ -58,6 +58,24 @@ static RValue emitBuiltinBitOp(CIRGenFunction &cgf, const 
CallExpr *e,
   return RValue::get(result);
 }
 
+// Initialize the alloca with the given size and alignment according to the 
lang
+// opts. Supporting only the trivial non-initialization for now.
+static void initializeAlloca(CIRGenFunction &CGF,
+ [[maybe_unused]] mlir::Value AllocaAddr,
+ [[maybe_unused]] mlir::Value Size,
+ [[maybe_unused]] CharUnits AlignmentInBytes) {
+
+  switch (CGF.getLangOpts().getTrivialAutoVarInit()) {
+  case LangOptions::TrivialAutoVarInitKind::Uninitialized:
+// Nothing to initialize.
+return;
+  case LangOptions::TrivialAutoVarInitKind::Zero:
+  case LangOptions::TrivialAutoVarInitKind::Pattern:
+assert(false && "unexpected trivial auto var init kind NYI");
+return;
+  }
+}
+
 RValue CIRGenFunction::emitRotate(const CallExpr *e, bool isRotateLeft) {
   mlir::Value input = emitScalarExpr(e->getArg(0));
   mlir::Value amount = emitScalarExpr(e->getArg(1));
@@ -172,21 +190,8 @@ RValue CIRGenFunction::emitBuiltinExpr(const GlobalDecl 
&gd, unsigned builtinID,
 builder.getUInt8Ty(), "bi_alloca", suitableAlignmentInBytes, size);
 
 // Initialize the allocated buffer if required.
-if (builtinID != Builtin::BI__builtin_alloca_uninitialized) {
-  // Initialize the alloca with the given size and alignment according to
-  // the lang opts. Only the trivial non-initialization is supported for
-  // now.
-
-  switch (getLangOpts().getTrivialAutoVarInit()) {
-  case LangOptions::TrivialAutoVarInitKind::Uninitialized:
-// Nothing to initialize.
-break;
-  case LangOptions::TrivialAutoVarInitKind::Zero:
-  case LangOptions::TrivialAutoVarInitKind::Pattern:
-cgm.errorNYI("trivial auto var init");
-break;
-  }
-}
+if (builtinID != Builtin::BI__builtin_alloca_uninitialized)
+  initializeAlloca(*this, allocaAddr, size, suitableAlignmentInBytes);
 
 // An alloca will always return a pointer to the alloca (stack) address
 // space. This address space need not be the same as the AST / Language
@@ -194,6 +199,12 @@ RValue CIRGenFunction::emitBuiltinExpr(const GlobalDecl 
&gd, unsigned builtinID,
 // the AST level this is handled within CreateTempAlloca et al., but for 
the
 // builtin / dynamic alloca we have to handle it here.
 assert(!cir::MissingFeatures::addressSpace());
+cir::AddressSpace aas = getCIRAllocaAddressSpace();
+cir::AddressSpace eas = cir::toCIRAddressSpace(
+e->getType()->getPointeeType().getAddressSpace());
+if (eas != aas) {
+  assert(false && "Non-default add

[llvm-branch-commits] [llvm] [AMDGPU][MC] Avoid creating lit64() operands unless asked or needed. (PR #161191)

2025-10-02 Thread Ivan Kosarev via llvm-branch-commits

https://github.com/kosarev ready_for_review 
https://github.com/llvm/llvm-project/pull/161191
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [clang][SPARC] Pass 16-aligned structs with the correct alignment in CC (PR #161766)

2025-10-02 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Brad Smith (brad0)


Changes

Pad argument registers to preserve overaligned structs in LLVM IR.
Additionally, since i128 values will be lowered as split i64 pairs in
the backend, correctly set the alignment of such arguments as 16 bytes.

This should make clang compliant with the ABI specification and fix
https://github.com/llvm/llvm-project/issues/144709.

(cherry picked from commit 6679e43937a87db3ce59a02f0cfc86951a4881e4)

---
Full diff: https://github.com/llvm/llvm-project/pull/161766.diff


4 Files Affected:

- (modified) clang/lib/CodeGen/Targets/Sparc.cpp (+57-68) 
- (modified) clang/test/CodeGen/sparcv9-abi.c (+56) 
- (modified) llvm/lib/Target/Sparc/SparcISelLowering.cpp (+2-1) 
- (modified) llvm/test/CodeGen/SPARC/64abi.ll (+14-1) 


``diff
diff --git a/clang/lib/CodeGen/Targets/Sparc.cpp 
b/clang/lib/CodeGen/Targets/Sparc.cpp
index 9642196b78c63..0461f121d76c9 100644
--- a/clang/lib/CodeGen/Targets/Sparc.cpp
+++ b/clang/lib/CodeGen/Targets/Sparc.cpp
@@ -8,6 +8,7 @@
 
 #include "ABIInfoImpl.h"
 #include "TargetInfo.h"
+#include 
 
 using namespace clang;
 using namespace clang::CodeGen;
@@ -109,7 +110,8 @@ class SparcV9ABIInfo : public ABIInfo {
   SparcV9ABIInfo(CodeGenTypes &CGT) : ABIInfo(CGT) {}
 
 private:
-  ABIArgInfo classifyType(QualType RetTy, unsigned SizeLimit) const;
+  ABIArgInfo classifyType(QualType RetTy, unsigned SizeLimit,
+  unsigned &RegOffset) const;
   void computeInfo(CGFunctionInfo &FI) const override;
   RValue EmitVAArg(CodeGenFunction &CGF, Address VAListAddr, QualType Ty,
AggValueSlot Slot) const override;
@@ -222,127 +224,114 @@ class SparcV9ABIInfo : public ABIInfo {
 };
 } // end anonymous namespace
 
-ABIArgInfo
-SparcV9ABIInfo::classifyType(QualType Ty, unsigned SizeLimit) const {
+ABIArgInfo SparcV9ABIInfo::classifyType(QualType Ty, unsigned SizeLimit,
+unsigned &RegOffset) const {
   if (Ty->isVoidType())
 return ABIArgInfo::getIgnore();
 
-  uint64_t Size = getContext().getTypeSize(Ty);
+  auto &Context = getContext();
+  auto &VMContext = getVMContext();
+
+  uint64_t Size = Context.getTypeSize(Ty);
+  unsigned Alignment = Context.getTypeAlign(Ty);
+  bool NeedPadding = (Alignment > 64) && (RegOffset % 2 != 0);
 
   // Anything too big to fit in registers is passed with an explicit indirect
   // pointer / sret pointer.
-  if (Size > SizeLimit)
+  if (Size > SizeLimit) {
+RegOffset += 1;
 return getNaturalAlignIndirect(
 Ty, /*AddrSpace=*/getDataLayout().getAllocaAddrSpace(),
 /*ByVal=*/false);
+  }
 
   // Treat an enum type as its underlying type.
   if (const EnumType *EnumTy = Ty->getAs())
 Ty = EnumTy->getDecl()->getIntegerType();
 
   // Integer types smaller than a register are extended.
-  if (Size < 64 && Ty->isIntegerType())
+  if (Size < 64 && Ty->isIntegerType()) {
+RegOffset += 1;
 return ABIArgInfo::getExtend(Ty);
+  }
 
   if (const auto *EIT = Ty->getAs())
-if (EIT->getNumBits() < 64)
+if (EIT->getNumBits() < 64) {
+  RegOffset += 1;
   return ABIArgInfo::getExtend(Ty);
+}
 
   // Other non-aggregates go in registers.
-  if (!isAggregateTypeForABI(Ty))
+  if (!isAggregateTypeForABI(Ty)) {
+RegOffset += Size / 64;
 return ABIArgInfo::getDirect();
+  }
 
   // If a C++ object has either a non-trivial copy constructor or a non-trivial
   // destructor, it is passed with an explicit indirect pointer / sret pointer.
-  if (CGCXXABI::RecordArgABI RAA = getRecordArgABI(Ty, getCXXABI()))
+  if (CGCXXABI::RecordArgABI RAA = getRecordArgABI(Ty, getCXXABI())) {
+RegOffset += 1;
 return getNaturalAlignIndirect(Ty, getDataLayout().getAllocaAddrSpace(),
RAA == CGCXXABI::RAA_DirectInMemory);
+  }
 
   // This is a small aggregate type that should be passed in registers.
   // Build a coercion type from the LLVM struct type.
   llvm::StructType *StrTy = dyn_cast(CGT.ConvertType(Ty));
-  if (!StrTy)
+  if (!StrTy) {
+RegOffset += Size / 64;
 return ABIArgInfo::getDirect();
+  }
 
-  CoerceBuilder CB(getVMContext(), getDataLayout());
+  CoerceBuilder CB(VMContext, getDataLayout());
   CB.addStruct(0, StrTy);
   // All structs, even empty ones, should take up a register argument slot,
   // so pin the minimum struct size to one bit.
   CB.pad(llvm::alignTo(
   std::max(CB.DL.getTypeSizeInBits(StrTy).getKnownMinValue(), uint64_t(1)),
   64));
+  RegOffset += CB.Size / 64;
+
+  // If we're dealing with overaligned structs we may need to add a padding in
+  // the front, to preserve the correct register-memory mapping.
+  //
+  // See SCD 2.4.1, pages 3P-11 and 3P-12.
+  llvm::Type *Padding =
+  NeedPadding ? llvm::Type::getInt64Ty(VMContext) : nullptr;
+  RegOffset += NeedPadding ? 1 : 0;
 
   // Try to use the original type for coercion.
   llvm::Type *CoerceTy = CB.isUsableType(StrTy) ?

[llvm-branch-commits] [clang] [llvm] [clang][SPARC] Pass 16-aligned structs with the correct alignment in CC (PR #161766)

2025-10-02 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-sparc

Author: Brad Smith (brad0)


Changes

Pad argument registers to preserve overaligned structs in LLVM IR.
Additionally, since i128 values will be lowered as split i64 pairs in
the backend, correctly set the alignment of such arguments as 16 bytes.

This should make clang compliant with the ABI specification and fix
https://github.com/llvm/llvm-project/issues/144709.

(cherry picked from commit 6679e43937a87db3ce59a02f0cfc86951a4881e4)

---
Full diff: https://github.com/llvm/llvm-project/pull/161766.diff


4 Files Affected:

- (modified) clang/lib/CodeGen/Targets/Sparc.cpp (+57-68) 
- (modified) clang/test/CodeGen/sparcv9-abi.c (+56) 
- (modified) llvm/lib/Target/Sparc/SparcISelLowering.cpp (+2-1) 
- (modified) llvm/test/CodeGen/SPARC/64abi.ll (+14-1) 


``diff
diff --git a/clang/lib/CodeGen/Targets/Sparc.cpp 
b/clang/lib/CodeGen/Targets/Sparc.cpp
index 9642196b78c63..0461f121d76c9 100644
--- a/clang/lib/CodeGen/Targets/Sparc.cpp
+++ b/clang/lib/CodeGen/Targets/Sparc.cpp
@@ -8,6 +8,7 @@
 
 #include "ABIInfoImpl.h"
 #include "TargetInfo.h"
+#include 
 
 using namespace clang;
 using namespace clang::CodeGen;
@@ -109,7 +110,8 @@ class SparcV9ABIInfo : public ABIInfo {
   SparcV9ABIInfo(CodeGenTypes &CGT) : ABIInfo(CGT) {}
 
 private:
-  ABIArgInfo classifyType(QualType RetTy, unsigned SizeLimit) const;
+  ABIArgInfo classifyType(QualType RetTy, unsigned SizeLimit,
+  unsigned &RegOffset) const;
   void computeInfo(CGFunctionInfo &FI) const override;
   RValue EmitVAArg(CodeGenFunction &CGF, Address VAListAddr, QualType Ty,
AggValueSlot Slot) const override;
@@ -222,127 +224,114 @@ class SparcV9ABIInfo : public ABIInfo {
 };
 } // end anonymous namespace
 
-ABIArgInfo
-SparcV9ABIInfo::classifyType(QualType Ty, unsigned SizeLimit) const {
+ABIArgInfo SparcV9ABIInfo::classifyType(QualType Ty, unsigned SizeLimit,
+unsigned &RegOffset) const {
   if (Ty->isVoidType())
 return ABIArgInfo::getIgnore();
 
-  uint64_t Size = getContext().getTypeSize(Ty);
+  auto &Context = getContext();
+  auto &VMContext = getVMContext();
+
+  uint64_t Size = Context.getTypeSize(Ty);
+  unsigned Alignment = Context.getTypeAlign(Ty);
+  bool NeedPadding = (Alignment > 64) && (RegOffset % 2 != 0);
 
   // Anything too big to fit in registers is passed with an explicit indirect
   // pointer / sret pointer.
-  if (Size > SizeLimit)
+  if (Size > SizeLimit) {
+RegOffset += 1;
 return getNaturalAlignIndirect(
 Ty, /*AddrSpace=*/getDataLayout().getAllocaAddrSpace(),
 /*ByVal=*/false);
+  }
 
   // Treat an enum type as its underlying type.
   if (const EnumType *EnumTy = Ty->getAs())
 Ty = EnumTy->getDecl()->getIntegerType();
 
   // Integer types smaller than a register are extended.
-  if (Size < 64 && Ty->isIntegerType())
+  if (Size < 64 && Ty->isIntegerType()) {
+RegOffset += 1;
 return ABIArgInfo::getExtend(Ty);
+  }
 
   if (const auto *EIT = Ty->getAs())
-if (EIT->getNumBits() < 64)
+if (EIT->getNumBits() < 64) {
+  RegOffset += 1;
   return ABIArgInfo::getExtend(Ty);
+}
 
   // Other non-aggregates go in registers.
-  if (!isAggregateTypeForABI(Ty))
+  if (!isAggregateTypeForABI(Ty)) {
+RegOffset += Size / 64;
 return ABIArgInfo::getDirect();
+  }
 
   // If a C++ object has either a non-trivial copy constructor or a non-trivial
   // destructor, it is passed with an explicit indirect pointer / sret pointer.
-  if (CGCXXABI::RecordArgABI RAA = getRecordArgABI(Ty, getCXXABI()))
+  if (CGCXXABI::RecordArgABI RAA = getRecordArgABI(Ty, getCXXABI())) {
+RegOffset += 1;
 return getNaturalAlignIndirect(Ty, getDataLayout().getAllocaAddrSpace(),
RAA == CGCXXABI::RAA_DirectInMemory);
+  }
 
   // This is a small aggregate type that should be passed in registers.
   // Build a coercion type from the LLVM struct type.
   llvm::StructType *StrTy = dyn_cast(CGT.ConvertType(Ty));
-  if (!StrTy)
+  if (!StrTy) {
+RegOffset += Size / 64;
 return ABIArgInfo::getDirect();
+  }
 
-  CoerceBuilder CB(getVMContext(), getDataLayout());
+  CoerceBuilder CB(VMContext, getDataLayout());
   CB.addStruct(0, StrTy);
   // All structs, even empty ones, should take up a register argument slot,
   // so pin the minimum struct size to one bit.
   CB.pad(llvm::alignTo(
   std::max(CB.DL.getTypeSizeInBits(StrTy).getKnownMinValue(), uint64_t(1)),
   64));
+  RegOffset += CB.Size / 64;
+
+  // If we're dealing with overaligned structs we may need to add a padding in
+  // the front, to preserve the correct register-memory mapping.
+  //
+  // See SCD 2.4.1, pages 3P-11 and 3P-12.
+  llvm::Type *Padding =
+  NeedPadding ? llvm::Type::getInt64Ty(VMContext) : nullptr;
+  RegOffset += NeedPadding ? 1 : 0;
 
   // Try to use the original type for coercion.
   llvm::Type *CoerceTy = CB.isUsableType(

[llvm-branch-commits] [llvm] AMDGPU: Remove LDS_DIRECT_CLASS register class (PR #161762)

2025-10-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/161762

This is a singleton register class which is a bad idea,
and not actually used.

>From 628c706149875d7ca11df6bf3cfb76a7571591b9 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Fri, 3 Oct 2025 10:21:10 +0900
Subject: [PATCH] AMDGPU: Remove LDS_DIRECT_CLASS register class

This is a singleton register class which is a bad idea,
and not actually used.
---
 llvm/lib/Target/AMDGPU/SIRegisterInfo.td  |  20 +-
 .../GlobalISel/irtranslator-inline-asm.ll |   2 +-
 .../coalesce-copy-to-agpr-to-av-registers.mir | 232 +-
 ...class-vgpr-mfma-to-av-with-load-source.mir |  12 +-
 ...al-regcopy-and-spill-missed-at-regalloc.ll |  16 +-
 ...lloc-failure-overlapping-insert-assert.mir |  12 +-
 .../rewrite-vgpr-mfma-to-agpr-copy-from.mir   |   4 +-
 ...gpr-mfma-to-agpr-subreg-insert-extract.mir |  12 +-
 ...te-vgpr-mfma-to-agpr-subreg-src2-chain.mir |  32 +--
 .../CodeGen/AMDGPU/spill-vector-superclass.ll |   2 +-
 .../Inputs/amdgpu_isel.ll.expected|   4 +-
 11 files changed, 171 insertions(+), 177 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td 
b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
index 500c60bc0f22a..b0305cfb7acb4 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
@@ -761,12 +761,6 @@ def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", 
Reg128Types.types, 32,
   let BaseClassOrder = 1;
 }
 
-def LDS_DIRECT_CLASS : RegisterClass<"AMDGPU", [i32], 32,
-  (add LDS_DIRECT)> {
-  let isAllocatable = 0;
-  let CopyCost = -1;
-}
-
 let GeneratePressureSet = 0, HasSGPR = 1 in {
 // Subset of SReg_32 without M0 for SMRD instructions and alike.
 // See comments in SIInstructions.td for more info.
@@ -829,7 +823,7 @@ def SGPR_NULL256 : SIReg<"null">;
 
 let GeneratePressureSet = 0 in {
 def SRegOrLds_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, 
v2f16, v2bf16], 32,
-  (add SReg_32, LDS_DIRECT_CLASS)> {
+  (add SReg_32, LDS_DIRECT)> {
   let isAllocatable = 0;
   let HasSGPR = 1;
   let Size = 32;
@@ -968,7 +962,7 @@ defm "" : SRegClass<32, Reg1024Types.types, SGPR_1024Regs>;
 }
 
 def VRegOrLds_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, 
v2f16, v2bf16], 32,
- (add VGPR_32, LDS_DIRECT_CLASS)> {
+ (add VGPR_32, LDS_DIRECT)> {
   let isAllocatable = 0;
   let HasVGPR = 1;
   let Size = 32;
@@ -1083,21 +1077,21 @@ def VReg_1 : SIRegisterClass<"AMDGPU", [i1], 32, (add)> 
{
 }
 
 def VS_16 : SIRegisterClass<"AMDGPU", Reg16Types.types, 16,
-  (add VGPR_16, SReg_32, LDS_DIRECT_CLASS)> {
+  (add VGPR_16, SReg_32, LDS_DIRECT)> {
   let isAllocatable = 0;
   let HasVGPR = 1;
   let Size = 16;
 }
 
 def VS_16_Lo128 : SIRegisterClass<"AMDGPU", Reg16Types.types, 16,
-  (add VGPR_16_Lo128, SReg_32, LDS_DIRECT_CLASS)> {
+  (add VGPR_16_Lo128, SReg_32, LDS_DIRECT)> {
   let isAllocatable = 0;
   let HasVGPR = 1;
   let Size = 16;
 }
 
 def VS_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, v2f16, 
v2bf16], 32,
-  (add VGPR_32, SReg_32, LDS_DIRECT_CLASS)> {
+  (add VGPR_32, SReg_32, LDS_DIRECT)> {
   let isAllocatable = 0;
   let HasVGPR = 1;
   let HasSGPR = 1;
@@ -1105,7 +1099,7 @@ def VS_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, 
f16, bf16, v2i16, v2f16, v
 }
 
 def VS_32_Lo128 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, 
v2f16, v2bf16], 32,
-  (add VGPR_32_Lo128, SReg_32, LDS_DIRECT_CLASS)> {
+  (add VGPR_32_Lo128, SReg_32, LDS_DIRECT)> {
   let isAllocatable = 0;
   let HasVGPR = 1;
   let HasSGPR = 1;
@@ -1113,7 +1107,7 @@ def VS_32_Lo128 : SIRegisterClass<"AMDGPU", [i32, f32, 
i16, f16, bf16, v2i16, v2
 }
 
 def VS_32_Lo256 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, 
v2f16, v2bf16], 32,
-  (add VGPR_32_Lo256, SReg_32, 
LDS_DIRECT_CLASS)> {
+  (add VGPR_32_Lo256, SReg_32, LDS_DIRECT)> {
   let isAllocatable = 0;
   let HasVGPR = 1;
   let HasSGPR = 1;
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll
index a54dc9dda16e0..e5cd0710359ac 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll
@@ -136,7 +136,7 @@ define float @test_multiple_register_outputs_same() #0 {
 define double @test_multiple_register_outputs_mixed() #0 {
   ; CHECK-LABEL: name: test_multiple_register_outputs_mixed
   ; CHECK: bb.1 (%ir-block.0):
-  ; CHECK-NEXT:   INLINEASM &"v_mov_b32 $0, 0; v_add_f64 $1, 0, 0", 0 /* 
attdialect */, 1835018 /* regdef:VGPR_32 */, def %8, 3473418 /* regdef

[llvm-branch-commits] [llvm] AMDGPU: Remove LDS_DIRECT_CLASS register class (PR #161762)

2025-10-02 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

This is a singleton register class which is a bad idea,
and not actually used.

---

Patch is 88.23 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/161762.diff


11 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.td (+7-13) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll 
(+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/coalesce-copy-to-agpr-to-av-registers.mir 
(+116-116) 
- (modified) 
llvm/test/CodeGen/AMDGPU/inflate-reg-class-vgpr-mfma-to-av-with-load-source.mir 
(+6-6) 
- (modified) 
llvm/test/CodeGen/AMDGPU/partial-regcopy-and-spill-missed-at-regalloc.ll (+8-8) 
- (modified) 
llvm/test/CodeGen/AMDGPU/regalloc-failure-overlapping-insert-assert.mir (+6-6) 
- (modified) llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir 
(+2-2) 
- (modified) 
llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-subreg-insert-extract.mir 
(+6-6) 
- (modified) 
llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-subreg-src2-chain.mir 
(+16-16) 
- (modified) llvm/test/CodeGen/AMDGPU/spill-vector-superclass.ll (+1-1) 
- (modified) 
llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_isel.ll.expected
 (+2-2) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td 
b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
index 500c60bc0f22a..b0305cfb7acb4 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
@@ -761,12 +761,6 @@ def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", 
Reg128Types.types, 32,
   let BaseClassOrder = 1;
 }
 
-def LDS_DIRECT_CLASS : RegisterClass<"AMDGPU", [i32], 32,
-  (add LDS_DIRECT)> {
-  let isAllocatable = 0;
-  let CopyCost = -1;
-}
-
 let GeneratePressureSet = 0, HasSGPR = 1 in {
 // Subset of SReg_32 without M0 for SMRD instructions and alike.
 // See comments in SIInstructions.td for more info.
@@ -829,7 +823,7 @@ def SGPR_NULL256 : SIReg<"null">;
 
 let GeneratePressureSet = 0 in {
 def SRegOrLds_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, 
v2f16, v2bf16], 32,
-  (add SReg_32, LDS_DIRECT_CLASS)> {
+  (add SReg_32, LDS_DIRECT)> {
   let isAllocatable = 0;
   let HasSGPR = 1;
   let Size = 32;
@@ -968,7 +962,7 @@ defm "" : SRegClass<32, Reg1024Types.types, SGPR_1024Regs>;
 }
 
 def VRegOrLds_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, 
v2f16, v2bf16], 32,
- (add VGPR_32, LDS_DIRECT_CLASS)> {
+ (add VGPR_32, LDS_DIRECT)> {
   let isAllocatable = 0;
   let HasVGPR = 1;
   let Size = 32;
@@ -1083,21 +1077,21 @@ def VReg_1 : SIRegisterClass<"AMDGPU", [i1], 32, (add)> 
{
 }
 
 def VS_16 : SIRegisterClass<"AMDGPU", Reg16Types.types, 16,
-  (add VGPR_16, SReg_32, LDS_DIRECT_CLASS)> {
+  (add VGPR_16, SReg_32, LDS_DIRECT)> {
   let isAllocatable = 0;
   let HasVGPR = 1;
   let Size = 16;
 }
 
 def VS_16_Lo128 : SIRegisterClass<"AMDGPU", Reg16Types.types, 16,
-  (add VGPR_16_Lo128, SReg_32, LDS_DIRECT_CLASS)> {
+  (add VGPR_16_Lo128, SReg_32, LDS_DIRECT)> {
   let isAllocatable = 0;
   let HasVGPR = 1;
   let Size = 16;
 }
 
 def VS_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, v2f16, 
v2bf16], 32,
-  (add VGPR_32, SReg_32, LDS_DIRECT_CLASS)> {
+  (add VGPR_32, SReg_32, LDS_DIRECT)> {
   let isAllocatable = 0;
   let HasVGPR = 1;
   let HasSGPR = 1;
@@ -1105,7 +1099,7 @@ def VS_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, 
f16, bf16, v2i16, v2f16, v
 }
 
 def VS_32_Lo128 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, 
v2f16, v2bf16], 32,
-  (add VGPR_32_Lo128, SReg_32, LDS_DIRECT_CLASS)> {
+  (add VGPR_32_Lo128, SReg_32, LDS_DIRECT)> {
   let isAllocatable = 0;
   let HasVGPR = 1;
   let HasSGPR = 1;
@@ -1113,7 +1107,7 @@ def VS_32_Lo128 : SIRegisterClass<"AMDGPU", [i32, f32, 
i16, f16, bf16, v2i16, v2
 }
 
 def VS_32_Lo256 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, 
v2f16, v2bf16], 32,
-  (add VGPR_32_Lo256, SReg_32, 
LDS_DIRECT_CLASS)> {
+  (add VGPR_32_Lo256, SReg_32, LDS_DIRECT)> {
   let isAllocatable = 0;
   let HasVGPR = 1;
   let HasSGPR = 1;
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll
index a54dc9dda16e0..e5cd0710359ac 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll
@@ -136,7 +136,7 @@ define float @test_multiple_register_outputs_same() #0 {
 define double @test_multiple_register_outputs_mixed() #0 {
   ; CHECK-LABEL: name: test

[llvm-branch-commits] [llvm] AMDGPU: Remove LDS_DIRECT_CLASS register class (PR #161762)

2025-10-02 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/161762?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#161762** https://app.graphite.dev/github/pr/llvm/llvm-project/161762?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/161762?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#161758** https://app.graphite.dev/github/pr/llvm/llvm-project/161758?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/161762
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [clang][SPARC] Pass 16-aligned structs with the correct alignment in CC (PR #161766)

2025-10-02 Thread Brad Smith via llvm-branch-commits

https://github.com/brad0 created 
https://github.com/llvm/llvm-project/pull/161766

Pad argument registers to preserve overaligned structs in LLVM IR.
Additionally, since i128 values will be lowered as split i64 pairs in
the backend, correctly set the alignment of such arguments as 16 bytes.

This should make clang compliant with the ABI specification and fix
https://github.com/llvm/llvm-project/issues/144709.

(cherry picked from commit 6679e43937a87db3ce59a02f0cfc86951a4881e4)

>From f3a79ba5a9c310d908e9fab7f9246b107178c1de Mon Sep 17 00:00:00 2001
From: Koakuma 
Date: Thu, 2 Oct 2025 22:22:07 -0400
Subject: [PATCH] [clang][SPARC] Pass 16-aligned structs with the correct
 alignment in CC

Pad argument registers to preserve overaligned structs in LLVM IR.
Additionally, since i128 values will be lowered as split i64 pairs in
the backend, correctly set the alignment of such arguments as 16 bytes.

This should make clang compliant with the ABI specification and fix
https://github.com/llvm/llvm-project/issues/144709.

(cherry picked from commit 6679e43937a87db3ce59a02f0cfc86951a4881e4)
---
 clang/lib/CodeGen/Targets/Sparc.cpp | 125 +---
 clang/test/CodeGen/sparcv9-abi.c|  56 +
 llvm/lib/Target/Sparc/SparcISelLowering.cpp |   3 +-
 llvm/test/CodeGen/SPARC/64abi.ll|  15 ++-
 4 files changed, 129 insertions(+), 70 deletions(-)

diff --git a/clang/lib/CodeGen/Targets/Sparc.cpp 
b/clang/lib/CodeGen/Targets/Sparc.cpp
index 9642196b78c63..0461f121d76c9 100644
--- a/clang/lib/CodeGen/Targets/Sparc.cpp
+++ b/clang/lib/CodeGen/Targets/Sparc.cpp
@@ -8,6 +8,7 @@
 
 #include "ABIInfoImpl.h"
 #include "TargetInfo.h"
+#include 
 
 using namespace clang;
 using namespace clang::CodeGen;
@@ -109,7 +110,8 @@ class SparcV9ABIInfo : public ABIInfo {
   SparcV9ABIInfo(CodeGenTypes &CGT) : ABIInfo(CGT) {}
 
 private:
-  ABIArgInfo classifyType(QualType RetTy, unsigned SizeLimit) const;
+  ABIArgInfo classifyType(QualType RetTy, unsigned SizeLimit,
+  unsigned &RegOffset) const;
   void computeInfo(CGFunctionInfo &FI) const override;
   RValue EmitVAArg(CodeGenFunction &CGF, Address VAListAddr, QualType Ty,
AggValueSlot Slot) const override;
@@ -222,127 +224,114 @@ class SparcV9ABIInfo : public ABIInfo {
 };
 } // end anonymous namespace
 
-ABIArgInfo
-SparcV9ABIInfo::classifyType(QualType Ty, unsigned SizeLimit) const {
+ABIArgInfo SparcV9ABIInfo::classifyType(QualType Ty, unsigned SizeLimit,
+unsigned &RegOffset) const {
   if (Ty->isVoidType())
 return ABIArgInfo::getIgnore();
 
-  uint64_t Size = getContext().getTypeSize(Ty);
+  auto &Context = getContext();
+  auto &VMContext = getVMContext();
+
+  uint64_t Size = Context.getTypeSize(Ty);
+  unsigned Alignment = Context.getTypeAlign(Ty);
+  bool NeedPadding = (Alignment > 64) && (RegOffset % 2 != 0);
 
   // Anything too big to fit in registers is passed with an explicit indirect
   // pointer / sret pointer.
-  if (Size > SizeLimit)
+  if (Size > SizeLimit) {
+RegOffset += 1;
 return getNaturalAlignIndirect(
 Ty, /*AddrSpace=*/getDataLayout().getAllocaAddrSpace(),
 /*ByVal=*/false);
+  }
 
   // Treat an enum type as its underlying type.
   if (const EnumType *EnumTy = Ty->getAs())
 Ty = EnumTy->getDecl()->getIntegerType();
 
   // Integer types smaller than a register are extended.
-  if (Size < 64 && Ty->isIntegerType())
+  if (Size < 64 && Ty->isIntegerType()) {
+RegOffset += 1;
 return ABIArgInfo::getExtend(Ty);
+  }
 
   if (const auto *EIT = Ty->getAs())
-if (EIT->getNumBits() < 64)
+if (EIT->getNumBits() < 64) {
+  RegOffset += 1;
   return ABIArgInfo::getExtend(Ty);
+}
 
   // Other non-aggregates go in registers.
-  if (!isAggregateTypeForABI(Ty))
+  if (!isAggregateTypeForABI(Ty)) {
+RegOffset += Size / 64;
 return ABIArgInfo::getDirect();
+  }
 
   // If a C++ object has either a non-trivial copy constructor or a non-trivial
   // destructor, it is passed with an explicit indirect pointer / sret pointer.
-  if (CGCXXABI::RecordArgABI RAA = getRecordArgABI(Ty, getCXXABI()))
+  if (CGCXXABI::RecordArgABI RAA = getRecordArgABI(Ty, getCXXABI())) {
+RegOffset += 1;
 return getNaturalAlignIndirect(Ty, getDataLayout().getAllocaAddrSpace(),
RAA == CGCXXABI::RAA_DirectInMemory);
+  }
 
   // This is a small aggregate type that should be passed in registers.
   // Build a coercion type from the LLVM struct type.
   llvm::StructType *StrTy = dyn_cast(CGT.ConvertType(Ty));
-  if (!StrTy)
+  if (!StrTy) {
+RegOffset += Size / 64;
 return ABIArgInfo::getDirect();
+  }
 
-  CoerceBuilder CB(getVMContext(), getDataLayout());
+  CoerceBuilder CB(VMContext, getDataLayout());
   CB.addStruct(0, StrTy);
   // All structs, even empty ones, should take up a register argument slot,
   // so pin the minimum struct size to one bit.
 

[llvm-branch-commits] [clang] [llvm] [clang][SPARC] Pass 16-aligned structs with the correct alignment in CC (PR #161766)

2025-10-02 Thread Brad Smith via llvm-branch-commits

https://github.com/brad0 milestoned 
https://github.com/llvm/llvm-project/pull/161766
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][SME] Support split ZPR and PPR area allocation (PR #142392)

2025-10-02 Thread Sander de Smalen via llvm-branch-commits


@@ -2277,6 +2351,70 @@ void AArch64FrameLowering::determineStackHazardSlot(
   << StackHazardSize << "\n");
 AFI->setStackHazardSlotIndex(ID);
   }
+
+  // Determine if we should use SplitSVEObjects. This should only be used if
+  // there's a possibility of a stack hazard between PPRs and ZPRs or FPRs.
+  if (SplitSVEObjects) {
+if (!HasPPRCSRs && !HasPPRStackObjects) {
+  LLVM_DEBUG(
+  dbgs() << "Not using SplitSVEObjects as no PPRs are on the stack\n");
+  return;
+}
+
+if (!HasFPRCSRs && !HasFPRStackObjects) {
+  LLVM_DEBUG(
+  dbgs()
+  << "Not using SplitSVEObjects as no FPRs or ZPRs are on the 
stack\n");
+  return;
+}
+
+const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
+if (MFI.hasVarSizedObjects() || TRI->hasStackRealignment(MF)) {
+  LLVM_DEBUG(dbgs() << "SplitSVEObjects is not supported with variable "
+   "sized objects or realignment\n");
+  return;
+}
+
+if (arePPRsSpilledAsZPR(MF)) {
+  LLVM_DEBUG(dbgs() << "SplitSVEObjects is not supported with "
+   "-aarch64-enable-zpr-predicate-spills");
+  return;
+}
+
+[[maybe_unused]] const AArch64Subtarget &Subtarget =
+MF.getSubtarget();
+assert(Subtarget.isSVEorStreamingSVEAvailable() &&
+   "Expected SVE to be available for PPRs");
+
+// With SplitSVEObjects the CS hazard padding is placed between the
+// PPRs and ZPRs. If there are any FPR CS there would be a hazard between
+// them and the CS GRPs. Avoid this by promoting all FPR CS to ZPRs.
+BitVector FPRZRegs(SavedRegs.size());
+for (size_t Reg = 0, E = SavedRegs.size(); HasFPRCSRs && Reg < E; ++Reg) {
+  BitVector::reference RegBit = SavedRegs[Reg];
+  if (!RegBit)
+continue;
+  unsigned SubRegIdx = 0;
+  if (AArch64::FPR64RegClass.contains(Reg))
+SubRegIdx = AArch64::dsub;
+  else if (AArch64::FPR128RegClass.contains(Reg))
+SubRegIdx = AArch64::zsub; // TODO: Is the the right sub-register?

sdesmalen-arm wrote:

it is the correct sub-register. It should be named `qsub`, but unfortunately 
renaming it `qsub` causes TableGen to run into a latent bug.

https://github.com/llvm/llvm-project/pull/142392
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [CIR] Upstream `AddressSpace` conversions support (PR #161212)

2025-10-02 Thread David Rivera via llvm-branch-commits

https://github.com/RiverDave deleted 
https://github.com/llvm/llvm-project/pull/161212
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [mlir] [openmp] [Flang] Add standalone tile support (PR #160298)

2025-10-02 Thread Michael Kruse via llvm-branch-commits

https://github.com/Meinersbur updated 
https://github.com/llvm/llvm-project/pull/160298

>From bfe9c6b642ebc01f113dbf0a574e424e83f7162a Mon Sep 17 00:00:00 2001
From: Michael Kruse 
Date: Tue, 23 Sep 2025 15:33:52 +0200
Subject: [PATCH 1/4] [flang] Add standalone tile support

---
 flang/lib/Lower/OpenMP/ClauseProcessor.cpp|  13 +
 flang/lib/Lower/OpenMP/ClauseProcessor.h  |   2 +
 flang/lib/Lower/OpenMP/OpenMP.cpp | 360 --
 flang/lib/Lower/OpenMP/Utils.cpp  |  23 +-
 flang/lib/Lower/OpenMP/Utils.h|   7 +
 .../lib/Semantics/check-directive-structure.h |   7 +-
 flang/lib/Semantics/check-omp-structure.cpp   |   8 +-
 flang/lib/Semantics/resolve-directives.cpp|  16 +-
 flang/test/Lower/OpenMP/tile01.f90|  58 +++
 flang/test/Lower/OpenMP/tile02.f90|  88 +
 .../loop-transformation-construct02.f90   |   5 +-
 flang/test/Parser/OpenMP/tile-fail.f90|  32 ++
 flang/test/Parser/OpenMP/tile.f90 |  15 +-
 flang/test/Semantics/OpenMP/tile01.f90|  26 ++
 flang/test/Semantics/OpenMP/tile02.f90|  15 +
 flang/test/Semantics/OpenMP/tile03.f90|  15 +
 flang/test/Semantics/OpenMP/tile04.f90|  38 ++
 flang/test/Semantics/OpenMP/tile05.f90|  14 +
 flang/test/Semantics/OpenMP/tile06.f90|  44 +++
 flang/test/Semantics/OpenMP/tile07.f90|  35 ++
 flang/test/Semantics/OpenMP/tile08.f90|  15 +
 llvm/include/llvm/Frontend/OpenMP/OMP.td  |   3 +
 openmp/runtime/test/transform/tile/intfor.f90 |  31 ++
 .../runtime/test/transform/tile/intfor_2d.f90 |  53 +++
 .../transform/tile/intfor_2d_varsizes.F90 |  60 +++
 25 files changed, 841 insertions(+), 142 deletions(-)
 create mode 100644 flang/test/Lower/OpenMP/tile01.f90
 create mode 100644 flang/test/Lower/OpenMP/tile02.f90
 create mode 100644 flang/test/Parser/OpenMP/tile-fail.f90
 create mode 100644 flang/test/Semantics/OpenMP/tile01.f90
 create mode 100644 flang/test/Semantics/OpenMP/tile02.f90
 create mode 100644 flang/test/Semantics/OpenMP/tile03.f90
 create mode 100644 flang/test/Semantics/OpenMP/tile04.f90
 create mode 100644 flang/test/Semantics/OpenMP/tile05.f90
 create mode 100644 flang/test/Semantics/OpenMP/tile06.f90
 create mode 100644 flang/test/Semantics/OpenMP/tile07.f90
 create mode 100644 flang/test/Semantics/OpenMP/tile08.f90
 create mode 100644 openmp/runtime/test/transform/tile/intfor.f90
 create mode 100644 openmp/runtime/test/transform/tile/intfor_2d.f90
 create mode 100644 openmp/runtime/test/transform/tile/intfor_2d_varsizes.F90

diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp 
b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
index a96884f5680ba..55eda7e3404c1 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
@@ -431,6 +431,19 @@ bool ClauseProcessor::processNumTasks(
   return false;
 }
 
+bool ClauseProcessor::processSizes(StatementContext &stmtCtx,
+   mlir::omp::SizesClauseOps &result) const {
+  if (auto *clause = findUniqueClause()) {
+result.sizes.reserve(clause->v.size());
+for (const ExprTy &vv : clause->v)
+  result.sizes.push_back(fir::getBase(converter.genExprValue(vv, 
stmtCtx)));
+
+return true;
+  }
+
+  return false;
+}
+
 bool ClauseProcessor::processNumTeams(
 lower::StatementContext &stmtCtx,
 mlir::omp::NumTeamsClauseOps &result) const {
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.h 
b/flang/lib/Lower/OpenMP/ClauseProcessor.h
index 324ea3c1047a5..9e352fa574a97 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.h
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.h
@@ -66,6 +66,8 @@ class ClauseProcessor {
   mlir::omp::LoopRelatedClauseOps &loopResult,
   mlir::omp::CollapseClauseOps &collapseResult,
   llvm::SmallVectorImpl &iv) const;
+  bool processSizes(StatementContext &stmtCtx,
+mlir::omp::SizesClauseOps &result) const;
   bool processDevice(lower::StatementContext &stmtCtx,
  mlir::omp::DeviceClauseOps &result) const;
   bool processDeviceType(mlir::omp::DeviceTypeClauseOps &result) const;
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp 
b/flang/lib/Lower/OpenMP/OpenMP.cpp
index 5681be664d450..7812d9fe00be2 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -1984,125 +1984,241 @@ genLoopOp(lower::AbstractConverter &converter, 
lower::SymMap &symTable,
   return loopOp;
 }
 
-static mlir::omp::CanonicalLoopOp
-genCanonicalLoopOp(lower::AbstractConverter &converter, lower::SymMap 
&symTable,
-   semantics::SemanticsContext &semaCtx,
-   lower::pft::Evaluation &eval, mlir::Location loc,
-   const ConstructQueue &queue,
-   ConstructQueue::const_iterator item,
-   llvm::ArrayRef ivs,
-   llvm::omp::Directive directive)

[llvm-branch-commits] [llvm] DAG: Remove TargetLowering::checkForPhysRegDependency (PR #161787)

2025-10-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/161787

I have no idea why this was here. The only implementation was AMDGPU,
which was essentially repeating the generic logic but for one specific
case.

>From 2d6ea1f0f59ad06eef7fd4f87522a76e9544bb00 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Fri, 3 Oct 2025 13:56:08 +0900
Subject: [PATCH] DAG: Remove TargetLowering::checkForPhysRegDependency

I have no idea why this was here. The only implementation was AMDGPU,
which was essentially repeating the generic logic but for one specific
case.
---
 llvm/include/llvm/CodeGen/TargetLowering.h| 17 -
 .../SelectionDAG/ScheduleDAGSDNodes.cpp   |  7 +-
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 25 ---
 llvm/lib/Target/AMDGPU/SIISelLowering.h   |  5 
 4 files changed, 1 insertion(+), 53 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h 
b/llvm/include/llvm/CodeGen/TargetLowering.h
index 7bbad172b2d42..88691b931a8d8 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -4654,23 +4654,6 @@ class LLVM_ABI TargetLowering : public 
TargetLoweringBase {
 return false;
   }
 
-  /// Allows the target to handle physreg-carried dependency
-  /// in target-specific way. Used from the ScheduleDAGSDNodes to decide 
whether
-  /// to add the edge to the dependency graph.
-  /// Def - input: Selection DAG node defininfg physical register
-  /// User - input: Selection DAG node using physical register
-  /// Op - input: Number of User operand
-  /// PhysReg - inout: set to the physical register if the edge is
-  /// necessary, unchanged otherwise
-  /// Cost - inout: physical register copy cost.
-  /// Returns 'true' is the edge is necessary, 'false' otherwise
-  virtual bool checkForPhysRegDependency(SDNode *Def, SDNode *User, unsigned 
Op,
- const TargetRegisterInfo *TRI,
- const TargetInstrInfo *TII,
- MCRegister &PhysReg, int &Cost) const 
{
-return false;
-  }
-
   /// Target-specific combining of register parts into its original value
   virtual SDValue
   joinRegisterPartsIntoValue(SelectionDAG &DAG, const SDLoc &DL,
diff --git a/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
index 79022295f0abd..4f4fb9c759ad7 100644
--- a/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
@@ -111,15 +111,11 @@ SUnit *ScheduleDAGSDNodes::Clone(SUnit *Old) {
 static void CheckForPhysRegDependency(SDNode *Def, SDNode *User, unsigned Op,
   const TargetRegisterInfo *TRI,
   const TargetInstrInfo *TII,
-  const TargetLowering &TLI,
   MCRegister &PhysReg, int &Cost) {
   if (Op != 2 || User->getOpcode() != ISD::CopyToReg)
 return;
 
   Register Reg = cast(User->getOperand(1))->getReg();
-  if (TLI.checkForPhysRegDependency(Def, User, Op, TRI, TII, PhysReg, Cost))
-return;
-
   if (Reg.isVirtual())
 return;
 
@@ -490,8 +486,7 @@ void ScheduleDAGSDNodes::AddSchedEdges() {
 MCRegister PhysReg;
 int Cost = 1;
 // Determine if this is a physical register dependency.
-const TargetLowering &TLI = DAG->getTargetLoweringInfo();
-CheckForPhysRegDependency(OpN, N, i, TRI, TII, TLI, PhysReg, Cost);
+CheckForPhysRegDependency(OpN, N, i, TRI, TII, PhysReg, Cost);
 assert((!PhysReg || !isChain) && "Chain dependence via physreg data?");
 // FIXME: See ScheduleDAGSDNodes::EmitCopyFromReg. For now, scheduler
 // emits a copy from the physical register to a virtual register unless
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 79876ff37b97a..e2334577884b7 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -18860,31 +18860,6 @@ SITargetLowering::getTargetMMOFlags(const Instruction 
&I) const {
   return Flags;
 }
 
-bool SITargetLowering::checkForPhysRegDependency(
-SDNode *Def, SDNode *User, unsigned Op, const TargetRegisterInfo *TRI,
-const TargetInstrInfo *TII, MCRegister &PhysReg, int &Cost) const {
-  if (User->getOpcode() != ISD::CopyToReg)
-return false;
-  if (!Def->isMachineOpcode())
-return false;
-  MachineSDNode *MDef = dyn_cast(Def);
-  if (!MDef)
-return false;
-
-  unsigned ResNo = User->getOperand(Op).getResNo();
-  if (User->getOperand(Op)->getValueType(ResNo) != MVT::i1)
-return false;
-  const MCInstrDesc &II = TII->get(MDef->getMachineOpcode());
-  if (II.isCompare() && II.hasImplicitDefOfPhysReg(AMDGPU::SCC)) {
-PhysReg = AMDGPU::SCC;
-const TargetRegisterClass *RC =
-

[llvm-branch-commits] [llvm] DAG: Remove TargetLowering::checkForPhysRegDependency (PR #161787)

2025-10-02 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/161787?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#161787** https://app.graphite.dev/github/pr/llvm/llvm-project/161787?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/161787?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#161786** https://app.graphite.dev/github/pr/llvm/llvm-project/161786?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/161787
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Remove TargetLowering::checkForPhysRegDependency (PR #161787)

2025-10-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/161787
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Remove TargetLowering::checkForPhysRegDependency (PR #161787)

2025-10-02 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-regalloc

Author: Matt Arsenault (arsenm)


Changes

I have no idea why this was here. The only implementation was AMDGPU,
which was essentially repeating the generic logic but for one specific
case.

---
Full diff: https://github.com/llvm/llvm-project/pull/161787.diff


4 Files Affected:

- (modified) llvm/include/llvm/CodeGen/TargetLowering.h (-17) 
- (modified) llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp (+1-6) 
- (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (-25) 
- (modified) llvm/lib/Target/AMDGPU/SIISelLowering.h (-5) 


``diff
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h 
b/llvm/include/llvm/CodeGen/TargetLowering.h
index 7bbad172b2d42..88691b931a8d8 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -4654,23 +4654,6 @@ class LLVM_ABI TargetLowering : public 
TargetLoweringBase {
 return false;
   }
 
-  /// Allows the target to handle physreg-carried dependency
-  /// in target-specific way. Used from the ScheduleDAGSDNodes to decide 
whether
-  /// to add the edge to the dependency graph.
-  /// Def - input: Selection DAG node defininfg physical register
-  /// User - input: Selection DAG node using physical register
-  /// Op - input: Number of User operand
-  /// PhysReg - inout: set to the physical register if the edge is
-  /// necessary, unchanged otherwise
-  /// Cost - inout: physical register copy cost.
-  /// Returns 'true' is the edge is necessary, 'false' otherwise
-  virtual bool checkForPhysRegDependency(SDNode *Def, SDNode *User, unsigned 
Op,
- const TargetRegisterInfo *TRI,
- const TargetInstrInfo *TII,
- MCRegister &PhysReg, int &Cost) const 
{
-return false;
-  }
-
   /// Target-specific combining of register parts into its original value
   virtual SDValue
   joinRegisterPartsIntoValue(SelectionDAG &DAG, const SDLoc &DL,
diff --git a/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
index 79022295f0abd..4f4fb9c759ad7 100644
--- a/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
@@ -111,15 +111,11 @@ SUnit *ScheduleDAGSDNodes::Clone(SUnit *Old) {
 static void CheckForPhysRegDependency(SDNode *Def, SDNode *User, unsigned Op,
   const TargetRegisterInfo *TRI,
   const TargetInstrInfo *TII,
-  const TargetLowering &TLI,
   MCRegister &PhysReg, int &Cost) {
   if (Op != 2 || User->getOpcode() != ISD::CopyToReg)
 return;
 
   Register Reg = cast(User->getOperand(1))->getReg();
-  if (TLI.checkForPhysRegDependency(Def, User, Op, TRI, TII, PhysReg, Cost))
-return;
-
   if (Reg.isVirtual())
 return;
 
@@ -490,8 +486,7 @@ void ScheduleDAGSDNodes::AddSchedEdges() {
 MCRegister PhysReg;
 int Cost = 1;
 // Determine if this is a physical register dependency.
-const TargetLowering &TLI = DAG->getTargetLoweringInfo();
-CheckForPhysRegDependency(OpN, N, i, TRI, TII, TLI, PhysReg, Cost);
+CheckForPhysRegDependency(OpN, N, i, TRI, TII, PhysReg, Cost);
 assert((!PhysReg || !isChain) && "Chain dependence via physreg data?");
 // FIXME: See ScheduleDAGSDNodes::EmitCopyFromReg. For now, scheduler
 // emits a copy from the physical register to a virtual register unless
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 79876ff37b97a..e2334577884b7 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -18860,31 +18860,6 @@ SITargetLowering::getTargetMMOFlags(const Instruction 
&I) const {
   return Flags;
 }
 
-bool SITargetLowering::checkForPhysRegDependency(
-SDNode *Def, SDNode *User, unsigned Op, const TargetRegisterInfo *TRI,
-const TargetInstrInfo *TII, MCRegister &PhysReg, int &Cost) const {
-  if (User->getOpcode() != ISD::CopyToReg)
-return false;
-  if (!Def->isMachineOpcode())
-return false;
-  MachineSDNode *MDef = dyn_cast(Def);
-  if (!MDef)
-return false;
-
-  unsigned ResNo = User->getOperand(Op).getResNo();
-  if (User->getOperand(Op)->getValueType(ResNo) != MVT::i1)
-return false;
-  const MCInstrDesc &II = TII->get(MDef->getMachineOpcode());
-  if (II.isCompare() && II.hasImplicitDefOfPhysReg(AMDGPU::SCC)) {
-PhysReg = AMDGPU::SCC;
-const TargetRegisterClass *RC =
-TRI->getMinimalPhysRegClass(PhysReg, Def->getSimpleValueType(ResNo));
-Cost = RC->expensiveOrImpossibleToCopy() ? -1 : RC->getCopyCost();
-return true;
-  }
-  return false;
-}
-
 void SITargetLowering::emitExpandAtomicAddrSpacePredicate(
 Instruction *AI) cons

[llvm-branch-commits] [llvm] AMDGPU: Stop trying to constrain register class of post-RA-pseudos (PR #161792)

2025-10-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/161792

This is trying to constrain the register class of a physical register,
which makes no sense.

>From cf04ae0c7c664d2fba2995eae7ce825fa4badd95 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Fri, 3 Oct 2025 14:29:14 +0900
Subject: [PATCH] AMDGPU: Stop trying to constrain register class of
 post-RA-pseudos

This is trying to constrain the register class of a physical register,
which makes no sense.
---
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 2 --
 1 file changed, 2 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index fe6b8b96cbd57..cda8069936af2 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -2112,8 +2112,6 @@ bool SIInstrInfo::expandPostRAPseudo(MachineInstr &MI) 
const {
 
   case AMDGPU::SI_RESTORE_S32_FROM_VGPR:
 MI.setDesc(get(AMDGPU::V_READLANE_B32));
-MI.getMF()->getRegInfo().constrainRegClass(MI.getOperand(0).getReg(),
-   &AMDGPU::SReg_32_XM0RegClass);
 break;
   case AMDGPU::AV_MOV_B32_IMM_PSEUDO: {
 Register Dst = MI.getOperand(0).getReg();

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Fix trying to constrain physical registers in spill handling (PR #161793)

2025-10-02 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

It's nonsensical to call constrainRegClass on a physical register,
and we should not see virtual registers here.

---
Full diff: https://github.com/llvm/llvm-project/pull/161793.diff


2 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/SIFrameLowering.cpp (+2-1) 
- (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp (+3-7) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
index e4b3528b432bb..0189e7b90ca94 100644
--- a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
@@ -306,7 +306,8 @@ class PrologEpilogSGPRSpillBuilder {
 
   buildEpilogRestore(ST, TRI, *FuncInfo, LiveUnits, MF, MBB, MI, DL,
  TmpVGPR, FI, FrameReg, DwordOff);
-  MRI.constrainRegClass(SubReg, &AMDGPU::SReg_32_XM0RegClass);
+  assert(SubReg.isPhysical());
+
   BuildMI(MBB, MI, DL, TII->get(AMDGPU::V_READFIRSTLANE_B32), SubReg)
   .addReg(TmpVGPR, RegState::Kill);
   DwordOff += 4;
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index 205237fefe785..3c2dd4252c583 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -,8 +,6 @@ bool 
SIRegisterInfo::spillEmergencySGPR(MachineBasicBlock::iterator MI,
 // Don't need to write VGPR out.
   }
 
-  MachineRegisterInfo &MRI = MI->getMF()->getRegInfo();
-
   // Restore clobbered registers in the specified restore block.
   MI = RestoreMBB.end();
   SB.setMI(&RestoreMBB, MI);
@@ -2238,7 +2236,8 @@ bool 
SIRegisterInfo::spillEmergencySGPR(MachineBasicBlock::iterator MI,
   SB.NumSubRegs == 1
   ? SB.SuperReg
   : Register(getSubReg(SB.SuperReg, SB.SplitParts[i]));
-  MRI.constrainRegClass(SubReg, &AMDGPU::SReg_32_XM0RegClass);
+
+  assert(SubReg.isPhysical());
   bool LastSubReg = (i + 1 == e);
   auto MIB = BuildMI(*SB.MBB, MI, SB.DL, 
SB.TII.get(AMDGPU::V_READLANE_B32),
  SubReg)
@@ -3059,8 +3058,7 @@ bool 
SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
 if (IsSALU && LiveSCC) {
   Register NewDest;
   if (IsCopy) {
-MF->getRegInfo().constrainRegClass(ResultReg,
-   &AMDGPU::SReg_32_XM0RegClass);
+assert(ResultReg.isPhysical());
 NewDest = ResultReg;
   } else {
 NewDest = 
RS->scavengeRegisterBackwards(AMDGPU::SReg_32_XM0RegClass,
@@ -3190,8 +3188,6 @@ bool 
SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
 
 Register NewDest;
 if (IsCopy) {
-  MF->getRegInfo().constrainRegClass(ResultReg,
- &AMDGPU::SReg_32_XM0RegClass);
   NewDest = ResultReg;
 } else {
   NewDest = RS->scavengeRegisterBackwards(

``




https://github.com/llvm/llvm-project/pull/161793
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Fix constrain register logic for physregs (PR #161794)

2025-10-02 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

We do not need to reconstrain physical registers. Enables an
additional fold for constant physregs.

---

Patch is 260.21 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/161794.diff


6 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/SIFoldOperands.cpp (+2-1) 
- (modified) llvm/test/CodeGen/AMDGPU/addrspacecast-gas.ll (+9-16) 
- (modified) llvm/test/CodeGen/AMDGPU/atomics-system-scope.ll (+200-210) 
- (modified) llvm/test/CodeGen/AMDGPU/flat-saddr-atomics.ll (+416-665) 
- (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.is.private.ll (+17-37) 
- (modified) llvm/test/CodeGen/AMDGPU/scale-offset-flat.ll (+8-12) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp 
b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index fed37788802b9..82789bc4968c5 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -722,7 +722,8 @@ bool SIFoldOperandsImpl::updateOperand(FoldCandidate &Fold) 
const {
 return false;
 }
 
-if (!MRI->constrainRegClass(New->getReg(), ConstrainRC)) {
+if (New->getReg().isVirtual() &&
+!MRI->constrainRegClass(New->getReg(), ConstrainRC)) {
   LLVM_DEBUG(dbgs() << "Cannot constrain " << printReg(New->getReg(), TRI)
 << TRI->getRegClassName(ConstrainRC) << '\n');
   return false;
diff --git a/llvm/test/CodeGen/AMDGPU/addrspacecast-gas.ll 
b/llvm/test/CodeGen/AMDGPU/addrspacecast-gas.ll
index aac499f2fc602..b486fabb19497 100644
--- a/llvm/test/CodeGen/AMDGPU/addrspacecast-gas.ll
+++ b/llvm/test/CodeGen/AMDGPU/addrspacecast-gas.ll
@@ -9,15 +9,14 @@ target triple = "amdgcn-amd-amdhsa"
 define amdgpu_kernel void @use_private_to_flat_addrspacecast(ptr addrspace(5) 
%ptr) {
 ; GFX1250-SDAG-LABEL: use_private_to_flat_addrspacecast:
 ; GFX1250-SDAG:   ; %bb.0:
-; GFX1250-SDAG-NEXT:s_load_b32 s2, s[4:5], 0x24
+; GFX1250-SDAG-NEXT:s_load_b32 s0, s[4:5], 0x24
 ; GFX1250-SDAG-NEXT:v_mbcnt_lo_u32_b32 v0, -1, 0
-; GFX1250-SDAG-NEXT:s_mov_b64 s[0:1], src_flat_scratch_base_lo
 ; GFX1250-SDAG-NEXT:s_wait_kmcnt 0x0
 ; GFX1250-SDAG-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_2) | 
instid1(VALU_DEP_1)
-; GFX1250-SDAG-NEXT:v_dual_mov_b32 v0, s2 :: v_dual_lshlrev_b32 v1, 20, v0
-; GFX1250-SDAG-NEXT:s_cmp_lg_u32 s2, -1
+; GFX1250-SDAG-NEXT:v_dual_mov_b32 v0, s0 :: v_dual_lshlrev_b32 v1, 20, v0
+; GFX1250-SDAG-NEXT:s_cmp_lg_u32 s0, -1
 ; GFX1250-SDAG-NEXT:s_cselect_b32 vcc_lo, -1, 0
-; GFX1250-SDAG-NEXT:v_add_nc_u64_e32 v[0:1], s[0:1], v[0:1]
+; GFX1250-SDAG-NEXT:v_add_nc_u64_e32 v[0:1], src_flat_scratch_base_lo, 
v[0:1]
 ; GFX1250-SDAG-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | 
instid1(VALU_DEP_2)
 ; GFX1250-SDAG-NEXT:v_dual_mov_b32 v2, 0 :: v_dual_cndmask_b32 v1, 0, v1
 ; GFX1250-SDAG-NEXT:v_cndmask_b32_e32 v0, 0, v0, vcc_lo
@@ -56,13 +55,11 @@ define amdgpu_kernel void 
@use_private_to_flat_addrspacecast_nonnull(ptr addrspa
 ; GFX1250-SDAG:   ; %bb.0:
 ; GFX1250-SDAG-NEXT:s_load_b32 s0, s[4:5], 0x24
 ; GFX1250-SDAG-NEXT:v_mbcnt_lo_u32_b32 v0, -1, 0
-; GFX1250-SDAG-NEXT:s_delay_alu instid0(VALU_DEP_1)
+; GFX1250-SDAG-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_2) | 
instid1(VALU_DEP_1)
 ; GFX1250-SDAG-NEXT:v_dual_mov_b32 v2, 0 :: v_dual_lshlrev_b32 v1, 20, v0
 ; GFX1250-SDAG-NEXT:s_wait_kmcnt 0x0
 ; GFX1250-SDAG-NEXT:v_mov_b32_e32 v0, s0
-; GFX1250-SDAG-NEXT:s_mov_b64 s[0:1], src_flat_scratch_base_lo
-; GFX1250-SDAG-NEXT:s_delay_alu instid0(VALU_DEP_1) | instid1(SALU_CYCLE_1)
-; GFX1250-SDAG-NEXT:v_add_nc_u64_e32 v[0:1], s[0:1], v[0:1]
+; GFX1250-SDAG-NEXT:v_add_nc_u64_e32 v[0:1], src_flat_scratch_base_lo, 
v[0:1]
 ; GFX1250-SDAG-NEXT:flat_store_b32 v[0:1], v2 scope:SCOPE_SYS
 ; GFX1250-SDAG-NEXT:s_wait_storecnt 0x0
 ; GFX1250-SDAG-NEXT:s_endpgm
@@ -91,10 +88,9 @@ define amdgpu_kernel void 
@use_flat_to_private_addrspacecast(ptr %ptr) {
 ; GFX1250-LABEL: use_flat_to_private_addrspacecast:
 ; GFX1250:   ; %bb.0:
 ; GFX1250-NEXT:s_load_b64 s[0:1], s[4:5], 0x24
-; GFX1250-NEXT:s_mov_b32 s2, src_flat_scratch_base_lo
 ; GFX1250-NEXT:v_mov_b32_e32 v0, 0
 ; GFX1250-NEXT:s_wait_kmcnt 0x0
-; GFX1250-NEXT:s_sub_co_i32 s2, s0, s2
+; GFX1250-NEXT:s_sub_co_i32 s2, s0, src_flat_scratch_base_lo
 ; GFX1250-NEXT:s_cmp_lg_u64 s[0:1], 0
 ; GFX1250-NEXT:s_cselect_b32 s0, s2, -1
 ; GFX1250-NEXT:scratch_store_b32 off, v0, s0 scope:SCOPE_SYS
@@ -110,9 +106,8 @@ define amdgpu_kernel void 
@use_flat_to_private_addrspacecast_nonnull(ptr %ptr) {
 ; GFX1250-SDAG:   ; %bb.0:
 ; GFX1250-SDAG-NEXT:s_load_b32 s0, s[4:5], 0x24
 ; GFX1250-SDAG-NEXT:v_mov_b32_e32 v0, 0
-; GFX1250-SDAG-NEXT:s_mov_b32 s1, src_flat_scratch_base_lo
 ; GFX1250-SDAG-NEXT:s_wait_kmcnt 0x0
-; G

[llvm-branch-commits] [llvm] AMDGPU: Stop trying to constrain register class of post-RA-pseudos (PR #161792)

2025-10-02 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/161792?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#161795** https://app.graphite.dev/github/pr/llvm/llvm-project/161795?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#161794** https://app.graphite.dev/github/pr/llvm/llvm-project/161794?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#161793** https://app.graphite.dev/github/pr/llvm/llvm-project/161793?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#161792** https://app.graphite.dev/github/pr/llvm/llvm-project/161792?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/161792?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#161790** https://app.graphite.dev/github/pr/llvm/llvm-project/161790?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/161792
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Stop trying to constrain register class of post-RA-pseudos (PR #161792)

2025-10-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/161792
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] CodeGen: Stop checking for physregs in constrainRegClass (PR #161795)

2025-10-02 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/161795?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#161795** https://app.graphite.dev/github/pr/llvm/llvm-project/161795?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/161795?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#161794** https://app.graphite.dev/github/pr/llvm/llvm-project/161794?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#161793** https://app.graphite.dev/github/pr/llvm/llvm-project/161793?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#161792** https://app.graphite.dev/github/pr/llvm/llvm-project/161792?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#161790** https://app.graphite.dev/github/pr/llvm/llvm-project/161790?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/161795
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] CodeGen: Stop checking for physregs in constrainRegClass (PR #161795)

2025-10-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/161795
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Fix constrain register logic for physregs (PR #161794)

2025-10-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/161794
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [AllocToken, Clang] Implement TypeHashPointerSplit mode (PR #156840)

2025-10-02 Thread Hans Wennborg via llvm-branch-commits


@@ -0,0 +1,301 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 5

zmodem wrote:

I think it would be clearer to hand-write the test, focused entirely on the 
call instructions and their metadata.

https://github.com/llvm/llvm-project/pull/156840
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LV] Bundle partial reductions inside VPExpressionRecipe (PR #147302)

2025-10-02 Thread Sam Tebbs via llvm-branch-commits


@@ -2955,12 +2966,14 @@ 
tryToMatchAndCreateMulAccumulateReduction(VPReductionRecipe *Red,
 
 // Match reduce.add(mul(ext, ext)).
 if (RecipeA && RecipeB &&
-(RecipeA->getOpcode() == RecipeB->getOpcode() || A == B) &&
+(RecipeA->getOpcode() == RecipeB->getOpcode() || A == B ||
+ IsPartialReduction) &&
 match(RecipeA, m_ZExtOrSExt(m_VPValue())) &&
 match(RecipeB, m_ZExtOrSExt(m_VPValue())) &&
-IsMulAccValidAndClampRange(RecipeA->getOpcode() ==
-   Instruction::CastOps::ZExt,
-   MulR, RecipeA, RecipeB, nullptr, Sub)) {
+(IsPartialReduction ||

SamTebbs33 wrote:

Currently, we collect the scaled reductions in `collectScaledReductions` in 
`LoopVectorize.cpp` and clamp the range there with those then becoming partial 
reductions later on in `LoopVectorize.cpp` via `tryToCreatePartialReduction`. 
This all happens before the abstract recipe conversion transform runs so, as  
things stand, we need to have created the partial reductions before this 
transform pass. If we were to move the clamping from `LoopVectorize.cpp` to 
here, then (in `LoopVectorize.cpp`) we'd have to create partial reductions for 
all VFs or none of them, which won't work well.

My ideal outcome is to move the clamping and partial reduction creation code to 
this transform pass, but that will be a bigger change that is outside of the 
scope of this PR. Having the shortcut here should be fine since the VF ranges 
have already been clamped properly so the plan state is valid.

https://github.com/llvm/llvm-project/pull/147302
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Clang] Introduce -fsanitize=alloc-token (PR #156839)

2025-10-02 Thread Hans Wennborg via llvm-branch-commits


@@ -0,0 +1,58 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 6

zmodem wrote:

I would suggest hand-writing the expectations instead, to make it more clear 
what are actually the interesting bits being checked for.

I put some comments below, but IIUC this test should just boil down to 
verifying that `test_malloc` calls `__alloc_token_malloc` and 
`no_sanitize_malloc` calls regular `malloc`. I think that can be expressed much 
more clearly.

https://github.com/llvm/llvm-project/pull/156839
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Update code sequence for CU-mode Release Fences in GFX10+ (PR #161638)

2025-10-02 Thread via llvm-branch-commits

github-actions[bot] wrote:




:warning: undef deprecator found issues in your code. :warning:



You can test this locally with the following command:


``bash
git diff -U0 --pickaxe-regex -S 
'([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 'HEAD~1' HEAD 
llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp 
llvm/test/CodeGen/AMDGPU/GlobalISel/memory-legalizer-atomic-fence.ll 
llvm/test/CodeGen/AMDGPU/lds-dma-workgroup-release.ll 
llvm/test/CodeGen/AMDGPU/memory-legalizer-barriers.ll 
llvm/test/CodeGen/AMDGPU/memory-legalizer-fence-mmra-global.ll 
llvm/test/CodeGen/AMDGPU/memory-legalizer-fence.ll 
llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-volatile.ll 
llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-workgroup.ll 
llvm/test/CodeGen/AMDGPU/memory-legalizer-global-volatile.ll 
llvm/test/CodeGen/AMDGPU/memory-legalizer-global-workgroup.ll 
llvm/test/CodeGen/AMDGPU/memory-legalizer-local-agent.ll 
llvm/test/CodeGen/AMDGPU/memory-legalizer-local-cluster.ll 
llvm/test/CodeGen/AMDGPU/memory-legalizer-local-system.ll 
llvm/test/CodeGen/AMDGPU/memory-legalizer-local-volatile.ll 
llvm/test/CodeGen/AMDGPU/memory-legalizer-local-workgroup.ll
``




The following files introduce new uses of undef:
 - llvm/test/CodeGen/AMDGPU/GlobalISel/memory-legalizer-atomic-fence.ll

[Undef](https://llvm.org/docs/LangRef.html#undefined-values) is now deprecated 
and should only be used in the rare cases where no replacement is possible. For 
example, a load of uninitialized memory yields `undef`. You should use `poison` 
values for placeholders instead.

In tests, avoid using `undef` and having tests that trigger undefined behavior. 
If you need an operand with some unimportant value, you can add a new argument 
to the function and use that instead.

For example, this is considered a bad practice:
```llvm
define void @fn() {
  ...
  br i1 undef, ...
}
```

Please use the following instead:
```llvm
define void @fn(i1 %cond) {
  ...
  br i1 %cond, ...
}
```

Please refer to the [Undefined Behavior 
Manual](https://llvm.org/docs/UndefinedBehavior.html) for more information.



https://github.com/llvm/llvm-project/pull/161638
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Update code sequence for CU-mode Release Fences in GFX10+ (PR #161638)

2025-10-02 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh ready_for_review 
https://github.com/llvm/llvm-project/pull/161638
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Clang] Introduce -fsanitize=alloc-token (PR #156839)

2025-10-02 Thread Hans Wennborg via llvm-branch-commits


@@ -0,0 +1,58 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 6
+//
+// Test optimization pipelines do not interfere with AllocToken lowering, and 
we
+// pass on function attributes correctly.
+//
+// RUN: %clang_cc1 -fsanitize=alloc-token -triple x86_64-linux-gnu 
-emit-llvm %s -o - | FileCheck --check-prefix=CHECK-O0 %s
+// RUN: %clang_cc1 -O1 -fsanitize=alloc-token -triple x86_64-linux-gnu 
-emit-llvm %s -o - | FileCheck --check-prefix=CHECK-O1 %s
+// RUN: %clang_cc1 -O2 -fsanitize=alloc-token -triple x86_64-linux-gnu 
-emit-llvm %s -o - | FileCheck --check-prefix=CHECK-O2 %s
+
+typedef __typeof(sizeof(int)) size_t;
+
+void *malloc(size_t size);
+
+// CHECK-O0-LABEL: define dso_local ptr @test_malloc(
+// CHECK-O0-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-O0-NEXT:  [[ENTRY:.*:]]

zmodem wrote:

The entry label is not interesting, and neither is the name of the return value 
or the return instruction. I think the checks should just be for the 
"test_malloc" definition and what the call looks like.

https://github.com/llvm/llvm-project/pull/156839
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AllocToken, Clang] Infer type hints from sizeof expressions and casts (PR #156841)

2025-10-02 Thread Marco Elver via llvm-branch-commits


@@ -1353,6 +1354,92 @@ void CodeGenFunction::EmitAllocToken(llvm::CallBase *CB, 
QualType AllocType) {
   CB->setMetadata(llvm::LLVMContext::MD_alloc_token, MDN);
 }
 
+/// Infer type from a simple sizeof expression.
+static QualType inferTypeFromSizeofExpr(const Expr *E) {
+  const Expr *Arg = E->IgnoreParenImpCasts();
+  if (const auto *UET = dyn_cast(Arg)) {
+if (UET->getKind() == UETT_SizeOf) {
+  if (UET->isArgumentType())
+return UET->getArgumentTypeInfo()->getType();
+  else
+return UET->getArgumentExpr()->getType();
+}
+  }
+  return QualType();
+}
+
+/// Infer type from an arithmetic expression involving a sizeof.
+static QualType inferTypeFromArithSizeofExpr(const Expr *E) {

melver wrote:

inferPossibleType it is.

FYI @thejh - this is probably what you'll be able to use to fill 
DW_AT_alloc_type  for malloc and friends for 
https://dwarfstd.org/issues/250407.1.html

https://github.com/llvm/llvm-project/pull/156841
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Greedy: Move physreg check when trying to recolor vregs (NFC) (PR #160484)

2025-10-02 Thread Quentin Colombet via llvm-branch-commits

https://github.com/qcolombet approved this pull request.

Good catch!

https://github.com/llvm/llvm-project/pull/160484
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Greedy: Use initializer list for recoloring candidates (NFC) (PR #160486)

2025-10-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/160486

>From f6250bed3d3b224fc5c7a4110e35e1c3c6d416c4 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 24 Sep 2025 19:14:06 +0900
Subject: [PATCH] Greedy: Use initializer list for recoloring candidates (NFC)

---
 llvm/lib/CodeGen/RegAllocGreedy.cpp | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/llvm/lib/CodeGen/RegAllocGreedy.cpp 
b/llvm/lib/CodeGen/RegAllocGreedy.cpp
index f0f313050cf92..803ecdc89bb7a 100644
--- a/llvm/lib/CodeGen/RegAllocGreedy.cpp
+++ b/llvm/lib/CodeGen/RegAllocGreedy.cpp
@@ -2482,15 +2482,13 @@ void RAGreedy::tryHintRecoloring(const LiveInterval 
&VirtReg) {
   // We have a broken hint, check if it is possible to fix it by
   // reusing PhysReg for the copy-related live-ranges. Indeed, we evicted
   // some register and PhysReg may be available for the other live-ranges.
-  SmallSet Visited;
-  SmallVector RecoloringCandidates;
   HintsInfo Info;
   Register Reg = VirtReg.reg();
   MCRegister PhysReg = VRM->getPhys(Reg);
   // Start the recoloring algorithm from the input live-interval, then
   // it will propagate to the ones that are copy-related with it.
-  Visited.insert(Reg);
-  RecoloringCandidates.push_back(Reg);
+  SmallSet Visited = {Reg};
+  SmallVector RecoloringCandidates = {Reg};
 
   LLVM_DEBUG(dbgs() << "Trying to reconcile hints for: " << printReg(Reg, TRI)
 << '(' << printReg(PhysReg, TRI) << ")\n");

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Greedy: Take hints from copy to physical subreg (PR #160467)

2025-10-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/160467

>From 9604dab28900f63fce615c31f5f624b031a94fcd Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 24 Sep 2025 16:53:33 +0900
Subject: [PATCH] Greedy: Take hints from copy to physical subreg

Previously this took hints from subregister extract of physreg,
like  %vreg.sub = COPY $physreg

This now also handles the rarer case:
  $physreg_sub = COPY %vreg

Also make an accidental bug here before explicit; this was
only using the superregister as a hint if it was already
in the copy, and not if using the existing assignment. There are
a handful of regressions in that case, so leave that extension
for a future change.
---
 llvm/lib/CodeGen/RegAllocGreedy.cpp | 35 -
 llvm/test/CodeGen/X86/shift-i128.ll |  3 +--
 2 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/llvm/lib/CodeGen/RegAllocGreedy.cpp 
b/llvm/lib/CodeGen/RegAllocGreedy.cpp
index 5638f98b8163d..8a418080ea666 100644
--- a/llvm/lib/CodeGen/RegAllocGreedy.cpp
+++ b/llvm/lib/CodeGen/RegAllocGreedy.cpp
@@ -2435,25 +2435,28 @@ void RAGreedy::collectHintInfo(Register Reg, HintsInfo 
&Out) {
 unsigned SubReg = Opnd.getSubReg();
 
 // Get the current assignment.
-MCRegister OtherPhysReg =
-OtherReg.isPhysical() ? OtherReg.asMCReg() : VRM->getPhys(OtherReg);
-if (OtherSubReg) {
-  if (OtherReg.isPhysical()) {
-MCRegister Tuple =
-TRI->getMatchingSuperReg(OtherPhysReg, OtherSubReg, RC);
-if (!Tuple)
-  continue;
-OtherPhysReg = Tuple;
-  } else {
-// TODO: There should be a hinting mechanism for subregisters
-if (SubReg != OtherSubReg)
-  continue;
-  }
+MCRegister OtherPhysReg;
+if (OtherReg.isPhysical()) {
+  if (OtherSubReg)
+OtherPhysReg = TRI->getMatchingSuperReg(OtherReg, OtherSubReg, RC);
+  else if (SubReg)
+OtherPhysReg = TRI->getMatchingSuperReg(OtherReg, SubReg, RC);
+  else
+OtherPhysReg = OtherReg;
+} else {
+  OtherPhysReg = VRM->getPhys(OtherReg);
+  // TODO: Should find matching superregister, but applying this in the
+  // non-hint case currently causes regressions
+
+  if (SubReg && OtherSubReg && SubReg != OtherSubReg)
+continue;
 }
 
 // Push the collected information.
-Out.push_back(HintInfo(MBFI->getBlockFreq(Instr.getParent()), OtherReg,
-   OtherPhysReg));
+if (OtherPhysReg) {
+  Out.push_back(HintInfo(MBFI->getBlockFreq(Instr.getParent()), OtherReg,
+ OtherPhysReg));
+}
   }
 }
 
diff --git a/llvm/test/CodeGen/X86/shift-i128.ll 
b/llvm/test/CodeGen/X86/shift-i128.ll
index 7462c77482827..049ee47af9681 100644
--- a/llvm/test/CodeGen/X86/shift-i128.ll
+++ b/llvm/test/CodeGen/X86/shift-i128.ll
@@ -613,8 +613,7 @@ define void @test_shl_v2i128(<2 x i128> %x, <2 x i128> %a, 
ptr nocapture %r) nou
 ; i686-NEXT:shldl %cl, %esi, %ebx
 ; i686-NEXT:movl {{[-0-9]+}}(%e{{[sb]}}p), %edi # 4-byte Reload
 ; i686-NEXT:movl %edi, %esi
-; i686-NEXT:movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
-; i686-NEXT:movl %eax, %ecx
+; i686-NEXT:movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
 ; i686-NEXT:shll %cl, %esi
 ; i686-NEXT:shldl %cl, %edi, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Folded Spill
 ; i686-NEXT:negl %edx

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Greedy: Take hints from copy to physical subreg (PR #160467)

2025-10-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/160467

>From 9604dab28900f63fce615c31f5f624b031a94fcd Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 24 Sep 2025 16:53:33 +0900
Subject: [PATCH] Greedy: Take hints from copy to physical subreg

Previously this took hints from subregister extract of physreg,
like  %vreg.sub = COPY $physreg

This now also handles the rarer case:
  $physreg_sub = COPY %vreg

Also make an accidental bug here before explicit; this was
only using the superregister as a hint if it was already
in the copy, and not if using the existing assignment. There are
a handful of regressions in that case, so leave that extension
for a future change.
---
 llvm/lib/CodeGen/RegAllocGreedy.cpp | 35 -
 llvm/test/CodeGen/X86/shift-i128.ll |  3 +--
 2 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/llvm/lib/CodeGen/RegAllocGreedy.cpp 
b/llvm/lib/CodeGen/RegAllocGreedy.cpp
index 5638f98b8163d..8a418080ea666 100644
--- a/llvm/lib/CodeGen/RegAllocGreedy.cpp
+++ b/llvm/lib/CodeGen/RegAllocGreedy.cpp
@@ -2435,25 +2435,28 @@ void RAGreedy::collectHintInfo(Register Reg, HintsInfo 
&Out) {
 unsigned SubReg = Opnd.getSubReg();
 
 // Get the current assignment.
-MCRegister OtherPhysReg =
-OtherReg.isPhysical() ? OtherReg.asMCReg() : VRM->getPhys(OtherReg);
-if (OtherSubReg) {
-  if (OtherReg.isPhysical()) {
-MCRegister Tuple =
-TRI->getMatchingSuperReg(OtherPhysReg, OtherSubReg, RC);
-if (!Tuple)
-  continue;
-OtherPhysReg = Tuple;
-  } else {
-// TODO: There should be a hinting mechanism for subregisters
-if (SubReg != OtherSubReg)
-  continue;
-  }
+MCRegister OtherPhysReg;
+if (OtherReg.isPhysical()) {
+  if (OtherSubReg)
+OtherPhysReg = TRI->getMatchingSuperReg(OtherReg, OtherSubReg, RC);
+  else if (SubReg)
+OtherPhysReg = TRI->getMatchingSuperReg(OtherReg, SubReg, RC);
+  else
+OtherPhysReg = OtherReg;
+} else {
+  OtherPhysReg = VRM->getPhys(OtherReg);
+  // TODO: Should find matching superregister, but applying this in the
+  // non-hint case currently causes regressions
+
+  if (SubReg && OtherSubReg && SubReg != OtherSubReg)
+continue;
 }
 
 // Push the collected information.
-Out.push_back(HintInfo(MBFI->getBlockFreq(Instr.getParent()), OtherReg,
-   OtherPhysReg));
+if (OtherPhysReg) {
+  Out.push_back(HintInfo(MBFI->getBlockFreq(Instr.getParent()), OtherReg,
+ OtherPhysReg));
+}
   }
 }
 
diff --git a/llvm/test/CodeGen/X86/shift-i128.ll 
b/llvm/test/CodeGen/X86/shift-i128.ll
index 7462c77482827..049ee47af9681 100644
--- a/llvm/test/CodeGen/X86/shift-i128.ll
+++ b/llvm/test/CodeGen/X86/shift-i128.ll
@@ -613,8 +613,7 @@ define void @test_shl_v2i128(<2 x i128> %x, <2 x i128> %a, 
ptr nocapture %r) nou
 ; i686-NEXT:shldl %cl, %esi, %ebx
 ; i686-NEXT:movl {{[-0-9]+}}(%e{{[sb]}}p), %edi # 4-byte Reload
 ; i686-NEXT:movl %edi, %esi
-; i686-NEXT:movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
-; i686-NEXT:movl %eax, %ecx
+; i686-NEXT:movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
 ; i686-NEXT:shll %cl, %esi
 ; i686-NEXT:shldl %cl, %edi, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Folded Spill
 ; i686-NEXT:negl %edx

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Greedy: Use initializer list for recoloring candidates (NFC) (PR #160486)

2025-10-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/160486

>From f6250bed3d3b224fc5c7a4110e35e1c3c6d416c4 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 24 Sep 2025 19:14:06 +0900
Subject: [PATCH] Greedy: Use initializer list for recoloring candidates (NFC)

---
 llvm/lib/CodeGen/RegAllocGreedy.cpp | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/llvm/lib/CodeGen/RegAllocGreedy.cpp 
b/llvm/lib/CodeGen/RegAllocGreedy.cpp
index f0f313050cf92..803ecdc89bb7a 100644
--- a/llvm/lib/CodeGen/RegAllocGreedy.cpp
+++ b/llvm/lib/CodeGen/RegAllocGreedy.cpp
@@ -2482,15 +2482,13 @@ void RAGreedy::tryHintRecoloring(const LiveInterval 
&VirtReg) {
   // We have a broken hint, check if it is possible to fix it by
   // reusing PhysReg for the copy-related live-ranges. Indeed, we evicted
   // some register and PhysReg may be available for the other live-ranges.
-  SmallSet Visited;
-  SmallVector RecoloringCandidates;
   HintsInfo Info;
   Register Reg = VirtReg.reg();
   MCRegister PhysReg = VRM->getPhys(Reg);
   // Start the recoloring algorithm from the input live-interval, then
   // it will propagate to the ones that are copy-related with it.
-  Visited.insert(Reg);
-  RecoloringCandidates.push_back(Reg);
+  SmallSet Visited = {Reg};
+  SmallVector RecoloringCandidates = {Reg};
 
   LLVM_DEBUG(dbgs() << "Trying to reconcile hints for: " << printReg(Reg, TRI)
 << '(' << printReg(PhysReg, TRI) << ")\n");

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Greedy: Merge VirtRegMap queries into one use (NFC) (PR #160485)

2025-10-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/160485

>From 585006d84dcf14f95d91ed48bc9f528e649b3261 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 24 Sep 2025 19:06:39 +0900
Subject: [PATCH] Greedy: Merge VirtRegMap queries into one use (NFC)

---
 llvm/lib/CodeGen/RegAllocGreedy.cpp | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/CodeGen/RegAllocGreedy.cpp 
b/llvm/lib/CodeGen/RegAllocGreedy.cpp
index 24fe838e4b7d8..f0f313050cf92 100644
--- a/llvm/lib/CodeGen/RegAllocGreedy.cpp
+++ b/llvm/lib/CodeGen/RegAllocGreedy.cpp
@@ -2498,8 +2498,10 @@ void RAGreedy::tryHintRecoloring(const LiveInterval 
&VirtReg) {
   do {
 Reg = RecoloringCandidates.pop_back_val();
 
+MCRegister CurrPhys = VRM->getPhys(Reg);
+
 // This may be a skipped register.
-if (!VRM->hasPhys(Reg)) {
+if (!CurrPhys) {
   assert(!shouldAllocateRegister(Reg) &&
  "We have an unallocated variable which should have been handled");
   continue;
@@ -2508,7 +2510,6 @@ void RAGreedy::tryHintRecoloring(const LiveInterval 
&VirtReg) {
 // Get the live interval mapped with this virtual register to be able
 // to check for the interference with the new color.
 LiveInterval &LI = LIS->getInterval(Reg);
-MCRegister CurrPhys = VRM->getPhys(Reg);
 // Check that the new color matches the register class constraints and
 // that it is free for this live range.
 if (CurrPhys != PhysReg && (!MRI->getRegClass(Reg)->contains(PhysReg) ||

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [llvm][mustache] Avoid extra allocations in parseSection (PR #159199)

2025-10-02 Thread Paul Kirth via llvm-branch-commits

https://github.com/ilovepi updated 
https://github.com/llvm/llvm-project/pull/159199

>From f93d7387c377d5247b7b819c60524f983059b963 Mon Sep 17 00:00:00 2001
From: Paul Kirth 
Date: Tue, 16 Sep 2025 09:40:04 -0700
Subject: [PATCH] [llvm][mustache] Avoid extra allocations in parseSection

We don't need to have extra allocations when concatenating raw bodies.
---
 llvm/lib/Support/Mustache.cpp | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Support/Mustache.cpp b/llvm/lib/Support/Mustache.cpp
index c2c6479544ed9..0174da49c9f41 100644
--- a/llvm/lib/Support/Mustache.cpp
+++ b/llvm/lib/Support/Mustache.cpp
@@ -587,9 +587,16 @@ void Parser::parseSection(ASTNode *Parent, ASTNode::Type 
Ty,
   size_t Start = CurrentPtr;
   parseMustache(CurrentNode);
   const size_t End = CurrentPtr - 1;
+
+  size_t RawBodySize = 0;
+  for (size_t I = Start; I < End; ++I)
+RawBodySize += Tokens[I].RawBody.size();
+
   SmallString<128> RawBody;
-  for (std::size_t I = Start; I < End; I++)
+  RawBody.reserve(RawBodySize);
+  for (std::size_t I = Start; I < End; ++I)
 RawBody += Tokens[I].RawBody;
+
   CurrentNode->setRawBody(Ctx.Saver.save(StringRef(RawBody)));
   Parent->addChild(CurrentNode);
 }

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [AllocToken, Clang] Implement TypeHashPointerSplit mode (PR #156840)

2025-10-02 Thread Vitaly Buka via llvm-branch-commits


@@ -0,0 +1,301 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 5

vitalybuka wrote:

Short term on review - yes,
long term it's just unnecessary pain to update them

https://github.com/llvm/llvm-project/pull/156840
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [AllocToken, Clang] Implement TypeHashPointerSplit mode (PR #156840)

2025-10-02 Thread Vitaly Buka via llvm-branch-commits

https://github.com/vitalybuka edited 
https://github.com/llvm/llvm-project/pull/156840
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [AllocToken, Clang] Implement TypeHashPointerSplit mode (PR #156840)

2025-10-02 Thread Vitaly Buka via llvm-branch-commits

https://github.com/vitalybuka approved this pull request.

I'd prefer this patch LLVM only,
and clang goes into the next patch

https://github.com/llvm/llvm-project/pull/156840
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [AllocToken, Clang] Implement TypeHashPointerSplit mode (PR #156840)

2025-10-02 Thread Vitaly Buka via llvm-branch-commits


@@ -69,19 +69,30 @@ enum class TokenMode : unsigned {
 
   /// Token ID based on allocated type hash.
   TypeHash = 2,
+
+  /// Token ID based on allocated type hash, where the top half ID-space is
+  /// reserved for types that contain pointers and the bottom half for types
+  /// that do not contain pointers.
+  TypeHashPointerSplit = 3,
 };
 
 //===--- Command-line options 
-===//
 
-cl::opt
-ClMode("alloc-token-mode", cl::Hidden, cl::desc("Token assignment mode"),
-   cl::init(TokenMode::TypeHash),
-   cl::values(clEnumValN(TokenMode::Increment, "increment",
- "Incrementally increasing token ID"),
-  clEnumValN(TokenMode::Random, "random",
- "Statically-assigned random token ID"),
-  clEnumValN(TokenMode::TypeHash, "typehash",
- "Token ID based on allocated type hash")));
+cl::opt ClMode(
+"alloc-token-mode", cl::Hidden, cl::desc("Token assignment mode"),
+cl::init(TokenMode::TypeHashPointerSplit),
+cl::values(

vitalybuka wrote:

Unless it's some experimental parameters to make them FUNCTION_PASS_WITH_PARAMS


https://github.com/llvm/llvm-project/pull/156840
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [IR2Vec] Refactor MIR vocabulary to use opcode-based indexing (PR #161713)

2025-10-02 Thread S. VenkataKeerthy via llvm-branch-commits

https://github.com/svkeerthy ready_for_review 
https://github.com/llvm/llvm-project/pull/161713
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [AllocToken, Clang] Implement TypeHashPointerSplit mode (PR #156840)

2025-10-02 Thread Marco Elver via llvm-branch-commits


@@ -0,0 +1,301 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 5

melver wrote:

I dunno what the right answer is here - for the .ll tests, definitely 
auto-generated, for .cpp/.c tests I'm unsure. The Clang auto-generated tests 
add a lot more cruft that has a tendency to change with unrelated changes and 
cause spurious failures for developers of unrelated changes, so that also 
induces pain.

My proposal: For the Clang tests, let's try manual tests first. If we end up 
changing some of these tests a lot, we can switch back to auto-generated ones 
for a subset that needs to change a lot.

https://github.com/llvm/llvm-project/pull/156840
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [llvm][mustache] Use StringRef parameters (PR #159190)

2025-10-02 Thread Erick Velez via llvm-branch-commits

https://github.com/evelez7 edited 
https://github.com/llvm/llvm-project/pull/159190
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] bdb8b2b - Revert "[HLSL] Update Frontend to support version 1.2 of root signature (#160…"

2025-10-02 Thread via llvm-branch-commits

Author: joaosaffran
Date: 2025-10-02T16:45:31-04:00
New Revision: bdb8b2b33ce4ec74802b83e8189cb5d56a112231

URL: 
https://github.com/llvm/llvm-project/commit/bdb8b2b33ce4ec74802b83e8189cb5d56a112231
DIFF: 
https://github.com/llvm/llvm-project/commit/bdb8b2b33ce4ec74802b83e8189cb5d56a112231.diff

LOG: Revert "[HLSL] Update Frontend to support version 1.2 of root signature 
(#160…"

This reverts commit f2c8c42821a8c6de8984a1e7a932233cf221d5c1.

Added: 


Modified: 
clang/include/clang/Basic/LangOptions.h
clang/include/clang/Driver/Options.td
clang/include/clang/Lex/HLSLRootSignatureTokenKinds.def
clang/include/clang/Parse/ParseHLSLRootSignature.h
clang/lib/AST/TextNodeDumper.cpp
clang/lib/Driver/ToolChains/HLSL.cpp
clang/lib/Parse/ParseHLSLRootSignature.cpp
clang/test/AST/HLSL/RootSignature-Target-AST.hlsl
clang/test/AST/HLSL/RootSignatures-AST.hlsl
clang/test/CodeGenHLSL/RootSignature.hlsl
clang/test/SemaHLSL/RootSignature-err.hlsl
clang/test/SemaHLSL/RootSignature-flags-err.hlsl
clang/unittests/Lex/LexHLSLRootSignatureTest.cpp
clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp
llvm/include/llvm/BinaryFormat/DXContainer.h
llvm/include/llvm/Frontend/HLSL/HLSLRootSignature.h
llvm/lib/Frontend/HLSL/HLSLRootSignature.cpp
llvm/lib/Frontend/HLSL/RootSignatureValidations.cpp
llvm/lib/ObjectYAML/DXContainerYAML.cpp
llvm/unittests/Frontend/HLSLRootSignatureDumpTest.cpp

Removed: 




diff  --git a/clang/include/clang/Basic/LangOptions.h 
b/clang/include/clang/Basic/LangOptions.h
index 41595ec2a060d..a8943df5b39aa 100644
--- a/clang/include/clang/Basic/LangOptions.h
+++ b/clang/include/clang/Basic/LangOptions.h
@@ -549,7 +549,8 @@ class LangOptions : public LangOptionsBase {
   bool CheckNew = false;
 
   /// The HLSL root signature version for dxil.
-  llvm::dxbc::RootSignatureVersion HLSLRootSigVer;
+  llvm::dxbc::RootSignatureVersion HLSLRootSigVer =
+  llvm::dxbc::RootSignatureVersion::V1_1;
 
   /// The HLSL root signature that will be used to overide the root signature
   /// used for the shader entry point.

diff  --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 2ef609831637e..2f865d8c30318 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -9476,7 +9476,7 @@ def target_profile : DXCJoinedOrSeparate<"T">, 
MetaVarName<"">,
  "lib_6_3, lib_6_4, lib_6_5, lib_6_6, lib_6_7, lib_6_x,"
  "ms_6_5, ms_6_6, ms_6_7,"
  "as_6_5, as_6_6, as_6_7,"
- "rootsig_1_0, rootsig_1_1, rootsig_1_2">;
+ "rootsig_1_0, rootsig_1_1">;
 def emit_pristine_llvm : DXCFlag<"emit-pristine-llvm">,
   HelpText<"Emit pristine LLVM IR from the frontend by not running any LLVM 
passes at all."
"Same as -S + -emit-llvm + -disable-llvm-passes.">;
@@ -9489,9 +9489,9 @@ def fdx_rootsignature_version :
   Group,
   Visibility<[ClangOption, CC1Option]>,
   HelpText<"Root Signature Version">,
-  Values<"rootsig_1_0,rootsig_1_1,rootsig_1_2">,
+  Values<"rootsig_1_0,rootsig_1_1">,
   NormalizedValuesScope<"llvm::dxbc::RootSignatureVersion">,
-  NormalizedValues<["V1_0", "V1_1", "V1_2"]>,
+  NormalizedValues<["V1_0", "V1_1"]>,
   MarshallingInfoEnum, "V1_1">;
 def dxc_rootsig_ver :
   Separate<["/", "-"], "force-rootsig-ver">,

diff  --git a/clang/include/clang/Lex/HLSLRootSignatureTokenKinds.def 
b/clang/include/clang/Lex/HLSLRootSignatureTokenKinds.def
index 1d7f7adbe076f..a5cfeb34b2b51 100644
--- a/clang/include/clang/Lex/HLSLRootSignatureTokenKinds.def
+++ b/clang/include/clang/Lex/HLSLRootSignatureTokenKinds.def
@@ -65,9 +65,6 @@
 #ifndef STATIC_BORDER_COLOR_ENUM
 #define STATIC_BORDER_COLOR_ENUM(NAME, LIT) ENUM(NAME, LIT)
 #endif
-#ifndef STATIC_SAMPLER_FLAG_ENUM
-#define STATIC_SAMPLER_FLAG_ENUM(NAME, LIT) ENUM(NAME, LIT)
-#endif
 
 // General Tokens:
 TOK(invalid, "invalid identifier")
@@ -231,10 +228,6 @@ STATIC_BORDER_COLOR_ENUM(OpaqueWhite, 
"STATIC_BORDER_COLOR_OPAQUE_WHITE")
 STATIC_BORDER_COLOR_ENUM(OpaqueBlackUint, 
"STATIC_BORDER_COLOR_OPAQUE_BLACK_UINT")
 STATIC_BORDER_COLOR_ENUM(OpaqueWhiteUint, 
"STATIC_BORDER_COLOR_OPAQUE_WHITE_UINT")
 
-// Root Descriptor Flag Enums:
-STATIC_SAMPLER_FLAG_ENUM(UintBorderColor, "UINT_BORDER_COLOR")
-STATIC_SAMPLER_FLAG_ENUM(NonNormalizedCoordinates, 
"NON_NORMALIZED_COORDINATES")
-
 #undef STATIC_BORDER_COLOR_ENUM
 #undef COMPARISON_FUNC_ENUM
 #undef TEXTURE_ADDRESS_MODE_ENUM
@@ -244,7 +237,6 @@ STATIC_SAMPLER_FLAG_ENUM(NonNormalizedCoordinates, 
"NON_NORMALIZED_COORDINATES")
 #undef DESCRIPTOR_RANGE_FLAG_ENUM_OFF
 #undef DESCRIPTOR_RANGE_FLAG_ENUM_ON
 #undef ROOT_DESCRIPTOR_FLAG_ENUM
-#undef STATIC_SAMPLER_FLAG_ENUM
 #undef ROOT_FLAG_ENUM
 #undef DESCRIPTOR_RANGE_OFFSET_ENUM
 #undef UNBOUNDED_ENUM

diff  --git a/clang/include/clang/Parse/ParseHLSLRootSignature.h 
b/clang/include/clang/Parse/P

[llvm-branch-commits] [llvm] [PowerPC] Implement paddis (PR #161572)

2025-10-02 Thread Lei Huang via llvm-branch-commits

https://github.com/lei137 updated 
https://github.com/llvm/llvm-project/pull/161572

>From 012b638031fb72d36525234115f9d7b87d8c98e3 Mon Sep 17 00:00:00 2001
From: Lei Huang 
Date: Tue, 30 Sep 2025 18:09:31 +
Subject: [PATCH 1/3] [PowerPC] Implement paddis

---
 .../Target/PowerPC/AsmParser/PPCAsmParser.cpp |  4 ++
 .../PowerPC/MCTargetDesc/PPCAsmBackend.cpp|  9 
 .../PowerPC/MCTargetDesc/PPCFixupKinds.h  |  6 +++
 .../PowerPC/MCTargetDesc/PPCInstPrinter.cpp   | 12 +
 .../PowerPC/MCTargetDesc/PPCInstPrinter.h |  2 +
 .../PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp |  1 +
 llvm/lib/Target/PowerPC/PPCInstrFuture.td | 44 +++
 llvm/lib/Target/PowerPC/PPCRegisterInfo.td| 19 
 .../PowerPC/ppc-encoding-ISAFuture.txt|  6 +++
 .../PowerPC/ppc64le-encoding-ISAFuture.txt|  6 +++
 llvm/test/MC/PowerPC/ppc-encoding-ISAFuture.s |  8 
 11 files changed, 117 insertions(+)

diff --git a/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp 
b/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
index 561a9c51b9cc2..b07f95018ca90 100644
--- a/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
+++ b/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
@@ -365,6 +365,10 @@ struct PPCOperand : public MCParsedAsmOperand {
   bool isS16ImmX4() const { return isExtImm<16>(/*Signed*/ true, 4); }
   bool isS16ImmX16() const { return isExtImm<16>(/*Signed*/ true, 16); }
   bool isS17Imm() const { return isExtImm<17>(/*Signed*/ true, 1); }
+  bool isS32Imm() const {
+// TODO: Is ContextImmediate needed?
+return Kind == Expression || isSImm<32>();
+  }
   bool isS34Imm() const {
 // Once the PC-Rel ABI is finalized, evaluate whether a 34-bit
 // ContextImmediate is needed.
diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp 
b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp
index 04b886ae74993..558351b515a2e 100644
--- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp
+++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp
@@ -47,6 +47,9 @@ static uint64_t adjustFixupValue(unsigned Kind, uint64_t 
Value) {
   case PPC::fixup_ppc_half16ds:
   case PPC::fixup_ppc_half16dq:
 return Value & 0xfffc;
+  case PPC::fixup_ppc_pcrel32:
+  case PPC::fixup_ppc_imm32:
+return Value & 0x;
   case PPC::fixup_ppc_pcrel34:
   case PPC::fixup_ppc_imm34:
 return Value & 0x3;
@@ -71,6 +74,8 @@ static unsigned getFixupKindNumBytes(unsigned Kind) {
   case PPC::fixup_ppc_br24abs:
   case PPC::fixup_ppc_br24_notoc:
 return 4;
+  case PPC::fixup_ppc_pcrel32:
+  case PPC::fixup_ppc_imm32:
   case PPC::fixup_ppc_pcrel34:
   case PPC::fixup_ppc_imm34:
   case FK_Data_8:
@@ -154,6 +159,8 @@ MCFixupKindInfo PPCAsmBackend::getFixupKindInfo(MCFixupKind 
Kind) const {
   {"fixup_ppc_brcond14abs", 16, 14, 0},
   {"fixup_ppc_half16", 0, 16, 0},
   {"fixup_ppc_half16ds", 0, 14, 0},
+  {"fixup_ppc_pcrel32", 0, 32, 0},
+  {"fixup_ppc_imm32", 0, 32, 0},
   {"fixup_ppc_pcrel34", 0, 34, 0},
   {"fixup_ppc_imm34", 0, 34, 0},
   {"fixup_ppc_nofixup", 0, 0, 0}};
@@ -166,6 +173,8 @@ MCFixupKindInfo PPCAsmBackend::getFixupKindInfo(MCFixupKind 
Kind) const {
   {"fixup_ppc_brcond14abs", 2, 14, 0},
   {"fixup_ppc_half16", 0, 16, 0},
   {"fixup_ppc_half16ds", 2, 14, 0},
+  {"fixup_ppc_pcrel32", 0, 32, 0},
+  {"fixup_ppc_imm32", 0, 32, 0},
   {"fixup_ppc_pcrel34", 0, 34, 0},
   {"fixup_ppc_imm34", 0, 34, 0},
   {"fixup_ppc_nofixup", 0, 0, 0}};
diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h 
b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h
index df0c666f5b113..4164b697649cd 100644
--- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h
+++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h
@@ -40,6 +40,12 @@ enum Fixups {
   /// instrs like 'std'.
   fixup_ppc_half16ds,
 
+  // A 32-bit fixup corresponding to PC-relative paddis.
+  fixup_ppc_pcrel32,
+
+  // A 32-bit fixup corresponding to Non-PC-relative paddis.
+  fixup_ppc_imm32,
+
   // A 34-bit fixup corresponding to PC-relative paddi.
   fixup_ppc_pcrel34,
 
diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp 
b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
index b27bc3bd49315..e2afb9378cbf0 100644
--- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
+++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
@@ -430,6 +430,18 @@ void PPCInstPrinter::printS16ImmOperand(const MCInst *MI, 
unsigned OpNo,
 printOperand(MI, OpNo, STI, O);
 }
 
+void PPCInstPrinter::printS32ImmOperand(const MCInst *MI, unsigned OpNo,
+const MCSubtargetInfo &STI,
+raw_ostream &O) {
+  if (MI->getOperand(OpNo).isImm()) {
+long long Value = MI->getOperand(OpNo).getImm();
+assert(isInt<32>(Value) && "Invalid s32imm argument!");
+O << (long long)Value;
+  }
+  else
+printOperand(MI

[llvm-branch-commits] [llvm] 220bac1 - [Hexagon] Add opcode V6_vS32Ub_npred_ai for offset validity check (#161618)

2025-10-02 Thread via llvm-branch-commits

Author: Ikhlas Ajbar
Date: 2025-10-02T15:58:31Z
New Revision: 220bac16a417e97bf97fdcb34855e28b2e6dfdf7

URL: 
https://github.com/llvm/llvm-project/commit/220bac16a417e97bf97fdcb34855e28b2e6dfdf7
DIFF: 
https://github.com/llvm/llvm-project/commit/220bac16a417e97bf97fdcb34855e28b2e6dfdf7.diff

LOG: [Hexagon] Add opcode V6_vS32Ub_npred_ai for offset validity check (#161618)

Check for a valid offset for unaligned vector store V6_vS32Ub_npred_ai.
isValidOffset() is updated to evaluate offset of this instruction.
Fixes #160647

(cherry picked from commit daa4e57ccf38ff6ac22243e98a035c87b9f9f3ae)

Added: 
llvm/test/CodeGen/Hexagon/unaligned-vec-store.ll

Modified: 
llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp

Removed: 




diff  --git a/llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp 
b/llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp
index 64bc5ca134c86..35863f790eae4 100644
--- a/llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp
@@ -2803,6 +2803,7 @@ bool HexagonInstrInfo::isValidOffset(unsigned Opcode, int 
Offset,
   case Hexagon::V6_vL32b_nt_cur_npred_ai:
   case Hexagon::V6_vL32b_nt_tmp_pred_ai:
   case Hexagon::V6_vL32b_nt_tmp_npred_ai:
+  case Hexagon::V6_vS32Ub_npred_ai:
   case Hexagon::V6_vgathermh_pseudo:
   case Hexagon::V6_vgathermw_pseudo:
   case Hexagon::V6_vgathermhw_pseudo:

diff  --git a/llvm/test/CodeGen/Hexagon/unaligned-vec-store.ll 
b/llvm/test/CodeGen/Hexagon/unaligned-vec-store.ll
new file mode 100644
index 0..267e365243711
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/unaligned-vec-store.ll
@@ -0,0 +1,23 @@
+; RUN: llc -march=hexagon -mcpu=hexagonv68 -mattr=+hvxv68,+hvx-length128B < %s 
| FileCheck %s
+; REQUIRES: asserts
+
+; Check that the test does not assert when unaligned vector store 
V6_vS32Ub_npred_ai is generated.
+; CHECK: if (!p{{[0-3]}}) vmemu
+
+target triple = "hexagon-unknown-unknown-elf"
+
+define fastcc void @test(i1 %cmp.i.i) {
+entry:
+  %call.i.i.i172 = load ptr, ptr null, align 4
+  %add.ptr = getelementptr i8, ptr %call.i.i.i172, i32 1
+  store <32 x i32> zeroinitializer, ptr %add.ptr, align 128
+  %add.ptr4.i4 = getelementptr i8, ptr %call.i.i.i172, i32 129
+  br i1 %cmp.i.i, label %common.ret, label %if.end.i.i
+
+common.ret:   ; preds = %if.end.i.i, %entry
+  ret void
+
+if.end.i.i:   ; preds = %entry
+  store <32 x i32> zeroinitializer, ptr %add.ptr4.i4, align 1
+  br label %common.ret
+}



___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [Hexagon] Add opcode V6_vS32Ub_npred_ai for offset validity check (#161618) (PR #161692)

2025-10-02 Thread via llvm-branch-commits

https://github.com/dyung closed https://github.com/llvm/llvm-project/pull/161692
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [Hexagon] Add opcode V6_vS32Ub_npred_ai for offset validity check (#161618) (PR #161692)

2025-10-02 Thread via llvm-branch-commits

github-actions[bot] wrote:

@androm3da (or anyone else). If you would like to add a note about this fix in 
the release notes (completely optional). Please reply to this comment with a 
one or two sentence description of the fix.  When you are done, please add the 
release:note label to this PR. 

https://github.com/llvm/llvm-project/pull/161692
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [IR2Vec] Refactor MIR vocabulary to use opcode-based indexing (PR #161713)

2025-10-02 Thread S. VenkataKeerthy via llvm-branch-commits

https://github.com/svkeerthy edited 
https://github.com/llvm/llvm-project/pull/161713
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] port 5b4819e to release (PR #159209)

2025-10-02 Thread David Blaikie via llvm-branch-commits

dwblaikie wrote:

> Hi @dwblaikie, before I merge your change into the release branch, can you 
> confirm that the premerge failures in libcxx were not caused by your change 
> here?

So far as I can tell, yeah, they seem unrelated.

@ldionne - any idea what's going on with libcxx presubmit checks here? Failures 
like "Error: Failed to remove 
"/scratch/powerllvm/cpap8006/llvm-project/libcxx-ci" (unlinkat 
/scratch/powerllvm/cpap8006/llvm-project/libcxx-ci/.ci/all_requirements.txt: 
permission denied)" - https://buildkite.com/llvm-project/libcxx-ci/builds/83843



https://github.com/llvm/llvm-project/pull/159209
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [PowerPC] Implement paddis (PR #161572)

2025-10-02 Thread Lei Huang via llvm-branch-commits

https://github.com/lei137 updated 
https://github.com/llvm/llvm-project/pull/161572

>From 012b638031fb72d36525234115f9d7b87d8c98e3 Mon Sep 17 00:00:00 2001
From: Lei Huang 
Date: Tue, 30 Sep 2025 18:09:31 +
Subject: [PATCH 1/2] [PowerPC] Implement paddis

---
 .../Target/PowerPC/AsmParser/PPCAsmParser.cpp |  4 ++
 .../PowerPC/MCTargetDesc/PPCAsmBackend.cpp|  9 
 .../PowerPC/MCTargetDesc/PPCFixupKinds.h  |  6 +++
 .../PowerPC/MCTargetDesc/PPCInstPrinter.cpp   | 12 +
 .../PowerPC/MCTargetDesc/PPCInstPrinter.h |  2 +
 .../PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp |  1 +
 llvm/lib/Target/PowerPC/PPCInstrFuture.td | 44 +++
 llvm/lib/Target/PowerPC/PPCRegisterInfo.td| 19 
 .../PowerPC/ppc-encoding-ISAFuture.txt|  6 +++
 .../PowerPC/ppc64le-encoding-ISAFuture.txt|  6 +++
 llvm/test/MC/PowerPC/ppc-encoding-ISAFuture.s |  8 
 11 files changed, 117 insertions(+)

diff --git a/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp 
b/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
index 561a9c51b9cc2..b07f95018ca90 100644
--- a/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
+++ b/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
@@ -365,6 +365,10 @@ struct PPCOperand : public MCParsedAsmOperand {
   bool isS16ImmX4() const { return isExtImm<16>(/*Signed*/ true, 4); }
   bool isS16ImmX16() const { return isExtImm<16>(/*Signed*/ true, 16); }
   bool isS17Imm() const { return isExtImm<17>(/*Signed*/ true, 1); }
+  bool isS32Imm() const {
+// TODO: Is ContextImmediate needed?
+return Kind == Expression || isSImm<32>();
+  }
   bool isS34Imm() const {
 // Once the PC-Rel ABI is finalized, evaluate whether a 34-bit
 // ContextImmediate is needed.
diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp 
b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp
index 04b886ae74993..558351b515a2e 100644
--- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp
+++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp
@@ -47,6 +47,9 @@ static uint64_t adjustFixupValue(unsigned Kind, uint64_t 
Value) {
   case PPC::fixup_ppc_half16ds:
   case PPC::fixup_ppc_half16dq:
 return Value & 0xfffc;
+  case PPC::fixup_ppc_pcrel32:
+  case PPC::fixup_ppc_imm32:
+return Value & 0x;
   case PPC::fixup_ppc_pcrel34:
   case PPC::fixup_ppc_imm34:
 return Value & 0x3;
@@ -71,6 +74,8 @@ static unsigned getFixupKindNumBytes(unsigned Kind) {
   case PPC::fixup_ppc_br24abs:
   case PPC::fixup_ppc_br24_notoc:
 return 4;
+  case PPC::fixup_ppc_pcrel32:
+  case PPC::fixup_ppc_imm32:
   case PPC::fixup_ppc_pcrel34:
   case PPC::fixup_ppc_imm34:
   case FK_Data_8:
@@ -154,6 +159,8 @@ MCFixupKindInfo PPCAsmBackend::getFixupKindInfo(MCFixupKind 
Kind) const {
   {"fixup_ppc_brcond14abs", 16, 14, 0},
   {"fixup_ppc_half16", 0, 16, 0},
   {"fixup_ppc_half16ds", 0, 14, 0},
+  {"fixup_ppc_pcrel32", 0, 32, 0},
+  {"fixup_ppc_imm32", 0, 32, 0},
   {"fixup_ppc_pcrel34", 0, 34, 0},
   {"fixup_ppc_imm34", 0, 34, 0},
   {"fixup_ppc_nofixup", 0, 0, 0}};
@@ -166,6 +173,8 @@ MCFixupKindInfo PPCAsmBackend::getFixupKindInfo(MCFixupKind 
Kind) const {
   {"fixup_ppc_brcond14abs", 2, 14, 0},
   {"fixup_ppc_half16", 0, 16, 0},
   {"fixup_ppc_half16ds", 2, 14, 0},
+  {"fixup_ppc_pcrel32", 0, 32, 0},
+  {"fixup_ppc_imm32", 0, 32, 0},
   {"fixup_ppc_pcrel34", 0, 34, 0},
   {"fixup_ppc_imm34", 0, 34, 0},
   {"fixup_ppc_nofixup", 0, 0, 0}};
diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h 
b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h
index df0c666f5b113..4164b697649cd 100644
--- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h
+++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCFixupKinds.h
@@ -40,6 +40,12 @@ enum Fixups {
   /// instrs like 'std'.
   fixup_ppc_half16ds,
 
+  // A 32-bit fixup corresponding to PC-relative paddis.
+  fixup_ppc_pcrel32,
+
+  // A 32-bit fixup corresponding to Non-PC-relative paddis.
+  fixup_ppc_imm32,
+
   // A 34-bit fixup corresponding to PC-relative paddi.
   fixup_ppc_pcrel34,
 
diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp 
b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
index b27bc3bd49315..e2afb9378cbf0 100644
--- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
+++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
@@ -430,6 +430,18 @@ void PPCInstPrinter::printS16ImmOperand(const MCInst *MI, 
unsigned OpNo,
 printOperand(MI, OpNo, STI, O);
 }
 
+void PPCInstPrinter::printS32ImmOperand(const MCInst *MI, unsigned OpNo,
+const MCSubtargetInfo &STI,
+raw_ostream &O) {
+  if (MI->getOperand(OpNo).isImm()) {
+long long Value = MI->getOperand(OpNo).getImm();
+assert(isInt<32>(Value) && "Invalid s32imm argument!");
+O << (long long)Value;
+  }
+  else
+printOperand(MI

[llvm-branch-commits] [llvm] [SimplifyCFG][profcheck] Handle branch weights in `simplifySwitchLookup` (PR #161739)

2025-10-02 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin created 
https://github.com/llvm/llvm-project/pull/161739

None

>From 1b6920213b57235b62daf431f11cce74f3b5a5c3 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Wed, 1 Oct 2025 17:08:48 -0700
Subject: [PATCH] [SimplifyCFG][profcheck] Handle branch weights in
 `simplifySwitchLookup`

---
 llvm/lib/Transforms/Utils/SimplifyCFG.cpp | 22 +
 .../Transforms/SimplifyCFG/rangereduce.ll | 24 +++
 2 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp 
b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 63f4b2e030b69..fa3ac273b39f9 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -7319,19 +7319,33 @@ static bool simplifySwitchLookup(SwitchInst *SI, 
IRBuilder<> &Builder,
   if (DTU)
 Updates.push_back({DominatorTree::Insert, LookupBB, CommonDest});
 
+  SmallVector BranchWeights;
+  const bool HasBranchWeights = RangeCheckBranch &&
+!ProfcheckDisableMetadataFixes &&
+extractBranchWeights(*SI, BranchWeights);
+  uint64_t ToLookupWeight = 0;
+  uint64_t ToDefaultWeight = 0;
+
   // Remove the switch.
   SmallPtrSet RemovedSuccessors;
-  for (unsigned i = 0, e = SI->getNumSuccessors(); i < e; ++i) {
-BasicBlock *Succ = SI->getSuccessor(i);
+  for (unsigned I = 0, E = SI->getNumSuccessors(); I < E; ++I) {
+BasicBlock *Succ = SI->getSuccessor(I);
 
-if (Succ == SI->getDefaultDest())
+if (Succ == SI->getDefaultDest()) {
+  if (HasBranchWeights)
+ToDefaultWeight += BranchWeights[I];
   continue;
+}
 Succ->removePredecessor(BB);
 if (DTU && RemovedSuccessors.insert(Succ).second)
   Updates.push_back({DominatorTree::Delete, BB, Succ});
+if (HasBranchWeights)
+  ToLookupWeight += BranchWeights[I];
   }
   SI->eraseFromParent();
-
+  if (HasBranchWeights)
+setFittedBranchWeights(*RangeCheckBranch, {ToLookupWeight, 
ToDefaultWeight},
+   /*IsExpected=*/false);
   if (DTU)
 DTU->applyUpdates(Updates);
 
diff --git a/llvm/test/Transforms/SimplifyCFG/rangereduce.ll 
b/llvm/test/Transforms/SimplifyCFG/rangereduce.ll
index 17d65a4d4fa5e..d1fba91d1e505 100644
--- a/llvm/test/Transforms/SimplifyCFG/rangereduce.ll
+++ b/llvm/test/Transforms/SimplifyCFG/rangereduce.ll
@@ -1,15 +1,22 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py 
UTC_ARGS: --check-globals
 ; RUN: opt < %s -passes=simplifycfg 
-simplifycfg-require-and-preserve-domtree=1 -switch-to-lookup -S | FileCheck %s
 ; RUN: opt < %s -passes='simplifycfg' -S | FileCheck %s
 
 target datalayout = "e-n32"
 
-define i32 @test1(i32 %a) {
+;.
+; CHECK: @switch.table.test1 = private unnamed_addr constant [4 x i32] [i32 
11984, i32 1143, i32 99783, i32 99783], align 4
+; CHECK: @switch.table.test3 = private unnamed_addr constant [3 x i32] [i32 
11984, i32 1143, i32 99783], align 4
+; CHECK: @switch.table.test6 = private unnamed_addr constant [4 x i32] [i32 
99783, i32 99783, i32 1143, i32 11984], align 4
+; CHECK: @switch.table.test8 = private unnamed_addr constant [5 x i32] [i32 
11984, i32 1143, i32 99783, i32 8867, i32 99783], align 4
+; CHECK: @switch.table.test9 = private unnamed_addr constant [8 x i32] [i32 
99783, i32 8867, i32 99783, i32 8867, i32 8867, i32 8867, i32 11984, i32 1143], 
align 4
+;.
+define i32 @test1(i32 %a) !prof !0 {
 ; CHECK-LABEL: @test1(
 ; CHECK-NEXT:[[TMP1:%.*]] = sub i32 [[A:%.*]], 97
 ; CHECK-NEXT:[[TMP2:%.*]] = call i32 @llvm.fshl.i32(i32 [[TMP1]], i32 
[[TMP1]], i32 30)
 ; CHECK-NEXT:[[TMP3:%.*]] = icmp ult i32 [[TMP2]], 4
-; CHECK-NEXT:br i1 [[TMP3]], label [[SWITCH_LOOKUP:%.*]], label 
[[COMMON_RET:%.*]]
+; CHECK-NEXT:br i1 [[TMP3]], label [[SWITCH_LOOKUP:%.*]], label 
[[COMMON_RET:%.*]], !prof [[PROF1:![0-9]+]]
 ; CHECK:   switch.lookup:
 ; CHECK-NEXT:[[TMP4:%.*]] = zext nneg i32 [[TMP2]] to i64
 ; CHECK-NEXT:[[SWITCH_GEP:%.*]] = getelementptr inbounds [4 x i32], ptr 
@switch.table.test1, i64 0, i64 [[TMP4]]
@@ -24,7 +31,7 @@ define i32 @test1(i32 %a) {
   i32 101, label %two
   i32 105, label %three
   i32 109, label %three
-  ]
+  ], !prof !1
 
 def:
   ret i32 8867
@@ -310,3 +317,12 @@ three:
   ret i32 99783
 }
 
+!0 = !{!"function_entry_count", i32 100}
+!1 = !{!"branch_weights", i32 5, i32 7, i32 11, i32 13, i32 17}
+;.
+; CHECK: attributes #[[ATTR0:[0-9]+]] = { optsize }
+; CHECK: attributes #[[ATTR1:[0-9]+]] = { nocallback nofree nosync nounwind 
speculatable willreturn memory(none) }
+;.
+; CHECK: [[META0:![0-9]+]] = !{!"function_entry_count", i32 100}
+; CHECK: [[PROF1]] = !{!"branch_weights", i32 48, i32 5}
+;.

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-b

[llvm-branch-commits] [llvm] [SimplifyCFG][profcheck] Handle branch weights in `simplifySwitchLookup` (PR #161739)

2025-10-02 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-transforms

Author: Mircea Trofin (mtrofin)


Changes



---
Full diff: https://github.com/llvm/llvm-project/pull/161739.diff


2 Files Affected:

- (modified) llvm/lib/Transforms/Utils/SimplifyCFG.cpp (+18-4) 
- (modified) llvm/test/Transforms/SimplifyCFG/rangereduce.ll (+20-4) 


``diff
diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp 
b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 63f4b2e030b69..fa3ac273b39f9 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -7319,19 +7319,33 @@ static bool simplifySwitchLookup(SwitchInst *SI, 
IRBuilder<> &Builder,
   if (DTU)
 Updates.push_back({DominatorTree::Insert, LookupBB, CommonDest});
 
+  SmallVector BranchWeights;
+  const bool HasBranchWeights = RangeCheckBranch &&
+!ProfcheckDisableMetadataFixes &&
+extractBranchWeights(*SI, BranchWeights);
+  uint64_t ToLookupWeight = 0;
+  uint64_t ToDefaultWeight = 0;
+
   // Remove the switch.
   SmallPtrSet RemovedSuccessors;
-  for (unsigned i = 0, e = SI->getNumSuccessors(); i < e; ++i) {
-BasicBlock *Succ = SI->getSuccessor(i);
+  for (unsigned I = 0, E = SI->getNumSuccessors(); I < E; ++I) {
+BasicBlock *Succ = SI->getSuccessor(I);
 
-if (Succ == SI->getDefaultDest())
+if (Succ == SI->getDefaultDest()) {
+  if (HasBranchWeights)
+ToDefaultWeight += BranchWeights[I];
   continue;
+}
 Succ->removePredecessor(BB);
 if (DTU && RemovedSuccessors.insert(Succ).second)
   Updates.push_back({DominatorTree::Delete, BB, Succ});
+if (HasBranchWeights)
+  ToLookupWeight += BranchWeights[I];
   }
   SI->eraseFromParent();
-
+  if (HasBranchWeights)
+setFittedBranchWeights(*RangeCheckBranch, {ToLookupWeight, 
ToDefaultWeight},
+   /*IsExpected=*/false);
   if (DTU)
 DTU->applyUpdates(Updates);
 
diff --git a/llvm/test/Transforms/SimplifyCFG/rangereduce.ll 
b/llvm/test/Transforms/SimplifyCFG/rangereduce.ll
index 17d65a4d4fa5e..d1fba91d1e505 100644
--- a/llvm/test/Transforms/SimplifyCFG/rangereduce.ll
+++ b/llvm/test/Transforms/SimplifyCFG/rangereduce.ll
@@ -1,15 +1,22 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py 
UTC_ARGS: --check-globals
 ; RUN: opt < %s -passes=simplifycfg 
-simplifycfg-require-and-preserve-domtree=1 -switch-to-lookup -S | FileCheck %s
 ; RUN: opt < %s -passes='simplifycfg' -S | FileCheck %s
 
 target datalayout = "e-n32"
 
-define i32 @test1(i32 %a) {
+;.
+; CHECK: @switch.table.test1 = private unnamed_addr constant [4 x i32] [i32 
11984, i32 1143, i32 99783, i32 99783], align 4
+; CHECK: @switch.table.test3 = private unnamed_addr constant [3 x i32] [i32 
11984, i32 1143, i32 99783], align 4
+; CHECK: @switch.table.test6 = private unnamed_addr constant [4 x i32] [i32 
99783, i32 99783, i32 1143, i32 11984], align 4
+; CHECK: @switch.table.test8 = private unnamed_addr constant [5 x i32] [i32 
11984, i32 1143, i32 99783, i32 8867, i32 99783], align 4
+; CHECK: @switch.table.test9 = private unnamed_addr constant [8 x i32] [i32 
99783, i32 8867, i32 99783, i32 8867, i32 8867, i32 8867, i32 11984, i32 1143], 
align 4
+;.
+define i32 @test1(i32 %a) !prof !0 {
 ; CHECK-LABEL: @test1(
 ; CHECK-NEXT:[[TMP1:%.*]] = sub i32 [[A:%.*]], 97
 ; CHECK-NEXT:[[TMP2:%.*]] = call i32 @llvm.fshl.i32(i32 [[TMP1]], i32 
[[TMP1]], i32 30)
 ; CHECK-NEXT:[[TMP3:%.*]] = icmp ult i32 [[TMP2]], 4
-; CHECK-NEXT:br i1 [[TMP3]], label [[SWITCH_LOOKUP:%.*]], label 
[[COMMON_RET:%.*]]
+; CHECK-NEXT:br i1 [[TMP3]], label [[SWITCH_LOOKUP:%.*]], label 
[[COMMON_RET:%.*]], !prof [[PROF1:![0-9]+]]
 ; CHECK:   switch.lookup:
 ; CHECK-NEXT:[[TMP4:%.*]] = zext nneg i32 [[TMP2]] to i64
 ; CHECK-NEXT:[[SWITCH_GEP:%.*]] = getelementptr inbounds [4 x i32], ptr 
@switch.table.test1, i64 0, i64 [[TMP4]]
@@ -24,7 +31,7 @@ define i32 @test1(i32 %a) {
   i32 101, label %two
   i32 105, label %three
   i32 109, label %three
-  ]
+  ], !prof !1
 
 def:
   ret i32 8867
@@ -310,3 +317,12 @@ three:
   ret i32 99783
 }
 
+!0 = !{!"function_entry_count", i32 100}
+!1 = !{!"branch_weights", i32 5, i32 7, i32 11, i32 13, i32 17}
+;.
+; CHECK: attributes #[[ATTR0:[0-9]+]] = { optsize }
+; CHECK: attributes #[[ATTR1:[0-9]+]] = { nocallback nofree nosync nounwind 
speculatable willreturn memory(none) }
+;.
+; CHECK: [[META0:![0-9]+]] = !{!"function_entry_count", i32 100}
+; CHECK: [[PROF1]] = !{!"branch_weights", i32 48, i32 5}
+;.

``




https://github.com/llvm/llvm-project/pull/161739
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [SimplifyCFG][profcheck] Handle branch weights in `simplifySwitchLookup` (PR #161739)

2025-10-02 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin ready_for_review 
https://github.com/llvm/llvm-project/pull/161739
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [SimplifyCFG][profcheck] Handle branch weights in `simplifySwitchLookup` (PR #161739)

2025-10-02 Thread Mircea Trofin via llvm-branch-commits

mtrofin wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/161739?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#161739** https://app.graphite.dev/github/pr/llvm/llvm-project/161739?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/161739?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#161549** https://app.graphite.dev/github/pr/llvm/llvm-project/161549?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#160629** https://app.graphite.dev/github/pr/llvm/llvm-project/160629?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#159645** https://app.graphite.dev/github/pr/llvm/llvm-project/159645?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#159644** https://app.graphite.dev/github/pr/llvm/llvm-project/159644?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/161739
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [SimplifyCFG][profcheck] Handle branch weights in `simplifySwitchLookup` (PR #161739)

2025-10-02 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/161739

>From 24ab181f625ab0dc4fe7953034312551f50a4189 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Wed, 1 Oct 2025 17:08:48 -0700
Subject: [PATCH] [SimplifyCFG][profcheck] Handle branch weights in
 `simplifySwitchLookup`

---
 llvm/lib/Transforms/Utils/SimplifyCFG.cpp | 25 +++
 .../SimplifyCFG/X86/switch_to_lookup_table.ll | 13 +++---
 .../Transforms/SimplifyCFG/rangereduce.ll | 24 +++---
 3 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp 
b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 63f4b2e030b69..5aff662bc3586 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -7227,6 +7227,7 @@ static bool simplifySwitchLookup(SwitchInst *SI, 
IRBuilder<> &Builder,
   Mod.getContext(), "switch.lookup", CommonDest->getParent(), CommonDest);
 
   BranchInst *RangeCheckBranch = nullptr;
+  BranchInst *CondBranch = nullptr;
 
   Builder.SetInsertPoint(SI);
   const bool GeneratingCoveredLookupTable = (MaxTableSize == TableSize);
@@ -7241,6 +7242,7 @@ static bool simplifySwitchLookup(SwitchInst *SI, 
IRBuilder<> &Builder,
 TableIndex, ConstantInt::get(MinCaseVal->getType(), TableSize));
 RangeCheckBranch =
 Builder.CreateCondBr(Cmp, LookupBB, SI->getDefaultDest());
+CondBranch = RangeCheckBranch;
 if (DTU)
   Updates.push_back({DominatorTree::Insert, BB, LookupBB});
   }
@@ -7279,7 +7281,7 @@ static bool simplifySwitchLookup(SwitchInst *SI, 
IRBuilder<> &Builder,
 Value *Shifted = Builder.CreateLShr(TableMask, MaskIndex, 
"switch.shifted");
 Value *LoBit = Builder.CreateTrunc(
 Shifted, Type::getInt1Ty(Mod.getContext()), "switch.lobit");
-Builder.CreateCondBr(LoBit, LookupBB, SI->getDefaultDest());
+CondBranch = Builder.CreateCondBr(LoBit, LookupBB, SI->getDefaultDest());
 if (DTU) {
   Updates.push_back({DominatorTree::Insert, MaskBB, LookupBB});
   Updates.push_back({DominatorTree::Insert, MaskBB, SI->getDefaultDest()});
@@ -7319,19 +7321,32 @@ static bool simplifySwitchLookup(SwitchInst *SI, 
IRBuilder<> &Builder,
   if (DTU)
 Updates.push_back({DominatorTree::Insert, LookupBB, CommonDest});
 
+  SmallVector BranchWeights;
+  const bool HasBranchWeights = CondBranch && !ProfcheckDisableMetadataFixes &&
+extractBranchWeights(*SI, BranchWeights);
+  uint64_t ToLookupWeight = 0;
+  uint64_t ToDefaultWeight = 0;
+
   // Remove the switch.
   SmallPtrSet RemovedSuccessors;
-  for (unsigned i = 0, e = SI->getNumSuccessors(); i < e; ++i) {
-BasicBlock *Succ = SI->getSuccessor(i);
+  for (unsigned I = 0, E = SI->getNumSuccessors(); I < E; ++I) {
+BasicBlock *Succ = SI->getSuccessor(I);
 
-if (Succ == SI->getDefaultDest())
+if (Succ == SI->getDefaultDest()) {
+  if (HasBranchWeights)
+ToDefaultWeight += BranchWeights[I];
   continue;
+}
 Succ->removePredecessor(BB);
 if (DTU && RemovedSuccessors.insert(Succ).second)
   Updates.push_back({DominatorTree::Delete, BB, Succ});
+if (HasBranchWeights)
+  ToLookupWeight += BranchWeights[I];
   }
   SI->eraseFromParent();
-
+  if (HasBranchWeights)
+setFittedBranchWeights(*CondBranch, {ToLookupWeight, ToDefaultWeight},
+   /*IsExpected=*/false);
   if (DTU)
 DTU->applyUpdates(Updates);
 
diff --git a/llvm/test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll 
b/llvm/test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll
index f9e79cabac51d..bee6b375ea11a 100644
--- a/llvm/test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll
+++ b/llvm/test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll
@@ -1565,14 +1565,14 @@ end:
 ; lookup (since i3 can only hold values in the range of explicit
 ; values) and simultaneously trying to generate a branch to deal with
 ; the fact that we have holes in the range.
-define i32 @covered_switch_with_bit_tests(i3) {
+define i32 @covered_switch_with_bit_tests(i3) !prof !0 {
 ; CHECK-LABEL: @covered_switch_with_bit_tests(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:[[SWITCH_TABLEIDX:%.*]] = sub i3 [[TMP0:%.*]], -4
 ; CHECK-NEXT:[[SWITCH_MASKINDEX:%.*]] = zext i3 [[SWITCH_TABLEIDX]] to i8
 ; CHECK-NEXT:[[SWITCH_SHIFTED:%.*]] = lshr i8 -61, [[SWITCH_MASKINDEX]]
 ; CHECK-NEXT:[[SWITCH_LOBIT:%.*]] = trunc i8 [[SWITCH_SHIFTED]] to i1
-; CHECK-NEXT:br i1 [[SWITCH_LOBIT]], label [[SWITCH_LOOKUP:%.*]], label 
[[L6:%.*]]
+; CHECK-NEXT:br i1 [[SWITCH_LOBIT]], label [[SWITCH_LOOKUP:%.*]], label 
[[L6:%.*]], !prof [[PROF1:![0-9]+]]
 ; CHECK:   switch.lookup:
 ; CHECK-NEXT:[[TMP1:%.*]] = zext i3 [[SWITCH_TABLEIDX]] to i64
 ; CHECK-NEXT:[[SWITCH_GEP:%.*]] = getelementptr inbounds [8 x i32], ptr 
@switch.table.covered_switch_with_bit_tests, i64 0, i64 [[TMP1]]
@@ -1588,7 +1588,7 @@ entry:
   i3 -4, label

[llvm-branch-commits] [llvm] [llvm][mustache] Refactor template rendering (PR #159189)

2025-10-02 Thread Paul Kirth via llvm-branch-commits

https://github.com/ilovepi updated 
https://github.com/llvm/llvm-project/pull/159189

>From 7ca57bb5bfcf4c257bea3f77e39c042e5e482435 Mon Sep 17 00:00:00 2001
From: Paul Kirth 
Date: Fri, 12 Sep 2025 00:06:14 -0700
Subject: [PATCH] [llvm][mustache] Refactor template rendering

Move the rendering logic into the ASTNode, and break the logic down into
individual methods.
---
 llvm/lib/Support/Mustache.cpp | 132 --
 1 file changed, 80 insertions(+), 52 deletions(-)

diff --git a/llvm/lib/Support/Mustache.cpp b/llvm/lib/Support/Mustache.cpp
index 07443eb84dfbe..ad5a7bcc3e3e6 100644
--- a/llvm/lib/Support/Mustache.cpp
+++ b/llvm/lib/Support/Mustache.cpp
@@ -180,6 +180,14 @@ class ASTNode {
 
   const llvm::json::Value *findContext();
 
+  void renderRoot(const json::Value &CurrentCtx, raw_ostream &OS);
+  void renderText(raw_ostream &OS);
+  void renderPartial(const json::Value &CurrentCtx, raw_ostream &OS);
+  void renderVariable(const json::Value &CurrentCtx, raw_ostream &OS);
+  void renderUnescapeVariable(const json::Value &CurrentCtx, raw_ostream &OS);
+  void renderSection(const json::Value &CurrentCtx, raw_ostream &OS);
+  void renderInvertSection(const json::Value &CurrentCtx, raw_ostream &OS);
+
   StringMap &Partials;
   StringMap &Lambdas;
   StringMap &SectionLambdas;
@@ -672,76 +680,96 @@ static void toMustacheString(const json::Value &Data, 
raw_ostream &OS) {
   }
 }
 
+void ASTNode::renderRoot(const json::Value &CurrentCtx, raw_ostream &OS) {
+  renderChild(CurrentCtx, OS);
+}
+
+void ASTNode::renderText(raw_ostream &OS) { OS << Body; }
+
+void ASTNode::renderPartial(const json::Value &CurrentCtx, raw_ostream &OS) {
+  auto Partial = Partials.find(AccessorValue[0]);
+  if (Partial != Partials.end())
+renderPartial(CurrentCtx, OS, Partial->getValue().get());
+}
+
+void ASTNode::renderVariable(const json::Value &CurrentCtx, raw_ostream &OS) {
+  auto Lambda = Lambdas.find(AccessorValue[0]);
+  if (Lambda != Lambdas.end()) {
+renderLambdas(CurrentCtx, OS, Lambda->getValue());
+  } else if (const json::Value *ContextPtr = findContext()) {
+EscapeStringStream ES(OS, Escapes);
+toMustacheString(*ContextPtr, ES);
+  }
+}
+
+void ASTNode::renderUnescapeVariable(const json::Value &CurrentCtx,
+ raw_ostream &OS) {
+  auto Lambda = Lambdas.find(AccessorValue[0]);
+  if (Lambda != Lambdas.end()) {
+renderLambdas(CurrentCtx, OS, Lambda->getValue());
+  } else if (const json::Value *ContextPtr = findContext()) {
+toMustacheString(*ContextPtr, OS);
+  }
+}
+
+void ASTNode::renderSection(const json::Value &CurrentCtx, raw_ostream &OS) {
+  auto SectionLambda = SectionLambdas.find(AccessorValue[0]);
+  if (SectionLambda != SectionLambdas.end()) {
+renderSectionLambdas(CurrentCtx, OS, SectionLambda->getValue());
+return;
+  }
+
+  const json::Value *ContextPtr = findContext();
+  if (isContextFalsey(ContextPtr))
+return;
+
+  if (const json::Array *Arr = ContextPtr->getAsArray()) {
+for (const json::Value &V : *Arr)
+  renderChild(V, OS);
+return;
+  }
+  renderChild(*ContextPtr, OS);
+}
+
+void ASTNode::renderInvertSection(const json::Value &CurrentCtx,
+  raw_ostream &OS) {
+  bool IsLambda = SectionLambdas.contains(AccessorValue[0]);
+  const json::Value *ContextPtr = findContext();
+  if (isContextFalsey(ContextPtr) && !IsLambda) {
+renderChild(CurrentCtx, OS);
+  }
+}
+
 void ASTNode::render(const json::Value &CurrentCtx, raw_ostream &OS) {
   if (Ty != Root && Ty != Text && AccessorValue.empty())
 return;
   // Set the parent context to the incoming context so that we
   // can walk up the context tree correctly in findContext().
   ParentContext = &CurrentCtx;
-  const json::Value *ContextPtr = Ty == Root ? ParentContext : findContext();
 
   switch (Ty) {
   case Root:
-renderChild(CurrentCtx, OS);
+renderRoot(CurrentCtx, OS);
 return;
   case Text:
-OS << Body;
+renderText(OS);
 return;
-  case Partial: {
-auto Partial = Partials.find(AccessorValue[0]);
-if (Partial != Partials.end())
-  renderPartial(CurrentCtx, OS, Partial->getValue().get());
+  case Partial:
+renderPartial(CurrentCtx, OS);
 return;
-  }
-  case Variable: {
-auto Lambda = Lambdas.find(AccessorValue[0]);
-if (Lambda != Lambdas.end()) {
-  renderLambdas(CurrentCtx, OS, Lambda->getValue());
-} else if (ContextPtr) {
-  EscapeStringStream ES(OS, Escapes);
-  toMustacheString(*ContextPtr, ES);
-}
+  case Variable:
+renderVariable(CurrentCtx, OS);
 return;
-  }
-  case UnescapeVariable: {
-auto Lambda = Lambdas.find(AccessorValue[0]);
-if (Lambda != Lambdas.end()) {
-  renderLambdas(CurrentCtx, OS, Lambda->getValue());
-} else if (ContextPtr) {
-  toMustacheString(*ContextPtr, OS);
-}
+  case UnescapeVariable:
+renderUnescapeVariable(CurrentCtx, OS);
 return;
-  }
-  case Section

[llvm-branch-commits] [llvm] [llvm][mustache] Optimize accessor splitting with a single pass (PR #159198)

2025-10-02 Thread Paul Kirth via llvm-branch-commits

https://github.com/ilovepi updated 
https://github.com/llvm/llvm-project/pull/159198

>From 66ea253c076cffb6aaf6569806c7b32c069e754d Mon Sep 17 00:00:00 2001
From: Paul Kirth 
Date: Tue, 16 Sep 2025 00:24:43 -0700
Subject: [PATCH] [llvm][mustache] Optimize accessor splitting with a single
 pass

The splitMustacheString function previously used a loop of
StringRef::split and StringRef::trim. This was inefficient as
it scanned each segment of the accessor string multiple times.

This change introduces a custom splitAndTrim function that
performs both operations in a single pass over the string,
reducing redundant work and improving performance, most notably
in the number of CPU cycles executed.

  Metric | Baseline | Optimized | Change
  -- |  | - | ---
  Time (ms)  | 35.57| 35.36 | -0.59%
  Cycles | 34.91M   | 34.26M| -1.86%
  Instructions   | 85.54M   | 85.24M| -0.35%
  Branch Misses  | 111.9K   | 112.2K| +0.27%
  Cache Misses   | 242.1K   | 239.9K| -0.91%
---
 llvm/lib/Support/Mustache.cpp | 34 +++---
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/llvm/lib/Support/Mustache.cpp b/llvm/lib/Support/Mustache.cpp
index 4786242cdfba9..8eebeaec11925 100644
--- a/llvm/lib/Support/Mustache.cpp
+++ b/llvm/lib/Support/Mustache.cpp
@@ -34,6 +34,32 @@ static bool isContextFalsey(const json::Value *V) {
   return isFalsey(*V);
 }
 
+static void splitAndTrim(StringRef Str, SmallVectorImpl &Tokens) {
+  size_t CurrentPos = 0;
+  while (CurrentPos < Str.size()) {
+// Find the next delimiter.
+size_t DelimiterPos = Str.find('.', CurrentPos);
+
+// If no delimiter is found, process the rest of the string.
+if (DelimiterPos == StringRef::npos) {
+  DelimiterPos = Str.size();
+}
+
+// Get the current part, which may have whitespace.
+StringRef Part = Str.slice(CurrentPos, DelimiterPos);
+
+// Manually trim the part without creating a new string object.
+size_t Start = Part.find_first_not_of(" \t\r\n");
+if (Start != StringRef::npos) {
+  size_t End = Part.find_last_not_of(" \t\r\n");
+  Tokens.push_back(Part.slice(Start, End + 1));
+}
+
+// Move past the delimiter for the next iteration.
+CurrentPos = DelimiterPos + 1;
+  }
+}
+
 static Accessor splitMustacheString(StringRef Str, MustacheContext &Ctx) {
   // We split the mustache string into an accessor.
   // For example:
@@ -46,13 +72,7 @@ static Accessor splitMustacheString(StringRef Str, 
MustacheContext &Ctx) {
 // It's a literal, so it doesn't need to be saved.
 Tokens.push_back(".");
   } else {
-while (!Str.empty()) {
-  StringRef Part;
-  std::tie(Part, Str) = Str.split('.');
-  // Each part of the accessor needs to be saved to the arena
-  // to ensure it has a stable address.
-  Tokens.push_back(Part.trim());
-}
+splitAndTrim(Str, Tokens);
   }
   // Now, allocate memory for the array of StringRefs in the arena.
   StringRef *ArenaTokens = Ctx.Allocator.Allocate(Tokens.size());

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [NVPTX] Disable relative lookup tables (#159748) (PR #160064)

2025-10-02 Thread via llvm-branch-commits

github-actions[bot] wrote:

@nikic (or anyone else). If you would like to add a note about this fix in the 
release notes (completely optional). Please reply to this comment with a one or 
two sentence description of the fix.  When you are done, please add the 
release:note label to this PR. 

https://github.com/llvm/llvm-project/pull/160064
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [llvm][mustache] Avoid redundant saves in accessor splitting (PR #159197)

2025-10-02 Thread Paul Kirth via llvm-branch-commits

https://github.com/ilovepi updated 
https://github.com/llvm/llvm-project/pull/159197

>From c6534b6ad31f12f8050be911df89961b5596499b Mon Sep 17 00:00:00 2001
From: Paul Kirth 
Date: Tue, 16 Sep 2025 00:11:47 -0700
Subject: [PATCH] [llvm][mustache] Avoid redundant saves in accessor splitting

The splitMustacheString function was saving StringRefs that
were already backed by an arena-allocated string. This was
unnecessary work. This change removes the redundant
Ctx.Saver.save() call.

This optimization provides a small but measurable performance
improvement on top of the single-pass tokenizer, most notably
reducing branch misses.

  Metric | Baseline | Optimized | Change
  -- |  | - | ---
  Time (ms)  | 35.77| 35.57 | -0.56%
  Cycles | 35.16M   | 34.91M| -0.71%
  Instructions   | 85.77M   | 85.54M| -0.27%
  Branch Misses  | 113.9K   | 111.9K| -1.76%
  Cache Misses   | 237.7K   | 242.1K| +1.85%
---
 llvm/lib/Support/Mustache.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/llvm/lib/Support/Mustache.cpp b/llvm/lib/Support/Mustache.cpp
index 0053a425b758d..4786242cdfba9 100644
--- a/llvm/lib/Support/Mustache.cpp
+++ b/llvm/lib/Support/Mustache.cpp
@@ -51,7 +51,7 @@ static Accessor splitMustacheString(StringRef Str, 
MustacheContext &Ctx) {
   std::tie(Part, Str) = Str.split('.');
   // Each part of the accessor needs to be saved to the arena
   // to ensure it has a stable address.
-  Tokens.push_back(Ctx.Saver.save(Part.trim()));
+  Tokens.push_back(Part.trim());
 }
   }
   // Now, allocate memory for the array of StringRefs in the arena.

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [CIR] Upstream AddressSpace casting support (PR #161212)

2025-10-02 Thread David Rivera via llvm-branch-commits

https://github.com/RiverDave updated 
https://github.com/llvm/llvm-project/pull/161212

>From a6313408ef8e8f9ad129e39cb0d8d0f2fb2f0ee3 Mon Sep 17 00:00:00 2001
From: David Rivera 
Date: Mon, 29 Sep 2025 11:05:44 -0400
Subject: [PATCH] [CIR] Upstream AddressSpace casting support

---
 .../CIR/Dialect/Builder/CIRBaseBuilder.h  |  9 +++
 clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp   | 41 +++
 clang/lib/CIR/CodeGen/CIRGenExpr.cpp  | 19 +-
 clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp| 22 ++
 clang/lib/CIR/CodeGen/CIRGenFunction.h|  4 ++
 clang/lib/CIR/CodeGen/CIRGenModule.cpp| 17 +
 clang/lib/CIR/CodeGen/CIRGenModule.h  |  6 ++
 clang/lib/CIR/CodeGen/CIRGenTypes.cpp |  2 +-
 clang/lib/CIR/CodeGen/TargetInfo.cpp  | 13 
 clang/lib/CIR/CodeGen/TargetInfo.h| 12 
 clang/test/CIR/address-space-conversion.cpp   | 68 +++
 11 files changed, 195 insertions(+), 18 deletions(-)
 create mode 100644 clang/test/CIR/address-space-conversion.cpp

diff --git a/clang/include/clang/CIR/Dialect/Builder/CIRBaseBuilder.h 
b/clang/include/clang/CIR/Dialect/Builder/CIRBaseBuilder.h
index cef8624e65d57..bf4a9b8438982 100644
--- a/clang/include/clang/CIR/Dialect/Builder/CIRBaseBuilder.h
+++ b/clang/include/clang/CIR/Dialect/Builder/CIRBaseBuilder.h
@@ -424,6 +424,15 @@ class CIRBaseBuilderTy : public mlir::OpBuilder {
 return createBitcast(src, getPointerTo(newPointeeTy));
   }
 
+  mlir::Value createAddrSpaceCast(mlir::Location loc, mlir::Value src,
+  mlir::Type newTy) {
+return createCast(loc, cir::CastKind::address_space, src, newTy);
+  }
+
+  mlir::Value createAddrSpaceCast(mlir::Value src, mlir::Type newTy) {
+return createAddrSpaceCast(src.getLoc(), src, newTy);
+  }
+
   
//======//
   // Binary Operators
   
//======//
diff --git a/clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp 
b/clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp
index cf17de144f4d9..95e392d860518 100644
--- a/clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp
@@ -58,6 +58,24 @@ static RValue emitBuiltinBitOp(CIRGenFunction &cgf, const 
CallExpr *e,
   return RValue::get(result);
 }
 
+// Initialize the alloca with the given size and alignment according to the 
lang
+// opts. Supporting only the trivial non-initialization for now.
+static void initializeAlloca(CIRGenFunction &CGF,
+ [[maybe_unused]] mlir::Value AllocaAddr,
+ [[maybe_unused]] mlir::Value Size,
+ [[maybe_unused]] CharUnits AlignmentInBytes) {
+
+  switch (CGF.getLangOpts().getTrivialAutoVarInit()) {
+  case LangOptions::TrivialAutoVarInitKind::Uninitialized:
+// Nothing to initialize.
+return;
+  case LangOptions::TrivialAutoVarInitKind::Zero:
+  case LangOptions::TrivialAutoVarInitKind::Pattern:
+assert(false && "unexpected trivial auto var init kind NYI");
+return;
+  }
+}
+
 RValue CIRGenFunction::emitRotate(const CallExpr *e, bool isRotateLeft) {
   mlir::Value input = emitScalarExpr(e->getArg(0));
   mlir::Value amount = emitScalarExpr(e->getArg(1));
@@ -172,21 +190,8 @@ RValue CIRGenFunction::emitBuiltinExpr(const GlobalDecl 
&gd, unsigned builtinID,
 builder.getUInt8Ty(), "bi_alloca", suitableAlignmentInBytes, size);
 
 // Initialize the allocated buffer if required.
-if (builtinID != Builtin::BI__builtin_alloca_uninitialized) {
-  // Initialize the alloca with the given size and alignment according to
-  // the lang opts. Only the trivial non-initialization is supported for
-  // now.
-
-  switch (getLangOpts().getTrivialAutoVarInit()) {
-  case LangOptions::TrivialAutoVarInitKind::Uninitialized:
-// Nothing to initialize.
-break;
-  case LangOptions::TrivialAutoVarInitKind::Zero:
-  case LangOptions::TrivialAutoVarInitKind::Pattern:
-cgm.errorNYI("trivial auto var init");
-break;
-  }
-}
+if (builtinID != Builtin::BI__builtin_alloca_uninitialized)
+  initializeAlloca(*this, allocaAddr, size, suitableAlignmentInBytes);
 
 // An alloca will always return a pointer to the alloca (stack) address
 // space. This address space need not be the same as the AST / Language
@@ -194,6 +199,12 @@ RValue CIRGenFunction::emitBuiltinExpr(const GlobalDecl 
&gd, unsigned builtinID,
 // the AST level this is handled within CreateTempAlloca et al., but for 
the
 // builtin / dynamic alloca we have to handle it here.
 assert(!cir::MissingFeatures::addressSpace());
+cir::AddressSpace aas = getCIRAllocaAddressSpace();
+cir::AddressSpace eas = cir::toCIRAddressSpace(
+e->getType()->getPointeeType().getAddressSpace());
+if (eas != aas) {
+  assert(false && "Non-default address

  1   2   >