[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)

2025-06-16 Thread Andrzej Warzyński via llvm-branch-commits


@@ -83,16 +84,48 @@ func.func @transfer_read_dims_mismatch_contiguous(
   return %res : vector<1x1x2x2xi8>
 }
 
-// CHECK-LABEL:   func.func @transfer_read_dims_mismatch_contiguous(
+// CHECK-LABEL:   func.func @transfer_read_dims_mismatch_contiguous_unit_dims(
 // CHECK-SAME:  %[[MEM:.*]]: memref<5x4x3x2xi8, strided<[24, 6, 2, 1], 
offset: ?>>) -> vector<1x1x2x2xi8> {
 // CHECK:   %[[VAL_1:.*]] = arith.constant 0 : i8
 // CHECK:   %[[VAL_2:.*]] = arith.constant 0 : index
-// CHECK:   %[[VAL_3:.*]] = memref.collapse_shape %[[MEM]] {{\[\[}}0, 
1, 2, 3]] : memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>> into 
memref<120xi8, strided<[1], offset: ?>>
-// CHECK:   %[[VAL_4:.*]] = vector.transfer_read 
%[[VAL_3]]{{\[}}%[[VAL_2]]], %[[VAL_1]] {in_bounds = [true]} : memref<120xi8, 
strided<[1], offset: ?>>, vector<4xi8>
+// CHECK:   %[[VAL_3:.*]] = memref.collapse_shape %[[MEM]]
+// CHECK-SAME{LITERAL}: [[0], [1], [2, 3]]
+// CHECK-SAME:: memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>> 
into memref<5x4x6xi8, strided<[24, 6, 1], offset: ?>>
+// CHECK:   %[[VAL_4:.*]] = vector.transfer_read 
%[[VAL_3]][%[[VAL_2]], %[[VAL_2]], %[[VAL_2]]], %[[VAL_1]] {in_bounds = [true]} 
: memref<5x4x6xi8, strided<[24, 6, 1], offset: ?>>, vector<4xi8>
 // CHECK:   %[[VAL_5:.*]] = vector.shape_cast %[[VAL_4]] : 
vector<4xi8> to vector<1x1x2x2xi8>
 // CHECK:   return %[[VAL_5]] : vector<1x1x2x2xi8>
 
-// CHECK-128B-LABEL: func @transfer_read_dims_mismatch_contiguous(
+// CHECK-128B-LABEL: func @transfer_read_dims_mismatch_contiguous_unit_dims(
+//   CHECK-128B:   memref.collapse_shape
+
+// -
+
+// The shape of the memref and the vector don't match, but the vector is a
+// contiguous subset of the memref, so "flattenable"
+
+func.func @transfer_read_dims_mismatch_contiguous_non_unit_dims(
+%mem : memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>) -> 
vector<2x3x2xi8> {
+
+  %c0 = arith.constant 0 : index
+  %cst = arith.constant 0 : i8
+  %res = vector.transfer_read %mem[%c0, %c0, %c0, %c0], %cst :
+memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>, vector<2x3x2xi8>
+  return %res : vector<2x3x2xi8>
+}
+
+// CHECK-LABEL: func.func 
@transfer_read_dims_mismatch_contiguous_non_unit_dims(
+// CHECK-SAME:%[[MEM:.+]]: memref<5x4x3x2xi8, {{.+}}>) -> vector<2x3x2xi8> 
{
+// CHECK: %[[C0_I8:.+]] = arith.constant 0 : i8
+// CHECK: %[[C0:.+]] = arith.constant 0 : index
+// CHECK: %[[COLLAPSED_MEM:.+]] = memref.collapse_shape %[[MEM]]
+// CHECK-SAME{LITERAL}: [[0], [1, 2, 3]]
+// CHECK-SAME:  : memref<5x4x3x2xi8, {{.+}}> into memref<5x24xi8, 
{{.+}}>
+// CHECK: %[[VEC_1D:.+]] = vector.transfer_read 
%[[COLLAPSED_MEM]][%[[C0]], %[[C0]]], %[[C0_I8]] {in_bounds = [true]}
+// CHECK-SAME:  : memref<5x24xi8, strided<[24, 1], offset: ?>>, 
vector<12xi8>
+// CHECK: %[[VEC:.+]] = vector.shape_cast %[[VEC_1D]] : vector<12xi8> 
to vector<2x3x2xi8>
+// CHECK: return %[[VEC]] : vector<2x3x2xi8>

banach-space wrote:

> I don't understand the rationale behind having these in a particular order.

The current ordering feels reversed to me.

In my head, it's clearer to start with the most basic case - e.g., the one with 
no leading unit dims - and then progressively build on it by adding complexity, 
such as leading unit dims. Right now, the more complex case comes first, which 
makes the overall structure harder to follow.

From another angle: the naming in

* `@transfer_read_dims_mismatch_contiguous_non_unit_dims`

is confusing, especially when compared to tests like

* `@transfer_read_dims_match_contiguous_empty_stride`.

Why is the absence of unit dims significant here? That may be obvious now, but 
it's not something I’m likely to remember when revisiting this file in the 
future.

To improve readability and flow, I suggest:
* Rename `@transfer_read_dims_mismatch_contiguous_non_unit_dims` -> 
`@transfer_read_dims_mismatch_contiguous`
* Move it before `@transfer_read_dims_mismatch_contiguous_unit_dims` to 
preserve a "simple-to-complex" test progression.

Thanks!

https://github.com/llvm/llvm-project/pull/142422
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/20.x: [InstCombine] Avoid folding select(umin(X, Y), X) with min/max values in false arm (#143020) (PR #144322)

2025-06-16 Thread Nikita Popov via llvm-branch-commits

https://github.com/nikic created 
https://github.com/llvm/llvm-project/pull/144322

Backport of https://github.com/llvm/llvm-project/pull/143020 for 
https://github.com/llvm/llvm-project/issues/139050.

>From 9792f981063d6ddadd3678ac31e2254daa6aa9cf Mon Sep 17 00:00:00 2001
From: Nikita Popov 
Date: Fri, 6 Jun 2025 17:50:16 +0200
Subject: [PATCH 1/2] [PhaseOrdering] Add test for #139050 (NFC)

(cherry picked from commit cef5a3155bab9a2db5389f782471d56f1dd15b61)
---
 .../PhaseOrdering/X86/vector-reductions.ll| 50 +++
 1 file changed, 50 insertions(+)

diff --git a/llvm/test/Transforms/PhaseOrdering/X86/vector-reductions.ll 
b/llvm/test/Transforms/PhaseOrdering/X86/vector-reductions.ll
index 254136b0b841a..f8450766037b2 100644
--- a/llvm/test/Transforms/PhaseOrdering/X86/vector-reductions.ll
+++ b/llvm/test/Transforms/PhaseOrdering/X86/vector-reductions.ll
@@ -325,3 +325,53 @@ cleanup:
   %retval.0 = phi i1 [ false, %if.then ], [ true, %if.end ]
   ret i1 %retval.0
 }
+
+; From https://github.com/llvm/llvm-project/issues/139050.
+; FIXME: This should be vectorized.
+define i8 @masked_min_reduction(ptr %data, ptr %mask) {
+; CHECK-LABEL: @masked_min_reduction(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:br label [[VECTOR_BODY:%.*]]
+; CHECK:   loop:
+; CHECK-NEXT:[[INDEX:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ 
[[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:[[ACC:%.*]] = phi i8 [ -1, [[ENTRY]] ], [ [[TMP21:%.*]], 
[[VECTOR_BODY]] ]
+; CHECK-NEXT:[[DATA:%.*]] = getelementptr i8, ptr [[DATA1:%.*]], i64 
[[INDEX]]
+; CHECK-NEXT:[[VAL:%.*]] = load i8, ptr [[DATA]], align 1
+; CHECK-NEXT:[[TMP7:%.*]] = getelementptr i8, ptr [[MASK:%.*]], i64 
[[INDEX]]
+; CHECK-NEXT:[[M:%.*]] = load i8, ptr [[TMP7]], align 1
+; CHECK-NEXT:[[COND:%.*]] = icmp eq i8 [[M]], 0
+; CHECK-NEXT:[[TMP0:%.*]] = tail call i8 @llvm.umin.i8(i8 [[ACC]], i8 
[[VAL]])
+; CHECK-NEXT:[[TMP21]] = select i1 [[COND]], i8 [[TMP0]], i8 [[ACC]]
+; CHECK-NEXT:[[INDEX_NEXT]] = add nuw nsw i64 [[INDEX]], 1
+; CHECK-NEXT:[[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
+; CHECK-NEXT:br i1 [[TMP20]], label [[EXIT:%.*]], label [[VECTOR_BODY]]
+; CHECK:   exit:
+; CHECK-NEXT:ret i8 [[TMP21]]
+;
+entry:
+  br label %loop
+
+loop:
+  %i = phi i64 [ 0, %entry ], [ %next, %loop ]
+  %acc = phi i8 [ 255, %entry ], [ %acc_next, %loop ]
+
+  %ptr_i = getelementptr i8, ptr %data, i64 %i
+  %val = load i8, ptr %ptr_i, align 1
+
+  %mask_ptr = getelementptr i8, ptr %mask, i64 %i
+  %m = load i8, ptr %mask_ptr, align 1
+  %cond = icmp eq i8 %m, 0
+
+  ; Use select to implement masking
+  %masked_val = select i1 %cond, i8 %val, i8 255
+
+  ; min reduction
+  %acc_next = call i8 @llvm.umin.i8(i8 %acc, i8 %masked_val)
+
+  %next = add i64 %i, 1
+  %cmp = icmp ult i64 %next, 1024
+  br i1 %cmp, label %loop, label %exit
+
+exit:
+  ret i8 %acc_next
+}

>From 3d02b33c4b791b1e19bfdc5a7a6d372fa6e527ac Mon Sep 17 00:00:00 2001
From: Konstantin Bogdanov 
Date: Sat, 14 Jun 2025 09:32:54 +0300
Subject: [PATCH 2/2] [InstCombine] Avoid folding `select(umin(X, Y), X)` with
 min/max values in false arm (#143020)

Fixes https://github.com/llvm/llvm-project/issues/139050.

This patch adds a check to avoid folding min/max reduction into select, which 
may block loop vectorization.

The issue is that the following snippet:
```
declare i8 @llvm.umin.i8(i8, i8)

define i8 @masked_min_fold_bug(i8 %acc, i8 %val, i8 %mask) {
; CHECK-LABEL: @masked_min_fold_bug(
; CHECK:   %cond = icmp eq i8 %mask, 0
; CHECK:   %masked_val = select i1 %cond, i8 %val, i8 255
; CHECK:   call i8 @llvm.umin.i8(i8 %acc, i8 %masked_val)
;
  %cond = icmp eq i8 %mask, 0
  %masked_val = select i1 %cond, i8 %val, i8 255
  %res = call i8 @llvm.umin.i8(i8 %acc, i8 %masked_val)
  ret i8 %res
}
```

is being optimized to the following code, which can not be vectorized
later.
```
declare i8 @llvm.umin.i8(i8, i8) #0

define i8 @masked_min_fold_bug(i8 %acc, i8 %val, i8 %mask) {
  %cond = icmp eq i8 %mask, 0
  %1 = call i8 @llvm.umin.i8(i8 %acc, i8 %val)
  %res = select i1 %cond, i8 %1, i8 %acc
  ret i8 %res
}

attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn 
memory(none) }
```

Expected:
```
declare i8 @llvm.umin.i8(i8, i8) #0

define i8 @masked_min_fold_bug(i8 %acc, i8 %val, i8 %mask) {
  %cond = icmp eq i8 %mask, 0
  %masked_val = select i1 %cond, i8 %val, i8 -1
  %res = call i8 @llvm.umin.i8(i8 %acc, i8 %masked_val)
  ret i8 %res
}

attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn 
memory(none) }
```

https://godbolt.org/z/cYMheKE5r
(cherry picked from commit 07fa6d1d90c714fa269529c3e5004a063d814c4a)
---
 .../InstCombine/InstructionCombining.cpp  |  9 
 llvm/test/Transforms/InstCombine/select.ll| 47 +
 .../PhaseOrdering/X86/vector-reductions.ll| 50 ++-
 3 files changed, 94 insertions(+), 12 deletions(-)

diff --git a/llvm/lib/Transfo

[llvm-branch-commits] [llvm] release/20.x: [InstCombine] Avoid folding select(umin(X, Y), X) with min/max values in false arm (#143020) (PR #144322)

2025-06-16 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-transforms

Author: Nikita Popov (nikic)


Changes

Backport of https://github.com/llvm/llvm-project/pull/143020 for 
https://github.com/llvm/llvm-project/issues/139050.

---
Full diff: https://github.com/llvm/llvm-project/pull/144322.diff


3 Files Affected:

- (modified) llvm/lib/Transforms/InstCombine/InstructionCombining.cpp (+9) 
- (modified) llvm/test/Transforms/InstCombine/select.ll (+47) 
- (modified) llvm/test/Transforms/PhaseOrdering/X86/vector-reductions.ll (+76) 


``diff
diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp 
b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
index a64c188575e6c..0f5e867877da2 100644
--- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
@@ -1697,6 +1697,15 @@ Instruction 
*InstCombinerImpl::FoldOpIntoSelect(Instruction &Op, SelectInst *SI,
   if (SI->getType()->isIntOrIntVectorTy(1))
 return nullptr;
 
+  // Avoid breaking min/max reduction pattern,
+  // which is necessary for vectorization later.
+  if (isa(&Op))
+for (Value *IntrinOp : Op.operands())
+  if (auto *PN = dyn_cast(IntrinOp))
+for (Value *PhiOp : PN->operands())
+  if (PhiOp == &Op)
+return nullptr;
+
   // Test if a FCmpInst instruction is used exclusively by a select as
   // part of a minimum or maximum operation. If so, refrain from doing
   // any other folding. This helps out other analyses which understand
diff --git a/llvm/test/Transforms/InstCombine/select.ll 
b/llvm/test/Transforms/InstCombine/select.ll
index 3c3111492fc68..e8a32cb1697a5 100644
--- a/llvm/test/Transforms/InstCombine/select.ll
+++ b/llvm/test/Transforms/InstCombine/select.ll
@@ -4901,3 +4901,50 @@ define i32 @src_simplify_2x_at_once_and(i32 %x, i32 %y) {
   %cond = select i1 %and0, i32 %sub, i32 %xor
   ret i32 %cond
 }
+
+define void @no_fold_masked_min_loop(ptr nocapture readonly %vals, ptr 
nocapture readonly %masks, ptr nocapture %out, i64 %n) {
+; CHECK-LABEL: @no_fold_masked_min_loop(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:br label [[LOOP:%.*]]
+; CHECK:   loop:
+; CHECK-NEXT:[[INDEX:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ 
[[NEXT_INDEX:%.*]], [[LOOP]] ]
+; CHECK-NEXT:[[ACC:%.*]] = phi i8 [ -1, [[ENTRY]] ], [ [[RES:%.*]], 
[[LOOP]] ]
+; CHECK-NEXT:[[VAL_PTR:%.*]] = getelementptr inbounds i8, ptr 
[[VALS:%.*]], i64 [[INDEX]]
+; CHECK-NEXT:[[MASK_PTR:%.*]] = getelementptr inbounds i8, ptr 
[[MASKS:%.*]], i64 [[INDEX]]
+; CHECK-NEXT:[[VAL:%.*]] = load i8, ptr [[VAL_PTR]], align 1
+; CHECK-NEXT:[[MASK:%.*]] = load i8, ptr [[MASK_PTR]], align 1
+; CHECK-NEXT:[[COND:%.*]] = icmp eq i8 [[MASK]], 0
+; CHECK-NEXT:[[MASKED_VAL:%.*]] = select i1 [[COND]], i8 [[VAL]], i8 -1
+; CHECK-NEXT:[[RES]] = call i8 @llvm.umin.i8(i8 [[ACC]], i8 [[MASKED_VAL]])
+; CHECK-NEXT:[[NEXT_INDEX]] = add i64 [[INDEX]], 1
+; CHECK-NEXT:[[DONE:%.*]] = icmp eq i64 [[NEXT_INDEX]], [[N:%.*]]
+; CHECK-NEXT:br i1 [[DONE]], label [[EXIT:%.*]], label [[LOOP]]
+; CHECK:   exit:
+; CHECK-NEXT:store i8 [[RES]], ptr [[OUT:%.*]], align 1
+; CHECK-NEXT:ret void
+;
+entry:
+  br label %loop
+
+loop:
+  %index = phi i64 [0, %entry], [%next_index, %loop]
+  %acc = phi i8 [255, %entry], [%res, %loop]
+
+  %val_ptr = getelementptr inbounds i8, ptr %vals, i64 %index
+  %mask_ptr = getelementptr inbounds i8, ptr %masks, i64 %index
+
+  %val = load i8, ptr %val_ptr, align 1
+  %mask = load i8, ptr %mask_ptr, align 1
+
+  %cond = icmp eq i8 %mask, 0
+  %masked_val = select i1 %cond, i8 %val, i8 -1
+  %res = call i8 @llvm.umin.i8(i8 %acc, i8 %masked_val)
+
+  %next_index = add i64 %index, 1
+  %done = icmp eq i64 %next_index, %n
+  br i1 %done, label %exit, label %loop
+
+exit:
+  store i8 %res, ptr %out, align 1
+  ret void
+}
diff --git a/llvm/test/Transforms/PhaseOrdering/X86/vector-reductions.ll 
b/llvm/test/Transforms/PhaseOrdering/X86/vector-reductions.ll
index 254136b0b841a..2ec48a8637dae 100644
--- a/llvm/test/Transforms/PhaseOrdering/X86/vector-reductions.ll
+++ b/llvm/test/Transforms/PhaseOrdering/X86/vector-reductions.ll
@@ -325,3 +325,79 @@ cleanup:
   %retval.0 = phi i1 [ false, %if.then ], [ true, %if.end ]
   ret i1 %retval.0
 }
+
+define i8 @masked_min_reduction(ptr %data, ptr %mask) {
+; CHECK-LABEL: @masked_min_reduction(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:br label [[VECTOR_BODY:%.*]]
+; CHECK:   vector.body:
+; CHECK-NEXT:[[INDEX:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ 
[[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:[[VEC_PHI:%.*]] = phi <32 x i8> [ splat (i8 -1), [[ENTRY]] ], 
[ [[TMP16:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:[[VEC_PHI1:%.*]] = phi <32 x i8> [ splat (i8 -1), [[ENTRY]] 
], [ [[TMP17:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:[[VEC_PHI2:%.*]] = phi <32 x i8> [ splat (i8 -1), [[ENTRY]] 
], [ [[TMP18:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:[[VEC_PHI3:%.*]] = p

[llvm-branch-commits] [llvm] release/20.x: [InstCombine] Avoid folding select(umin(X, Y), X) with min/max values in false arm (#143020) (PR #144322)

2025-06-16 Thread Nikita Popov via llvm-branch-commits

https://github.com/nikic milestoned 
https://github.com/llvm/llvm-project/pull/144322
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/20.x: [InstCombine] Avoid folding select(umin(X, Y), X) with min/max values in false arm (#143020) (PR #144322)

2025-06-16 Thread Nikita Popov via llvm-branch-commits

nikic wrote:

I'm not entirely sure about this one, as this is a performance rather than 
correctness fix and we're late in the release cycle. But the fix itself does 
seem quite safe and it blocks clickhouse from updating to LLVM 20.

https://github.com/llvm/llvm-project/pull/144322
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/20.x: [InstCombine] Avoid folding select(umin(X, Y), X) with min/max values in false arm (#143020) (PR #144322)

2025-06-16 Thread via llvm-branch-commits

github-actions[bot] wrote:

⚠️ We detected that you are using a GitHub private e-mail address to contribute 
to the repo. Please turn off [Keep my email addresses 
private](https://github.com/settings/emails) setting in your account. See 
[LLVM 
Discourse](https://discourse.llvm.org/t/hidden-emails-on-github-should-we-do-something-about-it)
 for more information.

https://github.com/llvm/llvm-project/pull/144322
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/20.x: [LoongArch] Precommit test case to show bug in LoongArchISelDagToDag (PR #144296)

2025-06-16 Thread via llvm-branch-commits

https://github.com/leecheechen closed 
https://github.com/llvm/llvm-project/pull/144296
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Backport 90a52f494296 to release/20.x (PR #144299)

2025-06-16 Thread Nikita Popov via llvm-branch-commits

https://github.com/nikic milestoned 
https://github.com/llvm/llvm-project/pull/144299
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Factor out MCInstReference from gadget scanner (NFC) (PR #138655)

2025-06-16 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/138655

>From bc69705cc0d77104af6d3b37d49c982c6ec4cfee Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Mon, 28 Apr 2025 18:35:48 +0300
Subject: [PATCH] [BOLT] Factor out MCInstReference from gadget scanner (NFC)

Move MCInstReference representing a constant reference to an instruction
inside a parent entity - either inside a basic block (which has a
reference to its parent function) or directly to the function (when CFG
information is not available).
---
 bolt/include/bolt/Core/MCInstUtils.h  | 168 +
 bolt/include/bolt/Passes/PAuthGadgetScanner.h | 178 +-
 bolt/lib/Core/CMakeLists.txt  |   1 +
 bolt/lib/Core/MCInstUtils.cpp |  57 ++
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 102 +-
 5 files changed, 269 insertions(+), 237 deletions(-)
 create mode 100644 bolt/include/bolt/Core/MCInstUtils.h
 create mode 100644 bolt/lib/Core/MCInstUtils.cpp

diff --git a/bolt/include/bolt/Core/MCInstUtils.h 
b/bolt/include/bolt/Core/MCInstUtils.h
new file mode 100644
index 0..69bf5e6159b74
--- /dev/null
+++ b/bolt/include/bolt/Core/MCInstUtils.h
@@ -0,0 +1,168 @@
+//===- bolt/Core/MCInstUtils.h --*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef BOLT_CORE_MCINSTUTILS_H
+#define BOLT_CORE_MCINSTUTILS_H
+
+#include "bolt/Core/BinaryBasicBlock.h"
+
+#include 
+#include 
+#include 
+
+namespace llvm {
+namespace bolt {
+
+class BinaryFunction;
+
+/// MCInstReference represents a reference to a constant MCInst as stored 
either
+/// in a BinaryFunction (i.e. before a CFG is created), or in a 
BinaryBasicBlock
+/// (after a CFG is created).
+class MCInstReference {
+  using nocfg_const_iterator = std::map::const_iterator;
+
+  // Two cases are possible:
+  // * functions with CFG reconstructed - a function stores a collection of
+  //   basic blocks, each basic block stores a contiguous vector of MCInst
+  // * functions without CFG - there are no basic blocks created,
+  //   the instructions are directly stored in std::map in BinaryFunction
+  //
+  // In both cases, the direct parent of MCInst is stored together with an
+  // iterator pointing to the instruction.
+
+  // Helper struct: CFG is available, the direct parent is a basic block,
+  // iterator's type is `MCInst *`.
+  struct RefInBB {
+RefInBB(const BinaryBasicBlock *BB, const MCInst *Inst)
+: BB(BB), It(Inst) {}
+RefInBB(const RefInBB &Other) = default;
+RefInBB &operator=(const RefInBB &Other) = default;
+
+const BinaryBasicBlock *BB;
+BinaryBasicBlock::const_iterator It;
+
+bool operator<(const RefInBB &Other) const {
+  return std::tie(BB, It) < std::tie(Other.BB, Other.It);
+}
+
+bool operator==(const RefInBB &Other) const {
+  return BB == Other.BB && It == Other.It;
+}
+  };
+
+  // Helper struct: CFG is *not* available, the direct parent is a function,
+  // iterator's type is std::map::iterator (the mapped value
+  // is an instruction's offset).
+  struct RefInBF {
+RefInBF(const BinaryFunction *BF, nocfg_const_iterator It)
+: BF(BF), It(It) {}
+RefInBF(const RefInBF &Other) = default;
+RefInBF &operator=(const RefInBF &Other) = default;
+
+const BinaryFunction *BF;
+nocfg_const_iterator It;
+
+bool operator<(const RefInBF &Other) const {
+  return std::tie(BF, It->first) < std::tie(Other.BF, Other.It->first);
+}
+
+bool operator==(const RefInBF &Other) const {
+  return BF == Other.BF && It->first == Other.It->first;
+}
+  };
+
+  std::variant Reference;
+
+  // Utility methods to be used like this:
+  //
+  // if (auto *Ref = tryGetRefInBB())
+  //   return Ref->doSomething(...);
+  // return getRefInBF().doSomethingElse(...);
+  const RefInBB *tryGetRefInBB() const {
+assert(std::get_if(&Reference) ||
+   std::get_if(&Reference));
+return std::get_if(&Reference);
+  }
+  const RefInBF &getRefInBF() const {
+assert(std::get_if(&Reference));
+return *std::get_if(&Reference);
+  }
+
+public:
+  /// Constructs an empty reference.
+  MCInstReference() : Reference(RefInBB(nullptr, nullptr)) {}
+  /// Constructs a reference to the instruction inside the basic block.
+  MCInstReference(const BinaryBasicBlock *BB, const MCInst *Inst)
+  : Reference(RefInBB(BB, Inst)) {
+assert(BB && Inst && "Neither BB nor Inst should be nullptr");
+  }
+  /// Constructs a reference to the instruction inside the basic block.
+  MCInstReference(const BinaryBasicBlock *BB, unsigned Index)
+  : Reference(RefInBB(BB, &BB->getInstructionAtIndex(I

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: fix LR to be safe in leaf functions without CFG (PR #141824)

2025-06-16 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/141824

>From 54c1b3aeaa9b269c57f3d121e974156204cf2ed7 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Wed, 14 May 2025 23:12:13 +0300
Subject: [PATCH] [BOLT] Gadget scanner: fix LR to be safe in leaf functions
 without CFG

After a label in a function without CFG information, use a reasonably
pessimistic estimation of register state (assume that any register that
can be clobbered in this function was actually clobbered) instead of the
most pessimistic "all registers are unsafe". This is the same estimation
as used by the dataflow variant of the analysis when the preceding
instruction is not known for sure.

Without this, leaf functions without CFG information are likely to have
false positive reports about non-protected return instructions, as
1) LR is unlikely to be signed and authenticated in a leaf function and
2) LR is likely to be used by a return instruction near the end of the
   function and
3) the register state is likely to be reset at least once during the
   linear scan through the function
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 14 +++--
 .../AArch64/gs-pacret-autiasp.s   | 31 +--
 .../AArch64/gs-pauth-authentication-oracles.s | 20 
 .../AArch64/gs-pauth-debug-output.s   | 30 ++
 .../AArch64/gs-pauth-signing-oracles.s| 27 
 5 files changed, 29 insertions(+), 93 deletions(-)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index e5bdade032488..05309a47aba40 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -737,19 +737,14 @@ template  class CFGUnawareAnalysis {
 //
 // Then, a function can be split into a number of disjoint contiguous sequences
 // of instructions without labels in between. These sequences can be processed
-// the same way basic blocks are processed by data-flow analysis, assuming
-// pessimistically that all registers are unsafe at the start of each sequence.
+// the same way basic blocks are processed by data-flow analysis, with the same
+// pessimistic estimation of the initial state at the start of each sequence
+// (except the first instruction of the function).
 class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis,
 public CFGUnawareAnalysis {
   using SrcSafetyAnalysis::BC;
   BinaryFunction &BF;
 
-  /// Creates a state with all registers marked unsafe (not to be confused
-  /// with empty state).
-  SrcState createUnsafeState() const {
-return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
-  }
-
 public:
   CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF,
   MCPlusBuilder::AllocatorIdTy AllocId,
@@ -759,6 +754,7 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis,
   }
 
   void run() override {
+const SrcState DefaultState = computePessimisticState(BF);
 SrcState S = createEntryState();
 for (auto &I : BF.instrs()) {
   MCInst &Inst = I.second;
@@ -773,7 +769,7 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis,
 LLVM_DEBUG({
   traceInst(BC, "Due to label, resetting the state before", Inst);
 });
-S = createUnsafeState();
+S = DefaultState;
   }
 
   // Attach the state *before* this instruction executes.
diff --git a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s 
b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
index df0a83be00986..627f8eb20ab9c 100644
--- a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
+++ b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
@@ -224,20 +224,33 @@ f_unreachable_instruction:
 ret
 .size f_unreachable_instruction, .-f_unreachable_instruction
 
-// Expected false positive: without CFG, the state is reset to all-unsafe
-// after an unconditional branch.
-
-.globl  state_is_reset_after_indirect_branch_nocfg
-.type   state_is_reset_after_indirect_branch_nocfg,@function
-state_is_reset_after_indirect_branch_nocfg:
-// CHECK-LABEL: GS-PAUTH: non-protected ret found in function 
state_is_reset_after_indirect_branch_nocfg, at address
-// CHECK-NEXT:  The instruction is {{[0-9a-f]+}}: ret
+// Without CFG, the state is reset at labels, assuming every register that can
+// be clobbered in the function was actually clobbered.
+
+.globl  lr_untouched_nocfg
+.type   lr_untouched_nocfg,@function
+lr_untouched_nocfg:
+// CHECK-NOT: lr_untouched_nocfg
+adr x2, 1f
+br  x2
+1:
+ret
+.size lr_untouched_nocfg, .-lr_untouched_nocfg
+
+.globl  lr_clobbered_nocfg
+.type   lr_clobbered_nocfg,@function
+lr_clobbered_nocfg:
+// CHECK-LABEL: GS-PAUTH: non-protected ret found in function 
lr_clobbered_nocfg, at address
+// CHECK-NEXT:  The instruction is

[llvm-branch-commits] [llvm] PowerPC: Fix using long double libm functions for f128 intrinsics (PR #144382)

2025-06-16 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/144382

This wasn't setting the correct libcall names, which default to the
l suffixed libm names.

>From 2c41a2154737d1e237fcfefecf1638a80dc09da1 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Tue, 17 Jun 2025 01:03:13 +0900
Subject: [PATCH] PowerPC: Fix using long double libm functions for f128
 intrinsics

This wasn't setting the correct libcall names, which default to the
l suffixed libm names.
---
 llvm/lib/IR/RuntimeLibcalls.cpp | 25 +
 llvm/test/CodeGen/PowerPC/f128-arith.ll | 48 -
 2 files changed, 49 insertions(+), 24 deletions(-)

diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp
index d63d398e243f9..51d7738151237 100644
--- a/llvm/lib/IR/RuntimeLibcalls.cpp
+++ b/llvm/lib/IR/RuntimeLibcalls.cpp
@@ -321,6 +321,24 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT) {
 setLibcallName(RTLIB::OGT_F128, "__gtkf2");
 setLibcallName(RTLIB::UO_F128, "__unordkf2");
 
+setLibcallName(RTLIB::ACOS_F128, "acosf128");
+setLibcallName(RTLIB::ASIN_F128, "asinf128");
+setLibcallName(RTLIB::ATAN2_F128, "atan2f128");
+setLibcallName(RTLIB::ATAN_F128, "atanf128");
+setLibcallName(RTLIB::CBRT_F128, "cbrtf128");
+setLibcallName(RTLIB::COPYSIGN_F128, "copysignf128");
+setLibcallName(RTLIB::COSH_F128, "coshf128");
+setLibcallName(RTLIB::EXP10_F128, "exp10f128");
+setLibcallName(RTLIB::FMAXIMUM_F128, "fmaximumf128");
+setLibcallName(RTLIB::FMAXIMUM_NUM_F128, "fmaximum_numf128");
+setLibcallName(RTLIB::FMINIMUM_F128, "fminimumf128");
+setLibcallName(RTLIB::FMINIMUM_NUM_F128, "fminimum_numf128");
+setLibcallName(RTLIB::LDEXP_F128, "ldexpf128");
+setLibcallName(RTLIB::MODF_F128, "modff128");
+setLibcallName(RTLIB::ROUNDEVEN_F128, "roundevenf128");
+setLibcallName(RTLIB::SINH_F128, "sinhf128");
+setLibcallName(RTLIB::TANH_F128, "tanhf128");
+setLibcallName(RTLIB::TAN_F128, "tanf128");
 setLibcallName(RTLIB::LOG_F128, "logf128");
 setLibcallName(RTLIB::LOG2_F128, "log2f128");
 setLibcallName(RTLIB::LOG10_F128, "log10f128");
@@ -347,6 +365,13 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT) {
 setLibcallName(RTLIB::FMA_F128, "fmaf128");
 setLibcallName(RTLIB::FREXP_F128, "frexpf128");
 
+// TODO: Do these exist?
+setLibcallName(RTLIB::EXP_FINITE_F128, nullptr);
+setLibcallName(RTLIB::EXP2_FINITE_F128, nullptr);
+setLibcallName(RTLIB::LOG_FINITE_F128, nullptr);
+setLibcallName(RTLIB::LOG2_FINITE_F128, nullptr);
+setLibcallName(RTLIB::LOG10_FINITE_F128, nullptr);
+
 if (TT.isOSAIX()) {
   bool isPPC64 = TT.isPPC64();
   setLibcallName(RTLIB::MEMCPY, isPPC64 ? "___memmove64" : "___memmove");
diff --git a/llvm/test/CodeGen/PowerPC/f128-arith.ll 
b/llvm/test/CodeGen/PowerPC/f128-arith.ll
index ffa7ac6cb0078..f9c953d483ff2 100644
--- a/llvm/test/CodeGen/PowerPC/f128-arith.ll
+++ b/llvm/test/CodeGen/PowerPC/f128-arith.ll
@@ -1413,7 +1413,7 @@ define dso_local fp128 @acos_f128(fp128 %x) {
 ; CHECK-NEXT:std r0, 48(r1)
 ; CHECK-NEXT:.cfi_def_cfa_offset 32
 ; CHECK-NEXT:.cfi_offset lr, 16
-; CHECK-NEXT:bl acosl
+; CHECK-NEXT:bl acosf128
 ; CHECK-NEXT:nop
 ; CHECK-NEXT:addi r1, r1, 32
 ; CHECK-NEXT:ld r0, 16(r1)
@@ -1427,7 +1427,7 @@ define dso_local fp128 @acos_f128(fp128 %x) {
 ; CHECK-P8-NEXT:std r0, 48(r1)
 ; CHECK-P8-NEXT:.cfi_def_cfa_offset 32
 ; CHECK-P8-NEXT:.cfi_offset lr, 16
-; CHECK-P8-NEXT:bl acosl
+; CHECK-P8-NEXT:bl acosf128
 ; CHECK-P8-NEXT:nop
 ; CHECK-P8-NEXT:addi r1, r1, 32
 ; CHECK-P8-NEXT:ld r0, 16(r1)
@@ -1445,7 +1445,7 @@ define dso_local fp128 @asin_f128(fp128 %x) {
 ; CHECK-NEXT:std r0, 48(r1)
 ; CHECK-NEXT:.cfi_def_cfa_offset 32
 ; CHECK-NEXT:.cfi_offset lr, 16
-; CHECK-NEXT:bl asinl
+; CHECK-NEXT:bl asinf128
 ; CHECK-NEXT:nop
 ; CHECK-NEXT:addi r1, r1, 32
 ; CHECK-NEXT:ld r0, 16(r1)
@@ -1459,7 +1459,7 @@ define dso_local fp128 @asin_f128(fp128 %x) {
 ; CHECK-P8-NEXT:std r0, 48(r1)
 ; CHECK-P8-NEXT:.cfi_def_cfa_offset 32
 ; CHECK-P8-NEXT:.cfi_offset lr, 16
-; CHECK-P8-NEXT:bl asinl
+; CHECK-P8-NEXT:bl asinf128
 ; CHECK-P8-NEXT:nop
 ; CHECK-P8-NEXT:addi r1, r1, 32
 ; CHECK-P8-NEXT:ld r0, 16(r1)
@@ -1477,7 +1477,7 @@ define dso_local fp128 @atan_f128(fp128 %x) {
 ; CHECK-NEXT:std r0, 48(r1)
 ; CHECK-NEXT:.cfi_def_cfa_offset 32
 ; CHECK-NEXT:.cfi_offset lr, 16
-; CHECK-NEXT:bl atanl
+; CHECK-NEXT:bl atanf128
 ; CHECK-NEXT:nop
 ; CHECK-NEXT:addi r1, r1, 32
 ; CHECK-NEXT:ld r0, 16(r1)
@@ -1491,7 +1491,7 @@ define dso_local fp128 @atan_f128(fp128 %x) {
 ; CHECK-P8-NEXT:std r0, 48(r1)
 ; CHECK-P8-NEXT:.cfi_def_cfa_offset 32
 ; CHECK-P8-NEXT:.cfi_offset lr, 16
-; CHECK-P8-NEXT:bl atanl
+; CHECK-P8-NEXT:bl atanf128
 ; CHECK-P8-NEXT:nop

[llvm-branch-commits] [llvm] PowerPC: Fix using long double libm functions for f128 intrinsics (PR #144382)

2025-06-16 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/144382?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#144382** https://app.graphite.dev/github/pr/llvm/llvm-project/144382?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/144382?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#144381** https://app.graphite.dev/github/pr/llvm/llvm-project/144381?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/144382
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] PowerPC: Fix using long double libm functions for f128 intrinsics (PR #144382)

2025-06-16 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/144382
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] PowerPC: Fix using long double libm functions for f128 intrinsics (PR #144382)

2025-06-16 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-powerpc

Author: Matt Arsenault (arsenm)


Changes

This wasn't setting the correct libcall names, which default to the
l suffixed libm names.

---
Full diff: https://github.com/llvm/llvm-project/pull/144382.diff


2 Files Affected:

- (modified) llvm/lib/IR/RuntimeLibcalls.cpp (+25) 
- (modified) llvm/test/CodeGen/PowerPC/f128-arith.ll (+24-24) 


``diff
diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp
index d63d398e243f9..51d7738151237 100644
--- a/llvm/lib/IR/RuntimeLibcalls.cpp
+++ b/llvm/lib/IR/RuntimeLibcalls.cpp
@@ -321,6 +321,24 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT) {
 setLibcallName(RTLIB::OGT_F128, "__gtkf2");
 setLibcallName(RTLIB::UO_F128, "__unordkf2");
 
+setLibcallName(RTLIB::ACOS_F128, "acosf128");
+setLibcallName(RTLIB::ASIN_F128, "asinf128");
+setLibcallName(RTLIB::ATAN2_F128, "atan2f128");
+setLibcallName(RTLIB::ATAN_F128, "atanf128");
+setLibcallName(RTLIB::CBRT_F128, "cbrtf128");
+setLibcallName(RTLIB::COPYSIGN_F128, "copysignf128");
+setLibcallName(RTLIB::COSH_F128, "coshf128");
+setLibcallName(RTLIB::EXP10_F128, "exp10f128");
+setLibcallName(RTLIB::FMAXIMUM_F128, "fmaximumf128");
+setLibcallName(RTLIB::FMAXIMUM_NUM_F128, "fmaximum_numf128");
+setLibcallName(RTLIB::FMINIMUM_F128, "fminimumf128");
+setLibcallName(RTLIB::FMINIMUM_NUM_F128, "fminimum_numf128");
+setLibcallName(RTLIB::LDEXP_F128, "ldexpf128");
+setLibcallName(RTLIB::MODF_F128, "modff128");
+setLibcallName(RTLIB::ROUNDEVEN_F128, "roundevenf128");
+setLibcallName(RTLIB::SINH_F128, "sinhf128");
+setLibcallName(RTLIB::TANH_F128, "tanhf128");
+setLibcallName(RTLIB::TAN_F128, "tanf128");
 setLibcallName(RTLIB::LOG_F128, "logf128");
 setLibcallName(RTLIB::LOG2_F128, "log2f128");
 setLibcallName(RTLIB::LOG10_F128, "log10f128");
@@ -347,6 +365,13 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT) {
 setLibcallName(RTLIB::FMA_F128, "fmaf128");
 setLibcallName(RTLIB::FREXP_F128, "frexpf128");
 
+// TODO: Do these exist?
+setLibcallName(RTLIB::EXP_FINITE_F128, nullptr);
+setLibcallName(RTLIB::EXP2_FINITE_F128, nullptr);
+setLibcallName(RTLIB::LOG_FINITE_F128, nullptr);
+setLibcallName(RTLIB::LOG2_FINITE_F128, nullptr);
+setLibcallName(RTLIB::LOG10_FINITE_F128, nullptr);
+
 if (TT.isOSAIX()) {
   bool isPPC64 = TT.isPPC64();
   setLibcallName(RTLIB::MEMCPY, isPPC64 ? "___memmove64" : "___memmove");
diff --git a/llvm/test/CodeGen/PowerPC/f128-arith.ll 
b/llvm/test/CodeGen/PowerPC/f128-arith.ll
index ffa7ac6cb0078..f9c953d483ff2 100644
--- a/llvm/test/CodeGen/PowerPC/f128-arith.ll
+++ b/llvm/test/CodeGen/PowerPC/f128-arith.ll
@@ -1413,7 +1413,7 @@ define dso_local fp128 @acos_f128(fp128 %x) {
 ; CHECK-NEXT:std r0, 48(r1)
 ; CHECK-NEXT:.cfi_def_cfa_offset 32
 ; CHECK-NEXT:.cfi_offset lr, 16
-; CHECK-NEXT:bl acosl
+; CHECK-NEXT:bl acosf128
 ; CHECK-NEXT:nop
 ; CHECK-NEXT:addi r1, r1, 32
 ; CHECK-NEXT:ld r0, 16(r1)
@@ -1427,7 +1427,7 @@ define dso_local fp128 @acos_f128(fp128 %x) {
 ; CHECK-P8-NEXT:std r0, 48(r1)
 ; CHECK-P8-NEXT:.cfi_def_cfa_offset 32
 ; CHECK-P8-NEXT:.cfi_offset lr, 16
-; CHECK-P8-NEXT:bl acosl
+; CHECK-P8-NEXT:bl acosf128
 ; CHECK-P8-NEXT:nop
 ; CHECK-P8-NEXT:addi r1, r1, 32
 ; CHECK-P8-NEXT:ld r0, 16(r1)
@@ -1445,7 +1445,7 @@ define dso_local fp128 @asin_f128(fp128 %x) {
 ; CHECK-NEXT:std r0, 48(r1)
 ; CHECK-NEXT:.cfi_def_cfa_offset 32
 ; CHECK-NEXT:.cfi_offset lr, 16
-; CHECK-NEXT:bl asinl
+; CHECK-NEXT:bl asinf128
 ; CHECK-NEXT:nop
 ; CHECK-NEXT:addi r1, r1, 32
 ; CHECK-NEXT:ld r0, 16(r1)
@@ -1459,7 +1459,7 @@ define dso_local fp128 @asin_f128(fp128 %x) {
 ; CHECK-P8-NEXT:std r0, 48(r1)
 ; CHECK-P8-NEXT:.cfi_def_cfa_offset 32
 ; CHECK-P8-NEXT:.cfi_offset lr, 16
-; CHECK-P8-NEXT:bl asinl
+; CHECK-P8-NEXT:bl asinf128
 ; CHECK-P8-NEXT:nop
 ; CHECK-P8-NEXT:addi r1, r1, 32
 ; CHECK-P8-NEXT:ld r0, 16(r1)
@@ -1477,7 +1477,7 @@ define dso_local fp128 @atan_f128(fp128 %x) {
 ; CHECK-NEXT:std r0, 48(r1)
 ; CHECK-NEXT:.cfi_def_cfa_offset 32
 ; CHECK-NEXT:.cfi_offset lr, 16
-; CHECK-NEXT:bl atanl
+; CHECK-NEXT:bl atanf128
 ; CHECK-NEXT:nop
 ; CHECK-NEXT:addi r1, r1, 32
 ; CHECK-NEXT:ld r0, 16(r1)
@@ -1491,7 +1491,7 @@ define dso_local fp128 @atan_f128(fp128 %x) {
 ; CHECK-P8-NEXT:std r0, 48(r1)
 ; CHECK-P8-NEXT:.cfi_def_cfa_offset 32
 ; CHECK-P8-NEXT:.cfi_offset lr, 16
-; CHECK-P8-NEXT:bl atanl
+; CHECK-P8-NEXT:bl atanf128
 ; CHECK-P8-NEXT:nop
 ; CHECK-P8-NEXT:addi r1, r1, 32
 ; CHECK-P8-NEXT:ld r0, 16(r1)
@@ -1509,7 +1509,7 @@ define dso_local fp128 @atan2_f128(fp128 %x, fp128 %y) {
 ; CHECK-NEXT:std r0, 48(r1)
 ; CHECK-NEXT:.cfi_def_cfa_offset 32
 ; CHECK-NEXT:.cfi_offset lr, 1

[llvm-branch-commits] [llvm] PowerPC: Fix using long double libm functions for f128 intrinsics (PR #144382)

2025-06-16 Thread Nikita Popov via llvm-branch-commits


@@ -321,6 +321,24 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT) {
 setLibcallName(RTLIB::OGT_F128, "__gtkf2");
 setLibcallName(RTLIB::UO_F128, "__unordkf2");
 
+setLibcallName(RTLIB::ACOS_F128, "acosf128");
+setLibcallName(RTLIB::ASIN_F128, "asinf128");
+setLibcallName(RTLIB::ATAN2_F128, "atan2f128");
+setLibcallName(RTLIB::ATAN_F128, "atanf128");
+setLibcallName(RTLIB::CBRT_F128, "cbrtf128");
+setLibcallName(RTLIB::COPYSIGN_F128, "copysignf128");
+setLibcallName(RTLIB::COSH_F128, "coshf128");
+setLibcallName(RTLIB::EXP10_F128, "exp10f128");
+setLibcallName(RTLIB::FMAXIMUM_F128, "fmaximumf128");
+setLibcallName(RTLIB::FMAXIMUM_NUM_F128, "fmaximum_numf128");
+setLibcallName(RTLIB::FMINIMUM_F128, "fminimumf128");
+setLibcallName(RTLIB::FMINIMUM_NUM_F128, "fminimum_numf128");
+setLibcallName(RTLIB::LDEXP_F128, "ldexpf128");
+setLibcallName(RTLIB::MODF_F128, "modff128");
+setLibcallName(RTLIB::ROUNDEVEN_F128, "roundevenf128");
+setLibcallName(RTLIB::SINH_F128, "sinhf128");
+setLibcallName(RTLIB::TANH_F128, "tanhf128");
+setLibcallName(RTLIB::TAN_F128, "tanf128");

nikic wrote:

We should share this list with the x86 code above.

(Side note: This is probably only correct for gnu environments, similar to x86.)

https://github.com/llvm/llvm-project/pull/144382
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] PowerPC: Fix using long double libm functions for f128 intrinsics (PR #144382)

2025-06-16 Thread Matt Arsenault via llvm-branch-commits


@@ -321,6 +321,24 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT) {
 setLibcallName(RTLIB::OGT_F128, "__gtkf2");
 setLibcallName(RTLIB::UO_F128, "__unordkf2");
 
+setLibcallName(RTLIB::ACOS_F128, "acosf128");
+setLibcallName(RTLIB::ASIN_F128, "asinf128");
+setLibcallName(RTLIB::ATAN2_F128, "atan2f128");
+setLibcallName(RTLIB::ATAN_F128, "atanf128");
+setLibcallName(RTLIB::CBRT_F128, "cbrtf128");
+setLibcallName(RTLIB::COPYSIGN_F128, "copysignf128");
+setLibcallName(RTLIB::COSH_F128, "coshf128");
+setLibcallName(RTLIB::EXP10_F128, "exp10f128");
+setLibcallName(RTLIB::FMAXIMUM_F128, "fmaximumf128");
+setLibcallName(RTLIB::FMAXIMUM_NUM_F128, "fmaximum_numf128");
+setLibcallName(RTLIB::FMINIMUM_F128, "fminimumf128");
+setLibcallName(RTLIB::FMINIMUM_NUM_F128, "fminimum_numf128");
+setLibcallName(RTLIB::LDEXP_F128, "ldexpf128");
+setLibcallName(RTLIB::MODF_F128, "modff128");
+setLibcallName(RTLIB::ROUNDEVEN_F128, "roundevenf128");
+setLibcallName(RTLIB::SINH_F128, "sinhf128");
+setLibcallName(RTLIB::TANH_F128, "tanhf128");
+setLibcallName(RTLIB::TAN_F128, "tanf128");

arsenm wrote:

This is throwaway code that extracts the functional change out of a future 
patch that replaces this with shared tablegen 

https://github.com/llvm/llvm-project/pull/144382
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] c901c3e - Revert "[InstCombine] Iterative replacement in PtrReplacer (#137215)"

2025-06-16 Thread via llvm-branch-commits

Author: Anshil Gandhi
Date: 2025-06-16T13:04:15-04:00
New Revision: c901c3e286832d3e2dee008a5e35ea7b80b6e9eb

URL: 
https://github.com/llvm/llvm-project/commit/c901c3e286832d3e2dee008a5e35ea7b80b6e9eb
DIFF: 
https://github.com/llvm/llvm-project/commit/c901c3e286832d3e2dee008a5e35ea7b80b6e9eb.diff

LOG: Revert "[InstCombine] Iterative replacement in PtrReplacer (#137215)"

This reverts commit 8bbef3d1c9115b3c64365e9b8e4ee84275a4d001.

Added: 


Modified: 
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp

Removed: 
llvm/test/Transforms/InstCombine/AMDGPU/ptr-replace-alloca.ll



diff  --git a/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp 
b/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
index 9aec90120d8b0..a9751ab03e20e 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
@@ -243,10 +243,11 @@ class PointerReplacer {
   void replacePointer(Value *V);
 
 private:
+  bool collectUsersRecursive(Instruction &I);
   void replace(Instruction *I);
-  Value *getReplacement(Value *V) const { return WorkMap.lookup(V); }
+  Value *getReplacement(Value *I);
   bool isAvailable(Instruction *I) const {
-return I == &Root || UsersToReplace.contains(I);
+return I == &Root || Worklist.contains(I);
   }
 
   bool isEqualOrValidAddrSpaceCast(const Instruction *I,
@@ -258,7 +259,8 @@ class PointerReplacer {
 return (FromAS == ToAS) || IC.isValidAddrSpaceCast(FromAS, ToAS);
   }
 
-  SmallSetVector UsersToReplace;
+  SmallPtrSet ValuesToRevisit;
+  SmallSetVector Worklist;
   MapVector WorkMap;
   InstCombinerImpl &IC;
   Instruction &Root;
@@ -267,79 +269,72 @@ class PointerReplacer {
 } // end anonymous namespace
 
 bool PointerReplacer::collectUsers() {
-  SmallVector Worklist;
-  SmallSetVector ValuesToRevisit;
-
-  auto PushUsersToWorklist = [&](Instruction *Inst) {
-for (auto *U : Inst->users())
-  if (auto *I = dyn_cast(U))
-if (!isAvailable(I) && !ValuesToRevisit.contains(I))
-  Worklist.emplace_back(I);
-  };
+  if (!collectUsersRecursive(Root))
+return false;
 
-  PushUsersToWorklist(&Root);
-  while (!Worklist.empty()) {
-Instruction *Inst = Worklist.pop_back_val();
+  // Ensure that all outstanding (indirect) users of I
+  // are inserted into the Worklist. Return false
+  // otherwise.
+  return llvm::set_is_subset(ValuesToRevisit, Worklist);
+}
+
+bool PointerReplacer::collectUsersRecursive(Instruction &I) {
+  for (auto *U : I.users()) {
+auto *Inst = cast(&*U);
 if (auto *Load = dyn_cast(Inst)) {
   if (Load->isVolatile())
 return false;
-  UsersToReplace.insert(Load);
+  Worklist.insert(Load);
 } else if (auto *PHI = dyn_cast(Inst)) {
-  /// TODO: Handle poison and null pointers for PHI and select.
-  // If all incoming values are available, mark this PHI as
-  // replacable and push it's users into the worklist.
-  bool IsReplacable = true;
-  if (all_of(PHI->incoming_values(), [&](Value *V) {
-if (!isa(V))
-  return IsReplacable = false;
-return isAvailable(cast(V));
+  // All incoming values must be instructions for replacability
+  if (any_of(PHI->incoming_values(),
+ [](Value *V) { return !isa(V); }))
+return false;
+
+  // If at least one incoming value of the PHI is not in Worklist,
+  // store the PHI for revisiting and skip this iteration of the
+  // loop.
+  if (any_of(PHI->incoming_values(), [this](Value *V) {
+return !isAvailable(cast(V));
   })) {
-UsersToReplace.insert(PHI);
-PushUsersToWorklist(PHI);
+ValuesToRevisit.insert(Inst);
 continue;
   }
 
-  // Either an incoming value is not an instruction or not all
-  // incoming values are available. If this PHI was already
-  // visited prior to this iteration, return false.
-  if (!IsReplacable || !ValuesToRevisit.insert(PHI))
+  Worklist.insert(PHI);
+  if (!collectUsersRecursive(*PHI))
 return false;
-
-  // Push PHI back into the stack, followed by unavailable
-  // incoming values.
-  Worklist.emplace_back(PHI);
-  for (unsigned Idx = 0; Idx < PHI->getNumIncomingValues(); ++Idx) {
-auto *IncomingValue = cast(PHI->getIncomingValue(Idx));
-if (UsersToReplace.contains(IncomingValue))
-  continue;
-if (!ValuesToRevisit.insert(IncomingValue))
-  return false;
-Worklist.emplace_back(IncomingValue);
-  }
 } else if (auto *SI = dyn_cast(Inst)) {
-  auto *TrueInst = dyn_cast(SI->getTrueValue());
-  auto *FalseInst = dyn_cast(SI->getFalseValue());
-  if (!TrueInst || !FalseInst)
+  if (!isa(SI->getTrueValue()) ||
+  !isa(SI->getFalseValue()))
 return false;
 

[llvm-branch-commits] [BOLT] Explicitly check for returns when extending call continuation profile (PR #143295)

2025-06-16 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov updated 
https://github.com/llvm/llvm-project/pull/143295


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Explicitly check for returns when extending call continuation profile (PR #143295)

2025-06-16 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov updated 
https://github.com/llvm/llvm-project/pull/143295


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang][OpenMP][NFC] Refactor to avoid global variable (PR #144087)

2025-06-16 Thread Kareem Ergawy via llvm-branch-commits

ergawy wrote:

I have no idea how I closed this! Sorry @tblah, can you please reopen it?

https://github.com/llvm/llvm-project/pull/144087
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [mlir] [MLIR][OpenMP] Simplify OpenMP device codegen (PR #137201)

2025-06-16 Thread Kareem Ergawy via llvm-branch-commits

https://github.com/ergawy edited 
https://github.com/llvm/llvm-project/pull/137201
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [mlir] [MLIR][OpenMP] Simplify OpenMP device codegen (PR #137201)

2025-06-16 Thread Kareem Ergawy via llvm-branch-commits


@@ -5488,40 +5483,172 @@ convertDeclareTargetAttr(Operation *op, 
mlir::omp::DeclareTargetAttr attribute,
   return success();
 }
 
-// Returns true if the operation is inside a TargetOp or
-// is part of a declare target function.
-static bool isTargetDeviceOp(Operation *op) {
+namespace {
+
+/// Implementation of the dialect interface that converts operations belonging
+/// to the OpenMP dialect to LLVM IR.
+class OpenMPDialectLLVMIRTranslationInterface
+: public LLVMTranslationDialectInterface {
+public:
+  using LLVMTranslationDialectInterface::LLVMTranslationDialectInterface;
+
+  /// Translates the given operation to LLVM IR using the provided IR builder
+  /// and saving the state in `moduleTranslation`.
+  LogicalResult
+  convertOperation(Operation *op, llvm::IRBuilderBase &builder,
+   LLVM::ModuleTranslation &moduleTranslation) const final;
+
+  /// Given an OpenMP MLIR attribute, create the corresponding LLVM-IR,
+  /// runtime calls, or operation amendments
+  LogicalResult
+  amendOperation(Operation *op, ArrayRef instructions,
+ NamedAttribute attribute,
+ LLVM::ModuleTranslation &moduleTranslation) const final;
+};
+
+} // namespace
+
+LogicalResult OpenMPDialectLLVMIRTranslationInterface::amendOperation(
+Operation *op, ArrayRef instructions,
+NamedAttribute attribute,
+LLVM::ModuleTranslation &moduleTranslation) const {
+  return llvm::StringSwitch>(
+ attribute.getName())
+  .Case("omp.is_target_device",
+[&](Attribute attr) {
+  if (auto deviceAttr = dyn_cast(attr)) {
+llvm::OpenMPIRBuilderConfig &config =
+moduleTranslation.getOpenMPBuilder()->Config;
+config.setIsTargetDevice(deviceAttr.getValue());
+return success();
+  }
+  return failure();
+})
+  .Case("omp.is_gpu",
+[&](Attribute attr) {
+  if (auto gpuAttr = dyn_cast(attr)) {
+llvm::OpenMPIRBuilderConfig &config =
+moduleTranslation.getOpenMPBuilder()->Config;
+config.setIsGPU(gpuAttr.getValue());
+return success();
+  }
+  return failure();
+})
+  .Case("omp.host_ir_filepath",
+[&](Attribute attr) {
+  if (auto filepathAttr = dyn_cast(attr)) {
+llvm::OpenMPIRBuilder *ompBuilder =
+moduleTranslation.getOpenMPBuilder();
+ompBuilder->loadOffloadInfoMetadata(filepathAttr.getValue());
+return success();
+  }
+  return failure();
+})
+  .Case("omp.flags",
+[&](Attribute attr) {
+  if (auto rtlAttr = dyn_cast(attr))
+return convertFlagsAttr(op, rtlAttr, moduleTranslation);
+  return failure();
+})
+  .Case("omp.version",
+[&](Attribute attr) {
+  if (auto versionAttr = dyn_cast(attr)) {
+llvm::OpenMPIRBuilder *ompBuilder =
+moduleTranslation.getOpenMPBuilder();
+ompBuilder->M.addModuleFlag(llvm::Module::Max, "openmp",
+versionAttr.getVersion());
+return success();
+  }
+  return failure();
+})
+  .Case("omp.declare_target",
+[&](Attribute attr) {
+  if (auto declareTargetAttr =
+  dyn_cast(attr))
+return convertDeclareTargetAttr(op, declareTargetAttr,
+moduleTranslation);
+  return failure();
+})
+  .Case("omp.requires",
+[&](Attribute attr) {
+  if (auto requiresAttr = dyn_cast(attr)) 
{
+using Requires = omp::ClauseRequires;
+Requires flags = requiresAttr.getValue();
+llvm::OpenMPIRBuilderConfig &config =
+moduleTranslation.getOpenMPBuilder()->Config;
+config.setHasRequiresReverseOffload(
+bitEnumContainsAll(flags, Requires::reverse_offload));
+config.setHasRequiresUnifiedAddress(
+bitEnumContainsAll(flags, Requires::unified_address));
+config.setHasRequiresUnifiedSharedMemory(
+bitEnumContainsAll(flags, 
Requires::unified_shared_memory));
+config.setHasRequiresDynamicAllocators(
+bitEnumContainsAll(flags, Requires::dynamic_allocators));
+return success();
+  }
+  return failure();
+})
+  .Case("omp.target_triples",
+[&](Attribute attr) {
+  if (auto triplesAttr = dyn_cast(attr)) {
+llvm::OpenMPIRBuilderConfig &config =
+moduleTra

[llvm-branch-commits] [flang] [mlir] [MLIR][OpenMP] Simplify OpenMP device codegen (PR #137201)

2025-06-16 Thread Kareem Ergawy via llvm-branch-commits


@@ -3173,19 +3173,14 @@ convertOmpThreadprivate(Operation &opInst, 
llvm::IRBuilderBase &builder,
   LLVM::GlobalOp global =
   addressOfOp.getGlobal(moduleTranslation.symbolTable());
   llvm::GlobalValue *globalValue = moduleTranslation.lookupGlobal(global);
-
-  if (!ompBuilder->Config.isTargetDevice()) {
-llvm::Type *type = globalValue->getValueType();
-llvm::TypeSize typeSize =
-
builder.GetInsertBlock()->getModule()->getDataLayout().getTypeStoreSize(
-type);
-llvm::ConstantInt *size = builder.getInt64(typeSize.getFixedValue());
-llvm::Value *callInst = ompBuilder->createCachedThreadPrivate(
-ompLoc, globalValue, size, global.getSymName() + ".cache");
-moduleTranslation.mapValue(opInst.getResult(0), callInst);
-  } else {
-moduleTranslation.mapValue(opInst.getResult(0), globalValue);

ergawy wrote:

Just to verify my understanding, we do not need the `else` here since for 
`target` regions, we pass `threadprivate` values as kernel arguments, right?

https://github.com/llvm/llvm-project/pull/137201
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)

2025-06-16 Thread Momchil Velikov via llvm-branch-commits

https://github.com/momchil-velikov edited 
https://github.com/llvm/llvm-project/pull/142422
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/20.x: [InstCombine] Avoid folding select(umin(X, Y), X) with min/max values in false arm (#143020) (PR #144322)

2025-06-16 Thread Yingwei Zheng via llvm-branch-commits

https://github.com/dtcxzyw approved this pull request.


https://github.com/llvm/llvm-project/pull/144322
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)

2025-06-16 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/137224

>From ec07ae02a991c5730fa0529e2479eadae4886211 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 22 Apr 2025 21:43:14 +0300
Subject: [PATCH] [BOLT] Gadget scanner: detect untrusted LR before tail call

Implement the detection of tail calls performed with untrusted link
register, which violates the assumption made on entry to every function.

Unlike other pauth gadgets, this one involves some amount of guessing
which branch instructions should be checked as tail calls.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp|  80 +++
 .../AArch64/gs-pauth-tail-calls.s | 597 ++
 2 files changed, 677 insertions(+)
 create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-tail-calls.s

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 05309a47aba40..b5b46390d4586 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -1319,6 +1319,83 @@ shouldReportReturnGadget(const BinaryContext &BC, const 
MCInstReference &Inst,
   return make_gadget_report(RetKind, Inst, *RetReg);
 }
 
+/// While BOLT already marks some of the branch instructions as tail calls,
+/// this function tries to improve the coverage by including less obvious cases
+/// when it is possible to do without introducing too many false positives.
+static bool shouldAnalyzeTailCallInst(const BinaryContext &BC,
+  const BinaryFunction &BF,
+  const MCInstReference &Inst) {
+  // Some BC.MIB->isXYZ(Inst) methods simply delegate to MCInstrDesc::isXYZ()
+  // (such as isBranch at the time of writing this comment), some don't (such
+  // as isCall). For that reason, call MCInstrDesc's methods explicitly when
+  // it is important.
+  const MCInstrDesc &Desc =
+  BC.MII->get(static_cast(Inst).getOpcode());
+  // Tail call should be a branch (but not necessarily an indirect one).
+  if (!Desc.isBranch())
+return false;
+
+  // Always analyze the branches already marked as tail calls by BOLT.
+  if (BC.MIB->isTailCall(Inst))
+return true;
+
+  // Try to also check the branches marked as "UNKNOWN CONTROL FLOW" - the
+  // below is a simplified condition from BinaryContext::printInstruction.
+  bool IsUnknownControlFlow =
+  BC.MIB->isIndirectBranch(Inst) && !BC.MIB->getJumpTable(Inst);
+
+  if (BF.hasCFG() && IsUnknownControlFlow)
+return true;
+
+  return false;
+}
+
+static std::optional>
+shouldReportUnsafeTailCall(const BinaryContext &BC, const BinaryFunction &BF,
+   const MCInstReference &Inst, const SrcState &S) {
+  static const GadgetKind UntrustedLRKind(
+  "untrusted link register found before tail call");
+
+  if (!shouldAnalyzeTailCallInst(BC, BF, Inst))
+return std::nullopt;
+
+  // Not only the set of registers returned by getTrustedLiveInRegs() can be
+  // seen as a reasonable target-independent _approximation_ of "the LR", these
+  // are *exactly* those registers used by SrcSafetyAnalysis to initialize the
+  // set of trusted registers on function entry.
+  // Thus, this function basically checks that the precondition expected to be
+  // imposed by a function call instruction (which is hardcoded into the 
target-
+  // specific getTrustedLiveInRegs() function) is also respected on tail calls.
+  SmallVector RegsToCheck = BC.MIB->getTrustedLiveInRegs();
+  LLVM_DEBUG({
+traceInst(BC, "Found tail call inst", Inst);
+traceRegMask(BC, "Trusted regs", S.TrustedRegs);
+  });
+
+  // In musl on AArch64, the _start function sets LR to zero and calls the next
+  // stage initialization function at the end, something along these lines:
+  //
+  //   _start:
+  // mov x30, #0
+  // ; ... other initialization ...
+  // b   _start_c ; performs "exit" system call at some point
+  //
+  // As this would produce a false positive for every executable linked with
+  // such libc, ignore tail calls performed by ELF entry function.
+  if (BC.StartFunctionAddress &&
+  *BC.StartFunctionAddress == Inst.getFunction()->getAddress()) {
+LLVM_DEBUG({ dbgs() << "  Skipping tail call in ELF entry function.\n"; });
+return std::nullopt;
+  }
+
+  // Returns at most one report per instruction - this is probably OK...
+  for (auto Reg : RegsToCheck)
+if (!S.TrustedRegs[Reg])
+  return make_gadget_report(UntrustedLRKind, Inst, Reg);
+
+  return std::nullopt;
+}
+
 static std::optional>
 shouldReportCallGadget(const BinaryContext &BC, const MCInstReference &Inst,
const SrcState &S) {
@@ -1473,6 +1550,9 @@ void FunctionAnalysisContext::findUnsafeUses(
 if (PacRetGadgetsOnly)
   return;
 
+if (auto Report = shouldReportUnsafeTailCall(BC, BF, Inst, S))
+  Reports.push_back(*Report);
+
 if (auto Report = shouldReportCallGadget(BC, Inst, S))

[llvm-branch-commits] [clang] [CIR] Upstream __real__ for ComplexType (PR #144261)

2025-06-16 Thread Andy Kaylor via llvm-branch-commits

andykaylor wrote:

My comments on https://github.com/llvm/llvm-project/pull/144235 apply more 
directly here, of course.

https://github.com/llvm/llvm-project/pull/144261
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [RISCV] Integrate RISCV target in baremetal toolchain object and deprecate RISCVToolchain object (PR #121831)

2025-06-16 Thread Garvit Gupta via llvm-branch-commits

https://github.com/quic-garvgupt updated 
https://github.com/llvm/llvm-project/pull/121831

>From c280afd49551479a2f9d861263cffa64a96df1a8 Mon Sep 17 00:00:00 2001
From: Garvit Gupta 
Date: Mon, 6 Jan 2025 10:05:08 -0800
Subject: [PATCH] [RISCV] Integrate RISCV target in baremetal toolchain object
 and deprecate RISCVToolchain object

This patch:
- Adds CXXStdlib, runtimelib and unwindlib defaults for riscv target to
  BareMetal toolchain object.
- Add riscv 32 and 64-bit emulation flags to linker job of BareMetal
  toolchain.
- Removes call to RISCVToolChain object from llvm.

This PR is last patch in the series of patches of merging RISCVToolchain
object into BareMetal toolchain object.

RFC:
https: 
//discourse.llvm.org/t/merging-riscvtoolchain-and-baremetal-toolchains/75524
Change-Id: Ic5d64a4ed3ebc58c30c12d9827e7e57a02eb13ca
---
 clang/lib/Driver/CMakeLists.txt   |   1 -
 clang/lib/Driver/Driver.cpp   |  10 +-
 clang/lib/Driver/ToolChains/BareMetal.cpp |  20 ++
 clang/lib/Driver/ToolChains/BareMetal.h   |  10 +-
 .../lib/Driver/ToolChains/RISCVToolchain.cpp  | 232 --
 clang/lib/Driver/ToolChains/RISCVToolchain.h  |  67 -
 .../test/Driver/baremetal-undefined-symbols.c |  14 +-
 clang/test/Driver/riscv32-toolchain-extra.c   |   7 +-
 clang/test/Driver/riscv32-toolchain.c |  26 +-
 clang/test/Driver/riscv64-toolchain-extra.c   |   7 +-
 clang/test/Driver/riscv64-toolchain.c |  20 +-
 11 files changed, 60 insertions(+), 354 deletions(-)
 delete mode 100644 clang/lib/Driver/ToolChains/RISCVToolchain.cpp
 delete mode 100644 clang/lib/Driver/ToolChains/RISCVToolchain.h

diff --git a/clang/lib/Driver/CMakeLists.txt b/clang/lib/Driver/CMakeLists.txt
index 5bdb6614389cf..eee29af5d181a 100644
--- a/clang/lib/Driver/CMakeLists.txt
+++ b/clang/lib/Driver/CMakeLists.txt
@@ -74,7 +74,6 @@ add_clang_library(clangDriver
   ToolChains/OHOS.cpp
   ToolChains/OpenBSD.cpp
   ToolChains/PS4CPU.cpp
-  ToolChains/RISCVToolchain.cpp
   ToolChains/Solaris.cpp
   ToolChains/SPIRV.cpp
   ToolChains/SPIRVOpenMP.cpp
diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index 07e36ea2efba4..cfc0ba63d5749 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -41,7 +41,6 @@
 #include "ToolChains/PPCFreeBSD.h"
 #include "ToolChains/PPCLinux.h"
 #include "ToolChains/PS4CPU.h"
-#include "ToolChains/RISCVToolchain.h"
 #include "ToolChains/SPIRV.h"
 #include "ToolChains/SPIRVOpenMP.h"
 #include "ToolChains/SYCL.h"
@@ -6889,16 +6888,11 @@ const ToolChain &Driver::getToolChain(const ArgList 
&Args,
 TC = std::make_unique(*this, Target, Args);
 break;
   case llvm::Triple::msp430:
-TC =
-std::make_unique(*this, Target, Args);
+TC = std::make_unique(*this, Target, 
Args);
 break;
   case llvm::Triple::riscv32:
   case llvm::Triple::riscv64:
-if (toolchains::RISCVToolChain::hasGCCToolchain(*this, Args))
-  TC =
-  std::make_unique(*this, Target, 
Args);
-else
-  TC = std::make_unique(*this, Target, Args);
+TC = std::make_unique(*this, Target, Args);
 break;
   case llvm::Triple::ve:
 TC = std::make_unique(*this, Target, Args);
diff --git a/clang/lib/Driver/ToolChains/BareMetal.cpp 
b/clang/lib/Driver/ToolChains/BareMetal.cpp
index 396197cad6429..5c0c7da1e4ab2 100644
--- a/clang/lib/Driver/ToolChains/BareMetal.cpp
+++ b/clang/lib/Driver/ToolChains/BareMetal.cpp
@@ -377,6 +377,26 @@ BareMetal::OrderedMultilibs 
BareMetal::getOrderedMultilibs() const {
   return llvm::reverse(Default);
 }
 
+ToolChain::CXXStdlibType BareMetal::GetDefaultCXXStdlibType() const {
+  if (getTriple().isRISCV() && IsGCCInstallationValid)
+return ToolChain::CST_Libstdcxx;
+  return ToolChain::CST_Libcxx;
+}
+
+ToolChain::RuntimeLibType BareMetal::GetDefaultRuntimeLibType() const {
+  if (getTriple().isRISCV() && IsGCCInstallationValid)
+return ToolChain::RLT_Libgcc;
+  return ToolChain::RLT_CompilerRT;
+}
+
+ToolChain::UnwindLibType
+BareMetal::GetUnwindLibType(const llvm::opt::ArgList &Args) const {
+  if (getTriple().isRISCV())
+return ToolChain::UNW_None;
+
+  return ToolChain::GetUnwindLibType(Args);
+}
+
 void BareMetal::AddClangSystemIncludeArgs(const ArgList &DriverArgs,
   ArgStringList &CC1Args) const {
   if (DriverArgs.hasArg(options::OPT_nostdinc))
diff --git a/clang/lib/Driver/ToolChains/BareMetal.h 
b/clang/lib/Driver/ToolChains/BareMetal.h
index 54805530bae82..cc57fa21867a2 100644
--- a/clang/lib/Driver/ToolChains/BareMetal.h
+++ b/clang/lib/Driver/ToolChains/BareMetal.h
@@ -56,13 +56,11 @@ class LLVM_LIBRARY_VISIBILITY BareMetal : public 
Generic_ELF {
 return UnwindTableLevel::None;
   }
 
-  RuntimeLibType GetDefaultRuntimeLibType() const override {
-return ToolChain::RLT_CompilerRT;
-  }
+  CXXStdlibType GetDefaultCXXStdlibType() const override;
 

[llvm-branch-commits] [clang] [RISCV][Driver] Add support for `-m` flag to linker job of Baremetal toolchain (PR #134442)

2025-06-16 Thread Garvit Gupta via llvm-branch-commits

https://github.com/quic-garvgupt edited 
https://github.com/llvm/llvm-project/pull/134442
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [RISCV][Driver] Add riscv emulation mode to linker job of BareMetal toolchain (PR #134442)

2025-06-16 Thread Garvit Gupta via llvm-branch-commits

https://github.com/quic-garvgupt updated 
https://github.com/llvm/llvm-project/pull/134442

>From 3bc40adcf27c61a0c7bb00c43b6de14d9cf823b3 Mon Sep 17 00:00:00 2001
From: Garvit Gupta 
Date: Fri, 4 Apr 2025 12:51:19 -0700
Subject: [PATCH] [RISCV][Driver] Add support for `-m` flag to linker job of
 Baremetal toolchain.

Change-Id: Ifce8a3a7f1df9c12561d35ca3c923595e3619428
---
 clang/lib/Driver/ToolChains/BareMetal.cpp  | 15 -
 clang/lib/Driver/ToolChains/CommonArgs.cpp | 70 ++
 clang/lib/Driver/ToolChains/CommonArgs.h   |  2 +
 clang/lib/Driver/ToolChains/Gnu.cpp| 70 --
 clang/test/Driver/aarch64-toolchain.c  | 14 ++---
 clang/test/Driver/arm-toolchain.c  | 14 ++---
 clang/test/Driver/baremetal.cpp| 51 
 7 files changed, 125 insertions(+), 111 deletions(-)

diff --git a/clang/lib/Driver/ToolChains/BareMetal.cpp 
b/clang/lib/Driver/ToolChains/BareMetal.cpp
index 65f3d0a8eefb1..396197cad6429 100644
--- a/clang/lib/Driver/ToolChains/BareMetal.cpp
+++ b/clang/lib/Driver/ToolChains/BareMetal.cpp
@@ -575,8 +575,19 @@ void baremetal::Linker::ConstructJob(Compilation &C, const 
JobAction &JA,
 
   CmdArgs.push_back("-Bstatic");
 
-  if (TC.getTriple().isRISCV() && Args.hasArg(options::OPT_mno_relax))
-CmdArgs.push_back("--no-relax");
+  if (const char *LDMOption = getLDMOption(TC.getTriple(), Args)) {
+CmdArgs.push_back("-m");
+CmdArgs.push_back(LDMOption);
+  } else {
+D.Diag(diag::err_target_unknown_triple) << Triple.str();
+return;
+  }
+
+  if (Triple.isRISCV()) {
+CmdArgs.push_back("-X");
+if (Args.hasArg(options::OPT_mno_relax))
+  CmdArgs.push_back("--no-relax");
+  }
 
   if (Triple.isARM() || Triple.isThumb()) {
 bool IsBigEndian = arm::isARMBigEndian(Triple, Args);
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp 
b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index ddeadff8f6dfb..292d52acdc002 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -535,6 +535,76 @@ void tools::AddLinkerInputs(const ToolChain &TC, const 
InputInfoList &Inputs,
   }
 }
 
+const char *tools::getLDMOption(const llvm::Triple &T, const ArgList &Args) {
+  switch (T.getArch()) {
+  case llvm::Triple::x86:
+if (T.isOSIAMCU())
+  return "elf_iamcu";
+return "elf_i386";
+  case llvm::Triple::aarch64:
+return "aarch64linux";
+  case llvm::Triple::aarch64_be:
+return "aarch64linuxb";
+  case llvm::Triple::arm:
+  case llvm::Triple::thumb:
+  case llvm::Triple::armeb:
+  case llvm::Triple::thumbeb:
+return tools::arm::isARMBigEndian(T, Args) ? "armelfb_linux_eabi"
+   : "armelf_linux_eabi";
+  case llvm::Triple::m68k:
+return "m68kelf";
+  case llvm::Triple::ppc:
+if (T.isOSLinux())
+  return "elf32ppclinux";
+return "elf32ppc";
+  case llvm::Triple::ppcle:
+if (T.isOSLinux())
+  return "elf32lppclinux";
+return "elf32lppc";
+  case llvm::Triple::ppc64:
+return "elf64ppc";
+  case llvm::Triple::ppc64le:
+return "elf64lppc";
+  case llvm::Triple::riscv32:
+return "elf32lriscv";
+  case llvm::Triple::riscv64:
+return "elf64lriscv";
+  case llvm::Triple::sparc:
+  case llvm::Triple::sparcel:
+return "elf32_sparc";
+  case llvm::Triple::sparcv9:
+return "elf64_sparc";
+  case llvm::Triple::loongarch32:
+return "elf32loongarch";
+  case llvm::Triple::loongarch64:
+return "elf64loongarch";
+  case llvm::Triple::mips:
+return "elf32btsmip";
+  case llvm::Triple::mipsel:
+return "elf32ltsmip";
+  case llvm::Triple::mips64:
+if (tools::mips::hasMipsAbiArg(Args, "n32") || T.isABIN32())
+  return "elf32btsmipn32";
+return "elf64btsmip";
+  case llvm::Triple::mips64el:
+if (tools::mips::hasMipsAbiArg(Args, "n32") || T.isABIN32())
+  return "elf32ltsmipn32";
+return "elf64ltsmip";
+  case llvm::Triple::systemz:
+return "elf64_s390";
+  case llvm::Triple::x86_64:
+if (T.isX32())
+  return "elf32_x86_64";
+return "elf_x86_64";
+  case llvm::Triple::ve:
+return "elf64ve";
+  case llvm::Triple::csky:
+return "cskyelf_linux";
+  default:
+return nullptr;
+  }
+}
+
 void tools::addLinkerCompressDebugSectionsOption(
 const ToolChain &TC, const llvm::opt::ArgList &Args,
 llvm::opt::ArgStringList &CmdArgs) {
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.h 
b/clang/lib/Driver/ToolChains/CommonArgs.h
index 96bc0619dcbc0..875354e969a2a 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.h
+++ b/clang/lib/Driver/ToolChains/CommonArgs.h
@@ -31,6 +31,8 @@ void AddLinkerInputs(const ToolChain &TC, const InputInfoList 
&Inputs,
  const llvm::opt::ArgList &Args,
  llvm::opt::ArgStringList &CmdArgs, const JobAction &JA);
 
+const char *getLDMOption(const llvm::Triple &T, const llvm::opt::ArgList 
&Args);
+
 void addLinkerCompressDebugSectionsOpt

[llvm-branch-commits] [llvm] [RISCV] Support non-power-of-2 types when expanding memcmp (PR #114971)

2025-06-16 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/114971
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AArch64: Fix outline atomic libcall names for arm64ec (PR #144378)

2025-06-16 Thread Daniel Paoliello via llvm-branch-commits

https://github.com/dpaoliello approved this pull request.

Can you please add a description?

https://github.com/llvm/llvm-project/pull/144378
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Explicitly check for returns when extending call continuation profile (PR #143295)

2025-06-16 Thread Rafael Auler via llvm-branch-commits

https://github.com/rafaelauler approved this pull request.

lgtm

https://github.com/llvm/llvm-project/pull/143295
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] RuntimeLibcalls: Cleanup sincos predicate functions (PR #143081)

2025-06-16 Thread Daniel Paoliello via llvm-branch-commits

https://github.com/dpaoliello approved this pull request.


https://github.com/llvm/llvm-project/pull/143081
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [GOFF] Add writing of text records (PR #137235)

2025-06-16 Thread Kai Nacke via llvm-branch-commits

https://github.com/redstar updated 
https://github.com/llvm/llvm-project/pull/137235

>From c9b5b19f6f6cb74f0aaf5eac950158342d3a3ada Mon Sep 17 00:00:00 2001
From: Kai Nacke 
Date: Wed, 9 Apr 2025 15:08:52 -0400
Subject: [PATCH 1/6] [GOFF] Add writing of text records

Sections which are not allowed to carry data are marked as virtual.
Only complication when writing out the text is that it must be
written in chunks of 32k-1 bytes, which is done by having a wrapper
stream writing those records.
Data of BSS sections is not written, since the contents is known to
be zero. Instead, the fill byte value is used.
---
 llvm/include/llvm/MC/MCContext.h  |  6 +-
 llvm/include/llvm/MC/MCSectionGOFF.h  | 47 +++
 .../CodeGen/TargetLoweringObjectFileImpl.cpp  |  5 +-
 llvm/lib/MC/GOFFObjectWriter.cpp  | 78 +++
 llvm/lib/MC/MCContext.cpp | 19 +++--
 llvm/lib/MC/MCObjectFileInfo.cpp  |  9 +--
 llvm/test/CodeGen/SystemZ/zos-section-1.ll| 27 +--
 llvm/test/CodeGen/SystemZ/zos-section-2.ll| 30 +--
 8 files changed, 180 insertions(+), 41 deletions(-)

diff --git a/llvm/include/llvm/MC/MCContext.h b/llvm/include/llvm/MC/MCContext.h
index 8eb904966f4de..2cdba64be116d 100644
--- a/llvm/include/llvm/MC/MCContext.h
+++ b/llvm/include/llvm/MC/MCContext.h
@@ -366,7 +366,8 @@ class MCContext {
 
   template 
   MCSectionGOFF *getGOFFSection(SectionKind Kind, StringRef Name,
-TAttr SDAttributes, MCSection *Parent);
+TAttr SDAttributes, MCSection *Parent,
+bool IsVirtual);
 
   /// Map of currently defined macros.
   StringMap MacroMap;
@@ -607,7 +608,8 @@ class MCContext {
   MCSectionGOFF *getGOFFSection(SectionKind Kind, StringRef Name,
 GOFF::SDAttr SDAttributes);
   MCSectionGOFF *getGOFFSection(SectionKind Kind, StringRef Name,
-GOFF::EDAttr EDAttributes, MCSection *Parent);
+GOFF::EDAttr EDAttributes, MCSection *Parent,
+bool IsVirtual);
   MCSectionGOFF *getGOFFSection(SectionKind Kind, StringRef Name,
 GOFF::PRAttr PRAttributes, MCSection *Parent);
 
diff --git a/llvm/include/llvm/MC/MCSectionGOFF.h 
b/llvm/include/llvm/MC/MCSectionGOFF.h
index b8b8cf112a34d..b2ca74c3ba78a 100644
--- a/llvm/include/llvm/MC/MCSectionGOFF.h
+++ b/llvm/include/llvm/MC/MCSectionGOFF.h
@@ -39,6 +39,9 @@ class MCSectionGOFF final : public MCSection {
   // The type of this section.
   GOFF::ESDSymbolType SymbolType;
 
+  // This section is a BSS section.
+  unsigned IsBSS : 1;
+
   // Indicates that the PR symbol needs to set the length of the section to a
   // non-zero value. This is only a problem with the ADA PR - the binder will
   // generate an error in this case.
@@ -50,26 +53,26 @@ class MCSectionGOFF final : public MCSection {
   friend class MCContext;
   friend class MCSymbolGOFF;
 
-  MCSectionGOFF(StringRef Name, SectionKind K, GOFF::SDAttr SDAttributes,
-MCSectionGOFF *Parent)
-  : MCSection(SV_GOFF, Name, K.isText(), /*IsVirtual=*/false, nullptr),
+  MCSectionGOFF(StringRef Name, SectionKind K, bool IsVirtual,
+GOFF::SDAttr SDAttributes, MCSectionGOFF *Parent)
+  : MCSection(SV_GOFF, Name, K.isText(), IsVirtual, nullptr),
 Parent(Parent), SDAttributes(SDAttributes),
-SymbolType(GOFF::ESD_ST_SectionDefinition), RequiresNonZeroLength(0),
-Emitted(0) {}
+SymbolType(GOFF::ESD_ST_SectionDefinition), IsBSS(K.isBSS()),
+RequiresNonZeroLength(0), Emitted(0) {}
 
-  MCSectionGOFF(StringRef Name, SectionKind K, GOFF::EDAttr EDAttributes,
-MCSectionGOFF *Parent)
-  : MCSection(SV_GOFF, Name, K.isText(), /*IsVirtual=*/false, nullptr),
+  MCSectionGOFF(StringRef Name, SectionKind K, bool IsVirtual,
+GOFF::EDAttr EDAttributes, MCSectionGOFF *Parent)
+  : MCSection(SV_GOFF, Name, K.isText(), IsVirtual, nullptr),
 Parent(Parent), EDAttributes(EDAttributes),
-SymbolType(GOFF::ESD_ST_ElementDefinition), RequiresNonZeroLength(0),
-Emitted(0) {}
+SymbolType(GOFF::ESD_ST_ElementDefinition), IsBSS(K.isBSS()),
+RequiresNonZeroLength(0), Emitted(0) {}
 
-  MCSectionGOFF(StringRef Name, SectionKind K, GOFF::PRAttr PRAttributes,
-MCSectionGOFF *Parent)
-  : MCSection(SV_GOFF, Name, K.isText(), /*IsVirtual=*/false, nullptr),
+  MCSectionGOFF(StringRef Name, SectionKind K, bool IsVirtual,
+GOFF::PRAttr PRAttributes, MCSectionGOFF *Parent)
+  : MCSection(SV_GOFF, Name, K.isText(), IsVirtual, nullptr),
 Parent(Parent), PRAttributes(PRAttributes),
-SymbolType(GOFF::ESD_ST_PartReference), RequiresNonZeroLength(0),
-Emitted(0) {}
+SymbolType(GOFF::ESD_ST_PartReference), IsBSS(K.isBSS()

[llvm-branch-commits] [clang] [RISCV] Integrate RISCV target in baremetal toolchain object and deprecate RISCVToolchain object (PR #121831)

2025-06-16 Thread Garvit Gupta via llvm-branch-commits

https://github.com/quic-garvgupt updated 
https://github.com/llvm/llvm-project/pull/121831

>From 5c9c5f2a05fe8104910ac9ef03349876dd445cd3 Mon Sep 17 00:00:00 2001
From: Garvit Gupta 
Date: Mon, 6 Jan 2025 10:05:08 -0800
Subject: [PATCH] [RISCV] Integrate RISCV target in baremetal toolchain object
 and deprecate RISCVToolchain object

This patch:
- Adds CXXStdlib, runtimelib and unwindlib defaults for riscv target to
  BareMetal toolchain object.
- Add riscv 32 and 64-bit emulation flags to linker job of BareMetal
  toolchain.
- Removes call to RISCVToolChain object from llvm.

This PR is last patch in the series of patches of merging RISCVToolchain
object into BareMetal toolchain object.

RFC:
https: 
//discourse.llvm.org/t/merging-riscvtoolchain-and-baremetal-toolchains/75524
Change-Id: Ic5d64a4ed3ebc58c30c12d9827e7e57a02eb13ca
---
 clang/lib/Driver/CMakeLists.txt   |   1 -
 clang/lib/Driver/Driver.cpp   |  10 +-
 clang/lib/Driver/ToolChains/BareMetal.cpp |  20 ++
 clang/lib/Driver/ToolChains/BareMetal.h   |  10 +-
 .../lib/Driver/ToolChains/RISCVToolchain.cpp  | 232 --
 clang/lib/Driver/ToolChains/RISCVToolchain.h  |  67 -
 .../test/Driver/baremetal-undefined-symbols.c |  14 +-
 clang/test/Driver/riscv32-toolchain-extra.c   |   7 +-
 clang/test/Driver/riscv32-toolchain.c |  26 +-
 clang/test/Driver/riscv64-toolchain-extra.c   |   7 +-
 clang/test/Driver/riscv64-toolchain.c |  20 +-
 11 files changed, 60 insertions(+), 354 deletions(-)
 delete mode 100644 clang/lib/Driver/ToolChains/RISCVToolchain.cpp
 delete mode 100644 clang/lib/Driver/ToolChains/RISCVToolchain.h

diff --git a/clang/lib/Driver/CMakeLists.txt b/clang/lib/Driver/CMakeLists.txt
index 5bdb6614389cf..eee29af5d181a 100644
--- a/clang/lib/Driver/CMakeLists.txt
+++ b/clang/lib/Driver/CMakeLists.txt
@@ -74,7 +74,6 @@ add_clang_library(clangDriver
   ToolChains/OHOS.cpp
   ToolChains/OpenBSD.cpp
   ToolChains/PS4CPU.cpp
-  ToolChains/RISCVToolchain.cpp
   ToolChains/Solaris.cpp
   ToolChains/SPIRV.cpp
   ToolChains/SPIRVOpenMP.cpp
diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index 07e36ea2efba4..cfc0ba63d5749 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -41,7 +41,6 @@
 #include "ToolChains/PPCFreeBSD.h"
 #include "ToolChains/PPCLinux.h"
 #include "ToolChains/PS4CPU.h"
-#include "ToolChains/RISCVToolchain.h"
 #include "ToolChains/SPIRV.h"
 #include "ToolChains/SPIRVOpenMP.h"
 #include "ToolChains/SYCL.h"
@@ -6889,16 +6888,11 @@ const ToolChain &Driver::getToolChain(const ArgList 
&Args,
 TC = std::make_unique(*this, Target, Args);
 break;
   case llvm::Triple::msp430:
-TC =
-std::make_unique(*this, Target, Args);
+TC = std::make_unique(*this, Target, 
Args);
 break;
   case llvm::Triple::riscv32:
   case llvm::Triple::riscv64:
-if (toolchains::RISCVToolChain::hasGCCToolchain(*this, Args))
-  TC =
-  std::make_unique(*this, Target, 
Args);
-else
-  TC = std::make_unique(*this, Target, Args);
+TC = std::make_unique(*this, Target, Args);
 break;
   case llvm::Triple::ve:
 TC = std::make_unique(*this, Target, Args);
diff --git a/clang/lib/Driver/ToolChains/BareMetal.cpp 
b/clang/lib/Driver/ToolChains/BareMetal.cpp
index 396197cad6429..ffc92fb66dfe9 100644
--- a/clang/lib/Driver/ToolChains/BareMetal.cpp
+++ b/clang/lib/Driver/ToolChains/BareMetal.cpp
@@ -377,6 +377,26 @@ BareMetal::OrderedMultilibs 
BareMetal::getOrderedMultilibs() const {
   return llvm::reverse(Default);
 }
 
+ToolChain::CXXStdlibType BareMetal::GetDefaultCXXStdlibType() const {
+  if (getTriple().isRISCV() && GCCInstallation.isValid())
+return ToolChain::CST_Libstdcxx;
+  return ToolChain::CST_Libcxx;
+}
+
+ToolChain::RuntimeLibType BareMetal::GetDefaultRuntimeLibType() const {
+  if (getTriple().isRISCV() && GCCInstallation.isValid())
+return ToolChain::RLT_Libgcc;
+  return ToolChain::RLT_CompilerRT;
+}
+
+ToolChain::UnwindLibType
+BareMetal::GetUnwindLibType(const llvm::opt::ArgList &Args) const {
+  if (getTriple().isRISCV())
+return ToolChain::UNW_None;
+
+  return ToolChain::GetUnwindLibType(Args);
+}
+
 void BareMetal::AddClangSystemIncludeArgs(const ArgList &DriverArgs,
   ArgStringList &CC1Args) const {
   if (DriverArgs.hasArg(options::OPT_nostdinc))
diff --git a/clang/lib/Driver/ToolChains/BareMetal.h 
b/clang/lib/Driver/ToolChains/BareMetal.h
index 54805530bae82..cc57fa21867a2 100644
--- a/clang/lib/Driver/ToolChains/BareMetal.h
+++ b/clang/lib/Driver/ToolChains/BareMetal.h
@@ -56,13 +56,11 @@ class LLVM_LIBRARY_VISIBILITY BareMetal : public 
Generic_ELF {
 return UnwindTableLevel::None;
   }
 
-  RuntimeLibType GetDefaultRuntimeLibType() const override {
-return ToolChain::RLT_CompilerRT;
-  }
+  CXXStdlibType GetDefaultCXXStdlibType() const overr

[llvm-branch-commits] [llvm] [GOFF] Add writing of section symbols (PR #133799)

2025-06-16 Thread Kai Nacke via llvm-branch-commits


@@ -0,0 +1,113 @@
+//===- MCGOFFAttributes.h - Attributes of GOFF symbols 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Defines the various attribute collections defining GOFF symbols.
+//
+//===--===//
+
+#ifndef LLVM_MC_MCGOFFATTRIBUTES_H
+#define LLVM_MC_MCGOFFATTRIBUTES_H
+
+#include "llvm/ADT/StringRef.h"
+#include "llvm/BinaryFormat/GOFF.h"
+
+namespace llvm {
+namespace GOFF {
+// An "External Symbol Definition" in the GOFF file has a type, and depending 
on
+// the type a different subset of the fields is used.
+//
+// Unlike other formats, a 2 dimensional structure is used to define the
+// location of data. For example, the equivalent of the ELF .text section is
+// made up of a Section Definition (SD) and a class (Element Definition; ED).
+// The name of the SD symbol depends on the application, while the class has 
the
+// predefined name C_CODE/C_CODE64 in AMODE31 and AMODE64 respectively.
+//
+// Data can be placed into this structure in 2 ways. First, the data (in a text
+// record) can be associated with an ED symbol. To refer to data, a Label
+// Definition (LD) is used to give an offset into the data a name. When 
binding,
+// the whole data is pulled into the resulting executable, and the addresses
+// given by the LD symbols are resolved.
+//
+// The alternative is to use a Part Definition (PR). In this case, the data (in
+// a text record) is associated with the part. When binding, only the data of
+// referenced PRs is pulled into the resulting binary.
+//
+// Both approaches are used, which means that the equivalent of a section in 
ELF
+// results in 3 GOFF symbols, either SD/ED/LD or SD/ED/PR. Moreover, certain
+// sections are fine with just defining SD/ED symbols. The SymbolMapper takes
+// care of all those details.
+
+// Attributes for SD symbols.
+struct SDAttr {
+  GOFF::ESDTaskingBehavior TaskingBehavior = GOFF::ESD_TA_Unspecified;
+  GOFF::ESDBindingScope BindingScope = GOFF::ESD_BSC_Unspecified;
+};
+
+// Attributes for ED symbols.
+struct EDAttr {
+  bool IsReadOnly = false;
+  GOFF::ESDExecutable Executable = GOFF::ESD_EXE_Unspecified;
+  GOFF::ESDAmode Amode;

redstar wrote:

To confirm, setting the Amode is not necessary on PR symbols.

https://github.com/llvm/llvm-project/pull/133799
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DirectX] Implement type formatting utilities for metadata validation (PR #144425)

2025-06-16 Thread via llvm-branch-commits

https://github.com/joaosaffran closed 
https://github.com/llvm/llvm-project/pull/144425
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: account for BRK when searching for auth oracles (PR #137975)

2025-06-16 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/137975

>From a9c870806c510766b52ae046cf7d9dce55c92db4 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Wed, 30 Apr 2025 16:08:10 +0300
Subject: [PATCH] [BOLT] Gadget scanner: account for BRK when searching for
 auth oracles

An authenticated pointer can be explicitly checked by the compiler via a
sequence of instructions that executes BRK on failure. It is important
to recognize such BRK instruction as checking every register (as it is
expected to immediately trigger an abnormal program termination) to
prevent false positive reports about authentication oracles:

autia   x2, x3
autia   x0, x1
; neither x0 nor x2 are checked at this point
eor x16, x0, x0, lsl #1
tbz x16, #62, on_success ; marks x0 as checked
; end of BB: for x2 to be checked here, it must be checked in both
; successor basic blocks
  on_failure:
brk 0xc470
  on_success:
; x2 is checked
ldr x1, [x2] ; marks x2 as checked
---
 bolt/include/bolt/Core/MCPlusBuilder.h| 14 ++
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 13 +-
 .../Target/AArch64/AArch64MCPlusBuilder.cpp   | 24 --
 .../AArch64/gs-pauth-address-checks.s | 44 +--
 .../AArch64/gs-pauth-authentication-oracles.s |  9 ++--
 .../AArch64/gs-pauth-signing-oracles.s|  6 +--
 6 files changed, 75 insertions(+), 35 deletions(-)

diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h 
b/bolt/include/bolt/Core/MCPlusBuilder.h
index b233452985502..c8cbcaf33f4b5 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -707,6 +707,20 @@ class MCPlusBuilder {
 return false;
   }
 
+  /// Returns true if Inst is a trap instruction.
+  ///
+  /// Tests if Inst is an instruction that immediately causes an abnormal
+  /// program termination, for example when a security violation is detected
+  /// by a compiler-inserted check.
+  ///
+  /// @note An implementation of this method should likely return false for
+  /// calls to library functions like abort(), as it is possible that the
+  /// execution state is partially attacker-controlled at this point.
+  virtual bool isTrap(const MCInst &Inst) const {
+llvm_unreachable("not implemented");
+return false;
+  }
+
   virtual bool isBreakpoint(const MCInst &Inst) const {
 llvm_unreachable("not implemented");
 return false;
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index b5b46390d4586..98bb84f6f965d 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -1078,6 +1078,15 @@ class DstSafetyAnalysis {
   dbgs() << ")\n";
 });
 
+// If this instruction terminates the program immediately, no
+// authentication oracles are possible past this point.
+if (BC.MIB->isTrap(Point)) {
+  LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); });
+  DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
+  Next.CannotEscapeUnchecked.set();
+  return Next;
+}
+
 // If this instruction is reachable by the analysis, a non-empty state will
 // be propagated to it sooner or later. Until then, skip computeNext().
 if (Cur.empty()) {
@@ -1185,8 +1194,8 @@ class DataflowDstSafetyAnalysis
 //
 // A basic block without any successors, on the other hand, can be
 // pessimistically initialized to everything-is-unsafe: this will naturally
-// handle both return and tail call instructions and is harmless for
-// internal indirect branch instructions (such as computed gotos).
+// handle return, trap and tail call instructions. At the same time, it is
+// harmless for internal indirect branch instructions, like computed gotos.
 if (BB.succ_empty())
   return createUnsafeState();
 
diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp 
b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
index 9d5a578cfbdff..b669d32cc2032 100644
--- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
@@ -386,10 +386,9 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
 // the list of successors of this basic block as appropriate.
 
 // Any of the above code sequences assume the fall-through basic block
-// is a dead-end BRK instruction (any immediate operand is accepted).
+// is a dead-end trap instruction.
 const BinaryBasicBlock *BreakBB = BB.getFallthrough();
-if (!BreakBB || BreakBB->empty() ||
-BreakBB->front().getOpcode() != AArch64::BRK)
+if (!BreakBB || BreakBB->empty() || !isTrap(BreakBB->front()))
   return std::nullopt;
 
 // Iterate over the instructions of BB in reverse order, matching opcodes
@@ -1751,6 +1750,25 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
 Inst.addOperand(MCOperand::createImm(0));
   }
 

[llvm-branch-commits] [llvm] [BOLT] Introduce helpers to match `MCInst`s one at a time (NFC) (PR #138883)

2025-06-16 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/138883

>From 8fa355f56ea745e775ae78f3087e062e198235dd Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Wed, 7 May 2025 16:42:00 +0300
Subject: [PATCH] [BOLT] Introduce helpers to match `MCInst`s one at a time
 (NFC)

Introduce matchInst helper function to capture and/or match the operands
of MCInst. Unlike the existing `MCPlusBuilder::MCInstMatcher` machinery,
matchInst is intended for the use cases when precise control over the
instruction order is required. For example, when validating PtrAuth
hardening, all registers are usually considered unsafe after a function
call, even though callee-saved registers should preserve their old
values *under normal operation*.
---
 bolt/include/bolt/Core/MCInstUtils.h  | 128 ++
 .../Target/AArch64/AArch64MCPlusBuilder.cpp   |  90 +---
 2 files changed, 162 insertions(+), 56 deletions(-)

diff --git a/bolt/include/bolt/Core/MCInstUtils.h 
b/bolt/include/bolt/Core/MCInstUtils.h
index 69bf5e6159b74..50b7d56470c99 100644
--- a/bolt/include/bolt/Core/MCInstUtils.h
+++ b/bolt/include/bolt/Core/MCInstUtils.h
@@ -162,6 +162,134 @@ static inline raw_ostream &operator<<(raw_ostream &OS,
   return Ref.print(OS);
 }
 
+/// Instruction-matching helpers operating on a single instruction at a time.
+///
+/// Unlike MCPlusBuilder::MCInstMatcher, this matchInst() function focuses on
+/// the cases where a precise control over the instruction order is important:
+///
+/// // Bring the short names into the local scope:
+/// using namespace MCInstMatcher;
+/// // Declare the registers to capture:
+/// Reg Xn, Xm;
+/// // Capture the 0th and 1st operands, match the 2nd operand against the
+/// // just captured Xm register, match the 3rd operand against literal 0:
+/// if (!matchInst(MaybeAdd, AArch64::ADDXrs, Xm, Xn, Xm, Imm(0))
+///   return AArch64::NoRegister;
+/// // Match the 0th operand against Xm:
+/// if (!matchInst(MaybeBr, AArch64::BR, Xm))
+///   return AArch64::NoRegister;
+/// // Return the matched register:
+/// return Xm.get();
+namespace MCInstMatcher {
+
+// The base class to match an operand of type T.
+//
+// The subclasses of OpMatcher are intended to be allocated on the stack and
+// to only be used by passing them to matchInst() and by calling their get()
+// function, thus the peculiar `mutable` specifiers: to make the calling code
+// compact and readable, the templated matchInst() function has to accept both
+// long-lived Imm/Reg wrappers declared as local variables (intended to capture
+// the first operand's value and match the subsequent operands, whether inside
+// a single instruction or across multiple instructions), as well as temporary
+// wrappers around literal values to match, f.e. Imm(42) or Reg(AArch64::XZR).
+template  class OpMatcher {
+  mutable std::optional Value;
+  mutable std::optional SavedValue;
+
+  // Remember/restore the last Value - to be called by matchInst.
+  void remember() const { SavedValue = Value; }
+  void restore() const { Value = SavedValue; }
+
+  template 
+  friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...);
+
+protected:
+  OpMatcher(std::optional ValueToMatch) : Value(ValueToMatch) {}
+
+  bool matchValue(T OpValue) const {
+// Check that OpValue does not contradict the existing Value.
+bool MatchResult = !Value || *Value == OpValue;
+// If MatchResult is false, all matchers will be reset before returning 
from
+// matchInst, including this one, thus no need to assign conditionally.
+Value = OpValue;
+
+return MatchResult;
+  }
+
+public:
+  /// Returns the captured value.
+  T get() const {
+assert(Value.has_value());
+return *Value;
+  }
+};
+
+class Reg : public OpMatcher {
+  bool matches(const MCOperand &Op) const {
+if (!Op.isReg())
+  return false;
+
+return matchValue(Op.getReg());
+  }
+
+  template 
+  friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...);
+
+public:
+  Reg(std::optional RegToMatch = std::nullopt)
+  : OpMatcher(RegToMatch) {}
+};
+
+class Imm : public OpMatcher {
+  bool matches(const MCOperand &Op) const {
+if (!Op.isImm())
+  return false;
+
+return matchValue(Op.getImm());
+  }
+
+  template 
+  friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...);
+
+public:
+  Imm(std::optional ImmToMatch = std::nullopt)
+  : OpMatcher(ImmToMatch) {}
+};
+
+/// Tries to match Inst and updates Ops on success.
+///
+/// If Inst has the specified Opcode and its operand list prefix matches Ops,
+/// this function returns true and updates Ops, otherwise false is returned and
+/// values of Ops are kept as before matchInst was called.
+///
+/// Please note that while Ops are technically passed by a const reference to
+/// make invocations like `matchInst(MI, Opcode, Imm(42))` possible, all their
+/// fields are marked mut

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: improve handling of unreachable basic blocks (PR #136183)

2025-06-16 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/136183

>From e27f4844cb5bd24a1d5758568c75ad83aa79b560 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Thu, 17 Apr 2025 20:51:16 +0300
Subject: [PATCH 1/3] [BOLT] Gadget scanner: improve handling of unreachable
 basic blocks

Instead of refusing to analyze an instruction completely, when it is
unreachable according to the CFG reconstructed by BOLT, pessimistically
assume all registers to be unsafe at the start of basic blocks without
any predecessors. Nevertheless, unreachable basic blocks found in
optimized code likely means imprecise CFG reconstruction, thus report a
warning once per basic block without predecessors.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 46 ++-
 .../AArch64/gs-pacret-autiasp.s   |  7 ++-
 .../binary-analysis/AArch64/gs-pauth-calls.s  | 57 +++
 3 files changed, 95 insertions(+), 15 deletions(-)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 95e831fe9c8ca..17d4f5c256a2a 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -342,6 +342,12 @@ class SrcSafetyAnalysis {
 return S;
   }
 
+  /// Creates a state with all registers marked unsafe (not to be confused
+  /// with empty state).
+  SrcState createUnsafeState() const {
+return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
+  }
+
   BitVector getClobberedRegs(const MCInst &Point) const {
 BitVector Clobbered(NumRegs);
 // Assume a call can clobber all registers, including callee-saved
@@ -585,6 +591,13 @@ class DataflowSrcSafetyAnalysis
 if (BB.isEntryPoint())
   return createEntryState();
 
+// If a basic block without any predecessors is found in an optimized code,
+// this likely means that some CFG edges were not detected. Pessimistically
+// assume all registers to be unsafe before this basic block and warn about
+// this fact in FunctionAnalysis::findUnsafeUses().
+if (BB.pred_empty())
+  return createUnsafeState();
+
 return SrcState();
   }
 
@@ -689,12 +702,6 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis,
   using SrcSafetyAnalysis::BC;
   BinaryFunction &BF;
 
-  /// Creates a state with all registers marked unsafe (not to be confused
-  /// with empty state).
-  SrcState createUnsafeState() const {
-return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
-  }
-
 public:
   CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF,
   MCPlusBuilder::AllocatorIdTy AllocId,
@@ -1375,19 +1382,30 @@ void FunctionAnalysisContext::findUnsafeUses(
 BF.dump();
   });
 
+  if (BF.hasCFG()) {
+// Warn on basic blocks being unreachable according to BOLT, as this
+// likely means CFG is imprecise.
+for (BinaryBasicBlock &BB : BF) {
+  if (!BB.pred_empty() || BB.isEntryPoint())
+continue;
+  // Arbitrarily attach the report to the first instruction of BB.
+  MCInst *InstToReport = BB.getFirstNonPseudoInstr();
+  if (!InstToReport)
+continue; // BB has no real instructions
+
+  Reports.push_back(
+  make_generic_report(MCInstReference::get(InstToReport, BF),
+  "Warning: no predecessor basic blocks detected "
+  "(possibly incomplete CFG)"));
+}
+  }
+
   iterateOverInstrs(BF, [&](MCInstReference Inst) {
 if (BC.MIB->isCFI(Inst))
   return;
 
 const SrcState &S = Analysis->getStateBefore(Inst);
-
-// If non-empty state was never propagated from the entry basic block
-// to Inst, assume it to be unreachable and report a warning.
-if (S.empty()) {
-  Reports.push_back(
-  make_generic_report(Inst, "Warning: unreachable instruction found"));
-  return;
-}
+assert(!S.empty() && "Instruction has no associated state");
 
 if (auto Report = shouldReportReturnGadget(BC, Inst, S))
   Reports.push_back(*Report);
diff --git a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s 
b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
index 284f0bea607a5..6559ba336e8de 100644
--- a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
+++ b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
@@ -215,12 +215,17 @@ f_callclobbered_calleesaved:
 .globl  f_unreachable_instruction
 .type   f_unreachable_instruction,@function
 f_unreachable_instruction:
-// CHECK-LABEL: GS-PAUTH: Warning: unreachable instruction found in function 
f_unreachable_instruction, basic block {{[0-9a-zA-Z.]+}}, at address
+// CHECK-LABEL: GS-PAUTH: Warning: no predecessor basic blocks detected 
(possibly incomplete CFG) in function f_unreachable_instruction, basic block 
{{[0-9a-zA-Z.]+}}, at address
 // CHECK-NEXT:The instruction is {{[0-9a-f]+}}:   add x0, x1, 
x2
 // CHECK-NOT:   instructions that write t

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: do not crash on debug-printing CFI instructions (PR #136151)

2025-06-16 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/136151

>From fcfd5f61bb011859121bc70c7abdeddb1a72d06b Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 15 Apr 2025 21:47:18 +0300
Subject: [PATCH] [BOLT] Gadget scanner: do not crash on debug-printing CFI
 instructions

Some instruction-printing code used under LLVM_DEBUG does not handle CFI
instructions well. While CFI instructions seem to be harmless for the
correctness of the analysis results, they do not convey any useful
information to the analysis either, so skip them early.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 16 ++
 .../AArch64/gs-pauth-debug-output.s   | 32 +++
 2 files changed, 48 insertions(+)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 7682d7fe2c542..95e831fe9c8ca 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -430,6 +430,9 @@ class SrcSafetyAnalysis {
   }
 
   SrcState computeNext(const MCInst &Point, const SrcState &Cur) {
+if (BC.MIB->isCFI(Point))
+  return Cur;
+
 SrcStatePrinter P(BC);
 LLVM_DEBUG({
   dbgs() << "  SrcSafetyAnalysis::ComputeNext(";
@@ -704,6 +707,8 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis,
 SrcState S = createEntryState();
 for (auto &I : BF.instrs()) {
   MCInst &Inst = I.second;
+  if (BC.MIB->isCFI(Inst))
+continue;
 
   // If there is a label before this instruction, it is possible that it
   // can be jumped-to, thus conservatively resetting S. As an exception,
@@ -1010,6 +1015,9 @@ class DstSafetyAnalysis {
   }
 
   DstState computeNext(const MCInst &Point, const DstState &Cur) {
+if (BC.MIB->isCFI(Point))
+  return Cur;
+
 DstStatePrinter P(BC);
 LLVM_DEBUG({
   dbgs() << "  DstSafetyAnalysis::ComputeNext(";
@@ -1177,6 +1185,8 @@ class CFGUnawareDstSafetyAnalysis : public 
DstSafetyAnalysis,
 DstState S = createUnsafeState();
 for (auto &I : llvm::reverse(BF.instrs())) {
   MCInst &Inst = I.second;
+  if (BC.MIB->isCFI(Inst))
+continue;
 
   // If Inst can change the control flow, we cannot be sure that the next
   // instruction (to be executed in analyzed program) is the one processed
@@ -1366,6 +1376,9 @@ void FunctionAnalysisContext::findUnsafeUses(
   });
 
   iterateOverInstrs(BF, [&](MCInstReference Inst) {
+if (BC.MIB->isCFI(Inst))
+  return;
+
 const SrcState &S = Analysis->getStateBefore(Inst);
 
 // If non-empty state was never propagated from the entry basic block
@@ -1429,6 +1442,9 @@ void FunctionAnalysisContext::findUnsafeDefs(
   });
 
   iterateOverInstrs(BF, [&](MCInstReference Inst) {
+if (BC.MIB->isCFI(Inst))
+  return;
+
 const DstState &S = Analysis->getStateAfter(Inst);
 
 if (auto Report = shouldReportAuthOracle(BC, Inst, S))
diff --git a/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s 
b/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s
index 686557eb1e529..fbb96a63d41ed 100644
--- a/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s
+++ b/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s
@@ -329,6 +329,38 @@ auth_oracle:
 // PAUTH-EMPTY:
 // PAUTH-NEXT:   Attaching leakage info to: :  autia   x0, x1 
# DataflowDstSafetyAnalysis: dst-state
 
+// Gadget scanner should not crash on CFI instructions, including when 
debug-printing them.
+// Note that the particular debug output is not checked, but BOLT should be
+// compiled with assertions enabled to support -debug-only argument.
+
+.globl  cfi_inst_df
+.type   cfi_inst_df,@function
+cfi_inst_df:
+.cfi_startproc
+sub sp, sp, #16
+.cfi_def_cfa_offset 16
+add sp, sp, #16
+.cfi_def_cfa_offset 0
+ret
+.size   cfi_inst_df, .-cfi_inst_df
+.cfi_endproc
+
+.globl  cfi_inst_nocfg
+.type   cfi_inst_nocfg,@function
+cfi_inst_nocfg:
+.cfi_startproc
+sub sp, sp, #16
+.cfi_def_cfa_offset 16
+
+adr x0, 1f
+br  x0
+1:
+add sp, sp, #16
+.cfi_def_cfa_offset 0
+ret
+.size   cfi_inst_nocfg, .-cfi_inst_nocfg
+.cfi_endproc
+
 // CHECK-LABEL:Analyzing function main, AllocatorId = 1
 .globl  main
 .type   main,@function

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: improve handling of unreachable basic blocks (PR #136183)

2025-06-16 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/136183

>From e27f4844cb5bd24a1d5758568c75ad83aa79b560 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Thu, 17 Apr 2025 20:51:16 +0300
Subject: [PATCH 1/3] [BOLT] Gadget scanner: improve handling of unreachable
 basic blocks

Instead of refusing to analyze an instruction completely, when it is
unreachable according to the CFG reconstructed by BOLT, pessimistically
assume all registers to be unsafe at the start of basic blocks without
any predecessors. Nevertheless, unreachable basic blocks found in
optimized code likely means imprecise CFG reconstruction, thus report a
warning once per basic block without predecessors.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 46 ++-
 .../AArch64/gs-pacret-autiasp.s   |  7 ++-
 .../binary-analysis/AArch64/gs-pauth-calls.s  | 57 +++
 3 files changed, 95 insertions(+), 15 deletions(-)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 95e831fe9c8ca..17d4f5c256a2a 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -342,6 +342,12 @@ class SrcSafetyAnalysis {
 return S;
   }
 
+  /// Creates a state with all registers marked unsafe (not to be confused
+  /// with empty state).
+  SrcState createUnsafeState() const {
+return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
+  }
+
   BitVector getClobberedRegs(const MCInst &Point) const {
 BitVector Clobbered(NumRegs);
 // Assume a call can clobber all registers, including callee-saved
@@ -585,6 +591,13 @@ class DataflowSrcSafetyAnalysis
 if (BB.isEntryPoint())
   return createEntryState();
 
+// If a basic block without any predecessors is found in an optimized code,
+// this likely means that some CFG edges were not detected. Pessimistically
+// assume all registers to be unsafe before this basic block and warn about
+// this fact in FunctionAnalysis::findUnsafeUses().
+if (BB.pred_empty())
+  return createUnsafeState();
+
 return SrcState();
   }
 
@@ -689,12 +702,6 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis,
   using SrcSafetyAnalysis::BC;
   BinaryFunction &BF;
 
-  /// Creates a state with all registers marked unsafe (not to be confused
-  /// with empty state).
-  SrcState createUnsafeState() const {
-return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
-  }
-
 public:
   CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF,
   MCPlusBuilder::AllocatorIdTy AllocId,
@@ -1375,19 +1382,30 @@ void FunctionAnalysisContext::findUnsafeUses(
 BF.dump();
   });
 
+  if (BF.hasCFG()) {
+// Warn on basic blocks being unreachable according to BOLT, as this
+// likely means CFG is imprecise.
+for (BinaryBasicBlock &BB : BF) {
+  if (!BB.pred_empty() || BB.isEntryPoint())
+continue;
+  // Arbitrarily attach the report to the first instruction of BB.
+  MCInst *InstToReport = BB.getFirstNonPseudoInstr();
+  if (!InstToReport)
+continue; // BB has no real instructions
+
+  Reports.push_back(
+  make_generic_report(MCInstReference::get(InstToReport, BF),
+  "Warning: no predecessor basic blocks detected "
+  "(possibly incomplete CFG)"));
+}
+  }
+
   iterateOverInstrs(BF, [&](MCInstReference Inst) {
 if (BC.MIB->isCFI(Inst))
   return;
 
 const SrcState &S = Analysis->getStateBefore(Inst);
-
-// If non-empty state was never propagated from the entry basic block
-// to Inst, assume it to be unreachable and report a warning.
-if (S.empty()) {
-  Reports.push_back(
-  make_generic_report(Inst, "Warning: unreachable instruction found"));
-  return;
-}
+assert(!S.empty() && "Instruction has no associated state");
 
 if (auto Report = shouldReportReturnGadget(BC, Inst, S))
   Reports.push_back(*Report);
diff --git a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s 
b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
index 284f0bea607a5..6559ba336e8de 100644
--- a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
+++ b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
@@ -215,12 +215,17 @@ f_callclobbered_calleesaved:
 .globl  f_unreachable_instruction
 .type   f_unreachable_instruction,@function
 f_unreachable_instruction:
-// CHECK-LABEL: GS-PAUTH: Warning: unreachable instruction found in function 
f_unreachable_instruction, basic block {{[0-9a-zA-Z.]+}}, at address
+// CHECK-LABEL: GS-PAUTH: Warning: no predecessor basic blocks detected 
(possibly incomplete CFG) in function f_unreachable_instruction, basic block 
{{[0-9a-zA-Z.]+}}, at address
 // CHECK-NEXT:The instruction is {{[0-9a-f]+}}:   add x0, x1, 
x2
 // CHECK-NOT:   instructions that write t

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: optionally assume auth traps on failure (PR #139778)

2025-06-16 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/139778

>From 090fb1a82e96ff4f8902e1d395e3e17d1ec20a43 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 13 May 2025 19:50:41 +0300
Subject: [PATCH] [BOLT] Gadget scanner: optionally assume auth traps on
 failure

On AArch64 it is possible for an auth instruction to either return an
invalid address value on failure (without FEAT_FPAC) or generate an
error (with FEAT_FPAC). It thus may be possible to never emit explicit
pointer checks, if the target CPU is known to support FEAT_FPAC.

This commit implements an --auth-traps-on-failure command line option,
which essentially makes "safe-to-dereference" and "trusted" register
properties identical and disables scanning for authentication oracles
completely.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 112 +++
 .../binary-analysis/AArch64/cmdline-args.test |   1 +
 .../AArch64/gs-pauth-authentication-oracles.s |   6 +-
 .../binary-analysis/AArch64/gs-pauth-calls.s  |   5 +-
 .../AArch64/gs-pauth-debug-output.s   | 177 ++---
 .../AArch64/gs-pauth-jump-table.s |   6 +-
 .../AArch64/gs-pauth-signing-oracles.s|  54 ++---
 .../AArch64/gs-pauth-tail-calls.s | 184 +-
 8 files changed, 318 insertions(+), 227 deletions(-)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 3514c953030a6..b4d2fe150d514 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -14,6 +14,7 @@
 #include "bolt/Passes/PAuthGadgetScanner.h"
 #include "bolt/Core/ParallelUtilities.h"
 #include "bolt/Passes/DataflowAnalysis.h"
+#include "bolt/Utils/CommandLineOpts.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallSet.h"
 #include "llvm/MC/MCInst.h"
@@ -26,6 +27,11 @@ namespace llvm {
 namespace bolt {
 namespace PAuthGadgetScanner {
 
+static cl::opt AuthTrapsOnFailure(
+"auth-traps-on-failure",
+cl::desc("Assume authentication instructions always trap on failure"),
+cl::cat(opts::BinaryAnalysisCategory));
+
 [[maybe_unused]] static void traceInst(const BinaryContext &BC, StringRef 
Label,
const MCInst &MI) {
   dbgs() << "  " << Label << ": ";
@@ -364,6 +370,34 @@ class SrcSafetyAnalysis {
 return Clobbered;
   }
 
+  std::optional getRegMadeTrustedByChecking(const MCInst &Inst,
+   SrcState Cur) const {
+// This functions cannot return multiple registers. This is never the case
+// on AArch64.
+std::optional RegCheckedByInst =
+BC.MIB->getAuthCheckedReg(Inst, /*MayOverwrite=*/false);
+if (RegCheckedByInst && Cur.SafeToDerefRegs[*RegCheckedByInst])
+  return *RegCheckedByInst;
+
+auto It = CheckerSequenceInfo.find(&Inst);
+if (It == CheckerSequenceInfo.end())
+  return std::nullopt;
+
+MCPhysReg RegCheckedBySequence = It->second.first;
+const MCInst *FirstCheckerInst = It->second.second;
+
+// FirstCheckerInst should belong to the same basic block (see the
+// assertion in DataflowSrcSafetyAnalysis::run()), meaning it was
+// deterministically processed a few steps before this instruction.
+const SrcState &StateBeforeChecker = getStateBefore(*FirstCheckerInst);
+
+// The sequence checks the register, but it should be authenticated before.
+if (!StateBeforeChecker.SafeToDerefRegs[RegCheckedBySequence])
+  return std::nullopt;
+
+return RegCheckedBySequence;
+  }
+
   // Returns all registers that can be treated as if they are written by an
   // authentication instruction.
   SmallVector getRegsMadeSafeToDeref(const MCInst &Point,
@@ -386,18 +420,38 @@ class SrcSafetyAnalysis {
 Regs.push_back(DstAndSrc->first);
 }
 
+// Make sure explicit checker sequence keeps register safe-to-dereference
+// when the register would be clobbered according to the regular rules:
+//
+//; LR is safe to dereference here
+//mov   x16, x30  ; start of the sequence, LR is s-t-d right before
+//xpaclri ; clobbers LR, LR is not safe anymore
+//cmp   x30, x16
+//b.eq  1f; end of the sequence: LR is marked as trusted
+//brk   0x1234
+//  1:
+//; at this point LR would be marked as trusted,
+//; but not safe-to-dereference
+//
+// or even just
+//
+//; X1 is safe to dereference here
+//ldr x0, [x1, #8]!
+//; X1 is trusted here, but it was clobbered due to address write-back
+if (auto CheckedReg = getRegMadeTrustedByChecking(Point, Cur))
+  Regs.push_back(*CheckedReg);
+
 return Regs;
   }
 
   // Returns all registers made trusted by this instruction.
   SmallVector getRegsMadeTrusted(const MCInst &Point,
 const SrcState &Cur) const {
+assert(!AuthTrapsOnFailure &&

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: make use of C++17 features and LLVM helpers (PR #141665)

2025-06-16 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/141665

>From 5fb7d950bbccc11f27fc44942585827ba494cc76 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 27 May 2025 21:06:03 +0300
Subject: [PATCH] [BOLT] Gadget scanner: make use of C++17 features and LLVM
 helpers

Perform trivial syntactical cleanups:
* make use of structured binding declarations
* use LLVM utility functions when appropriate
* omit braces around single expression inside single-line LLVM_DEBUG()

This patch is NFC aside from minor debug output changes.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 67 +--
 .../AArch64/gs-pauth-debug-output.s   | 14 ++--
 2 files changed, 38 insertions(+), 43 deletions(-)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index b4d2fe150d514..f455229d9ea7a 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -88,8 +88,8 @@ class TrackedRegisters {
   TrackedRegisters(ArrayRef RegsToTrack)
   : Registers(RegsToTrack),
 RegToIndexMapping(getMappingSize(RegsToTrack), NoIndex) {
-for (unsigned I = 0; I < RegsToTrack.size(); ++I)
-  RegToIndexMapping[RegsToTrack[I]] = I;
+for (auto [MappedIndex, Reg] : llvm::enumerate(RegsToTrack))
+  RegToIndexMapping[Reg] = MappedIndex;
   }
 
   ArrayRef getRegisters() const { return Registers; }
@@ -203,9 +203,9 @@ struct SrcState {
 
 SafeToDerefRegs &= StateIn.SafeToDerefRegs;
 TrustedRegs &= StateIn.TrustedRegs;
-for (unsigned I = 0; I < LastInstWritingReg.size(); ++I)
-  for (const MCInst *J : StateIn.LastInstWritingReg[I])
-LastInstWritingReg[I].insert(J);
+for (auto [ThisSet, OtherSet] :
+ llvm::zip_equal(LastInstWritingReg, StateIn.LastInstWritingReg))
+  ThisSet.insert_range(OtherSet);
 return *this;
   }
 
@@ -224,11 +224,9 @@ struct SrcState {
 static void printInstsShort(raw_ostream &OS,
 ArrayRef Insts) {
   OS << "Insts: ";
-  for (unsigned I = 0; I < Insts.size(); ++I) {
-auto &Set = Insts[I];
+  for (auto [I, PtrSet] : llvm::enumerate(Insts)) {
 OS << "[" << I << "](";
-for (const MCInst *MCInstP : Set)
-  OS << MCInstP << " ";
+interleave(PtrSet, OS, " ");
 OS << ")";
   }
 }
@@ -416,8 +414,9 @@ class SrcSafetyAnalysis {
 // ... an address can be updated in a safe manner, producing the result
 // which is as trusted as the input address.
 if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) {
-  if (Cur.SafeToDerefRegs[DstAndSrc->second])
-Regs.push_back(DstAndSrc->first);
+  auto [DstReg, SrcReg] = *DstAndSrc;
+  if (Cur.SafeToDerefRegs[SrcReg])
+Regs.push_back(DstReg);
 }
 
 // Make sure explicit checker sequence keeps register safe-to-dereference
@@ -469,8 +468,9 @@ class SrcSafetyAnalysis {
 // ... an address can be updated in a safe manner, producing the result
 // which is as trusted as the input address.
 if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) {
-  if (Cur.TrustedRegs[DstAndSrc->second])
-Regs.push_back(DstAndSrc->first);
+  auto [DstReg, SrcReg] = *DstAndSrc;
+  if (Cur.TrustedRegs[SrcReg])
+Regs.push_back(DstReg);
 }
 
 return Regs;
@@ -868,9 +868,9 @@ struct DstState {
   return (*this = StateIn);
 
 CannotEscapeUnchecked &= StateIn.CannotEscapeUnchecked;
-for (unsigned I = 0; I < FirstInstLeakingReg.size(); ++I)
-  for (const MCInst *J : StateIn.FirstInstLeakingReg[I])
-FirstInstLeakingReg[I].insert(J);
+for (auto [ThisSet, OtherSet] :
+ llvm::zip_equal(FirstInstLeakingReg, StateIn.FirstInstLeakingReg))
+  ThisSet.insert_range(OtherSet);
 return *this;
   }
 
@@ -1036,8 +1036,7 @@ class DstSafetyAnalysis {
 
 // ... an address can be updated in a safe manner, or
 if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Inst)) {
-  MCPhysReg DstReg, SrcReg;
-  std::tie(DstReg, SrcReg) = *DstAndSrc;
+  auto [DstReg, SrcReg] = *DstAndSrc;
   // Note that *all* registers containing the derived values must be safe,
   // both source and destination ones. No temporaries are supported at now.
   if (Cur.CannotEscapeUnchecked[SrcReg] &&
@@ -1077,7 +1076,7 @@ class DstSafetyAnalysis {
 // If this instruction terminates the program immediately, no
 // authentication oracles are possible past this point.
 if (BC.MIB->isTrap(Point)) {
-  LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); });
+  LLVM_DEBUG(traceInst(BC, "Trap instruction found", Point));
   DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
   Next.CannotEscapeUnchecked.set();
   return Next;
@@ -1255,7 +1254,7 @@ class CFGUnawareDstSafetyAnalysis : public 
DstSafetyAnalysis,
   // starting to analyze Inst.
   

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: fix LR to be safe in leaf functions without CFG (PR #141824)

2025-06-16 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/141824

>From 54c1b3aeaa9b269c57f3d121e974156204cf2ed7 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Wed, 14 May 2025 23:12:13 +0300
Subject: [PATCH] [BOLT] Gadget scanner: fix LR to be safe in leaf functions
 without CFG

After a label in a function without CFG information, use a reasonably
pessimistic estimation of register state (assume that any register that
can be clobbered in this function was actually clobbered) instead of the
most pessimistic "all registers are unsafe". This is the same estimation
as used by the dataflow variant of the analysis when the preceding
instruction is not known for sure.

Without this, leaf functions without CFG information are likely to have
false positive reports about non-protected return instructions, as
1) LR is unlikely to be signed and authenticated in a leaf function and
2) LR is likely to be used by a return instruction near the end of the
   function and
3) the register state is likely to be reset at least once during the
   linear scan through the function
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 14 +++--
 .../AArch64/gs-pacret-autiasp.s   | 31 +--
 .../AArch64/gs-pauth-authentication-oracles.s | 20 
 .../AArch64/gs-pauth-debug-output.s   | 30 ++
 .../AArch64/gs-pauth-signing-oracles.s| 27 
 5 files changed, 29 insertions(+), 93 deletions(-)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index e5bdade032488..05309a47aba40 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -737,19 +737,14 @@ template  class CFGUnawareAnalysis {
 //
 // Then, a function can be split into a number of disjoint contiguous sequences
 // of instructions without labels in between. These sequences can be processed
-// the same way basic blocks are processed by data-flow analysis, assuming
-// pessimistically that all registers are unsafe at the start of each sequence.
+// the same way basic blocks are processed by data-flow analysis, with the same
+// pessimistic estimation of the initial state at the start of each sequence
+// (except the first instruction of the function).
 class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis,
 public CFGUnawareAnalysis {
   using SrcSafetyAnalysis::BC;
   BinaryFunction &BF;
 
-  /// Creates a state with all registers marked unsafe (not to be confused
-  /// with empty state).
-  SrcState createUnsafeState() const {
-return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
-  }
-
 public:
   CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF,
   MCPlusBuilder::AllocatorIdTy AllocId,
@@ -759,6 +754,7 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis,
   }
 
   void run() override {
+const SrcState DefaultState = computePessimisticState(BF);
 SrcState S = createEntryState();
 for (auto &I : BF.instrs()) {
   MCInst &Inst = I.second;
@@ -773,7 +769,7 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis,
 LLVM_DEBUG({
   traceInst(BC, "Due to label, resetting the state before", Inst);
 });
-S = createUnsafeState();
+S = DefaultState;
   }
 
   // Attach the state *before* this instruction executes.
diff --git a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s 
b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
index df0a83be00986..627f8eb20ab9c 100644
--- a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
+++ b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
@@ -224,20 +224,33 @@ f_unreachable_instruction:
 ret
 .size f_unreachable_instruction, .-f_unreachable_instruction
 
-// Expected false positive: without CFG, the state is reset to all-unsafe
-// after an unconditional branch.
-
-.globl  state_is_reset_after_indirect_branch_nocfg
-.type   state_is_reset_after_indirect_branch_nocfg,@function
-state_is_reset_after_indirect_branch_nocfg:
-// CHECK-LABEL: GS-PAUTH: non-protected ret found in function 
state_is_reset_after_indirect_branch_nocfg, at address
-// CHECK-NEXT:  The instruction is {{[0-9a-f]+}}: ret
+// Without CFG, the state is reset at labels, assuming every register that can
+// be clobbered in the function was actually clobbered.
+
+.globl  lr_untouched_nocfg
+.type   lr_untouched_nocfg,@function
+lr_untouched_nocfg:
+// CHECK-NOT: lr_untouched_nocfg
+adr x2, 1f
+br  x2
+1:
+ret
+.size lr_untouched_nocfg, .-lr_untouched_nocfg
+
+.globl  lr_clobbered_nocfg
+.type   lr_clobbered_nocfg,@function
+lr_clobbered_nocfg:
+// CHECK-LABEL: GS-PAUTH: non-protected ret found in function 
lr_clobbered_nocfg, at address
+// CHECK-NEXT:  The instruction is

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: prevent false positives due to jump tables (PR #138884)

2025-06-16 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/138884

>From ecdc3d88913e18a513e57fbaff0a58c78a2f5c5a Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 6 May 2025 11:31:03 +0300
Subject: [PATCH] [BOLT] Gadget scanner: prevent false positives due to jump
 tables

As part of PAuth hardening, AArch64 LLVM backend can use a special
BR_JumpTable pseudo (enabled by -faarch64-jump-table-hardening
Clang option) which is expanded in the AsmPrinter into a contiguous
sequence without unsafe instructions in the middle.

This commit adds another target-specific callback to MCPlusBuilder
to make it possible to inhibit false positives for known-safe jump
table dispatch sequences. Without special handling, the branch
instruction is likely to be reported as a non-protected call (as its
destination is not produced by an auth instruction, PC-relative address
materialization, etc.) and possibly as a tail call being performed with
unsafe link register (as the detection whether the branch instruction
is a tail call is an heuristic).

For now, only the specific instruction sequence used by the AArch64
LLVM backend is matched.
---
 bolt/include/bolt/Core/MCInstUtils.h  |   9 +
 bolt/include/bolt/Core/MCPlusBuilder.h|  14 +
 bolt/lib/Core/MCInstUtils.cpp |  20 +
 bolt/lib/Passes/PAuthGadgetScanner.cpp|  10 +
 .../Target/AArch64/AArch64MCPlusBuilder.cpp   |  73 ++
 .../AArch64/gs-pauth-jump-table.s | 703 ++
 6 files changed, 829 insertions(+)
 create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-jump-table.s

diff --git a/bolt/include/bolt/Core/MCInstUtils.h 
b/bolt/include/bolt/Core/MCInstUtils.h
index 50b7d56470c99..33d36cccbcfff 100644
--- a/bolt/include/bolt/Core/MCInstUtils.h
+++ b/bolt/include/bolt/Core/MCInstUtils.h
@@ -154,6 +154,15 @@ class MCInstReference {
 return nullptr;
   }
 
+  /// Returns the only preceding instruction, or std::nullopt if multiple or no
+  /// predecessors are possible.
+  ///
+  /// If CFG information is available, basic block boundary can be crossed,
+  /// provided there is exactly one predecessor. If CFG is not available, the
+  /// preceding instruction in the offset order is returned, unless this is the
+  /// first instruction of the function.
+  std::optional getSinglePredecessor();
+
   raw_ostream &print(raw_ostream &OS) const;
 };
 
diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h 
b/bolt/include/bolt/Core/MCPlusBuilder.h
index c8cbcaf33f4b5..3abf4d18e94da 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -14,6 +14,7 @@
 #ifndef BOLT_CORE_MCPLUSBUILDER_H
 #define BOLT_CORE_MCPLUSBUILDER_H
 
+#include "bolt/Core/MCInstUtils.h"
 #include "bolt/Core/MCPlus.h"
 #include "bolt/Core/Relocation.h"
 #include "llvm/ADT/ArrayRef.h"
@@ -700,6 +701,19 @@ class MCPlusBuilder {
 return std::nullopt;
   }
 
+  /// Tests if BranchInst corresponds to an instruction sequence which is known
+  /// to be a safe dispatch via jump table.
+  ///
+  /// The target can decide which instruction sequences to consider "safe" from
+  /// the Pointer Authentication point of view, such as any jump table dispatch
+  /// sequence without function calls inside, any sequence which is contiguous,
+  /// or only some specific well-known sequences.
+  virtual bool
+  isSafeJumpTableBranchForPtrAuth(MCInstReference BranchInst) const {
+llvm_unreachable("not implemented");
+return false;
+  }
+
   virtual bool isTerminator(const MCInst &Inst) const;
 
   virtual bool isNoop(const MCInst &Inst) const {
diff --git a/bolt/lib/Core/MCInstUtils.cpp b/bolt/lib/Core/MCInstUtils.cpp
index 40f6edd59135c..b7c6d898988af 100644
--- a/bolt/lib/Core/MCInstUtils.cpp
+++ b/bolt/lib/Core/MCInstUtils.cpp
@@ -55,3 +55,23 @@ raw_ostream &MCInstReference::print(raw_ostream &OS) const {
   OS << ">";
   return OS;
 }
+
+std::optional MCInstReference::getSinglePredecessor() {
+  if (const RefInBB *Ref = tryGetRefInBB()) {
+if (Ref->It != Ref->BB->begin())
+  return MCInstReference(Ref->BB, &*std::prev(Ref->It));
+
+if (Ref->BB->pred_size() != 1)
+  return std::nullopt;
+
+BinaryBasicBlock *PredBB = *Ref->BB->pred_begin();
+assert(!PredBB->empty() && "Empty basic blocks are not supported yet");
+return MCInstReference(PredBB, &*PredBB->rbegin());
+  }
+
+  const RefInBF &Ref = getRefInBF();
+  if (Ref.It == Ref.BF->instrs().begin())
+return std::nullopt;
+
+  return MCInstReference(Ref.BF, std::prev(Ref.It));
+}
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index ee873f7c2c21d..3514c953030a6 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -1363,6 +1363,11 @@ shouldReportUnsafeTailCall(const BinaryContext &BC, 
const BinaryFunction &BF,
 return std::nullopt;
   }
 
+  if (BC.MIB->isSafeJumpTableBranchForPtrAuth(Inst)) {
+LL

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)

2025-06-16 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/137224

>From ec07ae02a991c5730fa0529e2479eadae4886211 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 22 Apr 2025 21:43:14 +0300
Subject: [PATCH] [BOLT] Gadget scanner: detect untrusted LR before tail call

Implement the detection of tail calls performed with untrusted link
register, which violates the assumption made on entry to every function.

Unlike other pauth gadgets, this one involves some amount of guessing
which branch instructions should be checked as tail calls.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp|  80 +++
 .../AArch64/gs-pauth-tail-calls.s | 597 ++
 2 files changed, 677 insertions(+)
 create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-tail-calls.s

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 05309a47aba40..b5b46390d4586 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -1319,6 +1319,83 @@ shouldReportReturnGadget(const BinaryContext &BC, const 
MCInstReference &Inst,
   return make_gadget_report(RetKind, Inst, *RetReg);
 }
 
+/// While BOLT already marks some of the branch instructions as tail calls,
+/// this function tries to improve the coverage by including less obvious cases
+/// when it is possible to do without introducing too many false positives.
+static bool shouldAnalyzeTailCallInst(const BinaryContext &BC,
+  const BinaryFunction &BF,
+  const MCInstReference &Inst) {
+  // Some BC.MIB->isXYZ(Inst) methods simply delegate to MCInstrDesc::isXYZ()
+  // (such as isBranch at the time of writing this comment), some don't (such
+  // as isCall). For that reason, call MCInstrDesc's methods explicitly when
+  // it is important.
+  const MCInstrDesc &Desc =
+  BC.MII->get(static_cast(Inst).getOpcode());
+  // Tail call should be a branch (but not necessarily an indirect one).
+  if (!Desc.isBranch())
+return false;
+
+  // Always analyze the branches already marked as tail calls by BOLT.
+  if (BC.MIB->isTailCall(Inst))
+return true;
+
+  // Try to also check the branches marked as "UNKNOWN CONTROL FLOW" - the
+  // below is a simplified condition from BinaryContext::printInstruction.
+  bool IsUnknownControlFlow =
+  BC.MIB->isIndirectBranch(Inst) && !BC.MIB->getJumpTable(Inst);
+
+  if (BF.hasCFG() && IsUnknownControlFlow)
+return true;
+
+  return false;
+}
+
+static std::optional>
+shouldReportUnsafeTailCall(const BinaryContext &BC, const BinaryFunction &BF,
+   const MCInstReference &Inst, const SrcState &S) {
+  static const GadgetKind UntrustedLRKind(
+  "untrusted link register found before tail call");
+
+  if (!shouldAnalyzeTailCallInst(BC, BF, Inst))
+return std::nullopt;
+
+  // Not only the set of registers returned by getTrustedLiveInRegs() can be
+  // seen as a reasonable target-independent _approximation_ of "the LR", these
+  // are *exactly* those registers used by SrcSafetyAnalysis to initialize the
+  // set of trusted registers on function entry.
+  // Thus, this function basically checks that the precondition expected to be
+  // imposed by a function call instruction (which is hardcoded into the 
target-
+  // specific getTrustedLiveInRegs() function) is also respected on tail calls.
+  SmallVector RegsToCheck = BC.MIB->getTrustedLiveInRegs();
+  LLVM_DEBUG({
+traceInst(BC, "Found tail call inst", Inst);
+traceRegMask(BC, "Trusted regs", S.TrustedRegs);
+  });
+
+  // In musl on AArch64, the _start function sets LR to zero and calls the next
+  // stage initialization function at the end, something along these lines:
+  //
+  //   _start:
+  // mov x30, #0
+  // ; ... other initialization ...
+  // b   _start_c ; performs "exit" system call at some point
+  //
+  // As this would produce a false positive for every executable linked with
+  // such libc, ignore tail calls performed by ELF entry function.
+  if (BC.StartFunctionAddress &&
+  *BC.StartFunctionAddress == Inst.getFunction()->getAddress()) {
+LLVM_DEBUG({ dbgs() << "  Skipping tail call in ELF entry function.\n"; });
+return std::nullopt;
+  }
+
+  // Returns at most one report per instruction - this is probably OK...
+  for (auto Reg : RegsToCheck)
+if (!S.TrustedRegs[Reg])
+  return make_gadget_report(UntrustedLRKind, Inst, Reg);
+
+  return std::nullopt;
+}
+
 static std::optional>
 shouldReportCallGadget(const BinaryContext &BC, const MCInstReference &Inst,
const SrcState &S) {
@@ -1473,6 +1550,9 @@ void FunctionAnalysisContext::findUnsafeUses(
 if (PacRetGadgetsOnly)
   return;
 
+if (auto Report = shouldReportUnsafeTailCall(BC, BF, Inst, S))
+  Reports.push_back(*Report);
+
 if (auto Report = shouldReportCallGadget(BC, Inst, S))

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: account for BRK when searching for auth oracles (PR #137975)

2025-06-16 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/137975

>From a9c870806c510766b52ae046cf7d9dce55c92db4 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Wed, 30 Apr 2025 16:08:10 +0300
Subject: [PATCH] [BOLT] Gadget scanner: account for BRK when searching for
 auth oracles

An authenticated pointer can be explicitly checked by the compiler via a
sequence of instructions that executes BRK on failure. It is important
to recognize such BRK instruction as checking every register (as it is
expected to immediately trigger an abnormal program termination) to
prevent false positive reports about authentication oracles:

autia   x2, x3
autia   x0, x1
; neither x0 nor x2 are checked at this point
eor x16, x0, x0, lsl #1
tbz x16, #62, on_success ; marks x0 as checked
; end of BB: for x2 to be checked here, it must be checked in both
; successor basic blocks
  on_failure:
brk 0xc470
  on_success:
; x2 is checked
ldr x1, [x2] ; marks x2 as checked
---
 bolt/include/bolt/Core/MCPlusBuilder.h| 14 ++
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 13 +-
 .../Target/AArch64/AArch64MCPlusBuilder.cpp   | 24 --
 .../AArch64/gs-pauth-address-checks.s | 44 +--
 .../AArch64/gs-pauth-authentication-oracles.s |  9 ++--
 .../AArch64/gs-pauth-signing-oracles.s|  6 +--
 6 files changed, 75 insertions(+), 35 deletions(-)

diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h 
b/bolt/include/bolt/Core/MCPlusBuilder.h
index b233452985502..c8cbcaf33f4b5 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -707,6 +707,20 @@ class MCPlusBuilder {
 return false;
   }
 
+  /// Returns true if Inst is a trap instruction.
+  ///
+  /// Tests if Inst is an instruction that immediately causes an abnormal
+  /// program termination, for example when a security violation is detected
+  /// by a compiler-inserted check.
+  ///
+  /// @note An implementation of this method should likely return false for
+  /// calls to library functions like abort(), as it is possible that the
+  /// execution state is partially attacker-controlled at this point.
+  virtual bool isTrap(const MCInst &Inst) const {
+llvm_unreachable("not implemented");
+return false;
+  }
+
   virtual bool isBreakpoint(const MCInst &Inst) const {
 llvm_unreachable("not implemented");
 return false;
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index b5b46390d4586..98bb84f6f965d 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -1078,6 +1078,15 @@ class DstSafetyAnalysis {
   dbgs() << ")\n";
 });
 
+// If this instruction terminates the program immediately, no
+// authentication oracles are possible past this point.
+if (BC.MIB->isTrap(Point)) {
+  LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); });
+  DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
+  Next.CannotEscapeUnchecked.set();
+  return Next;
+}
+
 // If this instruction is reachable by the analysis, a non-empty state will
 // be propagated to it sooner or later. Until then, skip computeNext().
 if (Cur.empty()) {
@@ -1185,8 +1194,8 @@ class DataflowDstSafetyAnalysis
 //
 // A basic block without any successors, on the other hand, can be
 // pessimistically initialized to everything-is-unsafe: this will naturally
-// handle both return and tail call instructions and is harmless for
-// internal indirect branch instructions (such as computed gotos).
+// handle return, trap and tail call instructions. At the same time, it is
+// harmless for internal indirect branch instructions, like computed gotos.
 if (BB.succ_empty())
   return createUnsafeState();
 
diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp 
b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
index 9d5a578cfbdff..b669d32cc2032 100644
--- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
@@ -386,10 +386,9 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
 // the list of successors of this basic block as appropriate.
 
 // Any of the above code sequences assume the fall-through basic block
-// is a dead-end BRK instruction (any immediate operand is accepted).
+// is a dead-end trap instruction.
 const BinaryBasicBlock *BreakBB = BB.getFallthrough();
-if (!BreakBB || BreakBB->empty() ||
-BreakBB->front().getOpcode() != AArch64::BRK)
+if (!BreakBB || BreakBB->empty() || !isTrap(BreakBB->front()))
   return std::nullopt;
 
 // Iterate over the instructions of BB in reverse order, matching opcodes
@@ -1751,6 +1750,25 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
 Inst.addOperand(MCOperand::createImm(0));
   }
 

[llvm-branch-commits] [llvm] [BOLT] Introduce helpers to match `MCInst`s one at a time (NFC) (PR #138883)

2025-06-16 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/138883

>From 8fa355f56ea745e775ae78f3087e062e198235dd Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Wed, 7 May 2025 16:42:00 +0300
Subject: [PATCH] [BOLT] Introduce helpers to match `MCInst`s one at a time
 (NFC)

Introduce matchInst helper function to capture and/or match the operands
of MCInst. Unlike the existing `MCPlusBuilder::MCInstMatcher` machinery,
matchInst is intended for the use cases when precise control over the
instruction order is required. For example, when validating PtrAuth
hardening, all registers are usually considered unsafe after a function
call, even though callee-saved registers should preserve their old
values *under normal operation*.
---
 bolt/include/bolt/Core/MCInstUtils.h  | 128 ++
 .../Target/AArch64/AArch64MCPlusBuilder.cpp   |  90 +---
 2 files changed, 162 insertions(+), 56 deletions(-)

diff --git a/bolt/include/bolt/Core/MCInstUtils.h 
b/bolt/include/bolt/Core/MCInstUtils.h
index 69bf5e6159b74..50b7d56470c99 100644
--- a/bolt/include/bolt/Core/MCInstUtils.h
+++ b/bolt/include/bolt/Core/MCInstUtils.h
@@ -162,6 +162,134 @@ static inline raw_ostream &operator<<(raw_ostream &OS,
   return Ref.print(OS);
 }
 
+/// Instruction-matching helpers operating on a single instruction at a time.
+///
+/// Unlike MCPlusBuilder::MCInstMatcher, this matchInst() function focuses on
+/// the cases where a precise control over the instruction order is important:
+///
+/// // Bring the short names into the local scope:
+/// using namespace MCInstMatcher;
+/// // Declare the registers to capture:
+/// Reg Xn, Xm;
+/// // Capture the 0th and 1st operands, match the 2nd operand against the
+/// // just captured Xm register, match the 3rd operand against literal 0:
+/// if (!matchInst(MaybeAdd, AArch64::ADDXrs, Xm, Xn, Xm, Imm(0))
+///   return AArch64::NoRegister;
+/// // Match the 0th operand against Xm:
+/// if (!matchInst(MaybeBr, AArch64::BR, Xm))
+///   return AArch64::NoRegister;
+/// // Return the matched register:
+/// return Xm.get();
+namespace MCInstMatcher {
+
+// The base class to match an operand of type T.
+//
+// The subclasses of OpMatcher are intended to be allocated on the stack and
+// to only be used by passing them to matchInst() and by calling their get()
+// function, thus the peculiar `mutable` specifiers: to make the calling code
+// compact and readable, the templated matchInst() function has to accept both
+// long-lived Imm/Reg wrappers declared as local variables (intended to capture
+// the first operand's value and match the subsequent operands, whether inside
+// a single instruction or across multiple instructions), as well as temporary
+// wrappers around literal values to match, f.e. Imm(42) or Reg(AArch64::XZR).
+template  class OpMatcher {
+  mutable std::optional Value;
+  mutable std::optional SavedValue;
+
+  // Remember/restore the last Value - to be called by matchInst.
+  void remember() const { SavedValue = Value; }
+  void restore() const { Value = SavedValue; }
+
+  template 
+  friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...);
+
+protected:
+  OpMatcher(std::optional ValueToMatch) : Value(ValueToMatch) {}
+
+  bool matchValue(T OpValue) const {
+// Check that OpValue does not contradict the existing Value.
+bool MatchResult = !Value || *Value == OpValue;
+// If MatchResult is false, all matchers will be reset before returning 
from
+// matchInst, including this one, thus no need to assign conditionally.
+Value = OpValue;
+
+return MatchResult;
+  }
+
+public:
+  /// Returns the captured value.
+  T get() const {
+assert(Value.has_value());
+return *Value;
+  }
+};
+
+class Reg : public OpMatcher {
+  bool matches(const MCOperand &Op) const {
+if (!Op.isReg())
+  return false;
+
+return matchValue(Op.getReg());
+  }
+
+  template 
+  friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...);
+
+public:
+  Reg(std::optional RegToMatch = std::nullopt)
+  : OpMatcher(RegToMatch) {}
+};
+
+class Imm : public OpMatcher {
+  bool matches(const MCOperand &Op) const {
+if (!Op.isImm())
+  return false;
+
+return matchValue(Op.getImm());
+  }
+
+  template 
+  friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...);
+
+public:
+  Imm(std::optional ImmToMatch = std::nullopt)
+  : OpMatcher(ImmToMatch) {}
+};
+
+/// Tries to match Inst and updates Ops on success.
+///
+/// If Inst has the specified Opcode and its operand list prefix matches Ops,
+/// this function returns true and updates Ops, otherwise false is returned and
+/// values of Ops are kept as before matchInst was called.
+///
+/// Please note that while Ops are technically passed by a const reference to
+/// make invocations like `matchInst(MI, Opcode, Imm(42))` possible, all their
+/// fields are marked mut

[llvm-branch-commits] [mlir] [MLIR] Fix incorrect slice contiguity inference in `vector::isContiguousSlice` (PR #142422)

2025-06-16 Thread Momchil Velikov via llvm-branch-commits

https://github.com/momchil-velikov updated 
https://github.com/llvm/llvm-project/pull/142422

>From b950757c234900db941ed950ea3469b520d2e28a Mon Sep 17 00:00:00 2001
From: Momchil Velikov 
Date: Mon, 2 Jun 2025 15:13:13 +
Subject: [PATCH 1/9] [MLIR] Fix incorrect slice contiguity inference in
 `vector::isContiguousSlice`

Previously, slices were sometimes marked as non-contiguous when
they were actually contiguous. This occurred when the vector type had
leading unit dimensions, e.g., `vector<1x1x...x1xd0xd1x...xdn-1xT>``.
In such cases, only the trailing n dimensions of the memref need to be
contiguous, not the entire vector rank.

This affects how `FlattenContiguousRowMajorTransfer{Read,Write}Pattern`
flattens `transfer_read` and `transfer_write`` ops. The pattern used
to collapse a number of dimensions equal the vector rank, which
may be is incorrect when leading dimensions are unit-sized.

This patch fixes the issue by collapsing only as many trailing memref
dimensions as are actually contiguous.
---
 .../mlir/Dialect/Vector/Utils/VectorUtils.h   |  54 -
 .../Transforms/VectorTransferOpTransforms.cpp |   8 +-
 mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp |  25 ++--
 .../Vector/vector-transfer-flatten.mlir   | 108 +-
 4 files changed, 120 insertions(+), 75 deletions(-)

diff --git a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h 
b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
index 6609b28d77b6c..ed06d7a029494 100644
--- a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
+++ b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
@@ -49,35 +49,37 @@ FailureOr> 
isTranspose2DSlice(vector::TransposeOp op);
 
 /// Return true if `vectorType` is a contiguous slice of `memrefType`.
 ///
-/// Only the N = vectorType.getRank() trailing dims of `memrefType` are
-/// checked (the other dims are not relevant). Note that for `vectorType` to be
-/// a contiguous slice of `memrefType`, the trailing dims of the latter have
-/// to be contiguous - this is checked by looking at the corresponding strides.
+/// The leading unit dimensions of the vector type are ignored as they
+/// are not relevant to the result. Let N be the number of the vector
+/// dimensions after ignoring a leading sequence of unit ones.
 ///
-/// There might be some restriction on the leading dim of `VectorType`:
+/// For `vectorType` to be a contiguous slice of `memrefType`
+///   a) the N trailing dimensions of the latter must be contiguous, and
+///   b) the trailing N dimensions of `vectorType` and `memrefType`,
+///  except the first of them, must match.
 ///
-/// Case 1. If all the trailing dims of `vectorType` match the trailing dims
-/// of `memrefType` then the leading dim of `vectorType` can be
-/// arbitrary.
-///
-///Ex. 1.1 contiguous slice, perfect match
-///  vector<4x3x2xi32> from memref<5x4x3x2xi32>
-///Ex. 1.2 contiguous slice, the leading dim does not match (2 != 4)
-///  vector<2x3x2xi32> from memref<5x4x3x2xi32>
-///
-/// Case 2. If an "internal" dim of `vectorType` does not match the
-/// corresponding trailing dim in `memrefType` then the remaining
-/// leading dims of `vectorType` have to be 1 (the first non-matching
-/// dim can be arbitrary).
+/// Examples:
 ///
-///Ex. 2.1 non-contiguous slice, 2 != 3 and the leading dim != <1>
-///  vector<2x2x2xi32> from memref<5x4x3x2xi32>
-///Ex. 2.2  contiguous slice, 2 != 3 and the leading dim == <1>
-///  vector<1x2x2xi32> from memref<5x4x3x2xi32>
-///Ex. 2.3. contiguous slice, 2 != 3 and the leading dims == <1x1>
-///  vector<1x1x2x2xi32> from memref<5x4x3x2xi32>
-///Ex. 2.4. non-contiguous slice, 2 != 3 and the leading dims != <1x1>
-/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>)
+///   Ex.1 contiguous slice, perfect match
+/// vector<4x3x2xi32> from memref<5x4x3x2xi32>
+///   Ex.2 contiguous slice, the leading dim does not match (2 != 4)
+/// vector<2x3x2xi32> from memref<5x4x3x2xi32>
+///   Ex.3 non-contiguous slice, 2 != 3
+/// vector<2x2x2xi32> from memref<5x4x3x2xi32>
+///   Ex.4 contiguous slice, leading unit dimension of the vector ignored,
+///2 != 3 (allowed)
+/// vector<1x2x2xi32> from memref<5x4x3x2xi32>
+///   Ex.5. contiguous slice, leasing two unit dims of the vector ignored,
+/// 2 != 3 (allowed)
+/// vector<1x1x2x2xi32> from memref<5x4x3x2xi32>
+///   Ex.6. non-contiguous slice, 2 != 3, no leading sequence of unit dims
+/// vector<2x1x2x2xi32> from memref<5x4x3x2xi32>)
+///   Ex.7 contiguous slice, memref needs to be contiguous only on the last
+///dimension
+/// vector<1x1x2xi32> from memref<2x2x2xi32, strided<[8, 4, 1]>>
+///   Ex.8 non-contiguous slice, memref needs to be contiguous one the last
+///two dimensions, and it isn't
+/// vector<1x2x2xi32> from memref<2x2x2xi32, strided<[8, 4, 1]>>
 bool isContiguo

[llvm-branch-commits] [llvm] [DLCov] Origin-Tracking: Collect stack traces in DebugLoc (PR #143592)

2025-06-16 Thread Jeremy Morse via llvm-branch-commits


@@ -27,6 +27,21 @@ namespace llvm {
   class Function;
 
 #if LLVM_ENABLE_DEBUGLOC_COVERAGE_TRACKING
+#if LLVM_ENABLE_DEBUGLOC_ORIGIN_TRACKING
+  struct DbgLocOrigin {
+static constexpr unsigned long MaxDepth = 16;
+using StackTracesTy =
+SmallVector>, 0>;

jmorse wrote:

Am I right in reading this as a 16-wide array stored in a vector that's either 
zero-lengthed or one (given that on initialization we add a single element to 
the vector)? I feel there must be a better pattern than storing an array in a 
vector -- a potentially empty unique_ptr to a std::array, and allocate the 
array off the heap when it's needed?

https://github.com/llvm/llvm-project/pull/143592
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DLCov] Origin-Tracking: Collect stack traces in DebugLoc (PR #143592)

2025-06-16 Thread Jeremy Morse via llvm-branch-commits

https://github.com/jmorse commented:

Looking good with some questions

https://github.com/llvm/llvm-project/pull/143592
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DLCov] Origin-Tracking: Collect stack traces in DebugLoc (PR #143592)

2025-06-16 Thread Jeremy Morse via llvm-branch-commits

https://github.com/jmorse edited 
https://github.com/llvm/llvm-project/pull/143592
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DLCov] Origin-Tracking: Collect stack traces in DebugLoc (PR #143592)

2025-06-16 Thread Jeremy Morse via llvm-branch-commits


@@ -55,22 +70,29 @@ namespace llvm {
 Temporary
   };
 
-  // Extends TrackingMDNodeRef to also store a DebugLocKind, allowing Debugify
-  // to ignore intentionally-empty DebugLocs.
-  class DILocAndCoverageTracking : public TrackingMDNodeRef {
+  // Extends TrackingMDNodeRef to also store a DebugLocKind and Origin,
+  // allowing Debugify to ignore intentionally-empty DebugLocs and display the
+  // code responsible for generating unintentionally-empty DebugLocs.
+  // Currently we only need to track the Origin of this DILoc when using a
+  // DebugLoc that is not annotated (i.e. has DebugLocKind::Normal) and has a
+  // null DILocation, so only collect the origin stacktrace in those cases.
+  class DILocAndCoverageTracking : public TrackingMDNodeRef,
+   public DbgLocOrigin {

jmorse wrote:

A reflex inside my head says "prefer composition to inheritance" -- is there a 
reason we need to inherit from DbgLocOrigin, instead of having it as a named 
member of `DILocAndCoverageTracking`?

https://github.com/llvm/llvm-project/pull/143592
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DLCov] Origin-Tracking: Collect stack traces in DebugLoc (PR #143592)

2025-06-16 Thread Jeremy Morse via llvm-branch-commits


@@ -9,11 +9,31 @@
 #include "llvm/IR/DebugLoc.h"
 #include "llvm/Config/llvm-config.h"
 #include "llvm/IR/DebugInfo.h"
+
+#if LLVM_ENABLE_DEBUGLOC_ORIGIN_TRACKING
+#include "llvm/Support/Signals.h"
+
+namespace llvm {
+DbgLocOrigin::DbgLocOrigin(bool ShouldCollectTrace) {
+  if (ShouldCollectTrace) {

jmorse wrote:

Early exit instead?

https://github.com/llvm/llvm-project/pull/143592
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AArch64: Fix outline atomic libcall names for arm64ec (PR #144378)

2025-06-16 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/144378

None

>From e92b0c6c776fba1ea7ba125ada0663676f1090b1 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Tue, 17 Jun 2025 00:46:22 +0900
Subject: [PATCH] AArch64: Fix outline atomic libcall names for arm64ec

---
 llvm/lib/IR/RuntimeLibcalls.cpp   | 14 +++---
 llvm/test/CodeGen/AArch64/arm64ec-builtins.ll |  3 +--
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp
index 09200f380305a..ec08b17cfcd73 100644
--- a/llvm/lib/IR/RuntimeLibcalls.cpp
+++ b/llvm/lib/IR/RuntimeLibcalls.cpp
@@ -36,9 +36,6 @@ static void setAArch64LibcallNames(RuntimeLibcallsInfo &Info,
   LCALLNAME4(RTLIB::OUTLINE_ATOMIC_LDSET, __aarch64_ldset)
   LCALLNAME4(RTLIB::OUTLINE_ATOMIC_LDCLR, __aarch64_ldclr)
   LCALLNAME4(RTLIB::OUTLINE_ATOMIC_LDEOR, __aarch64_ldeor)
-#undef LCALLNAMES
-#undef LCALLNAME4
-#undef LCALLNAME5
 
   if (TT.isWindowsArm64EC()) {
 // FIXME: are there calls we need to exclude from this?
@@ -54,7 +51,18 @@ static void setAArch64LibcallNames(RuntimeLibcallsInfo &Info,
 #include "llvm/IR/RuntimeLibcalls.def"
 #undef HANDLE_LIBCALL
 #undef LIBCALL_NO_NAME
+
+LCALLNAME5(RTLIB::OUTLINE_ATOMIC_CAS, #__aarch64_cas)
+LCALLNAME4(RTLIB::OUTLINE_ATOMIC_SWP, #__aarch64_swp)
+LCALLNAME4(RTLIB::OUTLINE_ATOMIC_LDADD, #__aarch64_ldadd)
+LCALLNAME4(RTLIB::OUTLINE_ATOMIC_LDSET, #__aarch64_ldset)
+LCALLNAME4(RTLIB::OUTLINE_ATOMIC_LDCLR, #__aarch64_ldclr)
+LCALLNAME4(RTLIB::OUTLINE_ATOMIC_LDEOR, #__aarch64_ldeor)
   }
+
+#undef LCALLNAMES
+#undef LCALLNAME4
+#undef LCALLNAME5
 }
 
 static void setARMLibcallNames(RuntimeLibcallsInfo &Info, const Triple &TT) {
diff --git a/llvm/test/CodeGen/AArch64/arm64ec-builtins.ll 
b/llvm/test/CodeGen/AArch64/arm64ec-builtins.ll
index 92b95a90d89a0..cc4ec9c2eebd6 100644
--- a/llvm/test/CodeGen/AArch64/arm64ec-builtins.ll
+++ b/llvm/test/CodeGen/AArch64/arm64ec-builtins.ll
@@ -28,10 +28,9 @@ define i128 @f4(i128 %x, i128 %y) {
   ret i128 %r
 }
 
-; FIXME: This is wrong; should be "#__aarch64_cas1_relax"
 define i8 @f5(i8 %expected, i8 %new, ptr %ptr) 
"target-features"="+outline-atomics" {
 ; CHECK-LABEL: "#f5":
-; CHECK: bl __aarch64_cas1_relax
+; CHECK: bl "#__aarch64_cas1_relax"
 %pair = cmpxchg ptr %ptr, i8 %expected, i8 %new monotonic monotonic, align 
1
%r = extractvalue { i8, i1 } %pair, 0
 ret i8 %r

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AArch64: Fix outline atomic libcall names for arm64ec (PR #144378)

2025-06-16 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-ir

Author: Matt Arsenault (arsenm)


Changes



---
Full diff: https://github.com/llvm/llvm-project/pull/144378.diff


2 Files Affected:

- (modified) llvm/lib/IR/RuntimeLibcalls.cpp (+11-3) 
- (modified) llvm/test/CodeGen/AArch64/arm64ec-builtins.ll (+1-2) 


``diff
diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp
index 09200f380305a..ec08b17cfcd73 100644
--- a/llvm/lib/IR/RuntimeLibcalls.cpp
+++ b/llvm/lib/IR/RuntimeLibcalls.cpp
@@ -36,9 +36,6 @@ static void setAArch64LibcallNames(RuntimeLibcallsInfo &Info,
   LCALLNAME4(RTLIB::OUTLINE_ATOMIC_LDSET, __aarch64_ldset)
   LCALLNAME4(RTLIB::OUTLINE_ATOMIC_LDCLR, __aarch64_ldclr)
   LCALLNAME4(RTLIB::OUTLINE_ATOMIC_LDEOR, __aarch64_ldeor)
-#undef LCALLNAMES
-#undef LCALLNAME4
-#undef LCALLNAME5
 
   if (TT.isWindowsArm64EC()) {
 // FIXME: are there calls we need to exclude from this?
@@ -54,7 +51,18 @@ static void setAArch64LibcallNames(RuntimeLibcallsInfo &Info,
 #include "llvm/IR/RuntimeLibcalls.def"
 #undef HANDLE_LIBCALL
 #undef LIBCALL_NO_NAME
+
+LCALLNAME5(RTLIB::OUTLINE_ATOMIC_CAS, #__aarch64_cas)
+LCALLNAME4(RTLIB::OUTLINE_ATOMIC_SWP, #__aarch64_swp)
+LCALLNAME4(RTLIB::OUTLINE_ATOMIC_LDADD, #__aarch64_ldadd)
+LCALLNAME4(RTLIB::OUTLINE_ATOMIC_LDSET, #__aarch64_ldset)
+LCALLNAME4(RTLIB::OUTLINE_ATOMIC_LDCLR, #__aarch64_ldclr)
+LCALLNAME4(RTLIB::OUTLINE_ATOMIC_LDEOR, #__aarch64_ldeor)
   }
+
+#undef LCALLNAMES
+#undef LCALLNAME4
+#undef LCALLNAME5
 }
 
 static void setARMLibcallNames(RuntimeLibcallsInfo &Info, const Triple &TT) {
diff --git a/llvm/test/CodeGen/AArch64/arm64ec-builtins.ll 
b/llvm/test/CodeGen/AArch64/arm64ec-builtins.ll
index 92b95a90d89a0..cc4ec9c2eebd6 100644
--- a/llvm/test/CodeGen/AArch64/arm64ec-builtins.ll
+++ b/llvm/test/CodeGen/AArch64/arm64ec-builtins.ll
@@ -28,10 +28,9 @@ define i128 @f4(i128 %x, i128 %y) {
   ret i128 %r
 }
 
-; FIXME: This is wrong; should be "#__aarch64_cas1_relax"
 define i8 @f5(i8 %expected, i8 %new, ptr %ptr) 
"target-features"="+outline-atomics" {
 ; CHECK-LABEL: "#f5":
-; CHECK: bl __aarch64_cas1_relax
+; CHECK: bl "#__aarch64_cas1_relax"
 %pair = cmpxchg ptr %ptr, i8 %expected, i8 %new monotonic monotonic, align 
1
%r = extractvalue { i8, i1 } %pair, 0
 ret i8 %r

``




https://github.com/llvm/llvm-project/pull/144378
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AArch64: Fix outline atomic libcall names for arm64ec (PR #144378)

2025-06-16 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/144378
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AArch64: Fix outline atomic libcall names for arm64ec (PR #144378)

2025-06-16 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/144378?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#144378** https://app.graphite.dev/github/pr/llvm/llvm-project/144378?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/144378?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#144374** https://app.graphite.dev/github/pr/llvm/llvm-project/144374?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/144378
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Support pre-aggregated returns (PR #143296)

2025-06-16 Thread Rafael Auler via llvm-branch-commits

https://github.com/rafaelauler approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/143296
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Support non-power-of-2 types when expanding memcmp (PR #114971)

2025-06-16 Thread Craig Topper via llvm-branch-commits


@@ -2954,20 +2954,12 @@ RISCVTTIImpl::enableMemCmpExpansion(bool OptSize, bool 
IsZeroCmp) const {
   }
 
   if (IsZeroCmp && ST->hasVInstructions()) {
-unsigned RealMinVLen = ST->getRealMinVLen();
-// Support Fractional LMULs if the lengths are larger than XLen.
-// TODO: Support non-power-of-2 types.
-for (unsigned FLMUL = 8; FLMUL >= 2; FLMUL /= 2) {
-  unsigned Len = RealMinVLen / FLMUL;
-  if (Len > ST->getXLen())
-Options.LoadSizes.insert(Options.LoadSizes.begin(), Len / 8);
-}
-for (unsigned LMUL = 1; LMUL <= ST->getMaxLMULForFixedLengthVectors();
- LMUL *= 2) {
-  unsigned Len = RealMinVLen * LMUL;
-  if (Len > ST->getXLen())
-Options.LoadSizes.insert(Options.LoadSizes.begin(), Len / 8);
-}
+unsigned VLenB = ST->getRealMinVLen() / 8;
+// The minimum size should be `XLen + 1`.

topperc wrote:

Comment doesn't match the code. Is it Xlen+1 or Xlen/8 + 1?

https://github.com/llvm/llvm-project/pull/114971
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Support non-power-of-2 types when expanding memcmp (PR #114971)

2025-06-16 Thread Philip Reames via llvm-branch-commits


@@ -16190,13 +16186,20 @@ combineVectorSizedSetCCEquality(EVT VT, SDValue X, 
SDValue Y, ISD::CondCode CC,
 return SDValue();
 
   unsigned VecSize = OpSize / 8;

preames wrote:

Where in the code above do we have a guarantee that OpSize is a multiple of 8?  

https://github.com/llvm/llvm-project/pull/114971
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] d644265 - Revert "[RISCV] Remove B and Zbc extension from Andes series cpus. (#144022)"

2025-06-16 Thread via llvm-branch-commits

Author: Aaron Ballman
Date: 2025-06-16T13:41:58-04:00
New Revision: d644265d459731aaf4ed38855167262d957a95eb

URL: 
https://github.com/llvm/llvm-project/commit/d644265d459731aaf4ed38855167262d957a95eb
DIFF: 
https://github.com/llvm/llvm-project/commit/d644265d459731aaf4ed38855167262d957a95eb.diff

LOG: Revert "[RISCV] Remove B and Zbc extension from Andes series cpus. 
(#144022)"

This reverts commit 24c8d900c47edeefb85643a06bc32235d9f42ea3.

Added: 


Modified: 
clang/test/Driver/print-enabled-extensions/riscv-andes-a25.c
clang/test/Driver/print-enabled-extensions/riscv-andes-a45.c
clang/test/Driver/print-enabled-extensions/riscv-andes-ax25.c
clang/test/Driver/print-enabled-extensions/riscv-andes-ax45.c
clang/test/Driver/print-enabled-extensions/riscv-andes-n45.c
clang/test/Driver/print-enabled-extensions/riscv-andes-nx45.c
llvm/lib/Target/RISCV/RISCVProcessors.td
llvm/test/tools/llvm-mca/RISCV/Andes45/gpr.s

Removed: 




diff  --git a/clang/test/Driver/print-enabled-extensions/riscv-andes-a25.c 
b/clang/test/Driver/print-enabled-extensions/riscv-andes-a25.c
index cfb4d0ed58d11..d8b3848d84520 100644
--- a/clang/test/Driver/print-enabled-extensions/riscv-andes-a25.c
+++ b/clang/test/Driver/print-enabled-extensions/riscv-andes-a25.c
@@ -10,6 +10,7 @@
 // CHECK-NEXT: f2.2   'F' (Single-Precision 
Floating-Point)
 // CHECK-NEXT: d2.2   'D' (Double-Precision 
Floating-Point)
 // CHECK-NEXT: c2.0   'C' (Compressed Instructions)
+// CHECK-NEXT: b1.0   'B' (the collection of the 
Zba, Zbb, Zbs extensions)
 // CHECK-NEXT: zicsr2.0   'Zicsr' (CSRs)
 // CHECK-NEXT: zifencei 2.0   'Zifencei' (fence.i)
 // CHECK-NEXT: zmmul1.0   'Zmmul' (Integer 
Multiplication)
@@ -18,8 +19,12 @@
 // CHECK-NEXT: zca  1.0   'Zca' (part of the C 
extension, excluding compressed floating point loads/stores)
 // CHECK-NEXT: zcd  1.0   'Zcd' (Compressed 
Double-Precision Floating-Point Instructions)
 // CHECK-NEXT: zcf  1.0   'Zcf' (Compressed 
Single-Precision Floating-Point Instructions)
+// CHECK-NEXT: zba  1.0   'Zba' (Address Generation 
Instructions)
+// CHECK-NEXT: zbb  1.0   'Zbb' (Basic 
Bit-Manipulation)
+// CHECK-NEXT: zbc  1.0   'Zbc' (Carry-Less 
Multiplication)
+// CHECK-NEXT: zbs  1.0   'Zbs' (Single-Bit 
Instructions)
 // CHECK-NEXT: xandesperf   5.0   'XAndesPerf' (Andes 
Performance Extension)
 // CHECK-EMPTY:
 // CHECK-NEXT: Experimental extensions
 // CHECK-EMPTY:
-// CHECK-NEXT: ISA String: 
rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zmmul1p0_zaamo1p0_zalrsc1p0_zca1p0_zcd1p0_zcf1p0_xandesperf5p0
+// CHECK-NEXT: ISA String: 
rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_b1p0_zicsr2p0_zifencei2p0_zmmul1p0_zaamo1p0_zalrsc1p0_zca1p0_zcd1p0_zcf1p0_zba1p0_zbb1p0_zbc1p0_zbs1p0_xandesperf5p0

diff  --git a/clang/test/Driver/print-enabled-extensions/riscv-andes-a45.c 
b/clang/test/Driver/print-enabled-extensions/riscv-andes-a45.c
index 3c3c554dffc57..a0a1c35911409 100644
--- a/clang/test/Driver/print-enabled-extensions/riscv-andes-a45.c
+++ b/clang/test/Driver/print-enabled-extensions/riscv-andes-a45.c
@@ -10,6 +10,7 @@
 // CHECK-NEXT: f2.2   'F' (Single-Precision 
Floating-Point)
 // CHECK-NEXT: d2.2   'D' (Double-Precision 
Floating-Point)
 // CHECK-NEXT: c2.0   'C' (Compressed Instructions)
+// CHECK-NEXT: b1.0   'B' (the collection of the 
Zba, Zbb, Zbs extensions)
 // CHECK-NEXT: zicsr2.0   'Zicsr' (CSRs)
 // CHECK-NEXT: zifencei 2.0   'Zifencei' (fence.i)
 // CHECK-NEXT: zmmul1.0   'Zmmul' (Integer 
Multiplication)
@@ -18,8 +19,11 @@
 // CHECK-NEXT: zca  1.0   'Zca' (part of the C 
extension, excluding compressed floating point loads/stores)
 // CHECK-NEXT: zcd  1.0   'Zcd' (Compressed 
Double-Precision Floating-Point Instructions)
 // CHECK-NEXT: zcf  1.0   'Zcf' (Compressed 
Single-Precision Floating-Point Instructions)
+// CHECK-NEXT: zba  1.0   'Zba' (Address Generation 
Instructions)
+// CHECK-NEXT: zbb  1.0   'Zbb' (Basic 
Bit-Manipulation)
+// CHECK-NEXT: zbs  1.0   'Zbs' (Single-Bit 
Instructions)
 // CHECK-NEXT: xandesperf   5.0   'XAndesPerf' (Andes 
Performance Extension)
 // CHECK-EMPTY:
 // CHECK-NEXT: Experimental extensions
 // CHECK-EMPTY:
-// CHECK-NEXT: ISA String: 
rv32i2p1_m2p0_a2p1_f2p2_d2p

[llvm-branch-commits] [llvm] release/20.x: [objcopy][MachO] Revert special handling of encryptable binaries (#144058) (PR #144449)

2025-06-16 Thread via llvm-branch-commits

llvmbot wrote:

@dianqk What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/19
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/20.x: [objcopy][MachO] Revert special handling of encryptable binaries (#144058) (PR #144449)

2025-06-16 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-binary-utilities

Author: None (llvmbot)


Changes

Backport a0662ceba83cf8782da4047b8ee6d175591f168f

Requested by: @dianqk

---
Full diff: https://github.com/llvm/llvm-project/pull/19.diff


5 Files Affected:

- (modified) llvm/lib/ObjCopy/MachO/MachOLayoutBuilder.cpp (-8) 
- (modified) llvm/lib/ObjCopy/MachO/MachOObject.cpp (-4) 
- (modified) llvm/lib/ObjCopy/MachO/MachOObject.h (-3) 
- (modified) llvm/lib/ObjCopy/MachO/MachOReader.cpp (-4) 
- (modified) llvm/test/tools/llvm-objcopy/MachO/strip-with-encryption-info.test 
(+106-50) 


``diff
diff --git a/llvm/lib/ObjCopy/MachO/MachOLayoutBuilder.cpp 
b/llvm/lib/ObjCopy/MachO/MachOLayoutBuilder.cpp
index 8ecd669e67178..93bc6631e64c8 100644
--- a/llvm/lib/ObjCopy/MachO/MachOLayoutBuilder.cpp
+++ b/llvm/lib/ObjCopy/MachO/MachOLayoutBuilder.cpp
@@ -116,10 +116,6 @@ uint64_t MachOLayoutBuilder::layoutSegments() {
   const bool IsObjectFile =
   O.Header.FileType == MachO::HeaderFileType::MH_OBJECT;
   uint64_t Offset = IsObjectFile ? (HeaderSize + O.Header.SizeOfCmds) : 0;
-  // If we are emitting an encryptable binary, our load commands must have a
-  // separate (non-encrypted) page to themselves.
-  bool RequiresFirstSectionOutsideFirstPage =
-  O.EncryptionInfoCommandIndex.has_value();
   for (LoadCommand &LC : O.LoadCommands) {
 auto &MLC = LC.MachOLoadCommand;
 StringRef Segname;
@@ -173,10 +169,6 @@ uint64_t MachOLayoutBuilder::layoutSegments() {
 if (!Sec->hasValidOffset()) {
   Sec->Offset = 0;
 } else {
-  if (RequiresFirstSectionOutsideFirstPage) {
-SectOffset = alignToPowerOf2(SectOffset, PageSize);
-RequiresFirstSectionOutsideFirstPage = false;
-  }
   Sec->Offset = SegOffset + SectOffset;
   Sec->Size = Sec->Content.size();
   SegFileSize = std::max(SegFileSize, SectOffset + Sec->Size);
diff --git a/llvm/lib/ObjCopy/MachO/MachOObject.cpp 
b/llvm/lib/ObjCopy/MachO/MachOObject.cpp
index e0819d89d24ff..8d2c02dc37c99 100644
--- a/llvm/lib/ObjCopy/MachO/MachOObject.cpp
+++ b/llvm/lib/ObjCopy/MachO/MachOObject.cpp
@@ -98,10 +98,6 @@ void Object::updateLoadCommandIndexes() {
 case MachO::LC_DYLD_EXPORTS_TRIE:
   ExportsTrieCommandIndex = Index;
   break;
-case MachO::LC_ENCRYPTION_INFO:
-case MachO::LC_ENCRYPTION_INFO_64:
-  EncryptionInfoCommandIndex = Index;
-  break;
 }
   }
 }
diff --git a/llvm/lib/ObjCopy/MachO/MachOObject.h 
b/llvm/lib/ObjCopy/MachO/MachOObject.h
index 79eb0133c2802..a454c4f502fd6 100644
--- a/llvm/lib/ObjCopy/MachO/MachOObject.h
+++ b/llvm/lib/ObjCopy/MachO/MachOObject.h
@@ -341,9 +341,6 @@ struct Object {
   /// The index of the LC_SEGMENT or LC_SEGMENT_64 load command
   /// corresponding to the __TEXT segment.
   std::optional TextSegmentCommandIndex;
-  /// The index of the LC_ENCRYPTION_INFO or LC_ENCRYPTION_INFO_64 load command
-  /// if present.
-  std::optional EncryptionInfoCommandIndex;
 
   BumpPtrAllocator Alloc;
   StringSaver NewSectionsContents;
diff --git a/llvm/lib/ObjCopy/MachO/MachOReader.cpp 
b/llvm/lib/ObjCopy/MachO/MachOReader.cpp
index ef0e0262f9395..2b344f36d8e78 100644
--- a/llvm/lib/ObjCopy/MachO/MachOReader.cpp
+++ b/llvm/lib/ObjCopy/MachO/MachOReader.cpp
@@ -184,10 +184,6 @@ Error MachOReader::readLoadCommands(Object &O) const {
 case MachO::LC_DYLD_CHAINED_FIXUPS:
   O.ChainedFixupsCommandIndex = O.LoadCommands.size();
   break;
-case MachO::LC_ENCRYPTION_INFO:
-case MachO::LC_ENCRYPTION_INFO_64:
-  O.EncryptionInfoCommandIndex = O.LoadCommands.size();
-  break;
 }
 #define HANDLE_LOAD_COMMAND(LCName, LCValue, LCStruct) 
\
   case MachO::LCName:  
\
diff --git a/llvm/test/tools/llvm-objcopy/MachO/strip-with-encryption-info.test 
b/llvm/test/tools/llvm-objcopy/MachO/strip-with-encryption-info.test
index 2b2bd670613de..d6f6fe10d88c2 100644
--- a/llvm/test/tools/llvm-objcopy/MachO/strip-with-encryption-info.test
+++ b/llvm/test/tools/llvm-objcopy/MachO/strip-with-encryption-info.test
@@ -16,7 +16,11 @@
 # CHECK:   fileoff: 0
 
 # The YAML below is the following code
+# ```
+# static int foo = 12345;
+# int bar = 4567;
 # int main(int argc, char **argv) { return 0; }
+# ```
 # Compiled on macOS against the macOS SDK and passing `-Wl,-encryptable`
 # Contents are removed, since they are not important for the test. We need a
 # small text segment (smaller than a page).
@@ -26,8 +30,8 @@ FileHeader:
   cputype: 0x10C
   cpusubtype:  0x0
   filetype:0x2
-  ncmds:   15
-  sizeofcmds:  696
+  ncmds:   18
+  sizeofcmds:  920
   flags:   0x200085
   reserved:0x0
 LoadCommands:
@@ -69,7 +73,7 @@ LoadCommands:
   - sectname:__unwind_info
 segname: __TEXT
 addr:0x14020
-size:41

[llvm-branch-commits] [llvm] release/20.x: [objcopy][MachO] Revert special handling of encryptable binaries (#144058) (PR #144449)

2025-06-16 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/19

Backport a0662ceba83cf8782da4047b8ee6d175591f168f

Requested by: @dianqk

>From 5016e20c7002d5f93e0843129f4a4679ce4c088b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Daniel=20Rodr=C3=ADguez=20Troiti=C3=B1o?=
 
Date: Mon, 16 Jun 2025 12:06:25 -0700
Subject: [PATCH] [objcopy][MachO] Revert special handling of encryptable
 binaries (#144058)

Code originally added in #120995 and later corrected in #130517 but
apparently still not correct according to #141494 and
rust-lang/rust#141913.

Revert the special handling because the test written in #120995 and
#130517 still passes without those changes. Kept the test and improved
it with a `__DATA` section to keep the current behaviour checked in case
other changes modify the behaviour and break this edge case.

(cherry picked from commit a0662ceba83cf8782da4047b8ee6d175591f168f)
---
 llvm/lib/ObjCopy/MachO/MachOLayoutBuilder.cpp |   8 -
 llvm/lib/ObjCopy/MachO/MachOObject.cpp|   4 -
 llvm/lib/ObjCopy/MachO/MachOObject.h  |   3 -
 llvm/lib/ObjCopy/MachO/MachOReader.cpp|   4 -
 .../MachO/strip-with-encryption-info.test | 156 --
 5 files changed, 106 insertions(+), 69 deletions(-)

diff --git a/llvm/lib/ObjCopy/MachO/MachOLayoutBuilder.cpp 
b/llvm/lib/ObjCopy/MachO/MachOLayoutBuilder.cpp
index 8ecd669e67178..93bc6631e64c8 100644
--- a/llvm/lib/ObjCopy/MachO/MachOLayoutBuilder.cpp
+++ b/llvm/lib/ObjCopy/MachO/MachOLayoutBuilder.cpp
@@ -116,10 +116,6 @@ uint64_t MachOLayoutBuilder::layoutSegments() {
   const bool IsObjectFile =
   O.Header.FileType == MachO::HeaderFileType::MH_OBJECT;
   uint64_t Offset = IsObjectFile ? (HeaderSize + O.Header.SizeOfCmds) : 0;
-  // If we are emitting an encryptable binary, our load commands must have a
-  // separate (non-encrypted) page to themselves.
-  bool RequiresFirstSectionOutsideFirstPage =
-  O.EncryptionInfoCommandIndex.has_value();
   for (LoadCommand &LC : O.LoadCommands) {
 auto &MLC = LC.MachOLoadCommand;
 StringRef Segname;
@@ -173,10 +169,6 @@ uint64_t MachOLayoutBuilder::layoutSegments() {
 if (!Sec->hasValidOffset()) {
   Sec->Offset = 0;
 } else {
-  if (RequiresFirstSectionOutsideFirstPage) {
-SectOffset = alignToPowerOf2(SectOffset, PageSize);
-RequiresFirstSectionOutsideFirstPage = false;
-  }
   Sec->Offset = SegOffset + SectOffset;
   Sec->Size = Sec->Content.size();
   SegFileSize = std::max(SegFileSize, SectOffset + Sec->Size);
diff --git a/llvm/lib/ObjCopy/MachO/MachOObject.cpp 
b/llvm/lib/ObjCopy/MachO/MachOObject.cpp
index e0819d89d24ff..8d2c02dc37c99 100644
--- a/llvm/lib/ObjCopy/MachO/MachOObject.cpp
+++ b/llvm/lib/ObjCopy/MachO/MachOObject.cpp
@@ -98,10 +98,6 @@ void Object::updateLoadCommandIndexes() {
 case MachO::LC_DYLD_EXPORTS_TRIE:
   ExportsTrieCommandIndex = Index;
   break;
-case MachO::LC_ENCRYPTION_INFO:
-case MachO::LC_ENCRYPTION_INFO_64:
-  EncryptionInfoCommandIndex = Index;
-  break;
 }
   }
 }
diff --git a/llvm/lib/ObjCopy/MachO/MachOObject.h 
b/llvm/lib/ObjCopy/MachO/MachOObject.h
index 79eb0133c2802..a454c4f502fd6 100644
--- a/llvm/lib/ObjCopy/MachO/MachOObject.h
+++ b/llvm/lib/ObjCopy/MachO/MachOObject.h
@@ -341,9 +341,6 @@ struct Object {
   /// The index of the LC_SEGMENT or LC_SEGMENT_64 load command
   /// corresponding to the __TEXT segment.
   std::optional TextSegmentCommandIndex;
-  /// The index of the LC_ENCRYPTION_INFO or LC_ENCRYPTION_INFO_64 load command
-  /// if present.
-  std::optional EncryptionInfoCommandIndex;
 
   BumpPtrAllocator Alloc;
   StringSaver NewSectionsContents;
diff --git a/llvm/lib/ObjCopy/MachO/MachOReader.cpp 
b/llvm/lib/ObjCopy/MachO/MachOReader.cpp
index ef0e0262f9395..2b344f36d8e78 100644
--- a/llvm/lib/ObjCopy/MachO/MachOReader.cpp
+++ b/llvm/lib/ObjCopy/MachO/MachOReader.cpp
@@ -184,10 +184,6 @@ Error MachOReader::readLoadCommands(Object &O) const {
 case MachO::LC_DYLD_CHAINED_FIXUPS:
   O.ChainedFixupsCommandIndex = O.LoadCommands.size();
   break;
-case MachO::LC_ENCRYPTION_INFO:
-case MachO::LC_ENCRYPTION_INFO_64:
-  O.EncryptionInfoCommandIndex = O.LoadCommands.size();
-  break;
 }
 #define HANDLE_LOAD_COMMAND(LCName, LCValue, LCStruct) 
\
   case MachO::LCName:  
\
diff --git a/llvm/test/tools/llvm-objcopy/MachO/strip-with-encryption-info.test 
b/llvm/test/tools/llvm-objcopy/MachO/strip-with-encryption-info.test
index 2b2bd670613de..d6f6fe10d88c2 100644
--- a/llvm/test/tools/llvm-objcopy/MachO/strip-with-encryption-info.test
+++ b/llvm/test/tools/llvm-objcopy/MachO/strip-with-encryption-info.test
@@ -16,7 +16,11 @@
 # CHECK:   fileoff: 0
 
 # The YAML below is the following code
+# ```
+# static int foo = 12345;
+# int bar = 4567;
 # int main(int argc, char **argv) { r

[llvm-branch-commits] [llvm] Backport 90a52f494296 to release/20.x (PR #144299)

2025-06-16 Thread via llvm-branch-commits

leecheechen wrote:

I am the first time to cherry-pick two commits which depend on each other. I 
have been told that manually cherry-pick may lead two commits to merge with 
squash and how to use cherry-pick command for more than one commit. So I close 
this PR. The newly PR is #144459. Sorry for making you confused.

https://github.com/llvm/llvm-project/pull/144299
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Backport 90a52f494296 to release/20.x (PR #144299)

2025-06-16 Thread via llvm-branch-commits

https://github.com/leecheechen closed 
https://github.com/llvm/llvm-project/pull/144299
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/20.x: [objcopy][MachO] Revert special handling of encryptable binaries (#144058) (PR #144449)

2025-06-16 Thread Daniel Rodríguez Troitiño via llvm-branch-commits

https://github.com/drodriguez approved this pull request.


https://github.com/llvm/llvm-project/pull/19
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/20.x: [objcopy][MachO] Revert special handling of encryptable binaries (#144058) (PR #144449)

2025-06-16 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/19
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/20.x: [LoongArch] Pass OptLevel to LoongArchDAGToDAGISel correctly (PR #144459)

2025-06-16 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/144459
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/20.x: [LoongArch] Pass OptLevel to LoongArchDAGToDAGISel correctly (PR #144459)

2025-06-16 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/144459

Backport fcc82cf 90a52f4

Requested by: @leecheechen

>From 1443f5093fda88e3fe91159e3029fd3c9f15f178 Mon Sep 17 00:00:00 2001
From: Weining Lu 
Date: Sat, 7 Jun 2025 15:10:24 +0800
Subject: [PATCH 1/2] [LoongArch] Precommit test case to show bug in
 LoongArchISelDagToDag

The optimization level should not be restored into O2.

(cherry picked from commit fcc82cfa9394b2bd4380acdcf0e2854caee5a47a)
---
 llvm/test/CodeGen/LoongArch/isel-optnone.ll | 13 +
 1 file changed, 13 insertions(+)
 create mode 100644 llvm/test/CodeGen/LoongArch/isel-optnone.ll

diff --git a/llvm/test/CodeGen/LoongArch/isel-optnone.ll 
b/llvm/test/CodeGen/LoongArch/isel-optnone.ll
new file mode 100644
index 0..d44f1405d0c18
--- /dev/null
+++ b/llvm/test/CodeGen/LoongArch/isel-optnone.ll
@@ -0,0 +1,13 @@
+; REQUIRES: asserts
+; RUN: llc %s -O0 -mtriple=loongarch64 -o /dev/null -debug-only=isel 2>&1 | 
FileCheck %s
+
+define void @fooOptnone() #0 {
+; CHECK: Changing optimization level for Function fooOptnone
+; CHECK: Before: -O2 ; After: -O0
+
+; CHECK: Restoring optimization level for Function fooOptnone
+; CHECK: Before: -O0 ; After: -O2
+  ret void
+}
+
+attributes #0 = { nounwind optnone noinline }

>From 75b652d4766c5444b8bdb35f63f61d0a38543fd3 Mon Sep 17 00:00:00 2001
From: Weining Lu 
Date: Sat, 7 Jun 2025 11:45:39 +0800
Subject: [PATCH 2/2] [LoongArch] Pass OptLevel to LoongArchDAGToDAGISel
 correctly

Like many other targets did. And see RISCV for similar fix.

Fix https://github.com/llvm/llvm-project/issues/143239

(cherry picked from commit 90a52f4942961a5c32afc69d69470c6b7e5bcb8a)
---
 llvm/lib/Target/LoongArch/LoongArch.h|  3 ++-
 llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp  | 10 ++
 llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h|  8 +---
 llvm/lib/Target/LoongArch/LoongArchTargetMachine.cpp |  2 +-
 llvm/test/CodeGen/LoongArch/O0-pipeline.ll   |  8 
 llvm/test/CodeGen/LoongArch/isel-optnone.ll  |  7 ++-
 llvm/test/CodeGen/LoongArch/spill-ra-without-kill.ll |  1 +
 7 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/llvm/lib/Target/LoongArch/LoongArch.h 
b/llvm/lib/Target/LoongArch/LoongArch.h
index db60523738880..6635c57ff0476 100644
--- a/llvm/lib/Target/LoongArch/LoongArch.h
+++ b/llvm/lib/Target/LoongArch/LoongArch.h
@@ -35,7 +35,8 @@ bool lowerLoongArchMachineOperandToMCOperand(const 
MachineOperand &MO,
 
 FunctionPass *createLoongArchDeadRegisterDefinitionsPass();
 FunctionPass *createLoongArchExpandAtomicPseudoPass();
-FunctionPass *createLoongArchISelDag(LoongArchTargetMachine &TM);
+FunctionPass *createLoongArchISelDag(LoongArchTargetMachine &TM,
+ CodeGenOptLevel OptLevel);
 FunctionPass *createLoongArchMergeBaseOffsetOptPass();
 FunctionPass *createLoongArchOptWInstrsPass();
 FunctionPass *createLoongArchPreRAExpandPseudoPass();
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp 
b/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp
index cb0fb9bc9c7f9..7169cdc9a2bf9 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp
@@ -25,8 +25,9 @@ using namespace llvm;
 char LoongArchDAGToDAGISelLegacy::ID;
 
 LoongArchDAGToDAGISelLegacy::LoongArchDAGToDAGISelLegacy(
-LoongArchTargetMachine &TM)
-: SelectionDAGISelLegacy(ID, std::make_unique(TM)) 
{}
+LoongArchTargetMachine &TM, CodeGenOptLevel OptLevel)
+: SelectionDAGISelLegacy(
+  ID, std::make_unique(TM, OptLevel)) {}
 
 INITIALIZE_PASS(LoongArchDAGToDAGISelLegacy, DEBUG_TYPE, PASS_NAME, false,
 false)
@@ -456,6 +457,7 @@ bool LoongArchDAGToDAGISel::selectVSplatUimmPow2(SDValue N,
 
 // This pass converts a legalized DAG into a LoongArch-specific DAG, ready
 // for instruction scheduling.
-FunctionPass *llvm::createLoongArchISelDag(LoongArchTargetMachine &TM) {
-  return new LoongArchDAGToDAGISelLegacy(TM);
+FunctionPass *llvm::createLoongArchISelDag(LoongArchTargetMachine &TM,
+   CodeGenOptLevel OptLevel) {
+  return new LoongArchDAGToDAGISelLegacy(TM, OptLevel);
 }
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h 
b/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h
index 8a7eba418d804..2e6bc9951e9e7 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h
+++ b/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h
@@ -26,8 +26,9 @@ class LoongArchDAGToDAGISel : public SelectionDAGISel {
 public:
   LoongArchDAGToDAGISel() = delete;
 
-  explicit LoongArchDAGToDAGISel(LoongArchTargetMachine &TM)
-  : SelectionDAGISel(TM) {}
+  explicit LoongArchDAGToDAGISel(LoongArchTargetMachine &TM,
+ CodeGenOptLevel OptLevel)
+  : SelectionDAGISel(TM, OptLevel) {}
 
   bool runOnMachineFunction(MachineFunction &MF) override {
 Subtarget

[llvm-branch-commits] [llvm] release/20.x: [LoongArch] Pass OptLevel to LoongArchDAGToDAGISel correctly (PR #144459)

2025-06-16 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-loongarch

Author: None (llvmbot)


Changes

Backport fcc82cf 90a52f4

Requested by: @leecheechen

---
Full diff: https://github.com/llvm/llvm-project/pull/144459.diff


7 Files Affected:

- (modified) llvm/lib/Target/LoongArch/LoongArch.h (+2-1) 
- (modified) llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp (+6-4) 
- (modified) llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h (+5-3) 
- (modified) llvm/lib/Target/LoongArch/LoongArchTargetMachine.cpp (+1-1) 
- (modified) llvm/test/CodeGen/LoongArch/O0-pipeline.ll (-8) 
- (added) llvm/test/CodeGen/LoongArch/isel-optnone.ll (+10) 
- (modified) llvm/test/CodeGen/LoongArch/spill-ra-without-kill.ll (+1) 


``diff
diff --git a/llvm/lib/Target/LoongArch/LoongArch.h 
b/llvm/lib/Target/LoongArch/LoongArch.h
index db60523738880..6635c57ff0476 100644
--- a/llvm/lib/Target/LoongArch/LoongArch.h
+++ b/llvm/lib/Target/LoongArch/LoongArch.h
@@ -35,7 +35,8 @@ bool lowerLoongArchMachineOperandToMCOperand(const 
MachineOperand &MO,
 
 FunctionPass *createLoongArchDeadRegisterDefinitionsPass();
 FunctionPass *createLoongArchExpandAtomicPseudoPass();
-FunctionPass *createLoongArchISelDag(LoongArchTargetMachine &TM);
+FunctionPass *createLoongArchISelDag(LoongArchTargetMachine &TM,
+ CodeGenOptLevel OptLevel);
 FunctionPass *createLoongArchMergeBaseOffsetOptPass();
 FunctionPass *createLoongArchOptWInstrsPass();
 FunctionPass *createLoongArchPreRAExpandPseudoPass();
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp 
b/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp
index cb0fb9bc9c7f9..7169cdc9a2bf9 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp
@@ -25,8 +25,9 @@ using namespace llvm;
 char LoongArchDAGToDAGISelLegacy::ID;
 
 LoongArchDAGToDAGISelLegacy::LoongArchDAGToDAGISelLegacy(
-LoongArchTargetMachine &TM)
-: SelectionDAGISelLegacy(ID, std::make_unique(TM)) 
{}
+LoongArchTargetMachine &TM, CodeGenOptLevel OptLevel)
+: SelectionDAGISelLegacy(
+  ID, std::make_unique(TM, OptLevel)) {}
 
 INITIALIZE_PASS(LoongArchDAGToDAGISelLegacy, DEBUG_TYPE, PASS_NAME, false,
 false)
@@ -456,6 +457,7 @@ bool LoongArchDAGToDAGISel::selectVSplatUimmPow2(SDValue N,
 
 // This pass converts a legalized DAG into a LoongArch-specific DAG, ready
 // for instruction scheduling.
-FunctionPass *llvm::createLoongArchISelDag(LoongArchTargetMachine &TM) {
-  return new LoongArchDAGToDAGISelLegacy(TM);
+FunctionPass *llvm::createLoongArchISelDag(LoongArchTargetMachine &TM,
+   CodeGenOptLevel OptLevel) {
+  return new LoongArchDAGToDAGISelLegacy(TM, OptLevel);
 }
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h 
b/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h
index 8a7eba418d804..2e6bc9951e9e7 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h
+++ b/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h
@@ -26,8 +26,9 @@ class LoongArchDAGToDAGISel : public SelectionDAGISel {
 public:
   LoongArchDAGToDAGISel() = delete;
 
-  explicit LoongArchDAGToDAGISel(LoongArchTargetMachine &TM)
-  : SelectionDAGISel(TM) {}
+  explicit LoongArchDAGToDAGISel(LoongArchTargetMachine &TM,
+ CodeGenOptLevel OptLevel)
+  : SelectionDAGISel(TM, OptLevel) {}
 
   bool runOnMachineFunction(MachineFunction &MF) override {
 Subtarget = &MF.getSubtarget();
@@ -71,7 +72,8 @@ class LoongArchDAGToDAGISel : public SelectionDAGISel {
 class LoongArchDAGToDAGISelLegacy : public SelectionDAGISelLegacy {
 public:
   static char ID;
-  explicit LoongArchDAGToDAGISelLegacy(LoongArchTargetMachine &TM);
+  explicit LoongArchDAGToDAGISelLegacy(LoongArchTargetMachine &TM,
+   CodeGenOptLevel OptLevel);
 };
 
 } // end namespace llvm
diff --git a/llvm/lib/Target/LoongArch/LoongArchTargetMachine.cpp 
b/llvm/lib/Target/LoongArch/LoongArchTargetMachine.cpp
index 62b08be5435cd..27f97b2cebb0c 100644
--- a/llvm/lib/Target/LoongArch/LoongArchTargetMachine.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchTargetMachine.cpp
@@ -188,7 +188,7 @@ void LoongArchPassConfig::addCodeGenPrepare() {
 }
 
 bool LoongArchPassConfig::addInstSelector() {
-  addPass(createLoongArchISelDag(getLoongArchTargetMachine()));
+  addPass(createLoongArchISelDag(getLoongArchTargetMachine(), getOptLevel()));
 
   return false;
 }
diff --git a/llvm/test/CodeGen/LoongArch/O0-pipeline.ll 
b/llvm/test/CodeGen/LoongArch/O0-pipeline.ll
index 24bd4c75a9821..73d0bda895de0 100644
--- a/llvm/test/CodeGen/LoongArch/O0-pipeline.ll
+++ b/llvm/test/CodeGen/LoongArch/O0-pipeline.ll
@@ -34,15 +34,7 @@
 ; CHECK-NEXT:   Safe Stack instrumentation pass
 ; CHECK-NEXT:   Insert stack protectors
 ; CHECK-NEXT:   Module Verifier
-; CHECK-NEXT:   Dominator Tree Construction
-; CHECK-NEXT:   Basic Alias Analysis (s

[llvm-branch-commits] [flang] [flang][OpenMP][NFC] Refactor to avoid global variable (PR #144087)

2025-06-16 Thread Kareem Ergawy via llvm-branch-commits

https://github.com/ergawy closed 
https://github.com/llvm/llvm-project/pull/144087
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang][OpenMP][NFC] Refactor to avoid global variable (PR #144087)

2025-06-16 Thread Kareem Ergawy via llvm-branch-commits

ergawy wrote:

Reopening since I closed it by mistake somehow.

https://github.com/llvm/llvm-project/pull/144087
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Support non-power-of-2 types when expanding memcmp (PR #114971)

2025-06-16 Thread Pengcheng Wang via llvm-branch-commits

https://github.com/wangpc-pp updated 
https://github.com/llvm/llvm-project/pull/114971

>From 3fd27bd1405a8b2c068786a200d610b9cacb65ef Mon Sep 17 00:00:00 2001
From: Wang Pengcheng 
Date: Tue, 5 Nov 2024 20:38:44 +0800
Subject: [PATCH 1/3] Set max bytes

Created using spr 1.3.6-beta.1
---
 llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp 
b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index c65feb9755633..a1c5f76bae009 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -2508,7 +2508,10 @@ RISCVTTIImpl::enableMemCmpExpansion(bool OptSize, bool 
IsZeroCmp) const {
 Options.LoadSizes = {4, 2, 1};
   if (IsZeroCmp && ST->hasVInstructions()) {
 unsigned VLenB = ST->getRealMinVLen() / 8;
-for (unsigned Size = ST->getXLen() / 8 + 1;
+// The minimum size should be the maximum bytes between `VLen * LMUL_MF8`
+// and `XLen + 8`.
+unsigned MinSize = std::max(VLenB / 8, ST->getXLen() / 8 + 1);
+for (unsigned Size = MinSize;
  Size <= VLenB * ST->getMaxLMULForFixedLengthVectors(); Size++)
   Options.LoadSizes.insert(Options.LoadSizes.begin(), Size);
   }

>From 17115212f1d7af68f5374896d1ddadf464b2bc11 Mon Sep 17 00:00:00 2001
From: Wang Pengcheng 
Date: Fri, 13 Jun 2025 18:24:15 +0800
Subject: [PATCH 2/3] Change to XLen + 1

Created using spr 1.3.6-beta.1
---
 .../Target/RISCV/RISCVTargetTransformInfo.cpp |   4 +-
 llvm/test/CodeGen/RISCV/memcmp-optsize.ll | 324 +++---
 llvm/test/CodeGen/RISCV/memcmp.ll | 324 +++---
 3 files changed, 570 insertions(+), 82 deletions(-)

diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp 
b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 4b9ea30a92c99..3aa0fcbb723a1 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -2956,8 +2956,8 @@ RISCVTTIImpl::enableMemCmpExpansion(bool OptSize, bool 
IsZeroCmp) const {
   if (IsZeroCmp && ST->hasVInstructions()) {
 unsigned VLenB = ST->getRealMinVLen() / 8;
 // The minimum size should be the maximum bytes between `VLen * LMUL_MF8`
-// and `XLen * 2`.
-unsigned MinSize = std::max(VLenB / 8, ST->getXLen() * 2 / 8);
+// and `XLen + 1`.
+unsigned MinSize = std::max(VLenB / 8, ST->getXLen() / 8 + 1);
 for (unsigned Size = MinSize;
  Size <= VLenB * ST->getMaxLMULForFixedLengthVectors(); Size++)
   Options.LoadSizes.insert(Options.LoadSizes.begin(), Size);
diff --git a/llvm/test/CodeGen/RISCV/memcmp-optsize.ll 
b/llvm/test/CodeGen/RISCV/memcmp-optsize.ll
index d4d12a932d0ec..0d57e4201512e 100644
--- a/llvm/test/CodeGen/RISCV/memcmp-optsize.ll
+++ b/llvm/test/CodeGen/RISCV/memcmp-optsize.ll
@@ -517,17 +517,99 @@ define i32 @bcmp_size_5(ptr %s1, ptr %s2) nounwind 
optsize {
 ; CHECK-ALIGNED-RV64-V-NEXT:addi sp, sp, 16
 ; CHECK-ALIGNED-RV64-V-NEXT:ret
 ;
-; CHECK-UNALIGNED-LABEL: bcmp_size_5:
-; CHECK-UNALIGNED:   # %bb.0: # %entry
-; CHECK-UNALIGNED-NEXT:lw a2, 0(a0)
-; CHECK-UNALIGNED-NEXT:lbu a0, 4(a0)
-; CHECK-UNALIGNED-NEXT:lw a3, 0(a1)
-; CHECK-UNALIGNED-NEXT:lbu a1, 4(a1)
-; CHECK-UNALIGNED-NEXT:xor a2, a2, a3
-; CHECK-UNALIGNED-NEXT:xor a0, a0, a1
-; CHECK-UNALIGNED-NEXT:or a0, a2, a0
-; CHECK-UNALIGNED-NEXT:snez a0, a0
-; CHECK-UNALIGNED-NEXT:ret
+; CHECK-UNALIGNED-RV32-LABEL: bcmp_size_5:
+; CHECK-UNALIGNED-RV32:   # %bb.0: # %entry
+; CHECK-UNALIGNED-RV32-NEXT:lw a2, 0(a0)
+; CHECK-UNALIGNED-RV32-NEXT:lbu a0, 4(a0)
+; CHECK-UNALIGNED-RV32-NEXT:lw a3, 0(a1)
+; CHECK-UNALIGNED-RV32-NEXT:lbu a1, 4(a1)
+; CHECK-UNALIGNED-RV32-NEXT:xor a2, a2, a3
+; CHECK-UNALIGNED-RV32-NEXT:xor a0, a0, a1
+; CHECK-UNALIGNED-RV32-NEXT:or a0, a2, a0
+; CHECK-UNALIGNED-RV32-NEXT:snez a0, a0
+; CHECK-UNALIGNED-RV32-NEXT:ret
+;
+; CHECK-UNALIGNED-RV64-LABEL: bcmp_size_5:
+; CHECK-UNALIGNED-RV64:   # %bb.0: # %entry
+; CHECK-UNALIGNED-RV64-NEXT:lw a2, 0(a0)
+; CHECK-UNALIGNED-RV64-NEXT:lbu a0, 4(a0)
+; CHECK-UNALIGNED-RV64-NEXT:lw a3, 0(a1)
+; CHECK-UNALIGNED-RV64-NEXT:lbu a1, 4(a1)
+; CHECK-UNALIGNED-RV64-NEXT:xor a2, a2, a3
+; CHECK-UNALIGNED-RV64-NEXT:xor a0, a0, a1
+; CHECK-UNALIGNED-RV64-NEXT:or a0, a2, a0
+; CHECK-UNALIGNED-RV64-NEXT:snez a0, a0
+; CHECK-UNALIGNED-RV64-NEXT:ret
+;
+; CHECK-UNALIGNED-RV32-ZBB-LABEL: bcmp_size_5:
+; CHECK-UNALIGNED-RV32-ZBB:   # %bb.0: # %entry
+; CHECK-UNALIGNED-RV32-ZBB-NEXT:lw a2, 0(a0)
+; CHECK-UNALIGNED-RV32-ZBB-NEXT:lbu a0, 4(a0)
+; CHECK-UNALIGNED-RV32-ZBB-NEXT:lw a3, 0(a1)
+; CHECK-UNALIGNED-RV32-ZBB-NEXT:lbu a1, 4(a1)
+; CHECK-UNALIGNED-RV32-ZBB-NEXT:xor a2, a2, a3
+; CHECK-UNALIGNED-RV32-ZBB-NEXT:xor a0, a0, a1
+; CHECK-UNALIGNED-RV32-ZBB-NEXT:or a0, a2, a0

[llvm-branch-commits] [llvm] [RISCV] Support non-power-of-2 types when expanding memcmp (PR #114971)

2025-06-16 Thread Pengcheng Wang via llvm-branch-commits

https://github.com/wangpc-pp updated 
https://github.com/llvm/llvm-project/pull/114971

>From 3fd27bd1405a8b2c068786a200d610b9cacb65ef Mon Sep 17 00:00:00 2001
From: Wang Pengcheng 
Date: Tue, 5 Nov 2024 20:38:44 +0800
Subject: [PATCH 1/3] Set max bytes

Created using spr 1.3.6-beta.1
---
 llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp 
b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index c65feb9755633..a1c5f76bae009 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -2508,7 +2508,10 @@ RISCVTTIImpl::enableMemCmpExpansion(bool OptSize, bool 
IsZeroCmp) const {
 Options.LoadSizes = {4, 2, 1};
   if (IsZeroCmp && ST->hasVInstructions()) {
 unsigned VLenB = ST->getRealMinVLen() / 8;
-for (unsigned Size = ST->getXLen() / 8 + 1;
+// The minimum size should be the maximum bytes between `VLen * LMUL_MF8`
+// and `XLen + 8`.
+unsigned MinSize = std::max(VLenB / 8, ST->getXLen() / 8 + 1);
+for (unsigned Size = MinSize;
  Size <= VLenB * ST->getMaxLMULForFixedLengthVectors(); Size++)
   Options.LoadSizes.insert(Options.LoadSizes.begin(), Size);
   }

>From 17115212f1d7af68f5374896d1ddadf464b2bc11 Mon Sep 17 00:00:00 2001
From: Wang Pengcheng 
Date: Fri, 13 Jun 2025 18:24:15 +0800
Subject: [PATCH 2/3] Change to XLen + 1

Created using spr 1.3.6-beta.1
---
 .../Target/RISCV/RISCVTargetTransformInfo.cpp |   4 +-
 llvm/test/CodeGen/RISCV/memcmp-optsize.ll | 324 +++---
 llvm/test/CodeGen/RISCV/memcmp.ll | 324 +++---
 3 files changed, 570 insertions(+), 82 deletions(-)

diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp 
b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 4b9ea30a92c99..3aa0fcbb723a1 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -2956,8 +2956,8 @@ RISCVTTIImpl::enableMemCmpExpansion(bool OptSize, bool 
IsZeroCmp) const {
   if (IsZeroCmp && ST->hasVInstructions()) {
 unsigned VLenB = ST->getRealMinVLen() / 8;
 // The minimum size should be the maximum bytes between `VLen * LMUL_MF8`
-// and `XLen * 2`.
-unsigned MinSize = std::max(VLenB / 8, ST->getXLen() * 2 / 8);
+// and `XLen + 1`.
+unsigned MinSize = std::max(VLenB / 8, ST->getXLen() / 8 + 1);
 for (unsigned Size = MinSize;
  Size <= VLenB * ST->getMaxLMULForFixedLengthVectors(); Size++)
   Options.LoadSizes.insert(Options.LoadSizes.begin(), Size);
diff --git a/llvm/test/CodeGen/RISCV/memcmp-optsize.ll 
b/llvm/test/CodeGen/RISCV/memcmp-optsize.ll
index d4d12a932d0ec..0d57e4201512e 100644
--- a/llvm/test/CodeGen/RISCV/memcmp-optsize.ll
+++ b/llvm/test/CodeGen/RISCV/memcmp-optsize.ll
@@ -517,17 +517,99 @@ define i32 @bcmp_size_5(ptr %s1, ptr %s2) nounwind 
optsize {
 ; CHECK-ALIGNED-RV64-V-NEXT:addi sp, sp, 16
 ; CHECK-ALIGNED-RV64-V-NEXT:ret
 ;
-; CHECK-UNALIGNED-LABEL: bcmp_size_5:
-; CHECK-UNALIGNED:   # %bb.0: # %entry
-; CHECK-UNALIGNED-NEXT:lw a2, 0(a0)
-; CHECK-UNALIGNED-NEXT:lbu a0, 4(a0)
-; CHECK-UNALIGNED-NEXT:lw a3, 0(a1)
-; CHECK-UNALIGNED-NEXT:lbu a1, 4(a1)
-; CHECK-UNALIGNED-NEXT:xor a2, a2, a3
-; CHECK-UNALIGNED-NEXT:xor a0, a0, a1
-; CHECK-UNALIGNED-NEXT:or a0, a2, a0
-; CHECK-UNALIGNED-NEXT:snez a0, a0
-; CHECK-UNALIGNED-NEXT:ret
+; CHECK-UNALIGNED-RV32-LABEL: bcmp_size_5:
+; CHECK-UNALIGNED-RV32:   # %bb.0: # %entry
+; CHECK-UNALIGNED-RV32-NEXT:lw a2, 0(a0)
+; CHECK-UNALIGNED-RV32-NEXT:lbu a0, 4(a0)
+; CHECK-UNALIGNED-RV32-NEXT:lw a3, 0(a1)
+; CHECK-UNALIGNED-RV32-NEXT:lbu a1, 4(a1)
+; CHECK-UNALIGNED-RV32-NEXT:xor a2, a2, a3
+; CHECK-UNALIGNED-RV32-NEXT:xor a0, a0, a1
+; CHECK-UNALIGNED-RV32-NEXT:or a0, a2, a0
+; CHECK-UNALIGNED-RV32-NEXT:snez a0, a0
+; CHECK-UNALIGNED-RV32-NEXT:ret
+;
+; CHECK-UNALIGNED-RV64-LABEL: bcmp_size_5:
+; CHECK-UNALIGNED-RV64:   # %bb.0: # %entry
+; CHECK-UNALIGNED-RV64-NEXT:lw a2, 0(a0)
+; CHECK-UNALIGNED-RV64-NEXT:lbu a0, 4(a0)
+; CHECK-UNALIGNED-RV64-NEXT:lw a3, 0(a1)
+; CHECK-UNALIGNED-RV64-NEXT:lbu a1, 4(a1)
+; CHECK-UNALIGNED-RV64-NEXT:xor a2, a2, a3
+; CHECK-UNALIGNED-RV64-NEXT:xor a0, a0, a1
+; CHECK-UNALIGNED-RV64-NEXT:or a0, a2, a0
+; CHECK-UNALIGNED-RV64-NEXT:snez a0, a0
+; CHECK-UNALIGNED-RV64-NEXT:ret
+;
+; CHECK-UNALIGNED-RV32-ZBB-LABEL: bcmp_size_5:
+; CHECK-UNALIGNED-RV32-ZBB:   # %bb.0: # %entry
+; CHECK-UNALIGNED-RV32-ZBB-NEXT:lw a2, 0(a0)
+; CHECK-UNALIGNED-RV32-ZBB-NEXT:lbu a0, 4(a0)
+; CHECK-UNALIGNED-RV32-ZBB-NEXT:lw a3, 0(a1)
+; CHECK-UNALIGNED-RV32-ZBB-NEXT:lbu a1, 4(a1)
+; CHECK-UNALIGNED-RV32-ZBB-NEXT:xor a2, a2, a3
+; CHECK-UNALIGNED-RV32-ZBB-NEXT:xor a0, a0, a1
+; CHECK-UNALIGNED-RV32-ZBB-NEXT:or a0, a2, a0

[llvm-branch-commits] [llvm] [RISCV] Support non-power-of-2 types when expanding memcmp (PR #114971)

2025-06-16 Thread Pengcheng Wang via llvm-branch-commits

https://github.com/wangpc-pp updated 
https://github.com/llvm/llvm-project/pull/114971

>From 3fd27bd1405a8b2c068786a200d610b9cacb65ef Mon Sep 17 00:00:00 2001
From: Wang Pengcheng 
Date: Tue, 5 Nov 2024 20:38:44 +0800
Subject: [PATCH 1/4] Set max bytes

Created using spr 1.3.6-beta.1
---
 llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp 
b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index c65feb9755633..a1c5f76bae009 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -2508,7 +2508,10 @@ RISCVTTIImpl::enableMemCmpExpansion(bool OptSize, bool 
IsZeroCmp) const {
 Options.LoadSizes = {4, 2, 1};
   if (IsZeroCmp && ST->hasVInstructions()) {
 unsigned VLenB = ST->getRealMinVLen() / 8;
-for (unsigned Size = ST->getXLen() / 8 + 1;
+// The minimum size should be the maximum bytes between `VLen * LMUL_MF8`
+// and `XLen + 8`.
+unsigned MinSize = std::max(VLenB / 8, ST->getXLen() / 8 + 1);
+for (unsigned Size = MinSize;
  Size <= VLenB * ST->getMaxLMULForFixedLengthVectors(); Size++)
   Options.LoadSizes.insert(Options.LoadSizes.begin(), Size);
   }

>From 17115212f1d7af68f5374896d1ddadf464b2bc11 Mon Sep 17 00:00:00 2001
From: Wang Pengcheng 
Date: Fri, 13 Jun 2025 18:24:15 +0800
Subject: [PATCH 2/4] Change to XLen + 1

Created using spr 1.3.6-beta.1
---
 .../Target/RISCV/RISCVTargetTransformInfo.cpp |   4 +-
 llvm/test/CodeGen/RISCV/memcmp-optsize.ll | 324 +++---
 llvm/test/CodeGen/RISCV/memcmp.ll | 324 +++---
 3 files changed, 570 insertions(+), 82 deletions(-)

diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp 
b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 4b9ea30a92c99..3aa0fcbb723a1 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -2956,8 +2956,8 @@ RISCVTTIImpl::enableMemCmpExpansion(bool OptSize, bool 
IsZeroCmp) const {
   if (IsZeroCmp && ST->hasVInstructions()) {
 unsigned VLenB = ST->getRealMinVLen() / 8;
 // The minimum size should be the maximum bytes between `VLen * LMUL_MF8`
-// and `XLen * 2`.
-unsigned MinSize = std::max(VLenB / 8, ST->getXLen() * 2 / 8);
+// and `XLen + 1`.
+unsigned MinSize = std::max(VLenB / 8, ST->getXLen() / 8 + 1);
 for (unsigned Size = MinSize;
  Size <= VLenB * ST->getMaxLMULForFixedLengthVectors(); Size++)
   Options.LoadSizes.insert(Options.LoadSizes.begin(), Size);
diff --git a/llvm/test/CodeGen/RISCV/memcmp-optsize.ll 
b/llvm/test/CodeGen/RISCV/memcmp-optsize.ll
index d4d12a932d0ec..0d57e4201512e 100644
--- a/llvm/test/CodeGen/RISCV/memcmp-optsize.ll
+++ b/llvm/test/CodeGen/RISCV/memcmp-optsize.ll
@@ -517,17 +517,99 @@ define i32 @bcmp_size_5(ptr %s1, ptr %s2) nounwind 
optsize {
 ; CHECK-ALIGNED-RV64-V-NEXT:addi sp, sp, 16
 ; CHECK-ALIGNED-RV64-V-NEXT:ret
 ;
-; CHECK-UNALIGNED-LABEL: bcmp_size_5:
-; CHECK-UNALIGNED:   # %bb.0: # %entry
-; CHECK-UNALIGNED-NEXT:lw a2, 0(a0)
-; CHECK-UNALIGNED-NEXT:lbu a0, 4(a0)
-; CHECK-UNALIGNED-NEXT:lw a3, 0(a1)
-; CHECK-UNALIGNED-NEXT:lbu a1, 4(a1)
-; CHECK-UNALIGNED-NEXT:xor a2, a2, a3
-; CHECK-UNALIGNED-NEXT:xor a0, a0, a1
-; CHECK-UNALIGNED-NEXT:or a0, a2, a0
-; CHECK-UNALIGNED-NEXT:snez a0, a0
-; CHECK-UNALIGNED-NEXT:ret
+; CHECK-UNALIGNED-RV32-LABEL: bcmp_size_5:
+; CHECK-UNALIGNED-RV32:   # %bb.0: # %entry
+; CHECK-UNALIGNED-RV32-NEXT:lw a2, 0(a0)
+; CHECK-UNALIGNED-RV32-NEXT:lbu a0, 4(a0)
+; CHECK-UNALIGNED-RV32-NEXT:lw a3, 0(a1)
+; CHECK-UNALIGNED-RV32-NEXT:lbu a1, 4(a1)
+; CHECK-UNALIGNED-RV32-NEXT:xor a2, a2, a3
+; CHECK-UNALIGNED-RV32-NEXT:xor a0, a0, a1
+; CHECK-UNALIGNED-RV32-NEXT:or a0, a2, a0
+; CHECK-UNALIGNED-RV32-NEXT:snez a0, a0
+; CHECK-UNALIGNED-RV32-NEXT:ret
+;
+; CHECK-UNALIGNED-RV64-LABEL: bcmp_size_5:
+; CHECK-UNALIGNED-RV64:   # %bb.0: # %entry
+; CHECK-UNALIGNED-RV64-NEXT:lw a2, 0(a0)
+; CHECK-UNALIGNED-RV64-NEXT:lbu a0, 4(a0)
+; CHECK-UNALIGNED-RV64-NEXT:lw a3, 0(a1)
+; CHECK-UNALIGNED-RV64-NEXT:lbu a1, 4(a1)
+; CHECK-UNALIGNED-RV64-NEXT:xor a2, a2, a3
+; CHECK-UNALIGNED-RV64-NEXT:xor a0, a0, a1
+; CHECK-UNALIGNED-RV64-NEXT:or a0, a2, a0
+; CHECK-UNALIGNED-RV64-NEXT:snez a0, a0
+; CHECK-UNALIGNED-RV64-NEXT:ret
+;
+; CHECK-UNALIGNED-RV32-ZBB-LABEL: bcmp_size_5:
+; CHECK-UNALIGNED-RV32-ZBB:   # %bb.0: # %entry
+; CHECK-UNALIGNED-RV32-ZBB-NEXT:lw a2, 0(a0)
+; CHECK-UNALIGNED-RV32-ZBB-NEXT:lbu a0, 4(a0)
+; CHECK-UNALIGNED-RV32-ZBB-NEXT:lw a3, 0(a1)
+; CHECK-UNALIGNED-RV32-ZBB-NEXT:lbu a1, 4(a1)
+; CHECK-UNALIGNED-RV32-ZBB-NEXT:xor a2, a2, a3
+; CHECK-UNALIGNED-RV32-ZBB-NEXT:xor a0, a0, a1
+; CHECK-UNALIGNED-RV32-ZBB-NEXT:or a0, a2, a0

[llvm-branch-commits] [flang] [mlir] [MLIR][OpenMP] Simplify OpenMP device codegen (PR #137201)

2025-06-16 Thread Kareem Ergawy via llvm-branch-commits

https://github.com/ergawy approved this pull request.

LGTM! Thanks Sergio and sorry for the very late review. Just 2 questions.

https://github.com/llvm/llvm-project/pull/137201
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/20.x: [LoongArch] Pass OptLevel to LoongArchDAGToDAGISel correctly (PR #144459)

2025-06-16 Thread Lu Weining via llvm-branch-commits

https://github.com/SixWeining approved this pull request.


https://github.com/llvm/llvm-project/pull/144459
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits