date:20250528

[llvm-branch-commits] [mlir] [MLIR] Integration tests for lowering vector.contract to SVE FEAT_I8MM (PR #140573)

2025-05-28 Thread Andrzej Warzyński via llvm-branch-commits

banach-space wrote:

Thanks - great to finally be reaching this stage! I have a few high-level
questions and suggestions:

**1. Why is the scalable dimension always [4]?**

From the current tests, it looks like the scalable dim is always `[4]`. Could
you remind me why that value is chosen?

**2. Reduce duplication in the 4x8x4 tests**

The current tests differ only in terms of **input**/**output** and `extsi` vs
`extui`. It should be possible to reduce duplication by extracting shared logic
into helpers, and writing 4 separate entry points (set via `entry_point`) to
isolate the differences.

For example:
```mlir
func.func @main_smmla() {
// Init LHS, RHS, ACC
// CHECK-LINES for LHS
print(lhs);
// CHECK-LINES for RHS
print(rhs);

arith.extsi (lhs)
arith.extsi (rhs)
vector.contract

// CHECK-LINES for ACC
print(acc);
}
```
This would keep the test logic focused and easier to maintain.

**3. Add checks for generated IR (LLVM dialect)**

It would be good to verify that the lowered IR includes the correct SME MMLA
intrinsics. For example:

```mlir
// CHECK-COUNT-4: llvm.intr.smmla
```
This would help confirm both correctness and that the expected number of
operations are emitted.

**4. Consider toggling VL within tests**
Have you considered toggling the scalable vector length (`VL`) within the test?
That would allow verifying behaviour for multiple `VL` values.

From what I can tell, this would only work if the inputs are generated inside a
loop, similar to this example:
https://github.com/llvm/llvm-project/blob/88f61f2c5c0ad9dad9c8df2fb86352629e7572c1/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/load-vertical.mlir#L19-L37

That might be a nice validation of the "scalability" aspect.

https://github.com/llvm/llvm-project/pull/140573
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Move S_BFE lowering into RegBankCombiner (PR #141589)

2025-05-28 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/141589

>From c7a0fb8f9846faa98cd5dbf3d71d5149051fa8a8 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Tue, 27 May 2025 11:16:16 +0200
Subject: [PATCH 1/2] [AMDGPU] Move S_BFE lowering into RegBankCombiner

---
 llvm/lib/Target/AMDGPU/AMDGPUCombine.td   |  14 +-
 .../Target/AMDGPU/AMDGPURegBankCombiner.cpp   |  51 +++
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  | 125 --
 3 files changed, 119 insertions(+), 71 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td 
b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index 9587fad1ecd63..94e1175b06b14 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -151,6 +151,17 @@ def zext_of_shift_amount_combines : GICombineGroup<[
   canonicalize_zext_lshr, canonicalize_zext_ashr, canonicalize_zext_shl
 ]>;
 
+// Early select of uniform BFX into S_BFE instructions.
+// These instructions encode the offset/width in a way that requires using
+// bitwise operations. Selecting these instructions early allow the combiner
+// to potentially fold these.
+class lower_uniform_bfx : GICombineRule<
+  (defs root:$bfx),
+  (combine (bfx $dst, $src, $o, $w):$bfx, [{ return lowerUniformBFX(*${bfx}); 
}])>;
+
+def lower_uniform_sbfx : lower_uniform_bfx;
+def lower_uniform_ubfx : lower_uniform_bfx;
+
 let Predicates = [Has16BitInsts, NotHasMed3_16] in {
 // For gfx8, expand f16-fmed3-as-f32 into a min/max f16 sequence. This
 // saves one instruction compared to the promotion.
@@ -198,5 +209,6 @@ def AMDGPURegBankCombiner : GICombiner<
zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain,
fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp,
identity_combines, redundant_and, constant_fold_cast_op,
-   cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines]> {
+   cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines,
+   lower_uniform_sbfx, lower_uniform_ubfx]> {
 }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
index ee324a5e93f0f..2100900bb8eb2 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
@@ -89,6 +89,8 @@ class AMDGPURegBankCombinerImpl : public Combiner {
 
   void applyCanonicalizeZextShiftAmt(MachineInstr &MI, MachineInstr &Ext) 
const;
 
+  bool lowerUniformBFX(MachineInstr &MI) const;
+
 private:
   SIModeRegisterDefaults getMode() const;
   bool getIEEE() const;
@@ -392,6 +394,55 @@ void 
AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt(
   MI.eraseFromParent();
 }
 
+bool AMDGPURegBankCombinerImpl::lowerUniformBFX(MachineInstr &MI) const {
+  assert(MI.getOpcode() == TargetOpcode::G_UBFX ||
+ MI.getOpcode() == TargetOpcode::G_SBFX);
+  const bool Signed = (MI.getOpcode() == TargetOpcode::G_SBFX);
+
+  Register DstReg = MI.getOperand(0).getReg();
+  const RegisterBank *RB = RBI.getRegBank(DstReg, MRI, TRI);
+  assert(RB && "No RB?");
+  if (RB->getID() != AMDGPU::SGPRRegBankID)
+return false;
+
+  Register SrcReg = MI.getOperand(1).getReg();
+  Register OffsetReg = MI.getOperand(2).getReg();
+  Register WidthReg = MI.getOperand(3).getReg();
+
+  const LLT S32 = LLT::scalar(32);
+  LLT Ty = MRI.getType(DstReg);
+
+  const unsigned Opc = (Ty == S32)
+   ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32)
+   : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
+
+  // Ensure the high bits are clear to insert the offset.
+  auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6));
+  auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
+
+  // Zeros out the low bits, so don't bother clamping the input value.
+  auto ShiftAmt = B.buildConstant(S32, 16);
+  auto ShiftWidth = B.buildShl(S32, WidthReg, ShiftAmt);
+
+  // Transformation function, pack the offset and width of a BFE into
+  // the format expected by the S_BFE_I32 / S_BFE_U32. In the second
+  // source, bits [5:0] contain the offset and bits [22:16] the width.
+  auto MergedInputs = B.buildOr(S32, ClampOffset, ShiftWidth);
+
+  MRI.setRegBank(OffsetMask.getReg(0), *RB);
+  MRI.setRegBank(ClampOffset.getReg(0), *RB);
+  MRI.setRegBank(ShiftAmt.getReg(0), *RB);
+  MRI.setRegBank(ShiftWidth.getReg(0), *RB);
+  MRI.setRegBank(MergedInputs.getReg(0), *RB);
+
+  auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
+  if (!constrainSelectedInstRegOperands(*MIB, TII, TRI, RBI))
+llvm_unreachable("failed to constrain BFE");
+
+  MI.eraseFromParent();
+  return true;
+}
+
 SIModeRegisterDefaults AMDGPURegBankCombinerImpl::getMode() const {
   return MF.getInfo()->getMode();
 }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index dd7aef8f0c583..0b7d64ee67c34 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/li

[llvm-branch-commits] [llvm] [AMDGPU] Add KnownBits simplification combines to RegBankCombiner (PR #141591)

2025-05-28 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/141591

>From 7c8f90225928c0dbffcfa03bd20da3419a80095f Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Tue, 27 May 2025 12:29:02 +0200
Subject: [PATCH 1/2] [AMDGPU] Add KnownBits simplification combines to
 RegBankCombiner

---
 llvm/lib/Target/AMDGPU/AMDGPUCombine.td   |  3 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   | 59 -
 .../test/CodeGen/AMDGPU/GlobalISel/saddsat.ll | 61 +++---
 .../test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll | 63 +++
 llvm/test/CodeGen/AMDGPU/div_i128.ll  | 30 -
 llvm/test/CodeGen/AMDGPU/itofp.i128.ll| 11 ++--
 llvm/test/CodeGen/AMDGPU/lround.ll| 18 +++---
 llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll   | 16 +
 8 files changed, 104 insertions(+), 157 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td 
b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index 96be17c487130..df867aaa204b1 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -210,5 +210,6 @@ def AMDGPURegBankCombiner : GICombiner<
fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp,
identity_combines, redundant_and, constant_fold_cast_op,
cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines,
-   lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract]> {
+   lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract,
+   known_bits_simplifications]> {
 }
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
index 6baa10bb48621..cc0f45681a3e2 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
@@ -1744,63 +1744,64 @@ define i65 @v_lshr_i65_33(i65 %value) {
 ; GFX6-LABEL: v_lshr_i65_33:
 ; GFX6:   ; %bb.0:
 ; GFX6-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX6-NEXT:v_mov_b32_e32 v3, v1
-; GFX6-NEXT:v_mov_b32_e32 v0, 1
+; GFX6-NEXT:v_mov_b32_e32 v3, 1
+; GFX6-NEXT:v_mov_b32_e32 v4, 0
+; GFX6-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX6-NEXT:v_lshl_b64 v[2:3], v[3:4], 31
+; GFX6-NEXT:v_lshrrev_b32_e32 v0, 1, v1
+; GFX6-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX6-NEXT:v_mov_b32_e32 v1, 0
-; GFX6-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX6-NEXT:v_lshl_b64 v[0:1], v[0:1], 31
-; GFX6-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX6-NEXT:v_or_b32_e32 v0, v2, v0
 ; GFX6-NEXT:v_mov_b32_e32 v2, 0
 ; GFX6-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: v_lshr_i65_33:
 ; GFX8:   ; %bb.0:
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_mov_b32_e32 v3, v1
-; GFX8-NEXT:v_mov_b32_e32 v0, 1
+; GFX8-NEXT:v_mov_b32_e32 v3, 1
+; GFX8-NEXT:v_mov_b32_e32 v4, 0
+; GFX8-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX8-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4]
+; GFX8-NEXT:v_lshrrev_b32_e32 v0, 1, v1
+; GFX8-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX8-NEXT:v_mov_b32_e32 v1, 0
-; GFX8-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX8-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1]
-; GFX8-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX8-NEXT:v_or_b32_e32 v0, v2, v0
 ; GFX8-NEXT:v_mov_b32_e32 v2, 0
 ; GFX8-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: v_lshr_i65_33:
 ; GFX9:   ; %bb.0:
 ; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:v_mov_b32_e32 v3, v1
-; GFX9-NEXT:v_mov_b32_e32 v0, 1
+; GFX9-NEXT:v_mov_b32_e32 v3, 1
+; GFX9-NEXT:v_mov_b32_e32 v4, 0
+; GFX9-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX9-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4]
+; GFX9-NEXT:v_lshrrev_b32_e32 v0, 1, v1
+; GFX9-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX9-NEXT:v_mov_b32_e32 v1, 0
-; GFX9-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX9-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1]
-; GFX9-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX9-NEXT:v_or_b32_e32 v0, v2, v0
 ; GFX9-NEXT:v_mov_b32_e32 v2, 0
 ; GFX9-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX10-LABEL: v_lshr_i65_33:
 ; GFX10:   ; %bb.0:
 ; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT:v_mov_b32_e32 v3, v1
-; GFX10-NEXT:v_mov_b32_e32 v0, 1
+; GFX10-NEXT:v_mov_b32_e32 v3, 1
+; GFX10-NEXT:v_mov_b32_e32 v4, 0
+; GFX10-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX10-NEXT:v_lshrrev_b32_e32 v0, 1, v1
 ; GFX10-NEXT:v_mov_b32_e32 v1, 0
-; GFX10-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX10-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX10-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1]
-; GFX10-NEXT:v_or_b32_e32 v0, v2, v0
+; GFX10-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4]
+; GFX10-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX10-NEXT:v_mov_b32_e32 v2, 0
 ; GFX10-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: v_lshr_i65_33:
 ; GFX11:   ; %bb.0:
 ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:v_dual_mov_b32 v3, v1 :: v_dual_mov_b32 v0, 1
-; GFX11-NEXT:v_dual_mov_b32 v1, 0 :: v_dual_an

[llvm-branch-commits] [llvm] [AMDGPU] Add KnownBits simplification combines to RegBankCombiner (PR #141591)

2025-05-28 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/141591

>From 7c8f90225928c0dbffcfa03bd20da3419a80095f Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Tue, 27 May 2025 12:29:02 +0200
Subject: [PATCH 1/2] [AMDGPU] Add KnownBits simplification combines to
 RegBankCombiner

---
 llvm/lib/Target/AMDGPU/AMDGPUCombine.td   |  3 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   | 59 -
 .../test/CodeGen/AMDGPU/GlobalISel/saddsat.ll | 61 +++---
 .../test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll | 63 +++
 llvm/test/CodeGen/AMDGPU/div_i128.ll  | 30 -
 llvm/test/CodeGen/AMDGPU/itofp.i128.ll| 11 ++--
 llvm/test/CodeGen/AMDGPU/lround.ll| 18 +++---
 llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll   | 16 +
 8 files changed, 104 insertions(+), 157 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td 
b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index 96be17c487130..df867aaa204b1 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -210,5 +210,6 @@ def AMDGPURegBankCombiner : GICombiner<
fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp,
identity_combines, redundant_and, constant_fold_cast_op,
cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines,
-   lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract]> {
+   lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract,
+   known_bits_simplifications]> {
 }
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
index 6baa10bb48621..cc0f45681a3e2 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
@@ -1744,63 +1744,64 @@ define i65 @v_lshr_i65_33(i65 %value) {
 ; GFX6-LABEL: v_lshr_i65_33:
 ; GFX6:   ; %bb.0:
 ; GFX6-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX6-NEXT:v_mov_b32_e32 v3, v1
-; GFX6-NEXT:v_mov_b32_e32 v0, 1
+; GFX6-NEXT:v_mov_b32_e32 v3, 1
+; GFX6-NEXT:v_mov_b32_e32 v4, 0
+; GFX6-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX6-NEXT:v_lshl_b64 v[2:3], v[3:4], 31
+; GFX6-NEXT:v_lshrrev_b32_e32 v0, 1, v1
+; GFX6-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX6-NEXT:v_mov_b32_e32 v1, 0
-; GFX6-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX6-NEXT:v_lshl_b64 v[0:1], v[0:1], 31
-; GFX6-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX6-NEXT:v_or_b32_e32 v0, v2, v0
 ; GFX6-NEXT:v_mov_b32_e32 v2, 0
 ; GFX6-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: v_lshr_i65_33:
 ; GFX8:   ; %bb.0:
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_mov_b32_e32 v3, v1
-; GFX8-NEXT:v_mov_b32_e32 v0, 1
+; GFX8-NEXT:v_mov_b32_e32 v3, 1
+; GFX8-NEXT:v_mov_b32_e32 v4, 0
+; GFX8-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX8-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4]
+; GFX8-NEXT:v_lshrrev_b32_e32 v0, 1, v1
+; GFX8-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX8-NEXT:v_mov_b32_e32 v1, 0
-; GFX8-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX8-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1]
-; GFX8-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX8-NEXT:v_or_b32_e32 v0, v2, v0
 ; GFX8-NEXT:v_mov_b32_e32 v2, 0
 ; GFX8-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: v_lshr_i65_33:
 ; GFX9:   ; %bb.0:
 ; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:v_mov_b32_e32 v3, v1
-; GFX9-NEXT:v_mov_b32_e32 v0, 1
+; GFX9-NEXT:v_mov_b32_e32 v3, 1
+; GFX9-NEXT:v_mov_b32_e32 v4, 0
+; GFX9-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX9-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4]
+; GFX9-NEXT:v_lshrrev_b32_e32 v0, 1, v1
+; GFX9-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX9-NEXT:v_mov_b32_e32 v1, 0
-; GFX9-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX9-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1]
-; GFX9-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX9-NEXT:v_or_b32_e32 v0, v2, v0
 ; GFX9-NEXT:v_mov_b32_e32 v2, 0
 ; GFX9-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX10-LABEL: v_lshr_i65_33:
 ; GFX10:   ; %bb.0:
 ; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT:v_mov_b32_e32 v3, v1
-; GFX10-NEXT:v_mov_b32_e32 v0, 1
+; GFX10-NEXT:v_mov_b32_e32 v3, 1
+; GFX10-NEXT:v_mov_b32_e32 v4, 0
+; GFX10-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX10-NEXT:v_lshrrev_b32_e32 v0, 1, v1
 ; GFX10-NEXT:v_mov_b32_e32 v1, 0
-; GFX10-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX10-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX10-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1]
-; GFX10-NEXT:v_or_b32_e32 v0, v2, v0
+; GFX10-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4]
+; GFX10-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX10-NEXT:v_mov_b32_e32 v2, 0
 ; GFX10-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: v_lshr_i65_33:
 ; GFX11:   ; %bb.0:
 ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:v_dual_mov_b32 v3, v1 :: v_dual_mov_b32 v0, 1
-; GFX11-NEXT:v_dual_mov_b32 v1, 0 :: v_dual_an

[llvm-branch-commits] [llvm] [AMDGPU] Add KnownBits simplification combines to RegBankCombiner (PR #141591)

2025-05-28 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/141591

>From 9e7f29551b788d9060aec2168920554df41ff5df Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Tue, 27 May 2025 12:29:02 +0200
Subject: [PATCH 1/2] [AMDGPU] Add KnownBits simplification combines to
 RegBankCombiner

---
 llvm/lib/Target/AMDGPU/AMDGPUCombine.td   |  3 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   | 59 -
 .../test/CodeGen/AMDGPU/GlobalISel/saddsat.ll | 61 +++---
 .../test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll | 63 +++
 llvm/test/CodeGen/AMDGPU/div_i128.ll  | 30 -
 llvm/test/CodeGen/AMDGPU/itofp.i128.ll| 11 ++--
 llvm/test/CodeGen/AMDGPU/lround.ll| 18 +++---
 llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll   | 16 +
 8 files changed, 104 insertions(+), 157 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td 
b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index 96be17c487130..df867aaa204b1 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -210,5 +210,6 @@ def AMDGPURegBankCombiner : GICombiner<
fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp,
identity_combines, redundant_and, constant_fold_cast_op,
cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines,
-   lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract]> {
+   lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract,
+   known_bits_simplifications]> {
 }
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
index 6baa10bb48621..cc0f45681a3e2 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
@@ -1744,63 +1744,64 @@ define i65 @v_lshr_i65_33(i65 %value) {
 ; GFX6-LABEL: v_lshr_i65_33:
 ; GFX6:   ; %bb.0:
 ; GFX6-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX6-NEXT:v_mov_b32_e32 v3, v1
-; GFX6-NEXT:v_mov_b32_e32 v0, 1
+; GFX6-NEXT:v_mov_b32_e32 v3, 1
+; GFX6-NEXT:v_mov_b32_e32 v4, 0
+; GFX6-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX6-NEXT:v_lshl_b64 v[2:3], v[3:4], 31
+; GFX6-NEXT:v_lshrrev_b32_e32 v0, 1, v1
+; GFX6-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX6-NEXT:v_mov_b32_e32 v1, 0
-; GFX6-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX6-NEXT:v_lshl_b64 v[0:1], v[0:1], 31
-; GFX6-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX6-NEXT:v_or_b32_e32 v0, v2, v0
 ; GFX6-NEXT:v_mov_b32_e32 v2, 0
 ; GFX6-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: v_lshr_i65_33:
 ; GFX8:   ; %bb.0:
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_mov_b32_e32 v3, v1
-; GFX8-NEXT:v_mov_b32_e32 v0, 1
+; GFX8-NEXT:v_mov_b32_e32 v3, 1
+; GFX8-NEXT:v_mov_b32_e32 v4, 0
+; GFX8-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX8-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4]
+; GFX8-NEXT:v_lshrrev_b32_e32 v0, 1, v1
+; GFX8-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX8-NEXT:v_mov_b32_e32 v1, 0
-; GFX8-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX8-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1]
-; GFX8-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX8-NEXT:v_or_b32_e32 v0, v2, v0
 ; GFX8-NEXT:v_mov_b32_e32 v2, 0
 ; GFX8-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: v_lshr_i65_33:
 ; GFX9:   ; %bb.0:
 ; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:v_mov_b32_e32 v3, v1
-; GFX9-NEXT:v_mov_b32_e32 v0, 1
+; GFX9-NEXT:v_mov_b32_e32 v3, 1
+; GFX9-NEXT:v_mov_b32_e32 v4, 0
+; GFX9-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX9-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4]
+; GFX9-NEXT:v_lshrrev_b32_e32 v0, 1, v1
+; GFX9-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX9-NEXT:v_mov_b32_e32 v1, 0
-; GFX9-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX9-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1]
-; GFX9-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX9-NEXT:v_or_b32_e32 v0, v2, v0
 ; GFX9-NEXT:v_mov_b32_e32 v2, 0
 ; GFX9-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX10-LABEL: v_lshr_i65_33:
 ; GFX10:   ; %bb.0:
 ; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT:v_mov_b32_e32 v3, v1
-; GFX10-NEXT:v_mov_b32_e32 v0, 1
+; GFX10-NEXT:v_mov_b32_e32 v3, 1
+; GFX10-NEXT:v_mov_b32_e32 v4, 0
+; GFX10-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX10-NEXT:v_lshrrev_b32_e32 v0, 1, v1
 ; GFX10-NEXT:v_mov_b32_e32 v1, 0
-; GFX10-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX10-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX10-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1]
-; GFX10-NEXT:v_or_b32_e32 v0, v2, v0
+; GFX10-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4]
+; GFX10-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX10-NEXT:v_mov_b32_e32 v2, 0
 ; GFX10-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: v_lshr_i65_33:
 ; GFX11:   ; %bb.0:
 ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:v_dual_mov_b32 v3, v1 :: v_dual_mov_b32 v0, 1
-; GFX11-NEXT:v_dual_mov_b32 v1, 0 :: v_dual_an

[llvm-branch-commits] [llvm] [AMDGPU] Add KnownBits simplification combines to RegBankCombiner (PR #141591)

2025-05-28 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/141591

>From 9e7f29551b788d9060aec2168920554df41ff5df Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Tue, 27 May 2025 12:29:02 +0200
Subject: [PATCH 1/2] [AMDGPU] Add KnownBits simplification combines to
 RegBankCombiner

---
 llvm/lib/Target/AMDGPU/AMDGPUCombine.td   |  3 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   | 59 -
 .../test/CodeGen/AMDGPU/GlobalISel/saddsat.ll | 61 +++---
 .../test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll | 63 +++
 llvm/test/CodeGen/AMDGPU/div_i128.ll  | 30 -
 llvm/test/CodeGen/AMDGPU/itofp.i128.ll| 11 ++--
 llvm/test/CodeGen/AMDGPU/lround.ll| 18 +++---
 llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll   | 16 +
 8 files changed, 104 insertions(+), 157 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td 
b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index 96be17c487130..df867aaa204b1 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -210,5 +210,6 @@ def AMDGPURegBankCombiner : GICombiner<
fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp,
identity_combines, redundant_and, constant_fold_cast_op,
cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines,
-   lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract]> {
+   lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract,
+   known_bits_simplifications]> {
 }
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
index 6baa10bb48621..cc0f45681a3e2 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
@@ -1744,63 +1744,64 @@ define i65 @v_lshr_i65_33(i65 %value) {
 ; GFX6-LABEL: v_lshr_i65_33:
 ; GFX6:   ; %bb.0:
 ; GFX6-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX6-NEXT:v_mov_b32_e32 v3, v1
-; GFX6-NEXT:v_mov_b32_e32 v0, 1
+; GFX6-NEXT:v_mov_b32_e32 v3, 1
+; GFX6-NEXT:v_mov_b32_e32 v4, 0
+; GFX6-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX6-NEXT:v_lshl_b64 v[2:3], v[3:4], 31
+; GFX6-NEXT:v_lshrrev_b32_e32 v0, 1, v1
+; GFX6-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX6-NEXT:v_mov_b32_e32 v1, 0
-; GFX6-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX6-NEXT:v_lshl_b64 v[0:1], v[0:1], 31
-; GFX6-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX6-NEXT:v_or_b32_e32 v0, v2, v0
 ; GFX6-NEXT:v_mov_b32_e32 v2, 0
 ; GFX6-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: v_lshr_i65_33:
 ; GFX8:   ; %bb.0:
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_mov_b32_e32 v3, v1
-; GFX8-NEXT:v_mov_b32_e32 v0, 1
+; GFX8-NEXT:v_mov_b32_e32 v3, 1
+; GFX8-NEXT:v_mov_b32_e32 v4, 0
+; GFX8-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX8-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4]
+; GFX8-NEXT:v_lshrrev_b32_e32 v0, 1, v1
+; GFX8-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX8-NEXT:v_mov_b32_e32 v1, 0
-; GFX8-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX8-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1]
-; GFX8-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX8-NEXT:v_or_b32_e32 v0, v2, v0
 ; GFX8-NEXT:v_mov_b32_e32 v2, 0
 ; GFX8-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: v_lshr_i65_33:
 ; GFX9:   ; %bb.0:
 ; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:v_mov_b32_e32 v3, v1
-; GFX9-NEXT:v_mov_b32_e32 v0, 1
+; GFX9-NEXT:v_mov_b32_e32 v3, 1
+; GFX9-NEXT:v_mov_b32_e32 v4, 0
+; GFX9-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX9-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4]
+; GFX9-NEXT:v_lshrrev_b32_e32 v0, 1, v1
+; GFX9-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX9-NEXT:v_mov_b32_e32 v1, 0
-; GFX9-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX9-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1]
-; GFX9-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX9-NEXT:v_or_b32_e32 v0, v2, v0
 ; GFX9-NEXT:v_mov_b32_e32 v2, 0
 ; GFX9-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX10-LABEL: v_lshr_i65_33:
 ; GFX10:   ; %bb.0:
 ; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT:v_mov_b32_e32 v3, v1
-; GFX10-NEXT:v_mov_b32_e32 v0, 1
+; GFX10-NEXT:v_mov_b32_e32 v3, 1
+; GFX10-NEXT:v_mov_b32_e32 v4, 0
+; GFX10-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX10-NEXT:v_lshrrev_b32_e32 v0, 1, v1
 ; GFX10-NEXT:v_mov_b32_e32 v1, 0
-; GFX10-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX10-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX10-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1]
-; GFX10-NEXT:v_or_b32_e32 v0, v2, v0
+; GFX10-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4]
+; GFX10-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX10-NEXT:v_mov_b32_e32 v2, 0
 ; GFX10-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: v_lshr_i65_33:
 ; GFX11:   ; %bb.0:
 ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:v_dual_mov_b32 v3, v1 :: v_dual_mov_b32 v0, 1
-; GFX11-NEXT:v_dual_mov_b32 v1, 0 :: v_dual_an

[llvm-branch-commits] [llvm] [AMDGPU] Move S_BFE lowering into RegBankCombiner (PR #141589)

2025-05-28 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/141589

>From c7a0fb8f9846faa98cd5dbf3d71d5149051fa8a8 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Tue, 27 May 2025 11:16:16 +0200
Subject: [PATCH 1/2] [AMDGPU] Move S_BFE lowering into RegBankCombiner

---
 llvm/lib/Target/AMDGPU/AMDGPUCombine.td   |  14 +-
 .../Target/AMDGPU/AMDGPURegBankCombiner.cpp   |  51 +++
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  | 125 --
 3 files changed, 119 insertions(+), 71 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td 
b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index 9587fad1ecd63..94e1175b06b14 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -151,6 +151,17 @@ def zext_of_shift_amount_combines : GICombineGroup<[
   canonicalize_zext_lshr, canonicalize_zext_ashr, canonicalize_zext_shl
 ]>;
 
+// Early select of uniform BFX into S_BFE instructions.
+// These instructions encode the offset/width in a way that requires using
+// bitwise operations. Selecting these instructions early allow the combiner
+// to potentially fold these.
+class lower_uniform_bfx : GICombineRule<
+  (defs root:$bfx),
+  (combine (bfx $dst, $src, $o, $w):$bfx, [{ return lowerUniformBFX(*${bfx}); 
}])>;
+
+def lower_uniform_sbfx : lower_uniform_bfx;
+def lower_uniform_ubfx : lower_uniform_bfx;
+
 let Predicates = [Has16BitInsts, NotHasMed3_16] in {
 // For gfx8, expand f16-fmed3-as-f32 into a min/max f16 sequence. This
 // saves one instruction compared to the promotion.
@@ -198,5 +209,6 @@ def AMDGPURegBankCombiner : GICombiner<
zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain,
fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp,
identity_combines, redundant_and, constant_fold_cast_op,
-   cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines]> {
+   cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines,
+   lower_uniform_sbfx, lower_uniform_ubfx]> {
 }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
index ee324a5e93f0f..2100900bb8eb2 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
@@ -89,6 +89,8 @@ class AMDGPURegBankCombinerImpl : public Combiner {
 
   void applyCanonicalizeZextShiftAmt(MachineInstr &MI, MachineInstr &Ext) 
const;
 
+  bool lowerUniformBFX(MachineInstr &MI) const;
+
 private:
   SIModeRegisterDefaults getMode() const;
   bool getIEEE() const;
@@ -392,6 +394,55 @@ void 
AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt(
   MI.eraseFromParent();
 }
 
+bool AMDGPURegBankCombinerImpl::lowerUniformBFX(MachineInstr &MI) const {
+  assert(MI.getOpcode() == TargetOpcode::G_UBFX ||
+ MI.getOpcode() == TargetOpcode::G_SBFX);
+  const bool Signed = (MI.getOpcode() == TargetOpcode::G_SBFX);
+
+  Register DstReg = MI.getOperand(0).getReg();
+  const RegisterBank *RB = RBI.getRegBank(DstReg, MRI, TRI);
+  assert(RB && "No RB?");
+  if (RB->getID() != AMDGPU::SGPRRegBankID)
+return false;
+
+  Register SrcReg = MI.getOperand(1).getReg();
+  Register OffsetReg = MI.getOperand(2).getReg();
+  Register WidthReg = MI.getOperand(3).getReg();
+
+  const LLT S32 = LLT::scalar(32);
+  LLT Ty = MRI.getType(DstReg);
+
+  const unsigned Opc = (Ty == S32)
+   ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32)
+   : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
+
+  // Ensure the high bits are clear to insert the offset.
+  auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6));
+  auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
+
+  // Zeros out the low bits, so don't bother clamping the input value.
+  auto ShiftAmt = B.buildConstant(S32, 16);
+  auto ShiftWidth = B.buildShl(S32, WidthReg, ShiftAmt);
+
+  // Transformation function, pack the offset and width of a BFE into
+  // the format expected by the S_BFE_I32 / S_BFE_U32. In the second
+  // source, bits [5:0] contain the offset and bits [22:16] the width.
+  auto MergedInputs = B.buildOr(S32, ClampOffset, ShiftWidth);
+
+  MRI.setRegBank(OffsetMask.getReg(0), *RB);
+  MRI.setRegBank(ClampOffset.getReg(0), *RB);
+  MRI.setRegBank(ShiftAmt.getReg(0), *RB);
+  MRI.setRegBank(ShiftWidth.getReg(0), *RB);
+  MRI.setRegBank(MergedInputs.getReg(0), *RB);
+
+  auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
+  if (!constrainSelectedInstRegOperands(*MIB, TII, TRI, RBI))
+llvm_unreachable("failed to constrain BFE");
+
+  MI.eraseFromParent();
+  return true;
+}
+
 SIModeRegisterDefaults AMDGPURegBankCombinerImpl::getMode() const {
   return MF.getInfo()->getMode();
 }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index dd7aef8f0c583..0b7d64ee67c34 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/li

[llvm-branch-commits] [llvm] [AMDGPU] Move S_BFE lowering into RegBankCombiner (PR #141589)

2025-05-28 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/141589

>From efa6a12fedf3c87678a1df1e5d03ff1e58531625 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Tue, 27 May 2025 11:16:16 +0200
Subject: [PATCH 1/2] [AMDGPU] Move S_BFE lowering into RegBankCombiner

---
 llvm/lib/Target/AMDGPU/AMDGPUCombine.td   |  14 +-
 .../Target/AMDGPU/AMDGPURegBankCombiner.cpp   |  51 +++
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  | 125 --
 3 files changed, 119 insertions(+), 71 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td 
b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index 9587fad1ecd63..94e1175b06b14 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -151,6 +151,17 @@ def zext_of_shift_amount_combines : GICombineGroup<[
   canonicalize_zext_lshr, canonicalize_zext_ashr, canonicalize_zext_shl
 ]>;
 
+// Early select of uniform BFX into S_BFE instructions.
+// These instructions encode the offset/width in a way that requires using
+// bitwise operations. Selecting these instructions early allow the combiner
+// to potentially fold these.
+class lower_uniform_bfx : GICombineRule<
+  (defs root:$bfx),
+  (combine (bfx $dst, $src, $o, $w):$bfx, [{ return lowerUniformBFX(*${bfx}); 
}])>;
+
+def lower_uniform_sbfx : lower_uniform_bfx;
+def lower_uniform_ubfx : lower_uniform_bfx;
+
 let Predicates = [Has16BitInsts, NotHasMed3_16] in {
 // For gfx8, expand f16-fmed3-as-f32 into a min/max f16 sequence. This
 // saves one instruction compared to the promotion.
@@ -198,5 +209,6 @@ def AMDGPURegBankCombiner : GICombiner<
zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain,
fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp,
identity_combines, redundant_and, constant_fold_cast_op,
-   cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines]> {
+   cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines,
+   lower_uniform_sbfx, lower_uniform_ubfx]> {
 }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
index ee324a5e93f0f..2100900bb8eb2 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
@@ -89,6 +89,8 @@ class AMDGPURegBankCombinerImpl : public Combiner {
 
   void applyCanonicalizeZextShiftAmt(MachineInstr &MI, MachineInstr &Ext) 
const;
 
+  bool lowerUniformBFX(MachineInstr &MI) const;
+
 private:
   SIModeRegisterDefaults getMode() const;
   bool getIEEE() const;
@@ -392,6 +394,55 @@ void 
AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt(
   MI.eraseFromParent();
 }
 
+bool AMDGPURegBankCombinerImpl::lowerUniformBFX(MachineInstr &MI) const {
+  assert(MI.getOpcode() == TargetOpcode::G_UBFX ||
+ MI.getOpcode() == TargetOpcode::G_SBFX);
+  const bool Signed = (MI.getOpcode() == TargetOpcode::G_SBFX);
+
+  Register DstReg = MI.getOperand(0).getReg();
+  const RegisterBank *RB = RBI.getRegBank(DstReg, MRI, TRI);
+  assert(RB && "No RB?");
+  if (RB->getID() != AMDGPU::SGPRRegBankID)
+return false;
+
+  Register SrcReg = MI.getOperand(1).getReg();
+  Register OffsetReg = MI.getOperand(2).getReg();
+  Register WidthReg = MI.getOperand(3).getReg();
+
+  const LLT S32 = LLT::scalar(32);
+  LLT Ty = MRI.getType(DstReg);
+
+  const unsigned Opc = (Ty == S32)
+   ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32)
+   : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
+
+  // Ensure the high bits are clear to insert the offset.
+  auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6));
+  auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
+
+  // Zeros out the low bits, so don't bother clamping the input value.
+  auto ShiftAmt = B.buildConstant(S32, 16);
+  auto ShiftWidth = B.buildShl(S32, WidthReg, ShiftAmt);
+
+  // Transformation function, pack the offset and width of a BFE into
+  // the format expected by the S_BFE_I32 / S_BFE_U32. In the second
+  // source, bits [5:0] contain the offset and bits [22:16] the width.
+  auto MergedInputs = B.buildOr(S32, ClampOffset, ShiftWidth);
+
+  MRI.setRegBank(OffsetMask.getReg(0), *RB);
+  MRI.setRegBank(ClampOffset.getReg(0), *RB);
+  MRI.setRegBank(ShiftAmt.getReg(0), *RB);
+  MRI.setRegBank(ShiftWidth.getReg(0), *RB);
+  MRI.setRegBank(MergedInputs.getReg(0), *RB);
+
+  auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
+  if (!constrainSelectedInstRegOperands(*MIB, TII, TRI, RBI))
+llvm_unreachable("failed to constrain BFE");
+
+  MI.eraseFromParent();
+  return true;
+}
+
 SIModeRegisterDefaults AMDGPURegBankCombinerImpl::getMode() const {
   return MF.getInfo()->getMode();
 }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index dd7aef8f0c583..0b7d64ee67c34 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/li

[llvm-branch-commits] [llvm] AMDGPU: Fix tracking subreg defs when folding through reg_sequence (PR #140608)

2025-05-28 Thread Jay Foad via llvm-branch-commits



@@ -25,52 +25,151 @@ using namespace llvm;
 
 namespace {
 
-struct FoldCandidate {
-  MachineInstr *UseMI;
+/// Track a value we may want to fold into downstream users, applying
+/// subregister extracts along the way.
+struct FoldableDef {
   union {
-MachineOperand *OpToFold;
+MachineOperand *OpToFold = nullptr;
 uint64_t ImmToFold;
 int FrameIndexToFold;
   };
-  int ShrinkOpcode;
-  unsigned UseOpNo;
+
+  /// Register class of the originally defined value.
+  const TargetRegisterClass *DefRC = nullptr;
+
+  /// Track the original defining instruction for the value.
+  const MachineInstr *DefMI = nullptr;
+
+  /// Subregister to apply to the value at the use point.
+  unsigned DefSubReg = AMDGPU::NoSubRegister;
+
+  /// Kind of value stored in the union.
   MachineOperand::MachineOperandType Kind;
-  bool Commuted;
 
-  FoldCandidate(MachineInstr *MI, unsigned OpNo, MachineOperand *FoldOp,
-bool Commuted_ = false,
-int ShrinkOp = -1) :
-UseMI(MI), OpToFold(nullptr), ShrinkOpcode(ShrinkOp), UseOpNo(OpNo),
-Kind(FoldOp->getType()),
-Commuted(Commuted_) {
-if (FoldOp->isImm()) {
-  ImmToFold = FoldOp->getImm();
-} else if (FoldOp->isFI()) {
-  FrameIndexToFold = FoldOp->getIndex();
+  FoldableDef() = delete;
+  FoldableDef(MachineOperand &FoldOp, const TargetRegisterClass *DefRC,
+  unsigned DefSubReg = AMDGPU::NoSubRegister)
+  : DefRC(DefRC), DefSubReg(DefSubReg), Kind(FoldOp.getType()) {
+
+if (FoldOp.isImm()) {
+  ImmToFold = FoldOp.getImm();
+} else if (FoldOp.isFI()) {
+  FrameIndexToFold = FoldOp.getIndex();
 } else {
-  assert(FoldOp->isReg() || FoldOp->isGlobal());
-  OpToFold = FoldOp;
+  assert(FoldOp.isReg() || FoldOp.isGlobal());
+  OpToFold = &FoldOp;
 }
+
+DefMI = FoldOp.getParent();
   }
 
-  FoldCandidate(MachineInstr *MI, unsigned OpNo, int64_t FoldImm,
-bool Commuted_ = false, int ShrinkOp = -1)
-  : UseMI(MI), ImmToFold(FoldImm), ShrinkOpcode(ShrinkOp), UseOpNo(OpNo),
-Kind(MachineOperand::MO_Immediate), Commuted(Commuted_) {}
+  FoldableDef(int64_t FoldImm, const TargetRegisterClass *DefRC,
+  unsigned DefSubReg = AMDGPU::NoSubRegister)
+  : ImmToFold(FoldImm), DefRC(DefRC), DefSubReg(DefSubReg),
+Kind(MachineOperand::MO_Immediate) {}
+
+  /// Copy the current def and apply \p SubReg to the value.
+  FoldableDef getWithSubReg(const SIRegisterInfo &TRI, unsigned SubReg) const {
+FoldableDef Copy(*this);
+Copy.DefSubReg = TRI.composeSubRegIndices(DefSubReg, SubReg);
+return Copy;
+  }
+
+  bool isReg() const { return Kind == MachineOperand::MO_Register; }
+
+  Register getReg() const {
+assert(isReg());
+return OpToFold->getReg();
+  }
+
+  unsigned getSubReg() const {
+assert(isReg());
+return OpToFold->getSubReg();
+  }
+
+  bool isImm() const { return Kind == MachineOperand::MO_Immediate; }
 
   bool isFI() const {
 return Kind == MachineOperand::MO_FrameIndex;
   }
 
-  bool isImm() const {
-return Kind == MachineOperand::MO_Immediate;
+  int getFI() const {
+assert(isFI());
+return FrameIndexToFold;
   }
 
-  bool isReg() const {
-return Kind == MachineOperand::MO_Register;
+  bool isGlobal() const { return OpToFold->isGlobal(); }

jayfoad wrote:

Not safe to access `OpToFold` unless you check for Imm and FI first:
```suggestion
  bool isGlobal() const {
return !isImm() && !isFI() && OpToFold->isGlobal();
  }
```

https://github.com/llvm/llvm-project/pull/140608
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Fix tracking subreg defs when folding through reg_sequence (PR #140608)

2025-05-28 Thread Jay Foad via llvm-branch-commits


https://github.com/jayfoad commented:

The idea seems good. I haven't reviewed it all in detail.

https://github.com/llvm/llvm-project/pull/140608
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Fix tracking subreg defs when folding through reg_sequence (PR #140608)

2025-05-28 Thread Jay Foad via llvm-branch-commits



@@ -25,52 +25,151 @@ using namespace llvm;
 
 namespace {
 
-struct FoldCandidate {
-  MachineInstr *UseMI;
+/// Track a value we may want to fold into downstream users, applying
+/// subregister extracts along the way.
+struct FoldableDef {
   union {
-MachineOperand *OpToFold;
+MachineOperand *OpToFold = nullptr;
 uint64_t ImmToFold;
 int FrameIndexToFold;
   };
-  int ShrinkOpcode;
-  unsigned UseOpNo;
+
+  /// Register class of the originally defined value.
+  const TargetRegisterClass *DefRC = nullptr;
+
+  /// Track the original defining instruction for the value.
+  const MachineInstr *DefMI = nullptr;
+
+  /// Subregister to apply to the value at the use point.
+  unsigned DefSubReg = AMDGPU::NoSubRegister;
+
+  /// Kind of value stored in the union.
   MachineOperand::MachineOperandType Kind;
-  bool Commuted;
 
-  FoldCandidate(MachineInstr *MI, unsigned OpNo, MachineOperand *FoldOp,
-bool Commuted_ = false,
-int ShrinkOp = -1) :
-UseMI(MI), OpToFold(nullptr), ShrinkOpcode(ShrinkOp), UseOpNo(OpNo),
-Kind(FoldOp->getType()),
-Commuted(Commuted_) {
-if (FoldOp->isImm()) {
-  ImmToFold = FoldOp->getImm();
-} else if (FoldOp->isFI()) {
-  FrameIndexToFold = FoldOp->getIndex();
+  FoldableDef() = delete;
+  FoldableDef(MachineOperand &FoldOp, const TargetRegisterClass *DefRC,
+  unsigned DefSubReg = AMDGPU::NoSubRegister)
+  : DefRC(DefRC), DefSubReg(DefSubReg), Kind(FoldOp.getType()) {
+
+if (FoldOp.isImm()) {
+  ImmToFold = FoldOp.getImm();
+} else if (FoldOp.isFI()) {
+  FrameIndexToFold = FoldOp.getIndex();
 } else {
-  assert(FoldOp->isReg() || FoldOp->isGlobal());
-  OpToFold = FoldOp;
+  assert(FoldOp.isReg() || FoldOp.isGlobal());
+  OpToFold = &FoldOp;
 }
+
+DefMI = FoldOp.getParent();
   }
 
-  FoldCandidate(MachineInstr *MI, unsigned OpNo, int64_t FoldImm,
-bool Commuted_ = false, int ShrinkOp = -1)
-  : UseMI(MI), ImmToFold(FoldImm), ShrinkOpcode(ShrinkOp), UseOpNo(OpNo),
-Kind(MachineOperand::MO_Immediate), Commuted(Commuted_) {}
+  FoldableDef(int64_t FoldImm, const TargetRegisterClass *DefRC,
+  unsigned DefSubReg = AMDGPU::NoSubRegister)
+  : ImmToFold(FoldImm), DefRC(DefRC), DefSubReg(DefSubReg),
+Kind(MachineOperand::MO_Immediate) {}
+
+  /// Copy the current def and apply \p SubReg to the value.
+  FoldableDef getWithSubReg(const SIRegisterInfo &TRI, unsigned SubReg) const {
+FoldableDef Copy(*this);
+Copy.DefSubReg = TRI.composeSubRegIndices(DefSubReg, SubReg);
+return Copy;
+  }
+
+  bool isReg() const { return Kind == MachineOperand::MO_Register; }
+
+  Register getReg() const {
+assert(isReg());
+return OpToFold->getReg();
+  }
+
+  unsigned getSubReg() const {
+assert(isReg());
+return OpToFold->getSubReg();
+  }
+
+  bool isImm() const { return Kind == MachineOperand::MO_Immediate; }
 
   bool isFI() const {
 return Kind == MachineOperand::MO_FrameIndex;
   }
 
-  bool isImm() const {
-return Kind == MachineOperand::MO_Immediate;
+  int getFI() const {
+assert(isFI());
+return FrameIndexToFold;
   }
 
-  bool isReg() const {
-return Kind == MachineOperand::MO_Register;
+  bool isGlobal() const { return OpToFold->isGlobal(); }
+
+  /// Return the effective immediate value defined by this instruction, after
+  /// application of any subregister extracts which may exist between the use
+  /// and def instruction.
+  std::optional getEffectiveImmVal() const {
+assert(isImm());
+return SIInstrInfo::extractSubregFromImm(ImmToFold, DefSubReg);
   }
 
-  bool isGlobal() const { return Kind == MachineOperand::MO_GlobalAddress; }
+  /// Check if it is legal to fold this effective value into \p MI's \p OpNo
+  /// operand.
+  bool isOperandLegal(const SIInstrInfo &TII, const MachineInstr &MI,
+  unsigned OpIdx) const {
+switch (Kind) {
+case MachineOperand::MO_Immediate: {
+  std::optional ImmToFold = getEffectiveImmVal();
+  if (!ImmToFold)
+return false;
+
+  // TODO: Should verify the subregister index is supported by the class
+  // TODO: Avoid the temporary MachineOperand
+  MachineOperand TmpOp = MachineOperand::CreateImm(*ImmToFold);
+  return TII.isOperandLegal(MI, OpIdx, &TmpOp);
+}
+case MachineOperand::MO_FrameIndex: {
+  if (DefSubReg != AMDGPU::NoSubRegister)
+return false;
+  MachineOperand TmpOp = MachineOperand::CreateFI(FrameIndexToFold);
+  return TII.isOperandLegal(MI, OpIdx, &TmpOp);
+}
+default:
+  // TODO: Try to apply DefSubReg, for global address we can extract
+  // low/high.
+  if (DefSubReg != AMDGPU::NoSubRegister)
+return false;
+  return TII.isOperandLegal(MI, OpIdx, OpToFold);
+}
+
+llvm_unreachable("covered MachineOperand kind switch");
+  }
+};
+
+struct FoldCandidate {
+

[llvm-branch-commits] [llvm] AMDGPU: Fix tracking subreg defs when folding through reg_sequence (PR #140608)

2025-05-28 Thread Jay Foad via llvm-branch-commits


https://github.com/jayfoad edited 
https://github.com/llvm/llvm-project/pull/140608
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [flang] Generlize names of delayed privatization CLI flags (PR #138816)

2025-05-28 Thread Kareem Ergawy via llvm-branch-commits


https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/138816

>From e0eb1611a67579562edefe1c66263c2cc562c5d7 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Wed, 7 May 2025 02:41:14 -0500
Subject: [PATCH] [flang] Generlize names of delayed privatization CLI flags

Remove the `openmp` prefix from delayed privatization/localization flags
since they are now used for `do concurrent` as well.
---
 flang/include/flang/Support/Flags.h   | 17 
 flang/lib/Lower/Bridge.cpp|  2 +-
 flang/lib/Lower/OpenMP/OpenMP.cpp |  1 +
 flang/lib/Lower/OpenMP/Utils.cpp  | 12 ---
 flang/lib/Lower/OpenMP/Utils.h|  2 --
 flang/lib/Support/CMakeLists.txt  |  1 +
 flang/lib/Support/Flags.cpp   | 20 +++
 .../distribute-standalone-private.f90 |  4 ++--
 .../DelayedPrivatization/equivalence.f90  |  4 ++--
 .../target-private-allocatable.f90|  4 ++--
 .../target-private-multiple-variables.f90 |  4 ++--
 .../target-private-simple.f90 |  4 ++--
 .../OpenMP/allocatable-multiple-vars.f90  |  4 ++--
 .../OpenMP/cfg-conversion-omp.private.f90 |  2 +-
 .../test/Lower/OpenMP/debug_info_conflict.f90 |  2 +-
 ...elayed-privatization-allocatable-array.f90 |  4 ++--
 ...privatization-allocatable-firstprivate.f90 |  6 +++---
 ...ayed-privatization-allocatable-private.f90 |  4 ++--
 .../OpenMP/delayed-privatization-array.f90| 12 +--
 .../delayed-privatization-character-array.f90 |  8 
 .../delayed-privatization-character.f90   |  8 
 .../delayed-privatization-default-init.f90|  4 ++--
 .../delayed-privatization-firstprivate.f90|  4 ++--
 ...rivatization-lower-allocatable-to-llvm.f90 |  2 +-
 .../OpenMP/delayed-privatization-pointer.f90  |  4 ++--
 ...yed-privatization-private-firstprivate.f90 |  4 ++--
 .../OpenMP/delayed-privatization-private.f90  |  4 ++--
 .../delayed-privatization-reduction-byref.f90 |  2 +-
 .../delayed-privatization-reduction.f90   |  4 ++--
 .../different_vars_lastprivate_barrier.f90|  2 +-
 .../Lower/OpenMP/firstprivate-commonblock.f90 |  2 +-
 .../test/Lower/OpenMP/private-commonblock.f90 |  2 +-
 .../Lower/OpenMP/private-derived-type.f90 |  4 ++--
 .../OpenMP/same_var_first_lastprivate.f90 |  2 +-
 .../Lower/do_concurrent_delayed_locality.f90  |  2 +-
 35 files changed, 96 insertions(+), 71 deletions(-)
 create mode 100644 flang/include/flang/Support/Flags.h
 create mode 100644 flang/lib/Support/Flags.cpp

diff --git a/flang/include/flang/Support/Flags.h 
b/flang/include/flang/Support/Flags.h
new file mode 100644
index 0..bcbb72f8e50d0
--- /dev/null
+++ b/flang/include/flang/Support/Flags.h
@@ -0,0 +1,17 @@
+//===-- include/flang/Support/Flags.h ---*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef FORTRAN_SUPPORT_FLAGS_H_
+#define FORTRAN_SUPPORT_FLAGS_H_
+
+#include "llvm/Support/CommandLine.h"
+
+extern llvm::cl::opt enableDelayedPrivatization;
+extern llvm::cl::opt enableDelayedPrivatizationStaging;
+
+#endif // FORTRAN_SUPPORT_FLAGS_H_
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index 49675d34215a9..9f3c50a52973a 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -13,7 +13,6 @@
 #include "flang/Lower/Bridge.h"
 
 #include "OpenMP/DataSharingProcessor.h"
-#include "OpenMP/Utils.h"
 #include "flang/Lower/Allocatable.h"
 #include "flang/Lower/CallInterface.h"
 #include "flang/Lower/Coarray.h"
@@ -63,6 +62,7 @@
 #include "flang/Semantics/runtime-type-info.h"
 #include "flang/Semantics/symbol.h"
 #include "flang/Semantics/tools.h"
+#include "flang/Support/Flags.h"
 #include "flang/Support/Version.h"
 #include "mlir/Dialect/ControlFlow/IR/ControlFlowOps.h"
 #include "mlir/IR/BuiltinAttributes.h"
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp 
b/flang/lib/Lower/OpenMP/OpenMP.cpp
index 5a975384bd371..f76afa2309233 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -34,6 +34,7 @@
 #include "flang/Parser/parse-tree.h"
 #include "flang/Semantics/openmp-directive-sets.h"
 #include "flang/Semantics/tools.h"
+#include "flang/Support/Flags.h"
 #include "flang/Support/OpenMP-utils.h"
 #include "mlir/Dialect/ControlFlow/IR/ControlFlowOps.h"
 #include "mlir/Dialect/OpenMP/OpenMPDialect.h"
diff --git a/flang/lib/Lower/OpenMP/Utils.cpp b/flang/lib/Lower/OpenMP/Utils.cpp
index 711d4af287691..c226c2558e7aa 100644
--- a/flang/lib/Lower/OpenMP/Utils.cpp
+++ b/flang/lib/Lower/OpenMP/Utils.cpp
@@ -33,18 +33,6 @@ llvm::cl::opt treatIndexAsSection(
 llvm::cl::desc("In the OpenMP data clauses trea

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: do not crash on debug-printing CFI instructions (PR #136151)

2025-05-28 Thread Kristof Beyls via llvm-branch-commits


https://github.com/kbeyls approved this pull request.


https://github.com/llvm/llvm-project/pull/136151
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: do not crash on debug-printing CFI instructions (PR #136151)

2025-05-28 Thread Kristof Beyls via llvm-branch-commits


https://github.com/kbeyls edited 
https://github.com/llvm/llvm-project/pull/136151
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Fix tracking subreg defs when folding through reg_sequence (PR #140608)

2025-05-28 Thread Jay Foad via llvm-branch-commits



@@ -25,52 +25,151 @@ using namespace llvm;
 
 namespace {
 
-struct FoldCandidate {
-  MachineInstr *UseMI;
+/// Track a value we may want to fold into downstream users, applying
+/// subregister extracts along the way.
+struct FoldableDef {
   union {
-MachineOperand *OpToFold;
+MachineOperand *OpToFold = nullptr;
 uint64_t ImmToFold;
 int FrameIndexToFold;
   };
-  int ShrinkOpcode;
-  unsigned UseOpNo;
+
+  /// Register class of the originally defined value.
+  const TargetRegisterClass *DefRC = nullptr;
+
+  /// Track the original defining instruction for the value.
+  const MachineInstr *DefMI = nullptr;
+
+  /// Subregister to apply to the value at the use point.
+  unsigned DefSubReg = AMDGPU::NoSubRegister;
+
+  /// Kind of value stored in the union.
   MachineOperand::MachineOperandType Kind;
-  bool Commuted;
 
-  FoldCandidate(MachineInstr *MI, unsigned OpNo, MachineOperand *FoldOp,
-bool Commuted_ = false,
-int ShrinkOp = -1) :
-UseMI(MI), OpToFold(nullptr), ShrinkOpcode(ShrinkOp), UseOpNo(OpNo),
-Kind(FoldOp->getType()),
-Commuted(Commuted_) {
-if (FoldOp->isImm()) {
-  ImmToFold = FoldOp->getImm();
-} else if (FoldOp->isFI()) {
-  FrameIndexToFold = FoldOp->getIndex();
+  FoldableDef() = delete;
+  FoldableDef(MachineOperand &FoldOp, const TargetRegisterClass *DefRC,
+  unsigned DefSubReg = AMDGPU::NoSubRegister)
+  : DefRC(DefRC), DefSubReg(DefSubReg), Kind(FoldOp.getType()) {
+
+if (FoldOp.isImm()) {
+  ImmToFold = FoldOp.getImm();
+} else if (FoldOp.isFI()) {
+  FrameIndexToFold = FoldOp.getIndex();
 } else {
-  assert(FoldOp->isReg() || FoldOp->isGlobal());
-  OpToFold = FoldOp;
+  assert(FoldOp.isReg() || FoldOp.isGlobal());
+  OpToFold = &FoldOp;
 }
+
+DefMI = FoldOp.getParent();
   }
 
-  FoldCandidate(MachineInstr *MI, unsigned OpNo, int64_t FoldImm,
-bool Commuted_ = false, int ShrinkOp = -1)
-  : UseMI(MI), ImmToFold(FoldImm), ShrinkOpcode(ShrinkOp), UseOpNo(OpNo),
-Kind(MachineOperand::MO_Immediate), Commuted(Commuted_) {}
+  FoldableDef(int64_t FoldImm, const TargetRegisterClass *DefRC,
+  unsigned DefSubReg = AMDGPU::NoSubRegister)
+  : ImmToFold(FoldImm), DefRC(DefRC), DefSubReg(DefSubReg),
+Kind(MachineOperand::MO_Immediate) {}
+
+  /// Copy the current def and apply \p SubReg to the value.
+  FoldableDef getWithSubReg(const SIRegisterInfo &TRI, unsigned SubReg) const {
+FoldableDef Copy(*this);
+Copy.DefSubReg = TRI.composeSubRegIndices(DefSubReg, SubReg);
+return Copy;
+  }
+
+  bool isReg() const { return Kind == MachineOperand::MO_Register; }
+
+  Register getReg() const {
+assert(isReg());
+return OpToFold->getReg();
+  }
+
+  unsigned getSubReg() const {
+assert(isReg());
+return OpToFold->getSubReg();
+  }
+
+  bool isImm() const { return Kind == MachineOperand::MO_Immediate; }
 
   bool isFI() const {
 return Kind == MachineOperand::MO_FrameIndex;
   }
 
-  bool isImm() const {
-return Kind == MachineOperand::MO_Immediate;
+  int getFI() const {
+assert(isFI());
+return FrameIndexToFold;
   }
 
-  bool isReg() const {
-return Kind == MachineOperand::MO_Register;
+  bool isGlobal() const { return OpToFold->isGlobal(); }
+
+  /// Return the effective immediate value defined by this instruction, after
+  /// application of any subregister extracts which may exist between the use
+  /// and def instruction.
+  std::optional getEffectiveImmVal() const {
+assert(isImm());
+return SIInstrInfo::extractSubregFromImm(ImmToFold, DefSubReg);
   }
 
-  bool isGlobal() const { return Kind == MachineOperand::MO_GlobalAddress; }
+  /// Check if it is legal to fold this effective value into \p MI's \p OpNo
+  /// operand.
+  bool isOperandLegal(const SIInstrInfo &TII, const MachineInstr &MI,
+  unsigned OpIdx) const {
+switch (Kind) {
+case MachineOperand::MO_Immediate: {
+  std::optional ImmToFold = getEffectiveImmVal();
+  if (!ImmToFold)
+return false;
+
+  // TODO: Should verify the subregister index is supported by the class
+  // TODO: Avoid the temporary MachineOperand
+  MachineOperand TmpOp = MachineOperand::CreateImm(*ImmToFold);
+  return TII.isOperandLegal(MI, OpIdx, &TmpOp);
+}
+case MachineOperand::MO_FrameIndex: {
+  if (DefSubReg != AMDGPU::NoSubRegister)
+return false;
+  MachineOperand TmpOp = MachineOperand::CreateFI(FrameIndexToFold);
+  return TII.isOperandLegal(MI, OpIdx, &TmpOp);
+}
+default:
+  // TODO: Try to apply DefSubReg, for global address we can extract
+  // low/high.
+  if (DefSubReg != AMDGPU::NoSubRegister)
+return false;
+  return TII.isOperandLegal(MI, OpIdx, OpToFold);
+}
+
+llvm_unreachable("covered MachineOperand kind switch");
+  }
+};
+
+struct FoldCandidate {
+

[llvm-branch-commits] X86: Add X86TTIImpl::isProfitableToSinkOperands hook for immediate operands. (PR #141326)

2025-05-28 Thread Nikita Popov via llvm-branch-commits



@@ -7170,16 +7165,31 @@ bool X86TTIImpl::isProfitableToSinkOperands(Instruction 
*I,
 II->getIntrinsicID() == Intrinsic::fshr)
   ShiftAmountOpNum = 2;
   }
-
   if (ShiftAmountOpNum == -1)
 return false;
+  auto *ShiftAmount = &I->getOperandUse(ShiftAmountOpNum);
 
-  auto *Shuf = dyn_cast(I->getOperand(ShiftAmountOpNum));
+  // A uniform shift amount in a vector shift or funnel shift may be much
+  // cheaper than a generic variable vector shift, so make that pattern visible
+  // to SDAG by sinking the shuffle instruction next to the shift.
+  auto *Shuf = dyn_cast(ShiftAmount);
   if (Shuf && getSplatIndex(Shuf->getShuffleMask()) >= 0 &&
   isVectorShiftByScalarCheap(I->getType())) {
-Ops.push_back(&I->getOperandUse(ShiftAmountOpNum));
+Ops.push_back(ShiftAmount);
 return true;
   }
 
+  // Casts taking a constant expression (generally derived from a global
+  // variable address) as an operand are profitable to sink because they appear
+  // as subexpressions in the instruction sequence generated by the
+  // LowerTypeTests pass which is expected to pattern match to the rotate
+  // instruction's immediate operand.
+  if (auto *CI = dyn_cast(ShiftAmount)) {
+if (isa(CI->getOperand(0))) {
+  Ops.push_back(ShiftAmount);
+  return true;
+}
+  }

nikic wrote:

This check needs to be more specific. Even the `zext ptrtoint (ptr @g to i8)` 
pattern would not become a relocatable ror immediate under normal 
circumstances. The fact that the global has `!absolute_symbol` with an 
appropriate range is load-bearing here. (Looking at selectRelocImm, it does 
specifically check for getAbsoluteSymbolRange.)

https://github.com/llvm/llvm-project/pull/141326
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Fix tracking subreg defs when folding through reg_sequence (PR #140608)

2025-05-28 Thread Jay Foad via llvm-branch-commits



@@ -380,7 +477,8 @@ bool SIFoldOperandsImpl::canUseImmWithOpSel(FoldCandidate 
&Fold) const {
   return true;
 }
 
-bool SIFoldOperandsImpl::tryFoldImmWithOpSel(FoldCandidate &Fold) const {
+bool SIFoldOperandsImpl::tryFoldImmWithOpSel(FoldCandidate &Fold,
+ int64_t ImmVal) const {

jayfoad wrote:

Needs a comment explaining the `ImmVal` argument. Is it different from 
`Fold.getEffectiveImmVal()`?

https://github.com/llvm/llvm-project/pull/140608
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Move S_BFE lowering into RegBankCombiner (PR #141589)

2025-05-28 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/141589

>From efa6a12fedf3c87678a1df1e5d03ff1e58531625 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Tue, 27 May 2025 11:16:16 +0200
Subject: [PATCH] [AMDGPU] Move S_BFE lowering into RegBankCombiner

---
 llvm/lib/Target/AMDGPU/AMDGPUCombine.td   |  14 +-
 .../Target/AMDGPU/AMDGPURegBankCombiner.cpp   |  51 +++
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  | 125 --
 3 files changed, 119 insertions(+), 71 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td 
b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index 9587fad1ecd63..94e1175b06b14 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -151,6 +151,17 @@ def zext_of_shift_amount_combines : GICombineGroup<[
   canonicalize_zext_lshr, canonicalize_zext_ashr, canonicalize_zext_shl
 ]>;
 
+// Early select of uniform BFX into S_BFE instructions.
+// These instructions encode the offset/width in a way that requires using
+// bitwise operations. Selecting these instructions early allow the combiner
+// to potentially fold these.
+class lower_uniform_bfx : GICombineRule<
+  (defs root:$bfx),
+  (combine (bfx $dst, $src, $o, $w):$bfx, [{ return lowerUniformBFX(*${bfx}); 
}])>;
+
+def lower_uniform_sbfx : lower_uniform_bfx;
+def lower_uniform_ubfx : lower_uniform_bfx;
+
 let Predicates = [Has16BitInsts, NotHasMed3_16] in {
 // For gfx8, expand f16-fmed3-as-f32 into a min/max f16 sequence. This
 // saves one instruction compared to the promotion.
@@ -198,5 +209,6 @@ def AMDGPURegBankCombiner : GICombiner<
zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain,
fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp,
identity_combines, redundant_and, constant_fold_cast_op,
-   cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines]> {
+   cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines,
+   lower_uniform_sbfx, lower_uniform_ubfx]> {
 }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
index ee324a5e93f0f..2100900bb8eb2 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
@@ -89,6 +89,8 @@ class AMDGPURegBankCombinerImpl : public Combiner {
 
   void applyCanonicalizeZextShiftAmt(MachineInstr &MI, MachineInstr &Ext) 
const;
 
+  bool lowerUniformBFX(MachineInstr &MI) const;
+
 private:
   SIModeRegisterDefaults getMode() const;
   bool getIEEE() const;
@@ -392,6 +394,55 @@ void 
AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt(
   MI.eraseFromParent();
 }
 
+bool AMDGPURegBankCombinerImpl::lowerUniformBFX(MachineInstr &MI) const {
+  assert(MI.getOpcode() == TargetOpcode::G_UBFX ||
+ MI.getOpcode() == TargetOpcode::G_SBFX);
+  const bool Signed = (MI.getOpcode() == TargetOpcode::G_SBFX);
+
+  Register DstReg = MI.getOperand(0).getReg();
+  const RegisterBank *RB = RBI.getRegBank(DstReg, MRI, TRI);
+  assert(RB && "No RB?");
+  if (RB->getID() != AMDGPU::SGPRRegBankID)
+return false;
+
+  Register SrcReg = MI.getOperand(1).getReg();
+  Register OffsetReg = MI.getOperand(2).getReg();
+  Register WidthReg = MI.getOperand(3).getReg();
+
+  const LLT S32 = LLT::scalar(32);
+  LLT Ty = MRI.getType(DstReg);
+
+  const unsigned Opc = (Ty == S32)
+   ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32)
+   : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
+
+  // Ensure the high bits are clear to insert the offset.
+  auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6));
+  auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
+
+  // Zeros out the low bits, so don't bother clamping the input value.
+  auto ShiftAmt = B.buildConstant(S32, 16);
+  auto ShiftWidth = B.buildShl(S32, WidthReg, ShiftAmt);
+
+  // Transformation function, pack the offset and width of a BFE into
+  // the format expected by the S_BFE_I32 / S_BFE_U32. In the second
+  // source, bits [5:0] contain the offset and bits [22:16] the width.
+  auto MergedInputs = B.buildOr(S32, ClampOffset, ShiftWidth);
+
+  MRI.setRegBank(OffsetMask.getReg(0), *RB);
+  MRI.setRegBank(ClampOffset.getReg(0), *RB);
+  MRI.setRegBank(ShiftAmt.getReg(0), *RB);
+  MRI.setRegBank(ShiftWidth.getReg(0), *RB);
+  MRI.setRegBank(MergedInputs.getReg(0), *RB);
+
+  auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
+  if (!constrainSelectedInstRegOperands(*MIB, TII, TRI, RBI))
+llvm_unreachable("failed to constrain BFE");
+
+  MI.eraseFromParent();
+  return true;
+}
+
 SIModeRegisterDefaults AMDGPURegBankCombinerImpl::getMode() const {
   return MF.getInfo()->getMode();
 }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index dd7aef8f0c583..0b7d64ee67c34 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Ta

[llvm-branch-commits] [llvm] [AMDGPU] Add KnownBits simplification combines to RegBankCombiner (PR #141591)

2025-05-28 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/141591

>From 62031c0316c73a3650223721347854fd0c45e730 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Tue, 27 May 2025 12:29:02 +0200
Subject: [PATCH] [AMDGPU] Add KnownBits simplification combines to
 RegBankCombiner

---
 llvm/lib/Target/AMDGPU/AMDGPUCombine.td   |  3 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   | 59 -
 .../test/CodeGen/AMDGPU/GlobalISel/saddsat.ll | 61 +++---
 .../test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll | 63 +++
 llvm/test/CodeGen/AMDGPU/div_i128.ll  | 30 -
 llvm/test/CodeGen/AMDGPU/itofp.i128.ll| 11 ++--
 llvm/test/CodeGen/AMDGPU/lround.ll| 18 +++---
 llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll   | 16 +
 8 files changed, 104 insertions(+), 157 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td 
b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index 96be17c487130..df867aaa204b1 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -210,5 +210,6 @@ def AMDGPURegBankCombiner : GICombiner<
fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp,
identity_combines, redundant_and, constant_fold_cast_op,
cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines,
-   lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract]> {
+   lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract,
+   known_bits_simplifications]> {
 }
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
index 6baa10bb48621..cc0f45681a3e2 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
@@ -1744,63 +1744,64 @@ define i65 @v_lshr_i65_33(i65 %value) {
 ; GFX6-LABEL: v_lshr_i65_33:
 ; GFX6:   ; %bb.0:
 ; GFX6-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX6-NEXT:v_mov_b32_e32 v3, v1
-; GFX6-NEXT:v_mov_b32_e32 v0, 1
+; GFX6-NEXT:v_mov_b32_e32 v3, 1
+; GFX6-NEXT:v_mov_b32_e32 v4, 0
+; GFX6-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX6-NEXT:v_lshl_b64 v[2:3], v[3:4], 31
+; GFX6-NEXT:v_lshrrev_b32_e32 v0, 1, v1
+; GFX6-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX6-NEXT:v_mov_b32_e32 v1, 0
-; GFX6-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX6-NEXT:v_lshl_b64 v[0:1], v[0:1], 31
-; GFX6-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX6-NEXT:v_or_b32_e32 v0, v2, v0
 ; GFX6-NEXT:v_mov_b32_e32 v2, 0
 ; GFX6-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: v_lshr_i65_33:
 ; GFX8:   ; %bb.0:
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_mov_b32_e32 v3, v1
-; GFX8-NEXT:v_mov_b32_e32 v0, 1
+; GFX8-NEXT:v_mov_b32_e32 v3, 1
+; GFX8-NEXT:v_mov_b32_e32 v4, 0
+; GFX8-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX8-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4]
+; GFX8-NEXT:v_lshrrev_b32_e32 v0, 1, v1
+; GFX8-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX8-NEXT:v_mov_b32_e32 v1, 0
-; GFX8-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX8-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1]
-; GFX8-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX8-NEXT:v_or_b32_e32 v0, v2, v0
 ; GFX8-NEXT:v_mov_b32_e32 v2, 0
 ; GFX8-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: v_lshr_i65_33:
 ; GFX9:   ; %bb.0:
 ; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:v_mov_b32_e32 v3, v1
-; GFX9-NEXT:v_mov_b32_e32 v0, 1
+; GFX9-NEXT:v_mov_b32_e32 v3, 1
+; GFX9-NEXT:v_mov_b32_e32 v4, 0
+; GFX9-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX9-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4]
+; GFX9-NEXT:v_lshrrev_b32_e32 v0, 1, v1
+; GFX9-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX9-NEXT:v_mov_b32_e32 v1, 0
-; GFX9-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX9-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1]
-; GFX9-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX9-NEXT:v_or_b32_e32 v0, v2, v0
 ; GFX9-NEXT:v_mov_b32_e32 v2, 0
 ; GFX9-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX10-LABEL: v_lshr_i65_33:
 ; GFX10:   ; %bb.0:
 ; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT:v_mov_b32_e32 v3, v1
-; GFX10-NEXT:v_mov_b32_e32 v0, 1
+; GFX10-NEXT:v_mov_b32_e32 v3, 1
+; GFX10-NEXT:v_mov_b32_e32 v4, 0
+; GFX10-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX10-NEXT:v_lshrrev_b32_e32 v0, 1, v1
 ; GFX10-NEXT:v_mov_b32_e32 v1, 0
-; GFX10-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX10-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX10-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1]
-; GFX10-NEXT:v_or_b32_e32 v0, v2, v0
+; GFX10-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4]
+; GFX10-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX10-NEXT:v_mov_b32_e32 v2, 0
 ; GFX10-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: v_lshr_i65_33:
 ; GFX11:   ; %bb.0:
 ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:v_dual_mov_b32 v3, v1 :: v_dual_mov_b32 v0, 1
-; GFX11-NEXT:v_dual_mov_b32 v1, 0 :: v_dual_and_b3

[llvm-branch-commits] [llvm] [AMDGPU] Move S_BFE lowering into RegBankCombiner (PR #141589)

2025-05-28 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/141589

>From efa6a12fedf3c87678a1df1e5d03ff1e58531625 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Tue, 27 May 2025 11:16:16 +0200
Subject: [PATCH] [AMDGPU] Move S_BFE lowering into RegBankCombiner

---
 llvm/lib/Target/AMDGPU/AMDGPUCombine.td   |  14 +-
 .../Target/AMDGPU/AMDGPURegBankCombiner.cpp   |  51 +++
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  | 125 --
 3 files changed, 119 insertions(+), 71 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td 
b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index 9587fad1ecd63..94e1175b06b14 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -151,6 +151,17 @@ def zext_of_shift_amount_combines : GICombineGroup<[
   canonicalize_zext_lshr, canonicalize_zext_ashr, canonicalize_zext_shl
 ]>;
 
+// Early select of uniform BFX into S_BFE instructions.
+// These instructions encode the offset/width in a way that requires using
+// bitwise operations. Selecting these instructions early allow the combiner
+// to potentially fold these.
+class lower_uniform_bfx : GICombineRule<
+  (defs root:$bfx),
+  (combine (bfx $dst, $src, $o, $w):$bfx, [{ return lowerUniformBFX(*${bfx}); 
}])>;
+
+def lower_uniform_sbfx : lower_uniform_bfx;
+def lower_uniform_ubfx : lower_uniform_bfx;
+
 let Predicates = [Has16BitInsts, NotHasMed3_16] in {
 // For gfx8, expand f16-fmed3-as-f32 into a min/max f16 sequence. This
 // saves one instruction compared to the promotion.
@@ -198,5 +209,6 @@ def AMDGPURegBankCombiner : GICombiner<
zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain,
fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp,
identity_combines, redundant_and, constant_fold_cast_op,
-   cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines]> {
+   cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines,
+   lower_uniform_sbfx, lower_uniform_ubfx]> {
 }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
index ee324a5e93f0f..2100900bb8eb2 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
@@ -89,6 +89,8 @@ class AMDGPURegBankCombinerImpl : public Combiner {
 
   void applyCanonicalizeZextShiftAmt(MachineInstr &MI, MachineInstr &Ext) 
const;
 
+  bool lowerUniformBFX(MachineInstr &MI) const;
+
 private:
   SIModeRegisterDefaults getMode() const;
   bool getIEEE() const;
@@ -392,6 +394,55 @@ void 
AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt(
   MI.eraseFromParent();
 }
 
+bool AMDGPURegBankCombinerImpl::lowerUniformBFX(MachineInstr &MI) const {
+  assert(MI.getOpcode() == TargetOpcode::G_UBFX ||
+ MI.getOpcode() == TargetOpcode::G_SBFX);
+  const bool Signed = (MI.getOpcode() == TargetOpcode::G_SBFX);
+
+  Register DstReg = MI.getOperand(0).getReg();
+  const RegisterBank *RB = RBI.getRegBank(DstReg, MRI, TRI);
+  assert(RB && "No RB?");
+  if (RB->getID() != AMDGPU::SGPRRegBankID)
+return false;
+
+  Register SrcReg = MI.getOperand(1).getReg();
+  Register OffsetReg = MI.getOperand(2).getReg();
+  Register WidthReg = MI.getOperand(3).getReg();
+
+  const LLT S32 = LLT::scalar(32);
+  LLT Ty = MRI.getType(DstReg);
+
+  const unsigned Opc = (Ty == S32)
+   ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32)
+   : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
+
+  // Ensure the high bits are clear to insert the offset.
+  auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6));
+  auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
+
+  // Zeros out the low bits, so don't bother clamping the input value.
+  auto ShiftAmt = B.buildConstant(S32, 16);
+  auto ShiftWidth = B.buildShl(S32, WidthReg, ShiftAmt);
+
+  // Transformation function, pack the offset and width of a BFE into
+  // the format expected by the S_BFE_I32 / S_BFE_U32. In the second
+  // source, bits [5:0] contain the offset and bits [22:16] the width.
+  auto MergedInputs = B.buildOr(S32, ClampOffset, ShiftWidth);
+
+  MRI.setRegBank(OffsetMask.getReg(0), *RB);
+  MRI.setRegBank(ClampOffset.getReg(0), *RB);
+  MRI.setRegBank(ShiftAmt.getReg(0), *RB);
+  MRI.setRegBank(ShiftWidth.getReg(0), *RB);
+  MRI.setRegBank(MergedInputs.getReg(0), *RB);
+
+  auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
+  if (!constrainSelectedInstRegOperands(*MIB, TII, TRI, RBI))
+llvm_unreachable("failed to constrain BFE");
+
+  MI.eraseFromParent();
+  return true;
+}
+
 SIModeRegisterDefaults AMDGPURegBankCombinerImpl::getMode() const {
   return MF.getInfo()->getMode();
 }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index dd7aef8f0c583..0b7d64ee67c34 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Ta

[llvm-branch-commits] [llvm] [AMDGPU] Add KnownBits simplification combines to RegBankCombiner (PR #141591)

2025-05-28 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/141591

>From 62031c0316c73a3650223721347854fd0c45e730 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Tue, 27 May 2025 12:29:02 +0200
Subject: [PATCH] [AMDGPU] Add KnownBits simplification combines to
 RegBankCombiner

---
 llvm/lib/Target/AMDGPU/AMDGPUCombine.td   |  3 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   | 59 -
 .../test/CodeGen/AMDGPU/GlobalISel/saddsat.ll | 61 +++---
 .../test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll | 63 +++
 llvm/test/CodeGen/AMDGPU/div_i128.ll  | 30 -
 llvm/test/CodeGen/AMDGPU/itofp.i128.ll| 11 ++--
 llvm/test/CodeGen/AMDGPU/lround.ll| 18 +++---
 llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll   | 16 +
 8 files changed, 104 insertions(+), 157 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td 
b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index 96be17c487130..df867aaa204b1 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -210,5 +210,6 @@ def AMDGPURegBankCombiner : GICombiner<
fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp,
identity_combines, redundant_and, constant_fold_cast_op,
cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines,
-   lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract]> {
+   lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract,
+   known_bits_simplifications]> {
 }
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
index 6baa10bb48621..cc0f45681a3e2 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
@@ -1744,63 +1744,64 @@ define i65 @v_lshr_i65_33(i65 %value) {
 ; GFX6-LABEL: v_lshr_i65_33:
 ; GFX6:   ; %bb.0:
 ; GFX6-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX6-NEXT:v_mov_b32_e32 v3, v1
-; GFX6-NEXT:v_mov_b32_e32 v0, 1
+; GFX6-NEXT:v_mov_b32_e32 v3, 1
+; GFX6-NEXT:v_mov_b32_e32 v4, 0
+; GFX6-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX6-NEXT:v_lshl_b64 v[2:3], v[3:4], 31
+; GFX6-NEXT:v_lshrrev_b32_e32 v0, 1, v1
+; GFX6-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX6-NEXT:v_mov_b32_e32 v1, 0
-; GFX6-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX6-NEXT:v_lshl_b64 v[0:1], v[0:1], 31
-; GFX6-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX6-NEXT:v_or_b32_e32 v0, v2, v0
 ; GFX6-NEXT:v_mov_b32_e32 v2, 0
 ; GFX6-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: v_lshr_i65_33:
 ; GFX8:   ; %bb.0:
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_mov_b32_e32 v3, v1
-; GFX8-NEXT:v_mov_b32_e32 v0, 1
+; GFX8-NEXT:v_mov_b32_e32 v3, 1
+; GFX8-NEXT:v_mov_b32_e32 v4, 0
+; GFX8-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX8-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4]
+; GFX8-NEXT:v_lshrrev_b32_e32 v0, 1, v1
+; GFX8-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX8-NEXT:v_mov_b32_e32 v1, 0
-; GFX8-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX8-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1]
-; GFX8-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX8-NEXT:v_or_b32_e32 v0, v2, v0
 ; GFX8-NEXT:v_mov_b32_e32 v2, 0
 ; GFX8-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: v_lshr_i65_33:
 ; GFX9:   ; %bb.0:
 ; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:v_mov_b32_e32 v3, v1
-; GFX9-NEXT:v_mov_b32_e32 v0, 1
+; GFX9-NEXT:v_mov_b32_e32 v3, 1
+; GFX9-NEXT:v_mov_b32_e32 v4, 0
+; GFX9-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX9-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4]
+; GFX9-NEXT:v_lshrrev_b32_e32 v0, 1, v1
+; GFX9-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX9-NEXT:v_mov_b32_e32 v1, 0
-; GFX9-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX9-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1]
-; GFX9-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX9-NEXT:v_or_b32_e32 v0, v2, v0
 ; GFX9-NEXT:v_mov_b32_e32 v2, 0
 ; GFX9-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX10-LABEL: v_lshr_i65_33:
 ; GFX10:   ; %bb.0:
 ; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT:v_mov_b32_e32 v3, v1
-; GFX10-NEXT:v_mov_b32_e32 v0, 1
+; GFX10-NEXT:v_mov_b32_e32 v3, 1
+; GFX10-NEXT:v_mov_b32_e32 v4, 0
+; GFX10-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX10-NEXT:v_lshrrev_b32_e32 v0, 1, v1
 ; GFX10-NEXT:v_mov_b32_e32 v1, 0
-; GFX10-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX10-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX10-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1]
-; GFX10-NEXT:v_or_b32_e32 v0, v2, v0
+; GFX10-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4]
+; GFX10-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX10-NEXT:v_mov_b32_e32 v2, 0
 ; GFX10-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: v_lshr_i65_33:
 ; GFX11:   ; %bb.0:
 ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:v_dual_mov_b32 v3, v1 :: v_dual_mov_b32 v0, 1
-; GFX11-NEXT:v_dual_mov_b32 v1, 0 :: v_dual_and_b3

[llvm-branch-commits] [llvm] [AMDGPU] Move S_BFE lowering into RegBankCombiner (PR #141589)

2025-05-28 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/141589

>From e253bde72750576cab699ad1b6b872fbf60dffe9 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Tue, 27 May 2025 11:16:16 +0200
Subject: [PATCH 1/2] [AMDGPU] Move S_BFE lowering into RegBankCombiner

---
 llvm/lib/Target/AMDGPU/AMDGPUCombine.td   |  14 +-
 .../Target/AMDGPU/AMDGPURegBankCombiner.cpp   |  51 +++
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  | 125 --
 3 files changed, 119 insertions(+), 71 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td 
b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index 9587fad1ecd63..94e1175b06b14 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -151,6 +151,17 @@ def zext_of_shift_amount_combines : GICombineGroup<[
   canonicalize_zext_lshr, canonicalize_zext_ashr, canonicalize_zext_shl
 ]>;
 
+// Early select of uniform BFX into S_BFE instructions.
+// These instructions encode the offset/width in a way that requires using
+// bitwise operations. Selecting these instructions early allow the combiner
+// to potentially fold these.
+class lower_uniform_bfx : GICombineRule<
+  (defs root:$bfx),
+  (combine (bfx $dst, $src, $o, $w):$bfx, [{ return lowerUniformBFX(*${bfx}); 
}])>;
+
+def lower_uniform_sbfx : lower_uniform_bfx;
+def lower_uniform_ubfx : lower_uniform_bfx;
+
 let Predicates = [Has16BitInsts, NotHasMed3_16] in {
 // For gfx8, expand f16-fmed3-as-f32 into a min/max f16 sequence. This
 // saves one instruction compared to the promotion.
@@ -198,5 +209,6 @@ def AMDGPURegBankCombiner : GICombiner<
zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain,
fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp,
identity_combines, redundant_and, constant_fold_cast_op,
-   cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines]> {
+   cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines,
+   lower_uniform_sbfx, lower_uniform_ubfx]> {
 }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
index ee324a5e93f0f..2100900bb8eb2 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
@@ -89,6 +89,8 @@ class AMDGPURegBankCombinerImpl : public Combiner {
 
   void applyCanonicalizeZextShiftAmt(MachineInstr &MI, MachineInstr &Ext) 
const;
 
+  bool lowerUniformBFX(MachineInstr &MI) const;
+
 private:
   SIModeRegisterDefaults getMode() const;
   bool getIEEE() const;
@@ -392,6 +394,55 @@ void 
AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt(
   MI.eraseFromParent();
 }
 
+bool AMDGPURegBankCombinerImpl::lowerUniformBFX(MachineInstr &MI) const {
+  assert(MI.getOpcode() == TargetOpcode::G_UBFX ||
+ MI.getOpcode() == TargetOpcode::G_SBFX);
+  const bool Signed = (MI.getOpcode() == TargetOpcode::G_SBFX);
+
+  Register DstReg = MI.getOperand(0).getReg();
+  const RegisterBank *RB = RBI.getRegBank(DstReg, MRI, TRI);
+  assert(RB && "No RB?");
+  if (RB->getID() != AMDGPU::SGPRRegBankID)
+return false;
+
+  Register SrcReg = MI.getOperand(1).getReg();
+  Register OffsetReg = MI.getOperand(2).getReg();
+  Register WidthReg = MI.getOperand(3).getReg();
+
+  const LLT S32 = LLT::scalar(32);
+  LLT Ty = MRI.getType(DstReg);
+
+  const unsigned Opc = (Ty == S32)
+   ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32)
+   : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
+
+  // Ensure the high bits are clear to insert the offset.
+  auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6));
+  auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
+
+  // Zeros out the low bits, so don't bother clamping the input value.
+  auto ShiftAmt = B.buildConstant(S32, 16);
+  auto ShiftWidth = B.buildShl(S32, WidthReg, ShiftAmt);
+
+  // Transformation function, pack the offset and width of a BFE into
+  // the format expected by the S_BFE_I32 / S_BFE_U32. In the second
+  // source, bits [5:0] contain the offset and bits [22:16] the width.
+  auto MergedInputs = B.buildOr(S32, ClampOffset, ShiftWidth);
+
+  MRI.setRegBank(OffsetMask.getReg(0), *RB);
+  MRI.setRegBank(ClampOffset.getReg(0), *RB);
+  MRI.setRegBank(ShiftAmt.getReg(0), *RB);
+  MRI.setRegBank(ShiftWidth.getReg(0), *RB);
+  MRI.setRegBank(MergedInputs.getReg(0), *RB);
+
+  auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
+  if (!constrainSelectedInstRegOperands(*MIB, TII, TRI, RBI))
+llvm_unreachable("failed to constrain BFE");
+
+  MI.eraseFromParent();
+  return true;
+}
+
 SIModeRegisterDefaults AMDGPURegBankCombinerImpl::getMode() const {
   return MF.getInfo()->getMode();
 }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index dd7aef8f0c583..0b7d64ee67c34 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/li

[llvm-branch-commits] [llvm] [AMDGPU] Add KnownBits simplification combines to RegBankCombiner (PR #141591)

2025-05-28 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/141591

>From b249611564844064031ca7be93aeda517fad37ea Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Tue, 27 May 2025 12:29:02 +0200
Subject: [PATCH 1/2] [AMDGPU] Add KnownBits simplification combines to
 RegBankCombiner

---
 llvm/lib/Target/AMDGPU/AMDGPUCombine.td   |  3 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   | 59 -
 .../test/CodeGen/AMDGPU/GlobalISel/saddsat.ll | 61 +++---
 .../test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll | 63 +++
 llvm/test/CodeGen/AMDGPU/div_i128.ll  | 30 -
 llvm/test/CodeGen/AMDGPU/itofp.i128.ll| 11 ++--
 llvm/test/CodeGen/AMDGPU/lround.ll| 18 +++---
 llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll   | 16 +
 8 files changed, 104 insertions(+), 157 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td 
b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index 96be17c487130..df867aaa204b1 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -210,5 +210,6 @@ def AMDGPURegBankCombiner : GICombiner<
fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp,
identity_combines, redundant_and, constant_fold_cast_op,
cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines,
-   lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract]> {
+   lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract,
+   known_bits_simplifications]> {
 }
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
index 6baa10bb48621..cc0f45681a3e2 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
@@ -1744,63 +1744,64 @@ define i65 @v_lshr_i65_33(i65 %value) {
 ; GFX6-LABEL: v_lshr_i65_33:
 ; GFX6:   ; %bb.0:
 ; GFX6-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX6-NEXT:v_mov_b32_e32 v3, v1
-; GFX6-NEXT:v_mov_b32_e32 v0, 1
+; GFX6-NEXT:v_mov_b32_e32 v3, 1
+; GFX6-NEXT:v_mov_b32_e32 v4, 0
+; GFX6-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX6-NEXT:v_lshl_b64 v[2:3], v[3:4], 31
+; GFX6-NEXT:v_lshrrev_b32_e32 v0, 1, v1
+; GFX6-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX6-NEXT:v_mov_b32_e32 v1, 0
-; GFX6-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX6-NEXT:v_lshl_b64 v[0:1], v[0:1], 31
-; GFX6-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX6-NEXT:v_or_b32_e32 v0, v2, v0
 ; GFX6-NEXT:v_mov_b32_e32 v2, 0
 ; GFX6-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: v_lshr_i65_33:
 ; GFX8:   ; %bb.0:
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_mov_b32_e32 v3, v1
-; GFX8-NEXT:v_mov_b32_e32 v0, 1
+; GFX8-NEXT:v_mov_b32_e32 v3, 1
+; GFX8-NEXT:v_mov_b32_e32 v4, 0
+; GFX8-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX8-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4]
+; GFX8-NEXT:v_lshrrev_b32_e32 v0, 1, v1
+; GFX8-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX8-NEXT:v_mov_b32_e32 v1, 0
-; GFX8-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX8-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1]
-; GFX8-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX8-NEXT:v_or_b32_e32 v0, v2, v0
 ; GFX8-NEXT:v_mov_b32_e32 v2, 0
 ; GFX8-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: v_lshr_i65_33:
 ; GFX9:   ; %bb.0:
 ; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:v_mov_b32_e32 v3, v1
-; GFX9-NEXT:v_mov_b32_e32 v0, 1
+; GFX9-NEXT:v_mov_b32_e32 v3, 1
+; GFX9-NEXT:v_mov_b32_e32 v4, 0
+; GFX9-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX9-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4]
+; GFX9-NEXT:v_lshrrev_b32_e32 v0, 1, v1
+; GFX9-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX9-NEXT:v_mov_b32_e32 v1, 0
-; GFX9-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX9-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1]
-; GFX9-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX9-NEXT:v_or_b32_e32 v0, v2, v0
 ; GFX9-NEXT:v_mov_b32_e32 v2, 0
 ; GFX9-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX10-LABEL: v_lshr_i65_33:
 ; GFX10:   ; %bb.0:
 ; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT:v_mov_b32_e32 v3, v1
-; GFX10-NEXT:v_mov_b32_e32 v0, 1
+; GFX10-NEXT:v_mov_b32_e32 v3, 1
+; GFX10-NEXT:v_mov_b32_e32 v4, 0
+; GFX10-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX10-NEXT:v_lshrrev_b32_e32 v0, 1, v1
 ; GFX10-NEXT:v_mov_b32_e32 v1, 0
-; GFX10-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX10-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX10-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1]
-; GFX10-NEXT:v_or_b32_e32 v0, v2, v0
+; GFX10-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4]
+; GFX10-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX10-NEXT:v_mov_b32_e32 v2, 0
 ; GFX10-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: v_lshr_i65_33:
 ; GFX11:   ; %bb.0:
 ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:v_dual_mov_b32 v3, v1 :: v_dual_mov_b32 v0, 1
-; GFX11-NEXT:v_dual_mov_b32 v1, 0 :: v_dual_an

[llvm-branch-commits] [llvm] [AMDGPU] Move S_BFE lowering into RegBankCombiner (PR #141589)

2025-05-28 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/141589

>From e253bde72750576cab699ad1b6b872fbf60dffe9 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Tue, 27 May 2025 11:16:16 +0200
Subject: [PATCH 1/2] [AMDGPU] Move S_BFE lowering into RegBankCombiner

---
 llvm/lib/Target/AMDGPU/AMDGPUCombine.td   |  14 +-
 .../Target/AMDGPU/AMDGPURegBankCombiner.cpp   |  51 +++
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  | 125 --
 3 files changed, 119 insertions(+), 71 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td 
b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index 9587fad1ecd63..94e1175b06b14 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -151,6 +151,17 @@ def zext_of_shift_amount_combines : GICombineGroup<[
   canonicalize_zext_lshr, canonicalize_zext_ashr, canonicalize_zext_shl
 ]>;
 
+// Early select of uniform BFX into S_BFE instructions.
+// These instructions encode the offset/width in a way that requires using
+// bitwise operations. Selecting these instructions early allow the combiner
+// to potentially fold these.
+class lower_uniform_bfx : GICombineRule<
+  (defs root:$bfx),
+  (combine (bfx $dst, $src, $o, $w):$bfx, [{ return lowerUniformBFX(*${bfx}); 
}])>;
+
+def lower_uniform_sbfx : lower_uniform_bfx;
+def lower_uniform_ubfx : lower_uniform_bfx;
+
 let Predicates = [Has16BitInsts, NotHasMed3_16] in {
 // For gfx8, expand f16-fmed3-as-f32 into a min/max f16 sequence. This
 // saves one instruction compared to the promotion.
@@ -198,5 +209,6 @@ def AMDGPURegBankCombiner : GICombiner<
zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain,
fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp,
identity_combines, redundant_and, constant_fold_cast_op,
-   cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines]> {
+   cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines,
+   lower_uniform_sbfx, lower_uniform_ubfx]> {
 }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
index ee324a5e93f0f..2100900bb8eb2 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
@@ -89,6 +89,8 @@ class AMDGPURegBankCombinerImpl : public Combiner {
 
   void applyCanonicalizeZextShiftAmt(MachineInstr &MI, MachineInstr &Ext) 
const;
 
+  bool lowerUniformBFX(MachineInstr &MI) const;
+
 private:
   SIModeRegisterDefaults getMode() const;
   bool getIEEE() const;
@@ -392,6 +394,55 @@ void 
AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt(
   MI.eraseFromParent();
 }
 
+bool AMDGPURegBankCombinerImpl::lowerUniformBFX(MachineInstr &MI) const {
+  assert(MI.getOpcode() == TargetOpcode::G_UBFX ||
+ MI.getOpcode() == TargetOpcode::G_SBFX);
+  const bool Signed = (MI.getOpcode() == TargetOpcode::G_SBFX);
+
+  Register DstReg = MI.getOperand(0).getReg();
+  const RegisterBank *RB = RBI.getRegBank(DstReg, MRI, TRI);
+  assert(RB && "No RB?");
+  if (RB->getID() != AMDGPU::SGPRRegBankID)
+return false;
+
+  Register SrcReg = MI.getOperand(1).getReg();
+  Register OffsetReg = MI.getOperand(2).getReg();
+  Register WidthReg = MI.getOperand(3).getReg();
+
+  const LLT S32 = LLT::scalar(32);
+  LLT Ty = MRI.getType(DstReg);
+
+  const unsigned Opc = (Ty == S32)
+   ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32)
+   : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
+
+  // Ensure the high bits are clear to insert the offset.
+  auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6));
+  auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
+
+  // Zeros out the low bits, so don't bother clamping the input value.
+  auto ShiftAmt = B.buildConstant(S32, 16);
+  auto ShiftWidth = B.buildShl(S32, WidthReg, ShiftAmt);
+
+  // Transformation function, pack the offset and width of a BFE into
+  // the format expected by the S_BFE_I32 / S_BFE_U32. In the second
+  // source, bits [5:0] contain the offset and bits [22:16] the width.
+  auto MergedInputs = B.buildOr(S32, ClampOffset, ShiftWidth);
+
+  MRI.setRegBank(OffsetMask.getReg(0), *RB);
+  MRI.setRegBank(ClampOffset.getReg(0), *RB);
+  MRI.setRegBank(ShiftAmt.getReg(0), *RB);
+  MRI.setRegBank(ShiftWidth.getReg(0), *RB);
+  MRI.setRegBank(MergedInputs.getReg(0), *RB);
+
+  auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
+  if (!constrainSelectedInstRegOperands(*MIB, TII, TRI, RBI))
+llvm_unreachable("failed to constrain BFE");
+
+  MI.eraseFromParent();
+  return true;
+}
+
 SIModeRegisterDefaults AMDGPURegBankCombinerImpl::getMode() const {
   return MF.getInfo()->getMode();
 }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index dd7aef8f0c583..0b7d64ee67c34 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/li

[llvm-branch-commits] [llvm] [AMDGPU] Add KnownBits simplification combines to RegBankCombiner (PR #141591)

2025-05-28 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/141591

>From b249611564844064031ca7be93aeda517fad37ea Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Tue, 27 May 2025 12:29:02 +0200
Subject: [PATCH 1/2] [AMDGPU] Add KnownBits simplification combines to
 RegBankCombiner

---
 llvm/lib/Target/AMDGPU/AMDGPUCombine.td   |  3 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   | 59 -
 .../test/CodeGen/AMDGPU/GlobalISel/saddsat.ll | 61 +++---
 .../test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll | 63 +++
 llvm/test/CodeGen/AMDGPU/div_i128.ll  | 30 -
 llvm/test/CodeGen/AMDGPU/itofp.i128.ll| 11 ++--
 llvm/test/CodeGen/AMDGPU/lround.ll| 18 +++---
 llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll   | 16 +
 8 files changed, 104 insertions(+), 157 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td 
b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index 96be17c487130..df867aaa204b1 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -210,5 +210,6 @@ def AMDGPURegBankCombiner : GICombiner<
fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp,
identity_combines, redundant_and, constant_fold_cast_op,
cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines,
-   lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract]> {
+   lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract,
+   known_bits_simplifications]> {
 }
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
index 6baa10bb48621..cc0f45681a3e2 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
@@ -1744,63 +1744,64 @@ define i65 @v_lshr_i65_33(i65 %value) {
 ; GFX6-LABEL: v_lshr_i65_33:
 ; GFX6:   ; %bb.0:
 ; GFX6-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX6-NEXT:v_mov_b32_e32 v3, v1
-; GFX6-NEXT:v_mov_b32_e32 v0, 1
+; GFX6-NEXT:v_mov_b32_e32 v3, 1
+; GFX6-NEXT:v_mov_b32_e32 v4, 0
+; GFX6-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX6-NEXT:v_lshl_b64 v[2:3], v[3:4], 31
+; GFX6-NEXT:v_lshrrev_b32_e32 v0, 1, v1
+; GFX6-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX6-NEXT:v_mov_b32_e32 v1, 0
-; GFX6-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX6-NEXT:v_lshl_b64 v[0:1], v[0:1], 31
-; GFX6-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX6-NEXT:v_or_b32_e32 v0, v2, v0
 ; GFX6-NEXT:v_mov_b32_e32 v2, 0
 ; GFX6-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: v_lshr_i65_33:
 ; GFX8:   ; %bb.0:
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_mov_b32_e32 v3, v1
-; GFX8-NEXT:v_mov_b32_e32 v0, 1
+; GFX8-NEXT:v_mov_b32_e32 v3, 1
+; GFX8-NEXT:v_mov_b32_e32 v4, 0
+; GFX8-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX8-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4]
+; GFX8-NEXT:v_lshrrev_b32_e32 v0, 1, v1
+; GFX8-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX8-NEXT:v_mov_b32_e32 v1, 0
-; GFX8-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX8-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1]
-; GFX8-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX8-NEXT:v_or_b32_e32 v0, v2, v0
 ; GFX8-NEXT:v_mov_b32_e32 v2, 0
 ; GFX8-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: v_lshr_i65_33:
 ; GFX9:   ; %bb.0:
 ; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:v_mov_b32_e32 v3, v1
-; GFX9-NEXT:v_mov_b32_e32 v0, 1
+; GFX9-NEXT:v_mov_b32_e32 v3, 1
+; GFX9-NEXT:v_mov_b32_e32 v4, 0
+; GFX9-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX9-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4]
+; GFX9-NEXT:v_lshrrev_b32_e32 v0, 1, v1
+; GFX9-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX9-NEXT:v_mov_b32_e32 v1, 0
-; GFX9-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX9-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1]
-; GFX9-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX9-NEXT:v_or_b32_e32 v0, v2, v0
 ; GFX9-NEXT:v_mov_b32_e32 v2, 0
 ; GFX9-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX10-LABEL: v_lshr_i65_33:
 ; GFX10:   ; %bb.0:
 ; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT:v_mov_b32_e32 v3, v1
-; GFX10-NEXT:v_mov_b32_e32 v0, 1
+; GFX10-NEXT:v_mov_b32_e32 v3, 1
+; GFX10-NEXT:v_mov_b32_e32 v4, 0
+; GFX10-NEXT:v_and_b32_e32 v3, 1, v2
+; GFX10-NEXT:v_lshrrev_b32_e32 v0, 1, v1
 ; GFX10-NEXT:v_mov_b32_e32 v1, 0
-; GFX10-NEXT:v_and_b32_e32 v0, 1, v2
-; GFX10-NEXT:v_lshrrev_b32_e32 v2, 1, v3
-; GFX10-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1]
-; GFX10-NEXT:v_or_b32_e32 v0, v2, v0
+; GFX10-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4]
+; GFX10-NEXT:v_or_b32_e32 v0, v0, v2
 ; GFX10-NEXT:v_mov_b32_e32 v2, 0
 ; GFX10-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: v_lshr_i65_33:
 ; GFX11:   ; %bb.0:
 ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:v_dual_mov_b32 v3, v1 :: v_dual_mov_b32 v0, 1
-; GFX11-NEXT:v_dual_mov_b32 v1, 0 :: v_dual_an

[llvm-branch-commits] [llvm] [BOLT] Factor out MCInstReference from gadget scanner (NFC) (PR #138655)

2025-05-28 Thread Maksim Panchenko via llvm-branch-commits


https://github.com/maksfb approved this pull request.

As an NFC this change looks good to me. I've left a comments for a follow-up.

https://github.com/llvm/llvm-project/pull/138655
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Factor out MCInstReference from gadget scanner (NFC) (PR #138655)

2025-05-28 Thread Maksim Panchenko via llvm-branch-commits



@@ -1682,48 +1648,66 @@ void Analysis::runOnFunction(BinaryFunction &BF,
   }
 }
 
+// Compute the instruction address for printing (may be slow).
+static uint64_t getAddress(const MCInstReference &Inst) {
+  const BinaryFunction *BF = Inst.getFunction();
+
+  if (Inst.hasCFG()) {
+const BinaryBasicBlock *BB = Inst.getBasicBlock();
+
+auto It = static_cast(&Inst.getMCInst());
+unsigned IndexInBB = std::distance(BB->begin(), It);
+
+// FIXME: this assumes all instructions are 4 bytes in size. This is true

maksfb wrote:

We have `BinaryContext::computeCodeSize()`.

https://github.com/llvm/llvm-project/pull/138655
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Factor out MCInstReference from gadget scanner (NFC) (PR #138655)

2025-05-28 Thread Maksim Panchenko via llvm-branch-commits



@@ -0,0 +1,168 @@
+//===- bolt/Core/MCInstUtils.h --*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef BOLT_CORE_MCINSTUTILS_H
+#define BOLT_CORE_MCINSTUTILS_H
+
+#include "bolt/Core/BinaryBasicBlock.h"
+
+#include 
+#include 
+#include 
+
+namespace llvm {
+namespace bolt {
+
+class BinaryFunction;
+
+/// MCInstReference represents a reference to a constant MCInst as stored 
either
+/// in a BinaryFunction (i.e. before a CFG is created), or in a 
BinaryBasicBlock
+/// (after a CFG is created).
+class MCInstReference {
+  using nocfg_const_iterator = std::map::const_iterator;
+
+  // Two cases are possible:
+  // * functions with CFG reconstructed - a function stores a collection of
+  //   basic blocks, each basic block stores a contiguous vector of MCInst
+  // * functions without CFG - there are no basic blocks created,
+  //   the instructions are directly stored in std::map in BinaryFunction
+  //
+  // In both cases, the direct parent of MCInst is stored together with an
+  // iterator pointing to the instruction.
+
+  // Helper struct: CFG is available, the direct parent is a basic block,
+  // iterator's type is `MCInst *`.
+  struct RefInBB {
+RefInBB(const BinaryBasicBlock *BB, const MCInst *Inst)
+: BB(BB), It(Inst) {}
+RefInBB(const RefInBB &Other) = default;
+RefInBB &operator=(const RefInBB &Other) = default;
+
+const BinaryBasicBlock *BB;
+BinaryBasicBlock::const_iterator It;
+
+bool operator<(const RefInBB &Other) const {
+  return std::tie(BB, It) < std::tie(Other.BB, Other.It);
+}
+
+bool operator==(const RefInBB &Other) const {
+  return BB == Other.BB && It == Other.It;
+}
+  };
+
+  // Helper struct: CFG is *not* available, the direct parent is a function,
+  // iterator's type is std::map::iterator (the mapped value
+  // is an instruction's offset).
+  struct RefInBF {
+RefInBF(const BinaryFunction *BF, nocfg_const_iterator It)
+: BF(BF), It(It) {}
+RefInBF(const RefInBF &Other) = default;
+RefInBF &operator=(const RefInBF &Other) = default;
+
+const BinaryFunction *BF;
+nocfg_const_iterator It;
+
+bool operator<(const RefInBF &Other) const {

maksfb wrote:

Similar concern regarding the `BinaryFunction *` order.

https://github.com/llvm/llvm-project/pull/138655
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Factor out MCInstReference from gadget scanner (NFC) (PR #138655)

2025-05-28 Thread Maksim Panchenko via llvm-branch-commits



@@ -0,0 +1,168 @@
+//===- bolt/Core/MCInstUtils.h --*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef BOLT_CORE_MCINSTUTILS_H
+#define BOLT_CORE_MCINSTUTILS_H
+
+#include "bolt/Core/BinaryBasicBlock.h"
+
+#include 
+#include 
+#include 
+
+namespace llvm {
+namespace bolt {
+
+class BinaryFunction;
+
+/// MCInstReference represents a reference to a constant MCInst as stored 
either
+/// in a BinaryFunction (i.e. before a CFG is created), or in a 
BinaryBasicBlock
+/// (after a CFG is created).
+class MCInstReference {
+  using nocfg_const_iterator = std::map::const_iterator;
+
+  // Two cases are possible:
+  // * functions with CFG reconstructed - a function stores a collection of
+  //   basic blocks, each basic block stores a contiguous vector of MCInst
+  // * functions without CFG - there are no basic blocks created,
+  //   the instructions are directly stored in std::map in BinaryFunction
+  //
+  // In both cases, the direct parent of MCInst is stored together with an
+  // iterator pointing to the instruction.
+
+  // Helper struct: CFG is available, the direct parent is a basic block,
+  // iterator's type is `MCInst *`.
+  struct RefInBB {
+RefInBB(const BinaryBasicBlock *BB, const MCInst *Inst)
+: BB(BB), It(Inst) {}
+RefInBB(const RefInBB &Other) = default;
+RefInBB &operator=(const RefInBB &Other) = default;
+
+const BinaryBasicBlock *BB;
+BinaryBasicBlock::const_iterator It;
+
+bool operator<(const RefInBB &Other) const {

maksfb wrote:

What are expected uses for this comparison? I'm concerned about 
non-deterministic order of `BinaryBasicBlock *`.

https://github.com/llvm/llvm-project/pull/138655
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Factor out MCInstReference from gadget scanner (NFC) (PR #138655)

2025-05-28 Thread Maksim Panchenko via llvm-branch-commits


https://github.com/maksfb edited 
https://github.com/llvm/llvm-project/pull/138655
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Factor out MCInstReference from gadget scanner (NFC) (PR #138655)

2025-05-28 Thread Maksim Panchenko via llvm-branch-commits



@@ -0,0 +1,168 @@
+//===- bolt/Core/MCInstUtils.h --*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef BOLT_CORE_MCINSTUTILS_H
+#define BOLT_CORE_MCINSTUTILS_H
+
+#include "bolt/Core/BinaryBasicBlock.h"
+

maksfb wrote:

nit: drop empty line.

https://github.com/llvm/llvm-project/pull/138655
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Add overflow operations to isBoolSGPR (PR #141803)

2025-05-28 Thread Stanislav Mekhanoshin via llvm-branch-commits


https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/141803
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Factor out MCInstReference from gadget scanner (NFC) (PR #138655)

2025-05-28 Thread Maksim Panchenko via llvm-branch-commits



@@ -0,0 +1,57 @@
+//===- bolt/Passes/MCInstUtils.cpp 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "bolt/Core/MCInstUtils.h"
+

maksfb wrote:

nit: empty line.

https://github.com/llvm/llvm-project/pull/138655
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add parsing of address params in StaticSampler (PR #140293)

2025-05-28 Thread Finn Plummer via llvm-branch-commits


https://github.com/inbelic edited 
https://github.com/llvm/llvm-project/pull/140293
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [DirectX] Improve error message when a binding cannot be found for a resource (PR #140642)

2025-05-28 Thread Helena Kotas via llvm-branch-commits


https://github.com/hekota closed 
https://github.com/llvm/llvm-project/pull/140642
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: fix LR to be safe in leaf functions without CFG (PR #141824)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko created 
https://github.com/llvm/llvm-project/pull/141824

After a label in a function without CFG information, use a reasonably
pessimistic estimation of register state (assume that any register that
can be clobbered in this function was actually clobbered) instead of the
most pessimistic "all registers are unsafe". This is the same estimation
as used by the dataflow variant of the analysis when the preceding
instruction is not known for sure.

Without this, leaf functions without CFG information are likely to have
false positive reports about non-protected return instructions, as
1) LR is unlikely to be signed and authenticated in a leaf function and
2) LR is likely to be used by a return instruction near the end of the
   function and
3) the register state is likely to be reset at least once during the
   linear scan through the function

>From 7d38c3ebb3dd7f67f87b494e2dfe6e6c4ca29787 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Wed, 14 May 2025 23:12:13 +0300
Subject: [PATCH] [BOLT] Gadget scanner: fix LR to be safe in leaf functions
 without CFG

After a label in a function without CFG information, use a reasonably
pessimistic estimation of register state (assume that any register that
can be clobbered in this function was actually clobbered) instead of the
most pessimistic "all registers are unsafe". This is the same estimation
as used by the dataflow variant of the analysis when the preceding
instruction is not known for sure.

Without this, leaf functions without CFG information are likely to have
false positive reports about non-protected return instructions, as
1) LR is unlikely to be signed and authenticated in a leaf function and
2) LR is likely to be used by a return instruction near the end of the
   function and
3) the register state is likely to be reset at least once during the
   linear scan through the function
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 14 +++--
 .../AArch64/gs-pacret-autiasp.s   | 31 +--
 .../AArch64/gs-pauth-authentication-oracles.s | 20 
 .../AArch64/gs-pauth-debug-output.s   | 30 ++
 .../AArch64/gs-pauth-signing-oracles.s| 27 
 5 files changed, 29 insertions(+), 93 deletions(-)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 2aacb38ee19a9..6327a2da54d5b 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -737,19 +737,14 @@ template  class CFGUnawareAnalysis {
 //
 // Then, a function can be split into a number of disjoint contiguous sequences
 // of instructions without labels in between. These sequences can be processed
-// the same way basic blocks are processed by data-flow analysis, assuming
-// pessimistically that all registers are unsafe at the start of each sequence.
+// the same way basic blocks are processed by data-flow analysis, with the same
+// pessimistic estimation of the initial state at the start of each sequence
+// (except the first instruction of the function).
 class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis,
 public CFGUnawareAnalysis {
   using SrcSafetyAnalysis::BC;
   BinaryFunction &BF;
 
-  /// Creates a state with all registers marked unsafe (not to be confused
-  /// with empty state).
-  SrcState createUnsafeState() const {
-return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
-  }
-
 public:
   CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF,
   MCPlusBuilder::AllocatorIdTy AllocId,
@@ -759,6 +754,7 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis,
   }
 
   void run() override {
+const SrcState DefaultState = computePessimisticState(BF);
 SrcState S = createEntryState();
 for (auto &I : BF.instrs()) {
   MCInst &Inst = I.second;
@@ -773,7 +769,7 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis,
 LLVM_DEBUG({
   traceInst(BC, "Due to label, resetting the state before", Inst);
 });
-S = createUnsafeState();
+S = DefaultState;
   }
 
   // Attach the state *before* this instruction executes.
diff --git a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s 
b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
index df0a83be00986..627f8eb20ab9c 100644
--- a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
+++ b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
@@ -224,20 +224,33 @@ f_unreachable_instruction:
 ret
 .size f_unreachable_instruction, .-f_unreachable_instruction
 
-// Expected false positive: without CFG, the state is reset to all-unsafe
-// after an unconditional branch.
-
-.globl  state_is_reset_after_indirect_branch_nocfg
-.type   state_is_reset_after_indirect_branch_nocfg,@function
-state_is_reset_after_indirect_branch_nocfg:
-// CHECK-LABEL: GS

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: fix LR to be safe in leaf functions without CFG (PR #141824)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


atrosinenko wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/141824?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#141824** https://app.graphite.dev/github/pr/llvm/llvm-project/141824?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/141824?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#136183** https://app.graphite.dev/github/pr/llvm/llvm-project/136183?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>: 1 other dependent PR 
([#137224](https://github.com/llvm/llvm-project/pull/137224) https://app.graphite.dev/github/pr/llvm/llvm-project/137224?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>)
* **#136151** https://app.graphite.dev/github/pr/llvm/llvm-project/136151?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#135663** https://app.graphite.dev/github/pr/llvm/llvm-project/135663?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#136147** https://app.graphite.dev/github/pr/llvm/llvm-project/136147?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#135662** https://app.graphite.dev/github/pr/llvm/llvm-project/135662?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#135661** https://app.graphite.dev/github/pr/llvm/llvm-project/135661?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#134146** https://app.graphite.dev/github/pr/llvm/llvm-project/134146?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#133461** https://app.graphite.dev/github/pr/llvm/llvm-project/133461?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#135073** https://app.graphite.dev/github/pr/llvm/llvm-project/135073?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/141824
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: fix LR to be safe in leaf functions without CFG (PR #141824)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko ready_for_review 
https://github.com/llvm/llvm-project/pull/141824
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: fix LR to be safe in leaf functions without CFG (PR #141824)

2025-05-28 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-bolt

Author: Anatoly Trosinenko (atrosinenko)


Changes

After a label in a function without CFG information, use a reasonably
pessimistic estimation of register state (assume that any register that
can be clobbered in this function was actually clobbered) instead of the
most pessimistic "all registers are unsafe". This is the same estimation
as used by the dataflow variant of the analysis when the preceding
instruction is not known for sure.

Without this, leaf functions without CFG information are likely to have
false positive reports about non-protected return instructions, as
1) LR is unlikely to be signed and authenticated in a leaf function and
2) LR is likely to be used by a return instruction near the end of the
   function and
3) the register state is likely to be reset at least once during the
   linear scan through the function

---
Full diff: https://github.com/llvm/llvm-project/pull/141824.diff


5 Files Affected:

- (modified) bolt/lib/Passes/PAuthGadgetScanner.cpp (+5-9) 
- (modified) bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s (+22-9) 
- (modified) 
bolt/test/binary-analysis/AArch64/gs-pauth-authentication-oracles.s (-20) 
- (modified) bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s (+2-28) 
- (modified) bolt/test/binary-analysis/AArch64/gs-pauth-signing-oracles.s (-27) 


``diff
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 2aacb38ee19a9..6327a2da54d5b 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -737,19 +737,14 @@ template  class CFGUnawareAnalysis {
 //
 // Then, a function can be split into a number of disjoint contiguous sequences
 // of instructions without labels in between. These sequences can be processed
-// the same way basic blocks are processed by data-flow analysis, assuming
-// pessimistically that all registers are unsafe at the start of each sequence.
+// the same way basic blocks are processed by data-flow analysis, with the same
+// pessimistic estimation of the initial state at the start of each sequence
+// (except the first instruction of the function).
 class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis,
 public CFGUnawareAnalysis {
   using SrcSafetyAnalysis::BC;
   BinaryFunction &BF;
 
-  /// Creates a state with all registers marked unsafe (not to be confused
-  /// with empty state).
-  SrcState createUnsafeState() const {
-return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
-  }
-
 public:
   CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF,
   MCPlusBuilder::AllocatorIdTy AllocId,
@@ -759,6 +754,7 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis,
   }
 
   void run() override {
+const SrcState DefaultState = computePessimisticState(BF);
 SrcState S = createEntryState();
 for (auto &I : BF.instrs()) {
   MCInst &Inst = I.second;
@@ -773,7 +769,7 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis,
 LLVM_DEBUG({
   traceInst(BC, "Due to label, resetting the state before", Inst);
 });
-S = createUnsafeState();
+S = DefaultState;
   }
 
   // Attach the state *before* this instruction executes.
diff --git a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s 
b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
index df0a83be00986..627f8eb20ab9c 100644
--- a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
+++ b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
@@ -224,20 +224,33 @@ f_unreachable_instruction:
 ret
 .size f_unreachable_instruction, .-f_unreachable_instruction
 
-// Expected false positive: without CFG, the state is reset to all-unsafe
-// after an unconditional branch.
-
-.globl  state_is_reset_after_indirect_branch_nocfg
-.type   state_is_reset_after_indirect_branch_nocfg,@function
-state_is_reset_after_indirect_branch_nocfg:
-// CHECK-LABEL: GS-PAUTH: non-protected ret found in function 
state_is_reset_after_indirect_branch_nocfg, at address
-// CHECK-NEXT:  The instruction is {{[0-9a-f]+}}: ret
+// Without CFG, the state is reset at labels, assuming every register that can
+// be clobbered in the function was actually clobbered.
+
+.globl  lr_untouched_nocfg
+.type   lr_untouched_nocfg,@function
+lr_untouched_nocfg:
+// CHECK-NOT: lr_untouched_nocfg
+adr x2, 1f
+br  x2
+1:
+ret
+.size lr_untouched_nocfg, .-lr_untouched_nocfg
+
+.globl  lr_clobbered_nocfg
+.type   lr_clobbered_nocfg,@function
+lr_clobbered_nocfg:
+// CHECK-LABEL: GS-PAUTH: non-protected ret found in function 
lr_clobbered_nocfg, at address
+// CHECK-NEXT:  The instruction is {{[0-9a-f]+}}:  ret
 // CHECK-NEXT:  The 0 instructions that write to the affected registers after 
any authent

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko edited 
https://github.com/llvm/llvm-project/pull/137224
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: make use of C++17 features and LLVM helpers (PR #141665)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/141665

>From d3742598bbf2a248124fe1b297d1447c52e40be1 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 27 May 2025 21:06:03 +0300
Subject: [PATCH] [BOLT] Gadget scanner: make use of C++17 features and LLVM
 helpers

Perform trivial syntactical cleanups:
* make use of structured binding declarations
* use LLVM utility functions when appropriate
* omit braces around single expression inside single-line LLVM_DEBUG()

This patch is NFC aside from minor debug output changes.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 67 +--
 .../AArch64/gs-pauth-debug-output.s   | 14 ++--
 2 files changed, 38 insertions(+), 43 deletions(-)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 34b5b1d51de4e..dac274c0f4130 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -88,8 +88,8 @@ class TrackedRegisters {
   TrackedRegisters(ArrayRef RegsToTrack)
   : Registers(RegsToTrack),
 RegToIndexMapping(getMappingSize(RegsToTrack), NoIndex) {
-for (unsigned I = 0; I < RegsToTrack.size(); ++I)
-  RegToIndexMapping[RegsToTrack[I]] = I;
+for (auto [MappedIndex, Reg] : llvm::enumerate(RegsToTrack))
+  RegToIndexMapping[Reg] = MappedIndex;
   }
 
   ArrayRef getRegisters() const { return Registers; }
@@ -203,9 +203,9 @@ struct SrcState {
 
 SafeToDerefRegs &= StateIn.SafeToDerefRegs;
 TrustedRegs &= StateIn.TrustedRegs;
-for (unsigned I = 0; I < LastInstWritingReg.size(); ++I)
-  for (const MCInst *J : StateIn.LastInstWritingReg[I])
-LastInstWritingReg[I].insert(J);
+for (auto [ThisSet, OtherSet] :
+ llvm::zip_equal(LastInstWritingReg, StateIn.LastInstWritingReg))
+  ThisSet.insert_range(OtherSet);
 return *this;
   }
 
@@ -224,11 +224,9 @@ struct SrcState {
 static void printInstsShort(raw_ostream &OS,
 ArrayRef Insts) {
   OS << "Insts: ";
-  for (unsigned I = 0; I < Insts.size(); ++I) {
-auto &Set = Insts[I];
+  for (auto [I, PtrSet] : llvm::enumerate(Insts)) {
 OS << "[" << I << "](";
-for (const MCInst *MCInstP : Set)
-  OS << MCInstP << " ";
+interleave(PtrSet, OS, " ");
 OS << ")";
   }
 }
@@ -416,8 +414,9 @@ class SrcSafetyAnalysis {
 // ... an address can be updated in a safe manner, producing the result
 // which is as trusted as the input address.
 if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) {
-  if (Cur.SafeToDerefRegs[DstAndSrc->second])
-Regs.push_back(DstAndSrc->first);
+  auto [DstReg, SrcReg] = *DstAndSrc;
+  if (Cur.SafeToDerefRegs[SrcReg])
+Regs.push_back(DstReg);
 }
 
 // Make sure explicit checker sequence keeps register safe-to-dereference
@@ -469,8 +468,9 @@ class SrcSafetyAnalysis {
 // ... an address can be updated in a safe manner, producing the result
 // which is as trusted as the input address.
 if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) {
-  if (Cur.TrustedRegs[DstAndSrc->second])
-Regs.push_back(DstAndSrc->first);
+  auto [DstReg, SrcReg] = *DstAndSrc;
+  if (Cur.TrustedRegs[SrcReg])
+Regs.push_back(DstReg);
 }
 
 return Regs;
@@ -858,9 +858,9 @@ struct DstState {
   return (*this = StateIn);
 
 CannotEscapeUnchecked &= StateIn.CannotEscapeUnchecked;
-for (unsigned I = 0; I < FirstInstLeakingReg.size(); ++I)
-  for (const MCInst *J : StateIn.FirstInstLeakingReg[I])
-FirstInstLeakingReg[I].insert(J);
+for (auto [ThisSet, OtherSet] :
+ llvm::zip_equal(FirstInstLeakingReg, StateIn.FirstInstLeakingReg))
+  ThisSet.insert_range(OtherSet);
 return *this;
   }
 
@@ -1025,8 +1025,7 @@ class DstSafetyAnalysis {
 
 // ... an address can be updated in a safe manner, or
 if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Inst)) {
-  MCPhysReg DstReg, SrcReg;
-  std::tie(DstReg, SrcReg) = *DstAndSrc;
+  auto [DstReg, SrcReg] = *DstAndSrc;
   // Note that *all* registers containing the derived values must be safe,
   // both source and destination ones. No temporaries are supported at now.
   if (Cur.CannotEscapeUnchecked[SrcReg] &&
@@ -1065,7 +1064,7 @@ class DstSafetyAnalysis {
 // If this instruction terminates the program immediately, no
 // authentication oracles are possible past this point.
 if (BC.MIB->isTrap(Point)) {
-  LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); });
+  LLVM_DEBUG(traceInst(BC, "Trap instruction found", Point));
   DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
   Next.CannotEscapeUnchecked.set();
   return Next;
@@ -1243,7 +1242,7 @@ class CFGUnawareDstSafetyAnalysis : public 
DstSafetyAnalysis,
   // starting to analyze Inst.

[llvm-branch-commits] [llvm] [BOLT] Factor out MCInstReference from gadget scanner (NFC) (PR #138655)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/138655

>From 5b9848cf82a1f047d90c1482404ac60f730892cf Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Mon, 28 Apr 2025 18:35:48 +0300
Subject: [PATCH] [BOLT] Factor out MCInstReference from gadget scanner (NFC)

Move MCInstReference representing a constant reference to an instruction
inside a parent entity - either inside a basic block (which has a
reference to its parent function) or directly to the function (when CFG
information is not available).
---
 bolt/include/bolt/Core/MCInstUtils.h  | 168 +
 bolt/include/bolt/Passes/PAuthGadgetScanner.h | 178 +-
 bolt/lib/Core/CMakeLists.txt  |   1 +
 bolt/lib/Core/MCInstUtils.cpp |  57 ++
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 102 +-
 5 files changed, 269 insertions(+), 237 deletions(-)
 create mode 100644 bolt/include/bolt/Core/MCInstUtils.h
 create mode 100644 bolt/lib/Core/MCInstUtils.cpp

diff --git a/bolt/include/bolt/Core/MCInstUtils.h 
b/bolt/include/bolt/Core/MCInstUtils.h
new file mode 100644
index 0..69bf5e6159b74
--- /dev/null
+++ b/bolt/include/bolt/Core/MCInstUtils.h
@@ -0,0 +1,168 @@
+//===- bolt/Core/MCInstUtils.h --*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef BOLT_CORE_MCINSTUTILS_H
+#define BOLT_CORE_MCINSTUTILS_H
+
+#include "bolt/Core/BinaryBasicBlock.h"
+
+#include 
+#include 
+#include 
+
+namespace llvm {
+namespace bolt {
+
+class BinaryFunction;
+
+/// MCInstReference represents a reference to a constant MCInst as stored 
either
+/// in a BinaryFunction (i.e. before a CFG is created), or in a 
BinaryBasicBlock
+/// (after a CFG is created).
+class MCInstReference {
+  using nocfg_const_iterator = std::map::const_iterator;
+
+  // Two cases are possible:
+  // * functions with CFG reconstructed - a function stores a collection of
+  //   basic blocks, each basic block stores a contiguous vector of MCInst
+  // * functions without CFG - there are no basic blocks created,
+  //   the instructions are directly stored in std::map in BinaryFunction
+  //
+  // In both cases, the direct parent of MCInst is stored together with an
+  // iterator pointing to the instruction.
+
+  // Helper struct: CFG is available, the direct parent is a basic block,
+  // iterator's type is `MCInst *`.
+  struct RefInBB {
+RefInBB(const BinaryBasicBlock *BB, const MCInst *Inst)
+: BB(BB), It(Inst) {}
+RefInBB(const RefInBB &Other) = default;
+RefInBB &operator=(const RefInBB &Other) = default;
+
+const BinaryBasicBlock *BB;
+BinaryBasicBlock::const_iterator It;
+
+bool operator<(const RefInBB &Other) const {
+  return std::tie(BB, It) < std::tie(Other.BB, Other.It);
+}
+
+bool operator==(const RefInBB &Other) const {
+  return BB == Other.BB && It == Other.It;
+}
+  };
+
+  // Helper struct: CFG is *not* available, the direct parent is a function,
+  // iterator's type is std::map::iterator (the mapped value
+  // is an instruction's offset).
+  struct RefInBF {
+RefInBF(const BinaryFunction *BF, nocfg_const_iterator It)
+: BF(BF), It(It) {}
+RefInBF(const RefInBF &Other) = default;
+RefInBF &operator=(const RefInBF &Other) = default;
+
+const BinaryFunction *BF;
+nocfg_const_iterator It;
+
+bool operator<(const RefInBF &Other) const {
+  return std::tie(BF, It->first) < std::tie(Other.BF, Other.It->first);
+}
+
+bool operator==(const RefInBF &Other) const {
+  return BF == Other.BF && It->first == Other.It->first;
+}
+  };
+
+  std::variant Reference;
+
+  // Utility methods to be used like this:
+  //
+  // if (auto *Ref = tryGetRefInBB())
+  //   return Ref->doSomething(...);
+  // return getRefInBF().doSomethingElse(...);
+  const RefInBB *tryGetRefInBB() const {
+assert(std::get_if(&Reference) ||
+   std::get_if(&Reference));
+return std::get_if(&Reference);
+  }
+  const RefInBF &getRefInBF() const {
+assert(std::get_if(&Reference));
+return *std::get_if(&Reference);
+  }
+
+public:
+  /// Constructs an empty reference.
+  MCInstReference() : Reference(RefInBB(nullptr, nullptr)) {}
+  /// Constructs a reference to the instruction inside the basic block.
+  MCInstReference(const BinaryBasicBlock *BB, const MCInst *Inst)
+  : Reference(RefInBB(BB, Inst)) {
+assert(BB && Inst && "Neither BB nor Inst should be nullptr");
+  }
+  /// Constructs a reference to the instruction inside the basic block.
+  MCInstReference(const BinaryBasicBlock *BB, unsigned Index)
+  : Reference(RefInBB(BB, &BB->getInstructionAtIndex(I

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: account for BRK when searching for auth oracles (PR #137975)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/137975

>From 74bbe1e6f6e759c369ecf517dbfa6f98c40e9ffb Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Wed, 30 Apr 2025 16:08:10 +0300
Subject: [PATCH] [BOLT] Gadget scanner: account for BRK when searching for
 auth oracles

An authenticated pointer can be explicitly checked by the compiler via a
sequence of instructions that executes BRK on failure. It is important
to recognize such BRK instruction as checking every register (as it is
expected to immediately trigger an abnormal program termination) to
prevent false positive reports about authentication oracles:

autia   x2, x3
autia   x0, x1
; neither x0 nor x2 are checked at this point
eor x16, x0, x0, lsl #1
tbz x16, #62, on_success ; marks x0 as checked
; end of BB: for x2 to be checked here, it must be checked in both
; successor basic blocks
  on_failure:
brk 0xc470
  on_success:
; x2 is checked
ldr x1, [x2] ; marks x2 as checked
---
 bolt/include/bolt/Core/MCPlusBuilder.h| 14 ++
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 13 +-
 .../Target/AArch64/AArch64MCPlusBuilder.cpp   | 24 --
 .../AArch64/gs-pauth-address-checks.s | 44 +--
 .../AArch64/gs-pauth-authentication-oracles.s |  9 ++--
 .../AArch64/gs-pauth-signing-oracles.s|  6 +--
 6 files changed, 75 insertions(+), 35 deletions(-)

diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h 
b/bolt/include/bolt/Core/MCPlusBuilder.h
index b233452985502..c8cbcaf33f4b5 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -707,6 +707,20 @@ class MCPlusBuilder {
 return false;
   }
 
+  /// Returns true if Inst is a trap instruction.
+  ///
+  /// Tests if Inst is an instruction that immediately causes an abnormal
+  /// program termination, for example when a security violation is detected
+  /// by a compiler-inserted check.
+  ///
+  /// @note An implementation of this method should likely return false for
+  /// calls to library functions like abort(), as it is possible that the
+  /// execution state is partially attacker-controlled at this point.
+  virtual bool isTrap(const MCInst &Inst) const {
+llvm_unreachable("not implemented");
+return false;
+  }
+
   virtual bool isBreakpoint(const MCInst &Inst) const {
 llvm_unreachable("not implemented");
 return false;
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 4c7ae3c880db4..11db51f6c6dd1 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -1066,6 +1066,15 @@ class DstSafetyAnalysis {
   dbgs() << ")\n";
 });
 
+// If this instruction terminates the program immediately, no
+// authentication oracles are possible past this point.
+if (BC.MIB->isTrap(Point)) {
+  LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); });
+  DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
+  Next.CannotEscapeUnchecked.set();
+  return Next;
+}
+
 // If this instruction is reachable by the analysis, a non-empty state will
 // be propagated to it sooner or later. Until then, skip computeNext().
 if (Cur.empty()) {
@@ -1173,8 +1182,8 @@ class DataflowDstSafetyAnalysis
 //
 // A basic block without any successors, on the other hand, can be
 // pessimistically initialized to everything-is-unsafe: this will naturally
-// handle both return and tail call instructions and is harmless for
-// internal indirect branch instructions (such as computed gotos).
+// handle return, trap and tail call instructions. At the same time, it is
+// harmless for internal indirect branch instructions, like computed gotos.
 if (BB.succ_empty())
   return createUnsafeState();
 
diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp 
b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
index 9d5a578cfbdff..b669d32cc2032 100644
--- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
@@ -386,10 +386,9 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
 // the list of successors of this basic block as appropriate.
 
 // Any of the above code sequences assume the fall-through basic block
-// is a dead-end BRK instruction (any immediate operand is accepted).
+// is a dead-end trap instruction.
 const BinaryBasicBlock *BreakBB = BB.getFallthrough();
-if (!BreakBB || BreakBB->empty() ||
-BreakBB->front().getOpcode() != AArch64::BRK)
+if (!BreakBB || BreakBB->empty() || !isTrap(BreakBB->front()))
   return std::nullopt;
 
 // Iterate over the instructions of BB in reverse order, matching opcodes
@@ -1751,6 +1750,25 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
 Inst.addOperand(MCOperand::createImm(0));
   }

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: prevent false positives due to jump tables (PR #138884)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/138884

>From b0eeddba47f56f0b917c4a43a744f120ea8e1d6e Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 6 May 2025 11:31:03 +0300
Subject: [PATCH] [BOLT] Gadget scanner: prevent false positives due to jump
 tables

As part of PAuth hardening, AArch64 LLVM backend can use a special
BR_JumpTable pseudo (enabled by -faarch64-jump-table-hardening
Clang option) which is expanded in the AsmPrinter into a contiguous
sequence without unsafe instructions in the middle.

This commit adds another target-specific callback to MCPlusBuilder
to make it possible to inhibit false positives for known-safe jump
table dispatch sequences. Without special handling, the branch
instruction is likely to be reported as a non-protected call (as its
destination is not produced by an auth instruction, PC-relative address
materialization, etc.) and possibly as a tail call being performed with
unsafe link register (as the detection whether the branch instruction
is a tail call is an heuristic).

For now, only the specific instruction sequence used by the AArch64
LLVM backend is matched.
---
 bolt/include/bolt/Core/MCInstUtils.h  |   9 +
 bolt/include/bolt/Core/MCPlusBuilder.h|  14 +
 bolt/lib/Core/MCInstUtils.cpp |  20 +
 bolt/lib/Passes/PAuthGadgetScanner.cpp|  10 +
 .../Target/AArch64/AArch64MCPlusBuilder.cpp   |  73 ++
 .../AArch64/gs-pauth-jump-table.s | 703 ++
 6 files changed, 829 insertions(+)
 create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-jump-table.s

diff --git a/bolt/include/bolt/Core/MCInstUtils.h 
b/bolt/include/bolt/Core/MCInstUtils.h
index 50b7d56470c99..33d36cccbcfff 100644
--- a/bolt/include/bolt/Core/MCInstUtils.h
+++ b/bolt/include/bolt/Core/MCInstUtils.h
@@ -154,6 +154,15 @@ class MCInstReference {
 return nullptr;
   }
 
+  /// Returns the only preceding instruction, or std::nullopt if multiple or no
+  /// predecessors are possible.
+  ///
+  /// If CFG information is available, basic block boundary can be crossed,
+  /// provided there is exactly one predecessor. If CFG is not available, the
+  /// preceding instruction in the offset order is returned, unless this is the
+  /// first instruction of the function.
+  std::optional getSinglePredecessor();
+
   raw_ostream &print(raw_ostream &OS) const;
 };
 
diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h 
b/bolt/include/bolt/Core/MCPlusBuilder.h
index c8cbcaf33f4b5..3abf4d18e94da 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -14,6 +14,7 @@
 #ifndef BOLT_CORE_MCPLUSBUILDER_H
 #define BOLT_CORE_MCPLUSBUILDER_H
 
+#include "bolt/Core/MCInstUtils.h"
 #include "bolt/Core/MCPlus.h"
 #include "bolt/Core/Relocation.h"
 #include "llvm/ADT/ArrayRef.h"
@@ -700,6 +701,19 @@ class MCPlusBuilder {
 return std::nullopt;
   }
 
+  /// Tests if BranchInst corresponds to an instruction sequence which is known
+  /// to be a safe dispatch via jump table.
+  ///
+  /// The target can decide which instruction sequences to consider "safe" from
+  /// the Pointer Authentication point of view, such as any jump table dispatch
+  /// sequence without function calls inside, any sequence which is contiguous,
+  /// or only some specific well-known sequences.
+  virtual bool
+  isSafeJumpTableBranchForPtrAuth(MCInstReference BranchInst) const {
+llvm_unreachable("not implemented");
+return false;
+  }
+
   virtual bool isTerminator(const MCInst &Inst) const;
 
   virtual bool isNoop(const MCInst &Inst) const {
diff --git a/bolt/lib/Core/MCInstUtils.cpp b/bolt/lib/Core/MCInstUtils.cpp
index 40f6edd59135c..b7c6d898988af 100644
--- a/bolt/lib/Core/MCInstUtils.cpp
+++ b/bolt/lib/Core/MCInstUtils.cpp
@@ -55,3 +55,23 @@ raw_ostream &MCInstReference::print(raw_ostream &OS) const {
   OS << ">";
   return OS;
 }
+
+std::optional MCInstReference::getSinglePredecessor() {
+  if (const RefInBB *Ref = tryGetRefInBB()) {
+if (Ref->It != Ref->BB->begin())
+  return MCInstReference(Ref->BB, &*std::prev(Ref->It));
+
+if (Ref->BB->pred_size() != 1)
+  return std::nullopt;
+
+BinaryBasicBlock *PredBB = *Ref->BB->pred_begin();
+assert(!PredBB->empty() && "Empty basic blocks are not supported yet");
+return MCInstReference(PredBB, &*PredBB->rbegin());
+  }
+
+  const RefInBF &Ref = getRefInBF();
+  if (Ref.It == Ref.BF->instrs().begin())
+return std::nullopt;
+
+  return MCInstReference(Ref.BF, std::prev(Ref.It));
+}
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 762c08ffd933e..e9ed44a47bf6f 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -1351,6 +1351,11 @@ shouldReportUnsafeTailCall(const BinaryContext &BC, 
const BinaryFunction &BF,
 return std::nullopt;
   }
 
+  if (BC.MIB->isSafeJumpTableBranchForPtrAuth(Inst)) {
+LL

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: optionally assume auth traps on failure (PR #139778)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/139778

>From b096c6ba85935f7a090031eb693612d1a110d965 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 13 May 2025 19:50:41 +0300
Subject: [PATCH] [BOLT] Gadget scanner: optionally assume auth traps on
 failure

On AArch64 it is possible for an auth instruction to either return an
invalid address value on failure (without FEAT_FPAC) or generate an
error (with FEAT_FPAC). It thus may be possible to never emit explicit
pointer checks, if the target CPU is known to support FEAT_FPAC.

This commit implements an --auth-traps-on-failure command line option,
which essentially makes "safe-to-dereference" and "trusted" register
properties identical and disables scanning for authentication oracles
completely.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 112 +++
 .../binary-analysis/AArch64/cmdline-args.test |   1 +
 .../AArch64/gs-pauth-authentication-oracles.s |   6 +-
 .../binary-analysis/AArch64/gs-pauth-calls.s  |   5 +-
 .../AArch64/gs-pauth-debug-output.s   | 177 ++---
 .../AArch64/gs-pauth-jump-table.s |   6 +-
 .../AArch64/gs-pauth-signing-oracles.s|  54 ++---
 .../AArch64/gs-pauth-tail-calls.s | 184 +-
 8 files changed, 318 insertions(+), 227 deletions(-)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index e9ed44a47bf6f..34b5b1d51de4e 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -14,6 +14,7 @@
 #include "bolt/Passes/PAuthGadgetScanner.h"
 #include "bolt/Core/ParallelUtilities.h"
 #include "bolt/Passes/DataflowAnalysis.h"
+#include "bolt/Utils/CommandLineOpts.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallSet.h"
 #include "llvm/MC/MCInst.h"
@@ -26,6 +27,11 @@ namespace llvm {
 namespace bolt {
 namespace PAuthGadgetScanner {
 
+static cl::opt AuthTrapsOnFailure(
+"auth-traps-on-failure",
+cl::desc("Assume authentication instructions always trap on failure"),
+cl::cat(opts::BinaryAnalysisCategory));
+
 [[maybe_unused]] static void traceInst(const BinaryContext &BC, StringRef 
Label,
const MCInst &MI) {
   dbgs() << "  " << Label << ": ";
@@ -364,6 +370,34 @@ class SrcSafetyAnalysis {
 return Clobbered;
   }
 
+  std::optional getRegMadeTrustedByChecking(const MCInst &Inst,
+   SrcState Cur) const {
+// This functions cannot return multiple registers. This is never the case
+// on AArch64.
+std::optional RegCheckedByInst =
+BC.MIB->getAuthCheckedReg(Inst, /*MayOverwrite=*/false);
+if (RegCheckedByInst && Cur.SafeToDerefRegs[*RegCheckedByInst])
+  return *RegCheckedByInst;
+
+auto It = CheckerSequenceInfo.find(&Inst);
+if (It == CheckerSequenceInfo.end())
+  return std::nullopt;
+
+MCPhysReg RegCheckedBySequence = It->second.first;
+const MCInst *FirstCheckerInst = It->second.second;
+
+// FirstCheckerInst should belong to the same basic block (see the
+// assertion in DataflowSrcSafetyAnalysis::run()), meaning it was
+// deterministically processed a few steps before this instruction.
+const SrcState &StateBeforeChecker = getStateBefore(*FirstCheckerInst);
+
+// The sequence checks the register, but it should be authenticated before.
+if (!StateBeforeChecker.SafeToDerefRegs[RegCheckedBySequence])
+  return std::nullopt;
+
+return RegCheckedBySequence;
+  }
+
   // Returns all registers that can be treated as if they are written by an
   // authentication instruction.
   SmallVector getRegsMadeSafeToDeref(const MCInst &Point,
@@ -386,18 +420,38 @@ class SrcSafetyAnalysis {
 Regs.push_back(DstAndSrc->first);
 }
 
+// Make sure explicit checker sequence keeps register safe-to-dereference
+// when the register would be clobbered according to the regular rules:
+//
+//; LR is safe to dereference here
+//mov   x16, x30  ; start of the sequence, LR is s-t-d right before
+//xpaclri ; clobbers LR, LR is not safe anymore
+//cmp   x30, x16
+//b.eq  1f; end of the sequence: LR is marked as trusted
+//brk   0x1234
+//  1:
+//; at this point LR would be marked as trusted,
+//; but not safe-to-dereference
+//
+// or even just
+//
+//; X1 is safe to dereference here
+//ldr x0, [x1, #8]!
+//; X1 is trusted here, but it was clobbered due to address write-back
+if (auto CheckedReg = getRegMadeTrustedByChecking(Point, Cur))
+  Regs.push_back(*CheckedReg);
+
 return Regs;
   }
 
   // Returns all registers made trusted by this instruction.
   SmallVector getRegsMadeTrusted(const MCInst &Point,
 const SrcState &Cur) const {
+assert(!AuthTrapsOnFailure &&

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: optionally assume auth traps on failure (PR #139778)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/139778

>From b096c6ba85935f7a090031eb693612d1a110d965 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 13 May 2025 19:50:41 +0300
Subject: [PATCH] [BOLT] Gadget scanner: optionally assume auth traps on
 failure

On AArch64 it is possible for an auth instruction to either return an
invalid address value on failure (without FEAT_FPAC) or generate an
error (with FEAT_FPAC). It thus may be possible to never emit explicit
pointer checks, if the target CPU is known to support FEAT_FPAC.

This commit implements an --auth-traps-on-failure command line option,
which essentially makes "safe-to-dereference" and "trusted" register
properties identical and disables scanning for authentication oracles
completely.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 112 +++
 .../binary-analysis/AArch64/cmdline-args.test |   1 +
 .../AArch64/gs-pauth-authentication-oracles.s |   6 +-
 .../binary-analysis/AArch64/gs-pauth-calls.s  |   5 +-
 .../AArch64/gs-pauth-debug-output.s   | 177 ++---
 .../AArch64/gs-pauth-jump-table.s |   6 +-
 .../AArch64/gs-pauth-signing-oracles.s|  54 ++---
 .../AArch64/gs-pauth-tail-calls.s | 184 +-
 8 files changed, 318 insertions(+), 227 deletions(-)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index e9ed44a47bf6f..34b5b1d51de4e 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -14,6 +14,7 @@
 #include "bolt/Passes/PAuthGadgetScanner.h"
 #include "bolt/Core/ParallelUtilities.h"
 #include "bolt/Passes/DataflowAnalysis.h"
+#include "bolt/Utils/CommandLineOpts.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallSet.h"
 #include "llvm/MC/MCInst.h"
@@ -26,6 +27,11 @@ namespace llvm {
 namespace bolt {
 namespace PAuthGadgetScanner {
 
+static cl::opt AuthTrapsOnFailure(
+"auth-traps-on-failure",
+cl::desc("Assume authentication instructions always trap on failure"),
+cl::cat(opts::BinaryAnalysisCategory));
+
 [[maybe_unused]] static void traceInst(const BinaryContext &BC, StringRef 
Label,
const MCInst &MI) {
   dbgs() << "  " << Label << ": ";
@@ -364,6 +370,34 @@ class SrcSafetyAnalysis {
 return Clobbered;
   }
 
+  std::optional getRegMadeTrustedByChecking(const MCInst &Inst,
+   SrcState Cur) const {
+// This functions cannot return multiple registers. This is never the case
+// on AArch64.
+std::optional RegCheckedByInst =
+BC.MIB->getAuthCheckedReg(Inst, /*MayOverwrite=*/false);
+if (RegCheckedByInst && Cur.SafeToDerefRegs[*RegCheckedByInst])
+  return *RegCheckedByInst;
+
+auto It = CheckerSequenceInfo.find(&Inst);
+if (It == CheckerSequenceInfo.end())
+  return std::nullopt;
+
+MCPhysReg RegCheckedBySequence = It->second.first;
+const MCInst *FirstCheckerInst = It->second.second;
+
+// FirstCheckerInst should belong to the same basic block (see the
+// assertion in DataflowSrcSafetyAnalysis::run()), meaning it was
+// deterministically processed a few steps before this instruction.
+const SrcState &StateBeforeChecker = getStateBefore(*FirstCheckerInst);
+
+// The sequence checks the register, but it should be authenticated before.
+if (!StateBeforeChecker.SafeToDerefRegs[RegCheckedBySequence])
+  return std::nullopt;
+
+return RegCheckedBySequence;
+  }
+
   // Returns all registers that can be treated as if they are written by an
   // authentication instruction.
   SmallVector getRegsMadeSafeToDeref(const MCInst &Point,
@@ -386,18 +420,38 @@ class SrcSafetyAnalysis {
 Regs.push_back(DstAndSrc->first);
 }
 
+// Make sure explicit checker sequence keeps register safe-to-dereference
+// when the register would be clobbered according to the regular rules:
+//
+//; LR is safe to dereference here
+//mov   x16, x30  ; start of the sequence, LR is s-t-d right before
+//xpaclri ; clobbers LR, LR is not safe anymore
+//cmp   x30, x16
+//b.eq  1f; end of the sequence: LR is marked as trusted
+//brk   0x1234
+//  1:
+//; at this point LR would be marked as trusted,
+//; but not safe-to-dereference
+//
+// or even just
+//
+//; X1 is safe to dereference here
+//ldr x0, [x1, #8]!
+//; X1 is trusted here, but it was clobbered due to address write-back
+if (auto CheckedReg = getRegMadeTrustedByChecking(Point, Cur))
+  Regs.push_back(*CheckedReg);
+
 return Regs;
   }
 
   // Returns all registers made trusted by this instruction.
   SmallVector getRegsMadeTrusted(const MCInst &Point,
 const SrcState &Cur) const {
+assert(!AuthTrapsOnFailure &&

[llvm-branch-commits] [llvm] [BOLT] Introduce helpers to match `MCInst`s one at a time (NFC) (PR #138883)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/138883

>From af01b4e2be6387240a8cbac90d937e37a3413148 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Wed, 7 May 2025 16:42:00 +0300
Subject: [PATCH] [BOLT] Introduce helpers to match `MCInst`s one at a time
 (NFC)

Introduce matchInst helper function to capture and/or match the operands
of MCInst. Unlike the existing `MCPlusBuilder::MCInstMatcher` machinery,
matchInst is intended for the use cases when precise control over the
instruction order is required. For example, when validating PtrAuth
hardening, all registers are usually considered unsafe after a function
call, even though callee-saved registers should preserve their old
values *under normal operation*.
---
 bolt/include/bolt/Core/MCInstUtils.h  | 128 ++
 .../Target/AArch64/AArch64MCPlusBuilder.cpp   |  90 +---
 2 files changed, 162 insertions(+), 56 deletions(-)

diff --git a/bolt/include/bolt/Core/MCInstUtils.h 
b/bolt/include/bolt/Core/MCInstUtils.h
index 69bf5e6159b74..50b7d56470c99 100644
--- a/bolt/include/bolt/Core/MCInstUtils.h
+++ b/bolt/include/bolt/Core/MCInstUtils.h
@@ -162,6 +162,134 @@ static inline raw_ostream &operator<<(raw_ostream &OS,
   return Ref.print(OS);
 }
 
+/// Instruction-matching helpers operating on a single instruction at a time.
+///
+/// Unlike MCPlusBuilder::MCInstMatcher, this matchInst() function focuses on
+/// the cases where a precise control over the instruction order is important:
+///
+/// // Bring the short names into the local scope:
+/// using namespace MCInstMatcher;
+/// // Declare the registers to capture:
+/// Reg Xn, Xm;
+/// // Capture the 0th and 1st operands, match the 2nd operand against the
+/// // just captured Xm register, match the 3rd operand against literal 0:
+/// if (!matchInst(MaybeAdd, AArch64::ADDXrs, Xm, Xn, Xm, Imm(0))
+///   return AArch64::NoRegister;
+/// // Match the 0th operand against Xm:
+/// if (!matchInst(MaybeBr, AArch64::BR, Xm))
+///   return AArch64::NoRegister;
+/// // Return the matched register:
+/// return Xm.get();
+namespace MCInstMatcher {
+
+// The base class to match an operand of type T.
+//
+// The subclasses of OpMatcher are intended to be allocated on the stack and
+// to only be used by passing them to matchInst() and by calling their get()
+// function, thus the peculiar `mutable` specifiers: to make the calling code
+// compact and readable, the templated matchInst() function has to accept both
+// long-lived Imm/Reg wrappers declared as local variables (intended to capture
+// the first operand's value and match the subsequent operands, whether inside
+// a single instruction or across multiple instructions), as well as temporary
+// wrappers around literal values to match, f.e. Imm(42) or Reg(AArch64::XZR).
+template  class OpMatcher {
+  mutable std::optional Value;
+  mutable std::optional SavedValue;
+
+  // Remember/restore the last Value - to be called by matchInst.
+  void remember() const { SavedValue = Value; }
+  void restore() const { Value = SavedValue; }
+
+  template 
+  friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...);
+
+protected:
+  OpMatcher(std::optional ValueToMatch) : Value(ValueToMatch) {}
+
+  bool matchValue(T OpValue) const {
+// Check that OpValue does not contradict the existing Value.
+bool MatchResult = !Value || *Value == OpValue;
+// If MatchResult is false, all matchers will be reset before returning 
from
+// matchInst, including this one, thus no need to assign conditionally.
+Value = OpValue;
+
+return MatchResult;
+  }
+
+public:
+  /// Returns the captured value.
+  T get() const {
+assert(Value.has_value());
+return *Value;
+  }
+};
+
+class Reg : public OpMatcher {
+  bool matches(const MCOperand &Op) const {
+if (!Op.isReg())
+  return false;
+
+return matchValue(Op.getReg());
+  }
+
+  template 
+  friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...);
+
+public:
+  Reg(std::optional RegToMatch = std::nullopt)
+  : OpMatcher(RegToMatch) {}
+};
+
+class Imm : public OpMatcher {
+  bool matches(const MCOperand &Op) const {
+if (!Op.isImm())
+  return false;
+
+return matchValue(Op.getImm());
+  }
+
+  template 
+  friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...);
+
+public:
+  Imm(std::optional ImmToMatch = std::nullopt)
+  : OpMatcher(ImmToMatch) {}
+};
+
+/// Tries to match Inst and updates Ops on success.
+///
+/// If Inst has the specified Opcode and its operand list prefix matches Ops,
+/// this function returns true and updates Ops, otherwise false is returned and
+/// values of Ops are kept as before matchInst was called.
+///
+/// Please note that while Ops are technically passed by a const reference to
+/// make invocations like `matchInst(MI, Opcode, Imm(42))` possible, all their
+/// fields are marked mut

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: make use of C++17 features and LLVM helpers (PR #141665)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/141665

>From d3742598bbf2a248124fe1b297d1447c52e40be1 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 27 May 2025 21:06:03 +0300
Subject: [PATCH] [BOLT] Gadget scanner: make use of C++17 features and LLVM
 helpers

Perform trivial syntactical cleanups:
* make use of structured binding declarations
* use LLVM utility functions when appropriate
* omit braces around single expression inside single-line LLVM_DEBUG()

This patch is NFC aside from minor debug output changes.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 67 +--
 .../AArch64/gs-pauth-debug-output.s   | 14 ++--
 2 files changed, 38 insertions(+), 43 deletions(-)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 34b5b1d51de4e..dac274c0f4130 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -88,8 +88,8 @@ class TrackedRegisters {
   TrackedRegisters(ArrayRef RegsToTrack)
   : Registers(RegsToTrack),
 RegToIndexMapping(getMappingSize(RegsToTrack), NoIndex) {
-for (unsigned I = 0; I < RegsToTrack.size(); ++I)
-  RegToIndexMapping[RegsToTrack[I]] = I;
+for (auto [MappedIndex, Reg] : llvm::enumerate(RegsToTrack))
+  RegToIndexMapping[Reg] = MappedIndex;
   }
 
   ArrayRef getRegisters() const { return Registers; }
@@ -203,9 +203,9 @@ struct SrcState {
 
 SafeToDerefRegs &= StateIn.SafeToDerefRegs;
 TrustedRegs &= StateIn.TrustedRegs;
-for (unsigned I = 0; I < LastInstWritingReg.size(); ++I)
-  for (const MCInst *J : StateIn.LastInstWritingReg[I])
-LastInstWritingReg[I].insert(J);
+for (auto [ThisSet, OtherSet] :
+ llvm::zip_equal(LastInstWritingReg, StateIn.LastInstWritingReg))
+  ThisSet.insert_range(OtherSet);
 return *this;
   }
 
@@ -224,11 +224,9 @@ struct SrcState {
 static void printInstsShort(raw_ostream &OS,
 ArrayRef Insts) {
   OS << "Insts: ";
-  for (unsigned I = 0; I < Insts.size(); ++I) {
-auto &Set = Insts[I];
+  for (auto [I, PtrSet] : llvm::enumerate(Insts)) {
 OS << "[" << I << "](";
-for (const MCInst *MCInstP : Set)
-  OS << MCInstP << " ";
+interleave(PtrSet, OS, " ");
 OS << ")";
   }
 }
@@ -416,8 +414,9 @@ class SrcSafetyAnalysis {
 // ... an address can be updated in a safe manner, producing the result
 // which is as trusted as the input address.
 if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) {
-  if (Cur.SafeToDerefRegs[DstAndSrc->second])
-Regs.push_back(DstAndSrc->first);
+  auto [DstReg, SrcReg] = *DstAndSrc;
+  if (Cur.SafeToDerefRegs[SrcReg])
+Regs.push_back(DstReg);
 }
 
 // Make sure explicit checker sequence keeps register safe-to-dereference
@@ -469,8 +468,9 @@ class SrcSafetyAnalysis {
 // ... an address can be updated in a safe manner, producing the result
 // which is as trusted as the input address.
 if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) {
-  if (Cur.TrustedRegs[DstAndSrc->second])
-Regs.push_back(DstAndSrc->first);
+  auto [DstReg, SrcReg] = *DstAndSrc;
+  if (Cur.TrustedRegs[SrcReg])
+Regs.push_back(DstReg);
 }
 
 return Regs;
@@ -858,9 +858,9 @@ struct DstState {
   return (*this = StateIn);
 
 CannotEscapeUnchecked &= StateIn.CannotEscapeUnchecked;
-for (unsigned I = 0; I < FirstInstLeakingReg.size(); ++I)
-  for (const MCInst *J : StateIn.FirstInstLeakingReg[I])
-FirstInstLeakingReg[I].insert(J);
+for (auto [ThisSet, OtherSet] :
+ llvm::zip_equal(FirstInstLeakingReg, StateIn.FirstInstLeakingReg))
+  ThisSet.insert_range(OtherSet);
 return *this;
   }
 
@@ -1025,8 +1025,7 @@ class DstSafetyAnalysis {
 
 // ... an address can be updated in a safe manner, or
 if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Inst)) {
-  MCPhysReg DstReg, SrcReg;
-  std::tie(DstReg, SrcReg) = *DstAndSrc;
+  auto [DstReg, SrcReg] = *DstAndSrc;
   // Note that *all* registers containing the derived values must be safe,
   // both source and destination ones. No temporaries are supported at now.
   if (Cur.CannotEscapeUnchecked[SrcReg] &&
@@ -1065,7 +1064,7 @@ class DstSafetyAnalysis {
 // If this instruction terminates the program immediately, no
 // authentication oracles are possible past this point.
 if (BC.MIB->isTrap(Point)) {
-  LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); });
+  LLVM_DEBUG(traceInst(BC, "Trap instruction found", Point));
   DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
   Next.CannotEscapeUnchecked.set();
   return Next;
@@ -1243,7 +1242,7 @@ class CFGUnawareDstSafetyAnalysis : public 
DstSafetyAnalysis,
   // starting to analyze Inst.

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: account for BRK when searching for auth oracles (PR #137975)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/137975

>From 74bbe1e6f6e759c369ecf517dbfa6f98c40e9ffb Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Wed, 30 Apr 2025 16:08:10 +0300
Subject: [PATCH] [BOLT] Gadget scanner: account for BRK when searching for
 auth oracles

An authenticated pointer can be explicitly checked by the compiler via a
sequence of instructions that executes BRK on failure. It is important
to recognize such BRK instruction as checking every register (as it is
expected to immediately trigger an abnormal program termination) to
prevent false positive reports about authentication oracles:

autia   x2, x3
autia   x0, x1
; neither x0 nor x2 are checked at this point
eor x16, x0, x0, lsl #1
tbz x16, #62, on_success ; marks x0 as checked
; end of BB: for x2 to be checked here, it must be checked in both
; successor basic blocks
  on_failure:
brk 0xc470
  on_success:
; x2 is checked
ldr x1, [x2] ; marks x2 as checked
---
 bolt/include/bolt/Core/MCPlusBuilder.h| 14 ++
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 13 +-
 .../Target/AArch64/AArch64MCPlusBuilder.cpp   | 24 --
 .../AArch64/gs-pauth-address-checks.s | 44 +--
 .../AArch64/gs-pauth-authentication-oracles.s |  9 ++--
 .../AArch64/gs-pauth-signing-oracles.s|  6 +--
 6 files changed, 75 insertions(+), 35 deletions(-)

diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h 
b/bolt/include/bolt/Core/MCPlusBuilder.h
index b233452985502..c8cbcaf33f4b5 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -707,6 +707,20 @@ class MCPlusBuilder {
 return false;
   }
 
+  /// Returns true if Inst is a trap instruction.
+  ///
+  /// Tests if Inst is an instruction that immediately causes an abnormal
+  /// program termination, for example when a security violation is detected
+  /// by a compiler-inserted check.
+  ///
+  /// @note An implementation of this method should likely return false for
+  /// calls to library functions like abort(), as it is possible that the
+  /// execution state is partially attacker-controlled at this point.
+  virtual bool isTrap(const MCInst &Inst) const {
+llvm_unreachable("not implemented");
+return false;
+  }
+
   virtual bool isBreakpoint(const MCInst &Inst) const {
 llvm_unreachable("not implemented");
 return false;
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 4c7ae3c880db4..11db51f6c6dd1 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -1066,6 +1066,15 @@ class DstSafetyAnalysis {
   dbgs() << ")\n";
 });
 
+// If this instruction terminates the program immediately, no
+// authentication oracles are possible past this point.
+if (BC.MIB->isTrap(Point)) {
+  LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); });
+  DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
+  Next.CannotEscapeUnchecked.set();
+  return Next;
+}
+
 // If this instruction is reachable by the analysis, a non-empty state will
 // be propagated to it sooner or later. Until then, skip computeNext().
 if (Cur.empty()) {
@@ -1173,8 +1182,8 @@ class DataflowDstSafetyAnalysis
 //
 // A basic block without any successors, on the other hand, can be
 // pessimistically initialized to everything-is-unsafe: this will naturally
-// handle both return and tail call instructions and is harmless for
-// internal indirect branch instructions (such as computed gotos).
+// handle return, trap and tail call instructions. At the same time, it is
+// harmless for internal indirect branch instructions, like computed gotos.
 if (BB.succ_empty())
   return createUnsafeState();
 
diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp 
b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
index 9d5a578cfbdff..b669d32cc2032 100644
--- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
@@ -386,10 +386,9 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
 // the list of successors of this basic block as appropriate.
 
 // Any of the above code sequences assume the fall-through basic block
-// is a dead-end BRK instruction (any immediate operand is accepted).
+// is a dead-end trap instruction.
 const BinaryBasicBlock *BreakBB = BB.getFallthrough();
-if (!BreakBB || BreakBB->empty() ||
-BreakBB->front().getOpcode() != AArch64::BRK)
+if (!BreakBB || BreakBB->empty() || !isTrap(BreakBB->front()))
   return std::nullopt;
 
 // Iterate over the instructions of BB in reverse order, matching opcodes
@@ -1751,6 +1750,25 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
 Inst.addOperand(MCOperand::createImm(0));
   }

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/137224

>From a75cab7070e2167a4be39a4467895a2d1622c4e8 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 22 Apr 2025 21:43:14 +0300
Subject: [PATCH] [BOLT] Gadget scanner: detect untrusted LR before tail call

Implement the detection of tail calls performed with untrusted link
register, which violates the assumption made on entry to every function.

Unlike other pauth gadgets, this one involves some amount of guessing
which branch instructions should be checked as tail calls.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp|  80 +++
 .../AArch64/gs-pauth-tail-calls.s | 597 ++
 2 files changed, 677 insertions(+)
 create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-tail-calls.s

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 6327a2da54d5b..4c7ae3c880db4 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -1307,6 +1307,83 @@ shouldReportReturnGadget(const BinaryContext &BC, const 
MCInstReference &Inst,
   return make_gadget_report(RetKind, Inst, *RetReg);
 }
 
+/// While BOLT already marks some of the branch instructions as tail calls,
+/// this function tries to improve the coverage by including less obvious cases
+/// when it is possible to do without introducing too many false positives.
+static bool shouldAnalyzeTailCallInst(const BinaryContext &BC,
+  const BinaryFunction &BF,
+  const MCInstReference &Inst) {
+  // Some BC.MIB->isXYZ(Inst) methods simply delegate to MCInstrDesc::isXYZ()
+  // (such as isBranch at the time of writing this comment), some don't (such
+  // as isCall). For that reason, call MCInstrDesc's methods explicitly when
+  // it is important.
+  const MCInstrDesc &Desc =
+  BC.MII->get(static_cast(Inst).getOpcode());
+  // Tail call should be a branch (but not necessarily an indirect one).
+  if (!Desc.isBranch())
+return false;
+
+  // Always analyze the branches already marked as tail calls by BOLT.
+  if (BC.MIB->isTailCall(Inst))
+return true;
+
+  // Try to also check the branches marked as "UNKNOWN CONTROL FLOW" - the
+  // below is a simplified condition from BinaryContext::printInstruction.
+  bool IsUnknownControlFlow =
+  BC.MIB->isIndirectBranch(Inst) && !BC.MIB->getJumpTable(Inst);
+
+  if (BF.hasCFG() && IsUnknownControlFlow)
+return true;
+
+  return false;
+}
+
+static std::optional>
+shouldReportUnsafeTailCall(const BinaryContext &BC, const BinaryFunction &BF,
+   const MCInstReference &Inst, const SrcState &S) {
+  static const GadgetKind UntrustedLRKind(
+  "untrusted link register found before tail call");
+
+  if (!shouldAnalyzeTailCallInst(BC, BF, Inst))
+return std::nullopt;
+
+  // Not only the set of registers returned by getTrustedLiveInRegs() can be
+  // seen as a reasonable target-independent _approximation_ of "the LR", these
+  // are *exactly* those registers used by SrcSafetyAnalysis to initialize the
+  // set of trusted registers on function entry.
+  // Thus, this function basically checks that the precondition expected to be
+  // imposed by a function call instruction (which is hardcoded into the 
target-
+  // specific getTrustedLiveInRegs() function) is also respected on tail calls.
+  SmallVector RegsToCheck = BC.MIB->getTrustedLiveInRegs();
+  LLVM_DEBUG({
+traceInst(BC, "Found tail call inst", Inst);
+traceRegMask(BC, "Trusted regs", S.TrustedRegs);
+  });
+
+  // In musl on AArch64, the _start function sets LR to zero and calls the next
+  // stage initialization function at the end, something along these lines:
+  //
+  //   _start:
+  // mov x30, #0
+  // ; ... other initialization ...
+  // b   _start_c ; performs "exit" system call at some point
+  //
+  // As this would produce a false positive for every executable linked with
+  // such libc, ignore tail calls performed by ELF entry function.
+  if (BC.StartFunctionAddress &&
+  *BC.StartFunctionAddress == Inst.getFunction()->getAddress()) {
+LLVM_DEBUG({ dbgs() << "  Skipping tail call in ELF entry function.\n"; });
+return std::nullopt;
+  }
+
+  // Returns at most one report per instruction - this is probably OK...
+  for (auto Reg : RegsToCheck)
+if (!S.TrustedRegs[Reg])
+  return make_gadget_report(UntrustedLRKind, Inst, Reg);
+
+  return std::nullopt;
+}
+
 static std::optional>
 shouldReportCallGadget(const BinaryContext &BC, const MCInstReference &Inst,
const SrcState &S) {
@@ -1462,6 +1539,9 @@ void FunctionAnalysisContext::findUnsafeUses(
 if (PacRetGadgetsOnly)
   return;
 
+if (auto Report = shouldReportUnsafeTailCall(BC, BF, Inst, S))
+  Reports.push_back(*Report);
+
 if (auto Report = shouldReportCallGadget(BC, Inst, S))

[llvm-branch-commits] [llvm] [BOLT] Introduce helpers to match `MCInst`s one at a time (NFC) (PR #138883)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/138883

>From af01b4e2be6387240a8cbac90d937e37a3413148 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Wed, 7 May 2025 16:42:00 +0300
Subject: [PATCH] [BOLT] Introduce helpers to match `MCInst`s one at a time
 (NFC)

Introduce matchInst helper function to capture and/or match the operands
of MCInst. Unlike the existing `MCPlusBuilder::MCInstMatcher` machinery,
matchInst is intended for the use cases when precise control over the
instruction order is required. For example, when validating PtrAuth
hardening, all registers are usually considered unsafe after a function
call, even though callee-saved registers should preserve their old
values *under normal operation*.
---
 bolt/include/bolt/Core/MCInstUtils.h  | 128 ++
 .../Target/AArch64/AArch64MCPlusBuilder.cpp   |  90 +---
 2 files changed, 162 insertions(+), 56 deletions(-)

diff --git a/bolt/include/bolt/Core/MCInstUtils.h 
b/bolt/include/bolt/Core/MCInstUtils.h
index 69bf5e6159b74..50b7d56470c99 100644
--- a/bolt/include/bolt/Core/MCInstUtils.h
+++ b/bolt/include/bolt/Core/MCInstUtils.h
@@ -162,6 +162,134 @@ static inline raw_ostream &operator<<(raw_ostream &OS,
   return Ref.print(OS);
 }
 
+/// Instruction-matching helpers operating on a single instruction at a time.
+///
+/// Unlike MCPlusBuilder::MCInstMatcher, this matchInst() function focuses on
+/// the cases where a precise control over the instruction order is important:
+///
+/// // Bring the short names into the local scope:
+/// using namespace MCInstMatcher;
+/// // Declare the registers to capture:
+/// Reg Xn, Xm;
+/// // Capture the 0th and 1st operands, match the 2nd operand against the
+/// // just captured Xm register, match the 3rd operand against literal 0:
+/// if (!matchInst(MaybeAdd, AArch64::ADDXrs, Xm, Xn, Xm, Imm(0))
+///   return AArch64::NoRegister;
+/// // Match the 0th operand against Xm:
+/// if (!matchInst(MaybeBr, AArch64::BR, Xm))
+///   return AArch64::NoRegister;
+/// // Return the matched register:
+/// return Xm.get();
+namespace MCInstMatcher {
+
+// The base class to match an operand of type T.
+//
+// The subclasses of OpMatcher are intended to be allocated on the stack and
+// to only be used by passing them to matchInst() and by calling their get()
+// function, thus the peculiar `mutable` specifiers: to make the calling code
+// compact and readable, the templated matchInst() function has to accept both
+// long-lived Imm/Reg wrappers declared as local variables (intended to capture
+// the first operand's value and match the subsequent operands, whether inside
+// a single instruction or across multiple instructions), as well as temporary
+// wrappers around literal values to match, f.e. Imm(42) or Reg(AArch64::XZR).
+template  class OpMatcher {
+  mutable std::optional Value;
+  mutable std::optional SavedValue;
+
+  // Remember/restore the last Value - to be called by matchInst.
+  void remember() const { SavedValue = Value; }
+  void restore() const { Value = SavedValue; }
+
+  template 
+  friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...);
+
+protected:
+  OpMatcher(std::optional ValueToMatch) : Value(ValueToMatch) {}
+
+  bool matchValue(T OpValue) const {
+// Check that OpValue does not contradict the existing Value.
+bool MatchResult = !Value || *Value == OpValue;
+// If MatchResult is false, all matchers will be reset before returning 
from
+// matchInst, including this one, thus no need to assign conditionally.
+Value = OpValue;
+
+return MatchResult;
+  }
+
+public:
+  /// Returns the captured value.
+  T get() const {
+assert(Value.has_value());
+return *Value;
+  }
+};
+
+class Reg : public OpMatcher {
+  bool matches(const MCOperand &Op) const {
+if (!Op.isReg())
+  return false;
+
+return matchValue(Op.getReg());
+  }
+
+  template 
+  friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...);
+
+public:
+  Reg(std::optional RegToMatch = std::nullopt)
+  : OpMatcher(RegToMatch) {}
+};
+
+class Imm : public OpMatcher {
+  bool matches(const MCOperand &Op) const {
+if (!Op.isImm())
+  return false;
+
+return matchValue(Op.getImm());
+  }
+
+  template 
+  friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...);
+
+public:
+  Imm(std::optional ImmToMatch = std::nullopt)
+  : OpMatcher(ImmToMatch) {}
+};
+
+/// Tries to match Inst and updates Ops on success.
+///
+/// If Inst has the specified Opcode and its operand list prefix matches Ops,
+/// this function returns true and updates Ops, otherwise false is returned and
+/// values of Ops are kept as before matchInst was called.
+///
+/// Please note that while Ops are technically passed by a const reference to
+/// make invocations like `matchInst(MI, Opcode, Imm(42))` possible, all their
+/// fields are marked mut

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: prevent false positives due to jump tables (PR #138884)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/138884

>From b0eeddba47f56f0b917c4a43a744f120ea8e1d6e Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 6 May 2025 11:31:03 +0300
Subject: [PATCH] [BOLT] Gadget scanner: prevent false positives due to jump
 tables

As part of PAuth hardening, AArch64 LLVM backend can use a special
BR_JumpTable pseudo (enabled by -faarch64-jump-table-hardening
Clang option) which is expanded in the AsmPrinter into a contiguous
sequence without unsafe instructions in the middle.

This commit adds another target-specific callback to MCPlusBuilder
to make it possible to inhibit false positives for known-safe jump
table dispatch sequences. Without special handling, the branch
instruction is likely to be reported as a non-protected call (as its
destination is not produced by an auth instruction, PC-relative address
materialization, etc.) and possibly as a tail call being performed with
unsafe link register (as the detection whether the branch instruction
is a tail call is an heuristic).

For now, only the specific instruction sequence used by the AArch64
LLVM backend is matched.
---
 bolt/include/bolt/Core/MCInstUtils.h  |   9 +
 bolt/include/bolt/Core/MCPlusBuilder.h|  14 +
 bolt/lib/Core/MCInstUtils.cpp |  20 +
 bolt/lib/Passes/PAuthGadgetScanner.cpp|  10 +
 .../Target/AArch64/AArch64MCPlusBuilder.cpp   |  73 ++
 .../AArch64/gs-pauth-jump-table.s | 703 ++
 6 files changed, 829 insertions(+)
 create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-jump-table.s

diff --git a/bolt/include/bolt/Core/MCInstUtils.h 
b/bolt/include/bolt/Core/MCInstUtils.h
index 50b7d56470c99..33d36cccbcfff 100644
--- a/bolt/include/bolt/Core/MCInstUtils.h
+++ b/bolt/include/bolt/Core/MCInstUtils.h
@@ -154,6 +154,15 @@ class MCInstReference {
 return nullptr;
   }
 
+  /// Returns the only preceding instruction, or std::nullopt if multiple or no
+  /// predecessors are possible.
+  ///
+  /// If CFG information is available, basic block boundary can be crossed,
+  /// provided there is exactly one predecessor. If CFG is not available, the
+  /// preceding instruction in the offset order is returned, unless this is the
+  /// first instruction of the function.
+  std::optional getSinglePredecessor();
+
   raw_ostream &print(raw_ostream &OS) const;
 };
 
diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h 
b/bolt/include/bolt/Core/MCPlusBuilder.h
index c8cbcaf33f4b5..3abf4d18e94da 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -14,6 +14,7 @@
 #ifndef BOLT_CORE_MCPLUSBUILDER_H
 #define BOLT_CORE_MCPLUSBUILDER_H
 
+#include "bolt/Core/MCInstUtils.h"
 #include "bolt/Core/MCPlus.h"
 #include "bolt/Core/Relocation.h"
 #include "llvm/ADT/ArrayRef.h"
@@ -700,6 +701,19 @@ class MCPlusBuilder {
 return std::nullopt;
   }
 
+  /// Tests if BranchInst corresponds to an instruction sequence which is known
+  /// to be a safe dispatch via jump table.
+  ///
+  /// The target can decide which instruction sequences to consider "safe" from
+  /// the Pointer Authentication point of view, such as any jump table dispatch
+  /// sequence without function calls inside, any sequence which is contiguous,
+  /// or only some specific well-known sequences.
+  virtual bool
+  isSafeJumpTableBranchForPtrAuth(MCInstReference BranchInst) const {
+llvm_unreachable("not implemented");
+return false;
+  }
+
   virtual bool isTerminator(const MCInst &Inst) const;
 
   virtual bool isNoop(const MCInst &Inst) const {
diff --git a/bolt/lib/Core/MCInstUtils.cpp b/bolt/lib/Core/MCInstUtils.cpp
index 40f6edd59135c..b7c6d898988af 100644
--- a/bolt/lib/Core/MCInstUtils.cpp
+++ b/bolt/lib/Core/MCInstUtils.cpp
@@ -55,3 +55,23 @@ raw_ostream &MCInstReference::print(raw_ostream &OS) const {
   OS << ">";
   return OS;
 }
+
+std::optional MCInstReference::getSinglePredecessor() {
+  if (const RefInBB *Ref = tryGetRefInBB()) {
+if (Ref->It != Ref->BB->begin())
+  return MCInstReference(Ref->BB, &*std::prev(Ref->It));
+
+if (Ref->BB->pred_size() != 1)
+  return std::nullopt;
+
+BinaryBasicBlock *PredBB = *Ref->BB->pred_begin();
+assert(!PredBB->empty() && "Empty basic blocks are not supported yet");
+return MCInstReference(PredBB, &*PredBB->rbegin());
+  }
+
+  const RefInBF &Ref = getRefInBF();
+  if (Ref.It == Ref.BF->instrs().begin())
+return std::nullopt;
+
+  return MCInstReference(Ref.BF, std::prev(Ref.It));
+}
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 762c08ffd933e..e9ed44a47bf6f 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -1351,6 +1351,11 @@ shouldReportUnsafeTailCall(const BinaryContext &BC, 
const BinaryFunction &BF,
 return std::nullopt;
   }
 
+  if (BC.MIB->isSafeJumpTableBranchForPtrAuth(Inst)) {
+LL

[llvm-branch-commits] [llvm] [BOLT] Factor out MCInstReference from gadget scanner (NFC) (PR #138655)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/138655

>From 5b9848cf82a1f047d90c1482404ac60f730892cf Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Mon, 28 Apr 2025 18:35:48 +0300
Subject: [PATCH] [BOLT] Factor out MCInstReference from gadget scanner (NFC)

Move MCInstReference representing a constant reference to an instruction
inside a parent entity - either inside a basic block (which has a
reference to its parent function) or directly to the function (when CFG
information is not available).
---
 bolt/include/bolt/Core/MCInstUtils.h  | 168 +
 bolt/include/bolt/Passes/PAuthGadgetScanner.h | 178 +-
 bolt/lib/Core/CMakeLists.txt  |   1 +
 bolt/lib/Core/MCInstUtils.cpp |  57 ++
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 102 +-
 5 files changed, 269 insertions(+), 237 deletions(-)
 create mode 100644 bolt/include/bolt/Core/MCInstUtils.h
 create mode 100644 bolt/lib/Core/MCInstUtils.cpp

diff --git a/bolt/include/bolt/Core/MCInstUtils.h 
b/bolt/include/bolt/Core/MCInstUtils.h
new file mode 100644
index 0..69bf5e6159b74
--- /dev/null
+++ b/bolt/include/bolt/Core/MCInstUtils.h
@@ -0,0 +1,168 @@
+//===- bolt/Core/MCInstUtils.h --*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef BOLT_CORE_MCINSTUTILS_H
+#define BOLT_CORE_MCINSTUTILS_H
+
+#include "bolt/Core/BinaryBasicBlock.h"
+
+#include 
+#include 
+#include 
+
+namespace llvm {
+namespace bolt {
+
+class BinaryFunction;
+
+/// MCInstReference represents a reference to a constant MCInst as stored 
either
+/// in a BinaryFunction (i.e. before a CFG is created), or in a 
BinaryBasicBlock
+/// (after a CFG is created).
+class MCInstReference {
+  using nocfg_const_iterator = std::map::const_iterator;
+
+  // Two cases are possible:
+  // * functions with CFG reconstructed - a function stores a collection of
+  //   basic blocks, each basic block stores a contiguous vector of MCInst
+  // * functions without CFG - there are no basic blocks created,
+  //   the instructions are directly stored in std::map in BinaryFunction
+  //
+  // In both cases, the direct parent of MCInst is stored together with an
+  // iterator pointing to the instruction.
+
+  // Helper struct: CFG is available, the direct parent is a basic block,
+  // iterator's type is `MCInst *`.
+  struct RefInBB {
+RefInBB(const BinaryBasicBlock *BB, const MCInst *Inst)
+: BB(BB), It(Inst) {}
+RefInBB(const RefInBB &Other) = default;
+RefInBB &operator=(const RefInBB &Other) = default;
+
+const BinaryBasicBlock *BB;
+BinaryBasicBlock::const_iterator It;
+
+bool operator<(const RefInBB &Other) const {
+  return std::tie(BB, It) < std::tie(Other.BB, Other.It);
+}
+
+bool operator==(const RefInBB &Other) const {
+  return BB == Other.BB && It == Other.It;
+}
+  };
+
+  // Helper struct: CFG is *not* available, the direct parent is a function,
+  // iterator's type is std::map::iterator (the mapped value
+  // is an instruction's offset).
+  struct RefInBF {
+RefInBF(const BinaryFunction *BF, nocfg_const_iterator It)
+: BF(BF), It(It) {}
+RefInBF(const RefInBF &Other) = default;
+RefInBF &operator=(const RefInBF &Other) = default;
+
+const BinaryFunction *BF;
+nocfg_const_iterator It;
+
+bool operator<(const RefInBF &Other) const {
+  return std::tie(BF, It->first) < std::tie(Other.BF, Other.It->first);
+}
+
+bool operator==(const RefInBF &Other) const {
+  return BF == Other.BF && It->first == Other.It->first;
+}
+  };
+
+  std::variant Reference;
+
+  // Utility methods to be used like this:
+  //
+  // if (auto *Ref = tryGetRefInBB())
+  //   return Ref->doSomething(...);
+  // return getRefInBF().doSomethingElse(...);
+  const RefInBB *tryGetRefInBB() const {
+assert(std::get_if(&Reference) ||
+   std::get_if(&Reference));
+return std::get_if(&Reference);
+  }
+  const RefInBF &getRefInBF() const {
+assert(std::get_if(&Reference));
+return *std::get_if(&Reference);
+  }
+
+public:
+  /// Constructs an empty reference.
+  MCInstReference() : Reference(RefInBB(nullptr, nullptr)) {}
+  /// Constructs a reference to the instruction inside the basic block.
+  MCInstReference(const BinaryBasicBlock *BB, const MCInst *Inst)
+  : Reference(RefInBB(BB, Inst)) {
+assert(BB && Inst && "Neither BB nor Inst should be nullptr");
+  }
+  /// Constructs a reference to the instruction inside the basic block.
+  MCInstReference(const BinaryBasicBlock *BB, unsigned Index)
+  : Reference(RefInBB(BB, &BB->getInstructionAtIndex(I

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: fix LR to be safe in leaf functions without CFG (PR #141824)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


atrosinenko wrote:

Factored this out of #137224.

https://github.com/llvm/llvm-project/pull/141824
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Add is.shared/is.private intrinsics to isBoolSGPR (PR #141804)

2025-05-28 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

### Merge activity

* **May 28, 7:25 PM UTC**: A user started a stack merge that includes this pull 
request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/141804).


https://github.com/llvm/llvm-project/pull/141804
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Add overflow operations to isBoolSGPR (PR #141803)

2025-05-28 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

### Merge activity

* **May 28, 7:25 PM UTC**: A user started a stack merge that includes this pull 
request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/141803).


https://github.com/llvm/llvm-project/pull/141803
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [utils][TableGen] Handle versions on clause/directive spellings (PR #141766)

2025-05-28 Thread Krzysztof Parzyszek via llvm-branch-commits


https://github.com/kparzysz updated 
https://github.com/llvm/llvm-project/pull/141766

>From 2ef30aacee4d80c0e4a925aa5ba9416423d10b1b Mon Sep 17 00:00:00 2001
From: Krzysztof Parzyszek 
Date: Tue, 27 May 2025 07:55:04 -0500
Subject: [PATCH 1/3] [utils][TableGen] Handle versions on clause/directive
 spellings

In "getDirectiveName(Kind, Version)", return the spelling that
corresponds to Version, and in "getDirectiveKindAndVersions(Name)"
return the pair {Kind, VersionRange}, where VersionRange contains the
minimum and the maximum versions that allow "Name" as a spelling.
This applies to clauses as well. In general it applies to classes that
have spellings (defined via TableGen class "Spelling").

Given a Kind and a Version, getting the corresponding spelling requires
a runtime search (which can fail in a general case). To avoid generating
the search function inline, a small additional component of llvm/Frontent
was added: LLVMFrontendDirective. The corresponding header file also
defines C++ classes "Spelling" and "VersionRange", which are used in
TableGen/DirectiveEmitter as well.

For background information see
https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507
---
 .../llvm/Frontend/Directive/Spelling.h|  39 +
 llvm/include/llvm/TableGen/DirectiveEmitter.h |  25 +--
 llvm/lib/Frontend/CMakeLists.txt  |   1 +
 llvm/lib/Frontend/Directive/CMakeLists.txt|   6 +
 llvm/lib/Frontend/Directive/Spelling.cpp  |  31 
 llvm/lib/Frontend/OpenACC/CMakeLists.txt  |   2 +-
 llvm/lib/Frontend/OpenMP/CMakeLists.txt   |   1 +
 llvm/test/TableGen/directive1.td  |  34 ++--
 llvm/test/TableGen/directive2.td  |  24 +--
 .../utils/TableGen/Basic/DirectiveEmitter.cpp | 146 +++---
 10 files changed, 212 insertions(+), 97 deletions(-)
 create mode 100644 llvm/include/llvm/Frontend/Directive/Spelling.h
 create mode 100644 llvm/lib/Frontend/Directive/CMakeLists.txt
 create mode 100644 llvm/lib/Frontend/Directive/Spelling.cpp

diff --git a/llvm/include/llvm/Frontend/Directive/Spelling.h 
b/llvm/include/llvm/Frontend/Directive/Spelling.h
new file mode 100644
index 0..3ba0ae2296535
--- /dev/null
+++ b/llvm/include/llvm/Frontend/Directive/Spelling.h
@@ -0,0 +1,39 @@
+//===-- Spelling.h  C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+#ifndef LLVM_FRONTEND_DIRECTIVE_SPELLING_H
+#define LLVM_FRONTEND_DIRECTIVE_SPELLING_H
+
+#include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/iterator_range.h"
+
+#include 
+
+namespace llvm::directive {
+
+struct VersionRange {
+  static constexpr int MaxValue = std::numeric_limits::max();
+  int Min = 1;
+  int Max = MaxValue;
+};
+
+inline bool operator<(const VersionRange &A, const VersionRange &B) {
+  if (A.Min != B.Min)
+return A.Min < B.Min;
+  return A.Max < B.Max;
+}
+
+struct Spelling {
+  StringRef Name;
+  VersionRange Versions;
+};
+
+StringRef FindName(llvm::iterator_range, unsigned Version);
+
+} // namespace llvm::directive
+
+#endif // LLVM_FRONTEND_DIRECTIVE_SPELLING_H
diff --git a/llvm/include/llvm/TableGen/DirectiveEmitter.h 
b/llvm/include/llvm/TableGen/DirectiveEmitter.h
index 1235b7638e761..c7d7460087723 100644
--- a/llvm/include/llvm/TableGen/DirectiveEmitter.h
+++ b/llvm/include/llvm/TableGen/DirectiveEmitter.h
@@ -17,6 +17,7 @@
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/Frontend/Directive/Spelling.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/TableGen/Record.h"
 #include 
@@ -113,29 +114,19 @@ class Versioned {
   constexpr static int IntWidth = 8 * sizeof(int);
 };
 
-// Range of specification versions: [Min, Max]
-// Default value: all possible versions.
-// This is the same structure as the one emitted into the generated sources.
-#define STRUCT_VERSION_RANGE   
\
-  struct VersionRange {
\
-int Min = 1;   
\
-int Max = INT_MAX; 
\
-  }
-
-STRUCT_VERSION_RANGE;
-
 class Spelling : public Versioned {
 public:
-  using Value = std::pair;
+  using Value = llvm::directive::Spelling;
 
   Spelling(const Record *Def) : Def(Def) {}
 
   StringRef getText() const { return Def->getValueAsString("spelling"); }
-  VersionRange getVersions() const {
-return VersionRange{getMinVersion(Def), getMaxVersion(Def)};
+  llvm::directive::VersionRange getVersions() const {
+return llvm::directive::VersionRange{getMinVersion(Def),
+

[llvm-branch-commits] [llvm] [utils][TableGen] Unify converting names to upper-camel case (PR #141762)

2025-05-28 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-tablegen

Author: Krzysztof Parzyszek (kparzysz)


Changes

There were 3 different functions in DirectiveEmitter.cpp doing essentially the 
same thing: taking a name separated with _ or whitepace, and converting it to 
the upper-camel case. Extract that into a single function that can handle 
different sets of separators.

---
Full diff: https://github.com/llvm/llvm-project/pull/141762.diff


2 Files Affected:

- (modified) llvm/include/llvm/TableGen/DirectiveEmitter.h (+32-44) 
- (modified) llvm/utils/TableGen/Basic/DirectiveEmitter.cpp (+1-1) 


``diff
diff --git a/llvm/include/llvm/TableGen/DirectiveEmitter.h 
b/llvm/include/llvm/TableGen/DirectiveEmitter.h
index 8615442ebff9f..48e18de0904c0 100644
--- a/llvm/include/llvm/TableGen/DirectiveEmitter.h
+++ b/llvm/include/llvm/TableGen/DirectiveEmitter.h
@@ -113,14 +113,39 @@ class BaseRecord {
 
   // Returns the name of the directive formatted for output. Whitespace are
   // replaced with underscores.
-  static std::string formatName(StringRef Name) {
+  static std::string getSnakeName(StringRef Name) {
 std::string N = Name.str();
 llvm::replace(N, ' ', '_');
 return N;
   }
 
+  static std::string getUpperCamelName(StringRef Name, StringRef Sep) {
+std::string Camel = Name.str();
+// Convert to uppercase
+bool Cap = true;
+llvm::transform(Camel, Camel.begin(), [&](unsigned char C) {
+  if (Sep.contains(C)) {
+assert(!Cap && "No initial or repeated separators");
+Cap = true;
+  } else if (Cap) {
+C = llvm::toUpper(C);
+Cap = false;
+  }
+  return C;
+});
+size_t Out = 0;
+// Remove separators
+for (size_t In = 0, End = Camel.size(); In != End; ++In) {
+  unsigned char C = Camel[In];
+  if (!Sep.contains(C))
+Camel[Out++] = C;
+}
+Camel.resize(Out);
+return Camel;
+  }
+
   std::string getFormattedName() const {
-return formatName(Def->getValueAsString("name"));
+return getSnakeName(Def->getValueAsString("name"));
   }
 
   bool isDefault() const { return Def->getValueAsBit("isDefault"); }
@@ -172,26 +197,13 @@ class Directive : public BaseRecord {
 
   // Clang uses a different format for names of its directives enum.
   std::string getClangAccSpelling() const {
-std::string Name = Def->getValueAsString("name").str();
+StringRef Name = Def->getValueAsString("name");
 
 // Clang calls the 'unknown' value 'invalid'.
 if (Name == "unknown")
   return "Invalid";
 
-// Clang entries all start with a capital letter, so apply that.
-Name[0] = std::toupper(Name[0]);
-// Additionally, spaces/underscores are handled by capitalizing the next
-// letter of the name and removing the space/underscore.
-for (unsigned I = 0; I < Name.size(); ++I) {
-  if (Name[I] == ' ' || Name[I] == '_') {
-Name.erase(I, 1);
-assert(Name[I] != ' ' && Name[I] != '_' &&
-   "No double spaces/underscores");
-Name[I] = std::toupper(Name[I]);
-  }
-}
-
-return Name;
+return BaseRecord::getUpperCamelName(Name, " _");
   }
 };
 
@@ -218,19 +230,7 @@ class Clause : public BaseRecord {
   // num_threads -> NumThreads
   std::string getFormattedParserClassName() const {
 StringRef Name = Def->getValueAsString("name");
-std::string N = Name.str();
-bool Cap = true;
-llvm::transform(N, N.begin(), [&Cap](unsigned char C) {
-  if (Cap == true) {
-C = toUpper(C);
-Cap = false;
-  } else if (C == '_') {
-Cap = true;
-  }
-  return C;
-});
-erase(N, '_');
-return N;
+return BaseRecord::getUpperCamelName(Name, "_");
   }
 
   // Clang uses a different format for names of its clause enum, which can be
@@ -241,20 +241,8 @@ class Clause : public BaseRecord {
 !ClangSpelling.empty())
   return ClangSpelling.str();
 
-std::string Name = Def->getValueAsString("name").str();
-// Clang entries all start with a capital letter, so apply that.
-Name[0] = std::toupper(Name[0]);
-// Additionally, underscores are handled by capitalizing the next letter of
-// the name and removing the underscore.
-for (unsigned I = 0; I < Name.size(); ++I) {
-  if (Name[I] == '_') {
-Name.erase(I, 1);
-assert(Name[I] != '_' && "No double underscores");
-Name[I] = std::toupper(Name[I]);
-  }
-}
-
-return Name;
+StringRef Name = Def->getValueAsString("name");
+return BaseRecord::getUpperCamelName(Name, "_");
   }
 
   // Optional field.
diff --git a/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp 
b/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp
index f459e7c98ebc1..9e79a83ed6e18 100644
--- a/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp
+++ b/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp
@@ -839,7 +839,7 @@ static void generateGetDirectiveLanguages(const 
DirectiveLanguage &DirLang,
 D.getSourceLangu

[llvm-branch-commits] [llvm] [utils][TableGen] Unify converting names to upper-camel case (PR #141762)

2025-05-28 Thread Krzysztof Parzyszek via llvm-branch-commits


https://github.com/kparzysz created 
https://github.com/llvm/llvm-project/pull/141762

There were 3 different functions in DirectiveEmitter.cpp doing essentially the 
same thing: taking a name separated with _ or whitepace, and converting it to 
the upper-camel case. Extract that into a single function that can handle 
different sets of separators.

>From 78d1f1b2344ab48902b44afd7fb84649b46d6749 Mon Sep 17 00:00:00 2001
From: Krzysztof Parzyszek 
Date: Wed, 21 May 2025 11:26:33 -0500
Subject: [PATCH] [utils][TableGen] Unify converting names to upper-camel case

There were 3 different functions in DirectiveEmitter.cpp doing essentially
the same thing: taking a name separated with _ or whitepace, and converting
it to the upper-camel case. Extract that into a single function that can
handle different sets of separators.
---
 llvm/include/llvm/TableGen/DirectiveEmitter.h | 76 ---
 .../utils/TableGen/Basic/DirectiveEmitter.cpp |  2 +-
 2 files changed, 33 insertions(+), 45 deletions(-)

diff --git a/llvm/include/llvm/TableGen/DirectiveEmitter.h 
b/llvm/include/llvm/TableGen/DirectiveEmitter.h
index 8615442ebff9f..48e18de0904c0 100644
--- a/llvm/include/llvm/TableGen/DirectiveEmitter.h
+++ b/llvm/include/llvm/TableGen/DirectiveEmitter.h
@@ -113,14 +113,39 @@ class BaseRecord {
 
   // Returns the name of the directive formatted for output. Whitespace are
   // replaced with underscores.
-  static std::string formatName(StringRef Name) {
+  static std::string getSnakeName(StringRef Name) {
 std::string N = Name.str();
 llvm::replace(N, ' ', '_');
 return N;
   }
 
+  static std::string getUpperCamelName(StringRef Name, StringRef Sep) {
+std::string Camel = Name.str();
+// Convert to uppercase
+bool Cap = true;
+llvm::transform(Camel, Camel.begin(), [&](unsigned char C) {
+  if (Sep.contains(C)) {
+assert(!Cap && "No initial or repeated separators");
+Cap = true;
+  } else if (Cap) {
+C = llvm::toUpper(C);
+Cap = false;
+  }
+  return C;
+});
+size_t Out = 0;
+// Remove separators
+for (size_t In = 0, End = Camel.size(); In != End; ++In) {
+  unsigned char C = Camel[In];
+  if (!Sep.contains(C))
+Camel[Out++] = C;
+}
+Camel.resize(Out);
+return Camel;
+  }
+
   std::string getFormattedName() const {
-return formatName(Def->getValueAsString("name"));
+return getSnakeName(Def->getValueAsString("name"));
   }
 
   bool isDefault() const { return Def->getValueAsBit("isDefault"); }
@@ -172,26 +197,13 @@ class Directive : public BaseRecord {
 
   // Clang uses a different format for names of its directives enum.
   std::string getClangAccSpelling() const {
-std::string Name = Def->getValueAsString("name").str();
+StringRef Name = Def->getValueAsString("name");
 
 // Clang calls the 'unknown' value 'invalid'.
 if (Name == "unknown")
   return "Invalid";
 
-// Clang entries all start with a capital letter, so apply that.
-Name[0] = std::toupper(Name[0]);
-// Additionally, spaces/underscores are handled by capitalizing the next
-// letter of the name and removing the space/underscore.
-for (unsigned I = 0; I < Name.size(); ++I) {
-  if (Name[I] == ' ' || Name[I] == '_') {
-Name.erase(I, 1);
-assert(Name[I] != ' ' && Name[I] != '_' &&
-   "No double spaces/underscores");
-Name[I] = std::toupper(Name[I]);
-  }
-}
-
-return Name;
+return BaseRecord::getUpperCamelName(Name, " _");
   }
 };
 
@@ -218,19 +230,7 @@ class Clause : public BaseRecord {
   // num_threads -> NumThreads
   std::string getFormattedParserClassName() const {
 StringRef Name = Def->getValueAsString("name");
-std::string N = Name.str();
-bool Cap = true;
-llvm::transform(N, N.begin(), [&Cap](unsigned char C) {
-  if (Cap == true) {
-C = toUpper(C);
-Cap = false;
-  } else if (C == '_') {
-Cap = true;
-  }
-  return C;
-});
-erase(N, '_');
-return N;
+return BaseRecord::getUpperCamelName(Name, "_");
   }
 
   // Clang uses a different format for names of its clause enum, which can be
@@ -241,20 +241,8 @@ class Clause : public BaseRecord {
 !ClangSpelling.empty())
   return ClangSpelling.str();
 
-std::string Name = Def->getValueAsString("name").str();
-// Clang entries all start with a capital letter, so apply that.
-Name[0] = std::toupper(Name[0]);
-// Additionally, underscores are handled by capitalizing the next letter of
-// the name and removing the underscore.
-for (unsigned I = 0; I < Name.size(); ++I) {
-  if (Name[I] == '_') {
-Name.erase(I, 1);
-assert(Name[I] != '_' && "No double underscores");
-Name[I] = std::toupper(Name[I]);
-  }
-}
-
-return Name;
+StringRef Name = Def->getValueAsString("name");
+return BaseRecord::getUpperCamelName(Name, "_");
   }

[llvm-branch-commits] [llvm] [utils][TableGen] Treat clause aliases equally with names (PR #141763)

2025-05-28 Thread Krzysztof Parzyszek via llvm-branch-commits


https://github.com/kparzysz created 
https://github.com/llvm/llvm-project/pull/141763

The code in DirectiveEmitter that generates clause parsers sorted clause names 
to ensure that longer names were tried before shorter ones, in cases where a 
shorter name may be a prefix of a longer one. This matters in the strict 
Fortran source format, since whitespace is ignored there.

This sorting did not take into account clause aliases, which are just 
alternative names. These extra names were not protected in the same way, and 
were just appended immediately after the primary name.

This patch generates a list of pairs Record+Name, where a given record can 
appear multiple times with different names. Sort that list and use it to 
generate parsers for each record. What used to be
```
  ("fred" || "f") >> construct{} ||
  "foo" << construct{}
```
is now
```
  "fred" >> construct{} ||
  "foo" >> construct{} ||
  "f" >> construct{}
```

>From e7d2e0b40eae0bf37f76d0aa8a59520b529c760c Mon Sep 17 00:00:00 2001
From: Krzysztof Parzyszek 
Date: Wed, 21 May 2025 14:23:38 -0500
Subject: [PATCH] [utils][TableGen] Treat clause aliases equally with names

The code in DirectiveEmitter that generates clause parsers sorted
clause names to ensure that longer names were tried before shorter
ones, in cases where a shorter name may be a prefix of a longer one.
This matters in the strict Fortran source format, since whitespace
is ignored there.

This sorting did not take into account clause aliases, which are
just alternative names. These extra names were not protected in the
same way, and were just appended immediately after the primary name.

This patch generates a list of pairs Record+Name, where a given
record can appear multiple times with different names. Sort that
list and use it to generate parsers for each record.
What used to be
```
  ("fred" || "f") >> construct{} ||
  "foo" << construct{}
```
is now
```
  "fred" >> construct{} ||
  "foo" >> construct{} ||
  "f" >> construct{}
```
---
 llvm/test/TableGen/directive1.td  |  4 +-
 .../utils/TableGen/Basic/DirectiveEmitter.cpp | 75 ++-
 2 files changed, 42 insertions(+), 37 deletions(-)

diff --git a/llvm/test/TableGen/directive1.td b/llvm/test/TableGen/directive1.td
index 74091edfa2a66..f756f54c03bfb 100644
--- a/llvm/test/TableGen/directive1.td
+++ b/llvm/test/TableGen/directive1.td
@@ -34,6 +34,7 @@ def TDLC_ClauseB : Clause<"clauseb"> {
 }
 
 def TDLC_ClauseC : Clause<"clausec"> {
+  let aliases = ["ccc"];
   let flangClass = "IntExpr";
   let isValueList = 1;
 }
@@ -260,7 +261,8 @@ def TDL_DirA : Directive<"dira"> {
 // IMPL-NEXT:  TYPE_PARSER(
 // IMPL-NEXT:"clausec" >> 
construct(construct(parenthesized(nonemptyList(Parser{}
 || 
 // IMPL-NEXT:"clauseb" >> 
construct(construct(maybe(parenthesized(Parser{}
 ||
-// IMPL-NEXT:"clausea" >> 
construct(construct())
+// IMPL-NEXT:"clausea" >> 
construct(construct()) ||
+// IMPL-NEXT:"ccc" >> 
construct(construct(parenthesized(nonemptyList(Parser{}
 // IMPL-NEXT:  )
 // IMPL-EMPTY:
 // IMPL-NEXT:  #endif // GEN_FLANG_CLAUSES_PARSER
diff --git a/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp 
b/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp
index 9e79a83ed6e18..bd6c543e1741a 100644
--- a/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp
+++ b/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp
@@ -608,7 +608,7 @@ static void emitLeafTable(const DirectiveLanguage &DirLang, 
raw_ostream &OS,
   std::vector Ordering(Directives.size());
   std::iota(Ordering.begin(), Ordering.end(), 0);
 
-  sort(Ordering, [&](int A, int B) {
+  llvm::sort(Ordering, [&](int A, int B) {
 auto &LeavesA = LeafTable[A];
 auto &LeavesB = LeafTable[B];
 int DirA = LeavesA[0], DirB = LeavesB[0];
@@ -1113,59 +1113,63 @@ static void generateFlangClauseParserKindMap(const 
DirectiveLanguage &DirLang,
  << " Parser clause\");\n";
 }
 
-static bool compareClauseName(const Record *R1, const Record *R2) {
-  Clause C1(R1);
-  Clause C2(R2);
-  return (C1.getName() > C2.getName());
+using RecordWithText = std::pair;
+
+static bool compareRecordText(const RecordWithText &A,
+  const RecordWithText &B) {
+  return A.second > B.second;
+}
+
+static std::vector
+getSpellingTexts(ArrayRef Records) {
+  std::vector List;
+  for (const Record *R : Records) {
+Clause C(R);
+List.push_back(std::make_pair(R, C.getName()));
+llvm::transform(C.getAliases(), std::back_inserter(List),
+[R](StringRef S) { return std::make_pair(R, S); });
+  }
+  return List;
 }
 
 // Generate the parser for the clauses.
 static void generateFlangClausesParser(const DirectiveLanguage &DirLang,
raw_ostream &OS) {
   std::vector Clauses = DirLang.getClauses();
-  // Sort clauses in reverse alphabetical order so with clauses with same
-  // beginning, the longer option is tried before.
-  sort(Clauses, compareClauseName);
+  /

[llvm-branch-commits] [llvm] [mlir] [utils][TableGen] Implement clause aliases as alternative spellings (PR #141765)

2025-05-28 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-flang-openmp

Author: Krzysztof Parzyszek (kparzysz)


Changes

Use the spellings in the generated clause parser. The functions 
`getClauseKind` and `getClauseName` are not yet updated.

The definitions of both clauses and directives now take a list of "Spelling"s 
instead of a single string. For example
```
def ACCC_Copyin : Clause<[Spelling<"copyin">,
  Spelling<"present_or_copyin">,
  Spelling<"pcopyin">]> { ... }
```

A "Spelling" is a versioned string, defaulting to "all versions".

For background information see
https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507

---

Patch is 106.02 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/141765.diff


9 Files Affected:

- (modified) llvm/include/llvm/Frontend/Directive/DirectiveBase.td (+23-18) 
- (modified) llvm/include/llvm/Frontend/OpenACC/ACC.td (+73-73) 
- (modified) llvm/include/llvm/Frontend/OpenMP/OMP.td (+252-244) 
- (modified) llvm/include/llvm/TableGen/DirectiveEmitter.h (+80-5) 
- (modified) llvm/test/TableGen/directive1.td (+36-24) 
- (modified) llvm/test/TableGen/directive2.td (+38-25) 
- (modified) llvm/test/TableGen/directive3.td (+5-5) 
- (modified) llvm/utils/TableGen/Basic/DirectiveEmitter.cpp (+74-41) 
- (modified) mlir/test/mlir-tblgen/directive-common.td (+1-1) 


``diff
diff --git a/llvm/include/llvm/Frontend/Directive/DirectiveBase.td 
b/llvm/include/llvm/Frontend/Directive/DirectiveBase.td
index 582da20083aee..142ba0423f251 100644
--- a/llvm/include/llvm/Frontend/Directive/DirectiveBase.td
+++ b/llvm/include/llvm/Frontend/Directive/DirectiveBase.td
@@ -51,6 +51,20 @@ class DirectiveLanguage {
   string flangClauseBaseClass = "";
 }
 
+// Base class for versioned entities.
+class Versioned {
+  // Mininum version number where this object is valid.
+  int minVersion = min;
+
+  // Maximum version number where this object is valid.
+  int maxVersion = max;
+}
+
+class Spelling
+: Versioned {
+  string spelling = s;
+}
+
 // Some clauses take an argument from a predefined list of allowed keyword
 // values. For example, assume a clause "someclause" with an argument from
 // the list "foo", "bar", "baz". In the user source code this would look
@@ -81,12 +95,9 @@ class EnumVal {
 }
 
 // Information about a specific clause.
-class Clause {
-  // Name of the clause.
-  string name = c;
-
-  // Define aliases used in the parser.
-  list aliases = [];
+class Clause ss> {
+  // Spellings of the clause.
+  list spellings = ss;
 
   // Optional class holding value of the clause in clang AST.
   string clangClass = "";
@@ -134,15 +145,9 @@ class Clause {
 }
 
 // Hold information about clause validity by version.
-class VersionedClause {
-  // Actual clause.
+class VersionedClause
+: Versioned {
   Clause clause = c;
-
-  // Mininum version number where this clause is valid.
-  int minVersion = min;
-
-  // Maximum version number where this clause is valid.
-  int maxVersion = max;
 }
 
 // Kinds of directive associations.
@@ -190,15 +195,15 @@ class SourceLanguage {
   string name = n;  // Name of the enum value in enum class Association.
 }
 
-// The C languages also implies C++ until there is a reason to add C++
+// The C language also implies C++ until there is a reason to add C++
 // separately.
 def L_C : SourceLanguage<"C"> {}
 def L_Fortran : SourceLanguage<"Fortran"> {}
 
 // Information about a specific directive.
-class Directive {
-  // Name of the directive. Can be composite directive sepearted by whitespace.
-  string name = d;
+class Directive ss> {
+  // Spellings of the directive.
+  list spellings = ss;
 
   // Clauses cannot appear twice in the three allowed lists below. Also, since
   // required implies allowed, the same clause cannot appear in both the
diff --git a/llvm/include/llvm/Frontend/OpenACC/ACC.td 
b/llvm/include/llvm/Frontend/OpenACC/ACC.td
index b74cd6e5642ec..65751839ceb09 100644
--- a/llvm/include/llvm/Frontend/OpenACC/ACC.td
+++ b/llvm/include/llvm/Frontend/OpenACC/ACC.td
@@ -32,64 +32,65 @@ def OpenACC : DirectiveLanguage {
 
//===--===//
 
 // 2.16.1
-def ACCC_Async : Clause<"async"> {
+def ACCC_Async : Clause<[Spelling<"async">]> {
   let flangClass = "ScalarIntExpr";
   let isValueOptional = true;
 }
 
 // 2.9.7
-def ACCC_Auto : Clause<"auto"> {}
+def ACCC_Auto : Clause<[Spelling<"auto">]> {}
 
 // 2.7.12
-def ACCC_Attach : Clause<"attach"> {
+def ACCC_Attach : Clause<[Spelling<"attach">]> {
   let flangClass = "AccObjectList";
 }
 
 // 2.15.1
-def ACCC_Bind : Clause<"bind"> {
+def ACCC_Bind : Clause<[Spelling<"bind">]> {
   let flangClass = "AccBindClause";
 }
 
 // 2.12
-def ACCC_Capture : Clause<"capture"> {
+def ACCC_Capture : Clause<[Spelling<"capture">]> {
 }
 
 // 2.9.1
-def ACCC_Collapse : Clause<"collapse"> {
+def ACCC_Collapse : Clause<[Spel

[llvm-branch-commits] [llvm] [mlir] [utils][TableGen] Implement clause aliases as alternative spellings (PR #141765)

2025-05-28 Thread via llvm-branch-commits


llvmbot wrote:



@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-openacc

Author: Krzysztof Parzyszek (kparzysz)


Changes

Use the spellings in the generated clause parser. The functions 
`getClauseKind` and `getClauseName` are not yet updated.

The definitions of both clauses and directives now take a list of "Spelling"s 
instead of a single string. For example
```
def ACCC_Copyin : Clause<[Spelling<"copyin">,
  Spelling<"present_or_copyin">,
  Spelling<"pcopyin">]> { ... }
```

A "Spelling" is a versioned string, defaulting to "all versions".

For background information see
https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507

---

Patch is 106.02 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/141765.diff


9 Files Affected:

- (modified) llvm/include/llvm/Frontend/Directive/DirectiveBase.td (+23-18) 
- (modified) llvm/include/llvm/Frontend/OpenACC/ACC.td (+73-73) 
- (modified) llvm/include/llvm/Frontend/OpenMP/OMP.td (+252-244) 
- (modified) llvm/include/llvm/TableGen/DirectiveEmitter.h (+80-5) 
- (modified) llvm/test/TableGen/directive1.td (+36-24) 
- (modified) llvm/test/TableGen/directive2.td (+38-25) 
- (modified) llvm/test/TableGen/directive3.td (+5-5) 
- (modified) llvm/utils/TableGen/Basic/DirectiveEmitter.cpp (+74-41) 
- (modified) mlir/test/mlir-tblgen/directive-common.td (+1-1) 


``diff
diff --git a/llvm/include/llvm/Frontend/Directive/DirectiveBase.td 
b/llvm/include/llvm/Frontend/Directive/DirectiveBase.td
index 582da20083aee..142ba0423f251 100644
--- a/llvm/include/llvm/Frontend/Directive/DirectiveBase.td
+++ b/llvm/include/llvm/Frontend/Directive/DirectiveBase.td
@@ -51,6 +51,20 @@ class DirectiveLanguage {
   string flangClauseBaseClass = "";
 }
 
+// Base class for versioned entities.
+class Versioned {
+  // Mininum version number where this object is valid.
+  int minVersion = min;
+
+  // Maximum version number where this object is valid.
+  int maxVersion = max;
+}
+
+class Spelling
+: Versioned {
+  string spelling = s;
+}
+
 // Some clauses take an argument from a predefined list of allowed keyword
 // values. For example, assume a clause "someclause" with an argument from
 // the list "foo", "bar", "baz". In the user source code this would look
@@ -81,12 +95,9 @@ class EnumVal {
 }
 
 // Information about a specific clause.
-class Clause {
-  // Name of the clause.
-  string name = c;
-
-  // Define aliases used in the parser.
-  list aliases = [];
+class Clause ss> {
+  // Spellings of the clause.
+  list spellings = ss;
 
   // Optional class holding value of the clause in clang AST.
   string clangClass = "";
@@ -134,15 +145,9 @@ class Clause {
 }
 
 // Hold information about clause validity by version.
-class VersionedClause {
-  // Actual clause.
+class VersionedClause
+: Versioned {
   Clause clause = c;
-
-  // Mininum version number where this clause is valid.
-  int minVersion = min;
-
-  // Maximum version number where this clause is valid.
-  int maxVersion = max;
 }
 
 // Kinds of directive associations.
@@ -190,15 +195,15 @@ class SourceLanguage {
   string name = n;  // Name of the enum value in enum class Association.
 }
 
-// The C languages also implies C++ until there is a reason to add C++
+// The C language also implies C++ until there is a reason to add C++
 // separately.
 def L_C : SourceLanguage<"C"> {}
 def L_Fortran : SourceLanguage<"Fortran"> {}
 
 // Information about a specific directive.
-class Directive {
-  // Name of the directive. Can be composite directive sepearted by whitespace.
-  string name = d;
+class Directive ss> {
+  // Spellings of the directive.
+  list spellings = ss;
 
   // Clauses cannot appear twice in the three allowed lists below. Also, since
   // required implies allowed, the same clause cannot appear in both the
diff --git a/llvm/include/llvm/Frontend/OpenACC/ACC.td 
b/llvm/include/llvm/Frontend/OpenACC/ACC.td
index b74cd6e5642ec..65751839ceb09 100644
--- a/llvm/include/llvm/Frontend/OpenACC/ACC.td
+++ b/llvm/include/llvm/Frontend/OpenACC/ACC.td
@@ -32,64 +32,65 @@ def OpenACC : DirectiveLanguage {
 
//===--===//
 
 // 2.16.1
-def ACCC_Async : Clause<"async"> {
+def ACCC_Async : Clause<[Spelling<"async">]> {
   let flangClass = "ScalarIntExpr";
   let isValueOptional = true;
 }
 
 // 2.9.7
-def ACCC_Auto : Clause<"auto"> {}
+def ACCC_Auto : Clause<[Spelling<"auto">]> {}
 
 // 2.7.12
-def ACCC_Attach : Clause<"attach"> {
+def ACCC_Attach : Clause<[Spelling<"attach">]> {
   let flangClass = "AccObjectList";
 }
 
 // 2.15.1
-def ACCC_Bind : Clause<"bind"> {
+def ACCC_Bind : Clause<[Spelling<"bind">]> {
   let flangClass = "AccBindClause";
 }
 
 // 2.12
-def ACCC_Capture : Clause<"capture"> {
+def ACCC_Capture : Clause<[Spelling<"capture">]> {
 }
 
 // 2.9.1
-def ACCC_Collapse : Clause<"collapse"> {
+def ACCC_Co

[llvm-branch-commits] [llvm] [utils][TableGen] Treat clause aliases equally with names (PR #141763)

2025-05-28 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-tablegen

Author: Krzysztof Parzyszek (kparzysz)


Changes

The code in DirectiveEmitter that generates clause parsers sorted clause names 
to ensure that longer names were tried before shorter ones, in cases where a 
shorter name may be a prefix of a longer one. This matters in the strict 
Fortran source format, since whitespace is ignored there.

This sorting did not take into account clause aliases, which are just 
alternative names. These extra names were not protected in the same way, and 
were just appended immediately after the primary name.

This patch generates a list of pairs Record+Name, where a given record can 
appear multiple times with different names. Sort that list and use it to 
generate parsers for each record. What used to be
```
  ("fred" || "f") >> construct{} ||
  "foo" << construct{}
```
is now
```
  "fred" >> construct{} ||
  "foo" >> construct{} ||
  "f" >> construct{}
```

---
Full diff: https://github.com/llvm/llvm-project/pull/141763.diff


2 Files Affected:

- (modified) llvm/test/TableGen/directive1.td (+3-1) 
- (modified) llvm/utils/TableGen/Basic/DirectiveEmitter.cpp (+39-36) 


``diff
diff --git a/llvm/test/TableGen/directive1.td b/llvm/test/TableGen/directive1.td
index 74091edfa2a66..f756f54c03bfb 100644
--- a/llvm/test/TableGen/directive1.td
+++ b/llvm/test/TableGen/directive1.td
@@ -34,6 +34,7 @@ def TDLC_ClauseB : Clause<"clauseb"> {
 }
 
 def TDLC_ClauseC : Clause<"clausec"> {
+  let aliases = ["ccc"];
   let flangClass = "IntExpr";
   let isValueList = 1;
 }
@@ -260,7 +261,8 @@ def TDL_DirA : Directive<"dira"> {
 // IMPL-NEXT:  TYPE_PARSER(
 // IMPL-NEXT:"clausec" >> 
construct(construct(parenthesized(nonemptyList(Parser{}
 || 
 // IMPL-NEXT:"clauseb" >> 
construct(construct(maybe(parenthesized(Parser{}
 ||
-// IMPL-NEXT:"clausea" >> 
construct(construct())
+// IMPL-NEXT:"clausea" >> 
construct(construct()) ||
+// IMPL-NEXT:"ccc" >> 
construct(construct(parenthesized(nonemptyList(Parser{}
 // IMPL-NEXT:  )
 // IMPL-EMPTY:
 // IMPL-NEXT:  #endif // GEN_FLANG_CLAUSES_PARSER
diff --git a/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp 
b/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp
index 9e79a83ed6e18..bd6c543e1741a 100644
--- a/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp
+++ b/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp
@@ -608,7 +608,7 @@ static void emitLeafTable(const DirectiveLanguage &DirLang, 
raw_ostream &OS,
   std::vector Ordering(Directives.size());
   std::iota(Ordering.begin(), Ordering.end(), 0);
 
-  sort(Ordering, [&](int A, int B) {
+  llvm::sort(Ordering, [&](int A, int B) {
 auto &LeavesA = LeafTable[A];
 auto &LeavesB = LeafTable[B];
 int DirA = LeavesA[0], DirB = LeavesB[0];
@@ -1113,59 +1113,63 @@ static void generateFlangClauseParserKindMap(const 
DirectiveLanguage &DirLang,
  << " Parser clause\");\n";
 }
 
-static bool compareClauseName(const Record *R1, const Record *R2) {
-  Clause C1(R1);
-  Clause C2(R2);
-  return (C1.getName() > C2.getName());
+using RecordWithText = std::pair;
+
+static bool compareRecordText(const RecordWithText &A,
+  const RecordWithText &B) {
+  return A.second > B.second;
+}
+
+static std::vector
+getSpellingTexts(ArrayRef Records) {
+  std::vector List;
+  for (const Record *R : Records) {
+Clause C(R);
+List.push_back(std::make_pair(R, C.getName()));
+llvm::transform(C.getAliases(), std::back_inserter(List),
+[R](StringRef S) { return std::make_pair(R, S); });
+  }
+  return List;
 }
 
 // Generate the parser for the clauses.
 static void generateFlangClausesParser(const DirectiveLanguage &DirLang,
raw_ostream &OS) {
   std::vector Clauses = DirLang.getClauses();
-  // Sort clauses in reverse alphabetical order so with clauses with same
-  // beginning, the longer option is tried before.
-  sort(Clauses, compareClauseName);
+  // Sort clauses in the reverse alphabetical order with respect to their
+  // names and aliases, so that longer names are tried before shorter ones.
+  std::vector> Names =
+  getSpellingTexts(Clauses);
+  llvm::sort(Names, compareRecordText);
   IfDefScope Scope("GEN_FLANG_CLAUSES_PARSER", OS);
   StringRef Base = DirLang.getFlangClauseBaseClass();
 
+  unsigned LastIndex = Names.size() - 1;
   OS << "\n";
-  unsigned Index = 0;
-  unsigned LastClauseIndex = Clauses.size() - 1;
   OS << "TYPE_PARSER(\n";
-  for (const Clause Clause : Clauses) {
-const std::vector &Aliases = Clause.getAliases();
-if (Aliases.empty()) {
-  OS << "  \"" << Clause.getName() << "\"";
-} else {
-  OS << "  ("
- << "\"" << Clause.getName() << "\"_tok";
-  for (StringRef Alias : Aliases) {
-OS << " || \"" << Alias << "\"_tok";
-  }
-  OS << ")";
-}
+  for (auto [Index, RecTxt] : llvm::enumera

[llvm-branch-commits] [llvm] [utils][TableGen] Handle versions on clause/directive spellings (PR #141766)

2025-05-28 Thread Krzysztof Parzyszek via llvm-branch-commits


https://github.com/kparzysz created 
https://github.com/llvm/llvm-project/pull/141766

In "getDirectiveName(Kind, Version)", return the spelling that 
corresponds to Version, and in "getDirectiveKindAndVersions(Name)" return 
the pair {Kind, VersionRange}, where VersionRange contains the minimum and the 
maximum versions that allow "Name" as a spelling. This applies to clauses as 
well. In general it applies to classes that have spellings (defined via 
TableGen class "Spelling").

Given a Kind and a Version, getting the corresponding spelling requires a 
runtime search (which can fail in a general case). To avoid generating the 
search function inline, a small additional component of llvm/Frontent was 
added: LLVMFrontendDirective. The corresponding header file also defines C++ 
classes "Spelling" and "VersionRange", which are used in 
TableGen/DirectiveEmitter as well.

For background information see
https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507

>From 2ef30aacee4d80c0e4a925aa5ba9416423d10b1b Mon Sep 17 00:00:00 2001
From: Krzysztof Parzyszek 
Date: Tue, 27 May 2025 07:55:04 -0500
Subject: [PATCH] [utils][TableGen] Handle versions on clause/directive
 spellings

In "getDirectiveName(Kind, Version)", return the spelling that
corresponds to Version, and in "getDirectiveKindAndVersions(Name)"
return the pair {Kind, VersionRange}, where VersionRange contains the
minimum and the maximum versions that allow "Name" as a spelling.
This applies to clauses as well. In general it applies to classes that
have spellings (defined via TableGen class "Spelling").

Given a Kind and a Version, getting the corresponding spelling requires
a runtime search (which can fail in a general case). To avoid generating
the search function inline, a small additional component of llvm/Frontent
was added: LLVMFrontendDirective. The corresponding header file also
defines C++ classes "Spelling" and "VersionRange", which are used in
TableGen/DirectiveEmitter as well.

For background information see
https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507
---
 .../llvm/Frontend/Directive/Spelling.h|  39 +
 llvm/include/llvm/TableGen/DirectiveEmitter.h |  25 +--
 llvm/lib/Frontend/CMakeLists.txt  |   1 +
 llvm/lib/Frontend/Directive/CMakeLists.txt|   6 +
 llvm/lib/Frontend/Directive/Spelling.cpp  |  31 
 llvm/lib/Frontend/OpenACC/CMakeLists.txt  |   2 +-
 llvm/lib/Frontend/OpenMP/CMakeLists.txt   |   1 +
 llvm/test/TableGen/directive1.td  |  34 ++--
 llvm/test/TableGen/directive2.td  |  24 +--
 .../utils/TableGen/Basic/DirectiveEmitter.cpp | 146 +++---
 10 files changed, 212 insertions(+), 97 deletions(-)
 create mode 100644 llvm/include/llvm/Frontend/Directive/Spelling.h
 create mode 100644 llvm/lib/Frontend/Directive/CMakeLists.txt
 create mode 100644 llvm/lib/Frontend/Directive/Spelling.cpp

diff --git a/llvm/include/llvm/Frontend/Directive/Spelling.h 
b/llvm/include/llvm/Frontend/Directive/Spelling.h
new file mode 100644
index 0..3ba0ae2296535
--- /dev/null
+++ b/llvm/include/llvm/Frontend/Directive/Spelling.h
@@ -0,0 +1,39 @@
+//===-- Spelling.h  C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+#ifndef LLVM_FRONTEND_DIRECTIVE_SPELLING_H
+#define LLVM_FRONTEND_DIRECTIVE_SPELLING_H
+
+#include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/iterator_range.h"
+
+#include 
+
+namespace llvm::directive {
+
+struct VersionRange {
+  static constexpr int MaxValue = std::numeric_limits::max();
+  int Min = 1;
+  int Max = MaxValue;
+};
+
+inline bool operator<(const VersionRange &A, const VersionRange &B) {
+  if (A.Min != B.Min)
+return A.Min < B.Min;
+  return A.Max < B.Max;
+}
+
+struct Spelling {
+  StringRef Name;
+  VersionRange Versions;
+};
+
+StringRef FindName(llvm::iterator_range, unsigned Version);
+
+} // namespace llvm::directive
+
+#endif // LLVM_FRONTEND_DIRECTIVE_SPELLING_H
diff --git a/llvm/include/llvm/TableGen/DirectiveEmitter.h 
b/llvm/include/llvm/TableGen/DirectiveEmitter.h
index 1235b7638e761..c7d7460087723 100644
--- a/llvm/include/llvm/TableGen/DirectiveEmitter.h
+++ b/llvm/include/llvm/TableGen/DirectiveEmitter.h
@@ -17,6 +17,7 @@
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/Frontend/Directive/Spelling.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/TableGen/Record.h"
 #include 
@@ -113,29 +114,19 @@ class Versioned {
   constexpr static int IntWidth = 8 * sizeof(int);
 };
 
-// Range of specification versions: [Min, Max]
-// Default value: all possible versions.
-// This is the same

[llvm-branch-commits] [llvm] [utils][TableGen] Handle versions on clause/directive spellings (PR #141766)

2025-05-28 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-openacc

Author: Krzysztof Parzyszek (kparzysz)


Changes

In "getDirectiveName(Kind, Version)", return the spelling that 
corresponds to Version, and in "getDirectiveKindAndVersions(Name)" 
return the pair {Kind, VersionRange}, where VersionRange contains the minimum 
and the maximum versions that allow "Name" as a spelling. This applies to 
clauses as well. In general it applies to classes that have spellings (defined 
via TableGen class "Spelling").

Given a Kind and a Version, getting the corresponding spelling requires a 
runtime search (which can fail in a general case). To avoid generating the 
search function inline, a small additional component of llvm/Frontent was 
added: LLVMFrontendDirective. The corresponding header file also defines C++ 
classes "Spelling" and "VersionRange", which are used in 
TableGen/DirectiveEmitter as well.

For background information see
https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507

---

Patch is 26.55 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/141766.diff


10 Files Affected:

- (added) llvm/include/llvm/Frontend/Directive/Spelling.h (+39) 
- (modified) llvm/include/llvm/TableGen/DirectiveEmitter.h (+8-17) 
- (modified) llvm/lib/Frontend/CMakeLists.txt (+1) 
- (added) llvm/lib/Frontend/Directive/CMakeLists.txt (+6) 
- (added) llvm/lib/Frontend/Directive/Spelling.cpp (+31) 
- (modified) llvm/lib/Frontend/OpenACC/CMakeLists.txt (+1-1) 
- (modified) llvm/lib/Frontend/OpenMP/CMakeLists.txt (+1) 
- (modified) llvm/test/TableGen/directive1.td (+20-14) 
- (modified) llvm/test/TableGen/directive2.td (+12-12) 
- (modified) llvm/utils/TableGen/Basic/DirectiveEmitter.cpp (+93-53) 


``diff
diff --git a/llvm/include/llvm/Frontend/Directive/Spelling.h 
b/llvm/include/llvm/Frontend/Directive/Spelling.h
new file mode 100644
index 0..3ba0ae2296535
--- /dev/null
+++ b/llvm/include/llvm/Frontend/Directive/Spelling.h
@@ -0,0 +1,39 @@
+//===-- Spelling.h  C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+#ifndef LLVM_FRONTEND_DIRECTIVE_SPELLING_H
+#define LLVM_FRONTEND_DIRECTIVE_SPELLING_H
+
+#include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/iterator_range.h"
+
+#include 
+
+namespace llvm::directive {
+
+struct VersionRange {
+  static constexpr int MaxValue = std::numeric_limits::max();
+  int Min = 1;
+  int Max = MaxValue;
+};
+
+inline bool operator<(const VersionRange &A, const VersionRange &B) {
+  if (A.Min != B.Min)
+return A.Min < B.Min;
+  return A.Max < B.Max;
+}
+
+struct Spelling {
+  StringRef Name;
+  VersionRange Versions;
+};
+
+StringRef FindName(llvm::iterator_range, unsigned Version);
+
+} // namespace llvm::directive
+
+#endif // LLVM_FRONTEND_DIRECTIVE_SPELLING_H
diff --git a/llvm/include/llvm/TableGen/DirectiveEmitter.h 
b/llvm/include/llvm/TableGen/DirectiveEmitter.h
index 1235b7638e761..c7d7460087723 100644
--- a/llvm/include/llvm/TableGen/DirectiveEmitter.h
+++ b/llvm/include/llvm/TableGen/DirectiveEmitter.h
@@ -17,6 +17,7 @@
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/Frontend/Directive/Spelling.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/TableGen/Record.h"
 #include 
@@ -113,29 +114,19 @@ class Versioned {
   constexpr static int IntWidth = 8 * sizeof(int);
 };
 
-// Range of specification versions: [Min, Max]
-// Default value: all possible versions.
-// This is the same structure as the one emitted into the generated sources.
-#define STRUCT_VERSION_RANGE   
\
-  struct VersionRange {
\
-int Min = 1;   
\
-int Max = INT_MAX; 
\
-  }
-
-STRUCT_VERSION_RANGE;
-
 class Spelling : public Versioned {
 public:
-  using Value = std::pair;
+  using Value = llvm::directive::Spelling;
 
   Spelling(const Record *Def) : Def(Def) {}
 
   StringRef getText() const { return Def->getValueAsString("spelling"); }
-  VersionRange getVersions() const {
-return VersionRange{getMinVersion(Def), getMaxVersion(Def)};
+  llvm::directive::VersionRange getVersions() const {
+return llvm::directive::VersionRange{getMinVersion(Def),
+ getMaxVersion(Def)};
   }
 
-  Value get() const { return std::make_pair(getText(), getVersions()); }
+  Value get() const { return Value{getText(), getVersions()}; }
 
 private:
   const Record *Def;
@@ -177,11 +168,11 @@ class BaseRe

[llvm-branch-commits] [llvm] [utils][TableGen] Handle versions on clause/directive spellings (PR #141766)

2025-05-28 Thread Krzysztof Parzyszek via llvm-branch-commits


https://github.com/kparzysz updated 
https://github.com/llvm/llvm-project/pull/141766

>From 2ef30aacee4d80c0e4a925aa5ba9416423d10b1b Mon Sep 17 00:00:00 2001
From: Krzysztof Parzyszek 
Date: Tue, 27 May 2025 07:55:04 -0500
Subject: [PATCH] [utils][TableGen] Handle versions on clause/directive
 spellings

In "getDirectiveName(Kind, Version)", return the spelling that
corresponds to Version, and in "getDirectiveKindAndVersions(Name)"
return the pair {Kind, VersionRange}, where VersionRange contains the
minimum and the maximum versions that allow "Name" as a spelling.
This applies to clauses as well. In general it applies to classes that
have spellings (defined via TableGen class "Spelling").

Given a Kind and a Version, getting the corresponding spelling requires
a runtime search (which can fail in a general case). To avoid generating
the search function inline, a small additional component of llvm/Frontent
was added: LLVMFrontendDirective. The corresponding header file also
defines C++ classes "Spelling" and "VersionRange", which are used in
TableGen/DirectiveEmitter as well.

For background information see
https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507
---
 .../llvm/Frontend/Directive/Spelling.h|  39 +
 llvm/include/llvm/TableGen/DirectiveEmitter.h |  25 +--
 llvm/lib/Frontend/CMakeLists.txt  |   1 +
 llvm/lib/Frontend/Directive/CMakeLists.txt|   6 +
 llvm/lib/Frontend/Directive/Spelling.cpp  |  31 
 llvm/lib/Frontend/OpenACC/CMakeLists.txt  |   2 +-
 llvm/lib/Frontend/OpenMP/CMakeLists.txt   |   1 +
 llvm/test/TableGen/directive1.td  |  34 ++--
 llvm/test/TableGen/directive2.td  |  24 +--
 .../utils/TableGen/Basic/DirectiveEmitter.cpp | 146 +++---
 10 files changed, 212 insertions(+), 97 deletions(-)
 create mode 100644 llvm/include/llvm/Frontend/Directive/Spelling.h
 create mode 100644 llvm/lib/Frontend/Directive/CMakeLists.txt
 create mode 100644 llvm/lib/Frontend/Directive/Spelling.cpp

diff --git a/llvm/include/llvm/Frontend/Directive/Spelling.h 
b/llvm/include/llvm/Frontend/Directive/Spelling.h
new file mode 100644
index 0..3ba0ae2296535
--- /dev/null
+++ b/llvm/include/llvm/Frontend/Directive/Spelling.h
@@ -0,0 +1,39 @@
+//===-- Spelling.h  C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+#ifndef LLVM_FRONTEND_DIRECTIVE_SPELLING_H
+#define LLVM_FRONTEND_DIRECTIVE_SPELLING_H
+
+#include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/iterator_range.h"
+
+#include 
+
+namespace llvm::directive {
+
+struct VersionRange {
+  static constexpr int MaxValue = std::numeric_limits::max();
+  int Min = 1;
+  int Max = MaxValue;
+};
+
+inline bool operator<(const VersionRange &A, const VersionRange &B) {
+  if (A.Min != B.Min)
+return A.Min < B.Min;
+  return A.Max < B.Max;
+}
+
+struct Spelling {
+  StringRef Name;
+  VersionRange Versions;
+};
+
+StringRef FindName(llvm::iterator_range, unsigned Version);
+
+} // namespace llvm::directive
+
+#endif // LLVM_FRONTEND_DIRECTIVE_SPELLING_H
diff --git a/llvm/include/llvm/TableGen/DirectiveEmitter.h 
b/llvm/include/llvm/TableGen/DirectiveEmitter.h
index 1235b7638e761..c7d7460087723 100644
--- a/llvm/include/llvm/TableGen/DirectiveEmitter.h
+++ b/llvm/include/llvm/TableGen/DirectiveEmitter.h
@@ -17,6 +17,7 @@
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/Frontend/Directive/Spelling.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/TableGen/Record.h"
 #include 
@@ -113,29 +114,19 @@ class Versioned {
   constexpr static int IntWidth = 8 * sizeof(int);
 };
 
-// Range of specification versions: [Min, Max]
-// Default value: all possible versions.
-// This is the same structure as the one emitted into the generated sources.
-#define STRUCT_VERSION_RANGE   
\
-  struct VersionRange {
\
-int Min = 1;   
\
-int Max = INT_MAX; 
\
-  }
-
-STRUCT_VERSION_RANGE;
-
 class Spelling : public Versioned {
 public:
-  using Value = std::pair;
+  using Value = llvm::directive::Spelling;
 
   Spelling(const Record *Def) : Def(Def) {}
 
   StringRef getText() const { return Def->getValueAsString("spelling"); }
-  VersionRange getVersions() const {
-return VersionRange{getMinVersion(Def), getMaxVersion(Def)};
+  llvm::directive::VersionRange getVersions() const {
+return llvm::directive::VersionRange{getMinVersion(Def),
+

[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add parsing of floats for StaticSampler (PR #140181)

2025-05-28 Thread Chris B via llvm-branch-commits



@@ -711,6 +734,35 @@ std::optional 
RootSignatureParser::parseRegister() {
   return Reg;
 }
 
+std::optional RootSignatureParser::parseFloatParam() {
+  assert(CurToken.TokKind == TokenKind::pu_equal &&
+ "Expects to only be invoked starting at given keyword");
+  // Consume sign modifier
+  bool Signed =
+  tryConsumeExpectedToken({TokenKind::pu_plus, TokenKind::pu_minus});
+  bool Negated = Signed && CurToken.TokKind == TokenKind::pu_minus;
+
+  // DXC will treat a postive signed integer as unsigned
+  if (!Negated && tryConsumeExpectedToken(TokenKind::int_literal)) {
+auto UInt = handleUIntLiteral();
+if (!UInt.has_value())
+  return std::nullopt;
+return (float)UInt.value();
+  } else if (tryConsumeExpectedToken(TokenKind::int_literal)) {

llvm-beanz wrote:

Flyby style nit: Don't use `else` after a `return`

https://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return

https://github.com/llvm/llvm-project/pull/140181
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] Add SimplifyTypeTests pass. (PR #141327)

2025-05-28 Thread Teresa Johnson via llvm-branch-commits



@@ -2478,3 +2479,76 @@ PreservedAnalyses LowerTypeTestsPass::run(Module &M,
 return PreservedAnalyses::all();
   return PreservedAnalyses::none();
 }
+
+PreservedAnalyses SimplifyTypeTestsPass::run(Module &M,
+ ModuleAnalysisManager &AM) {
+  bool Changed = false;
+  // Figure out whether inlining has exposed a constant address to a lowered
+  // type test, and remove the test if so and the address is known to pass the
+  // test. Unfortunately this pass ends up needing to reverse engineer what
+  // LowerTypeTests did; this is currently inherent to the design of ThinLTO
+  // importing where LowerTypeTests needs to run at the start.
+  for (auto &GV : M.globals()) {
+if (!GV.getName().starts_with("__typeid_") ||
+!GV.getName().ends_with("_global_addr"))
+  continue;
+auto *MD = MDString::get(M.getContext(),
+ GV.getName().substr(9, GV.getName().size() - 21));
+auto MaySimplifyPtr = [&](Value *Ptr) {
+  if (auto *GV = dyn_cast(Ptr))
+if (auto *CFIGV = M.getNamedValue((GV->getName() + ".cfi").str()))
+  Ptr = CFIGV;
+  return isKnownTypeIdMember(MD, M.getDataLayout(), Ptr, 0);

teresajohnson wrote:

Are there cases where the GV will not have a ".cfi" extension? I notice the 
test has that extension.

https://github.com/llvm/llvm-project/pull/141327
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] Add SimplifyTypeTests pass. (PR #141327)

2025-05-28 Thread Teresa Johnson via llvm-branch-commits



@@ -2478,3 +2479,76 @@ PreservedAnalyses LowerTypeTestsPass::run(Module &M,
 return PreservedAnalyses::all();
   return PreservedAnalyses::none();
 }
+
+PreservedAnalyses SimplifyTypeTestsPass::run(Module &M,
+ ModuleAnalysisManager &AM) {
+  bool Changed = false;
+  // Figure out whether inlining has exposed a constant address to a lowered
+  // type test, and remove the test if so and the address is known to pass the
+  // test. Unfortunately this pass ends up needing to reverse engineer what
+  // LowerTypeTests did; this is currently inherent to the design of ThinLTO

teresajohnson wrote:

Can you add a more extensive comment with what this is looking for and why? I 
don't look at lower type test output often so I don't recall offhand what e.g. 
it would have looked like without inlining vs with.

https://github.com/llvm/llvm-project/pull/141327
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] Add SimplifyTypeTests pass. (PR #141327)

2025-05-28 Thread Teresa Johnson via llvm-branch-commits



@@ -0,0 +1,40 @@
+; RUN: opt -S %s -passes=simplify-type-tests | FileCheck %s

teresajohnson wrote:

Add a comment about what this is testing

https://github.com/llvm/llvm-project/pull/141327
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] Add SimplifyTypeTests pass. (PR #141327)

2025-05-28 Thread Teresa Johnson via llvm-branch-commits



@@ -2478,3 +2479,76 @@ PreservedAnalyses LowerTypeTestsPass::run(Module &M,
 return PreservedAnalyses::all();
   return PreservedAnalyses::none();
 }
+
+PreservedAnalyses SimplifyTypeTestsPass::run(Module &M,
+ ModuleAnalysisManager &AM) {
+  bool Changed = false;
+  // Figure out whether inlining has exposed a constant address to a lowered
+  // type test, and remove the test if so and the address is known to pass the
+  // test. Unfortunately this pass ends up needing to reverse engineer what
+  // LowerTypeTests did; this is currently inherent to the design of ThinLTO
+  // importing where LowerTypeTests needs to run at the start.
+  for (auto &GV : M.globals()) {
+if (!GV.getName().starts_with("__typeid_") ||
+!GV.getName().ends_with("_global_addr"))
+  continue;
+auto *MD = MDString::get(M.getContext(),

teresajohnson wrote:

Can you add a comment on this conversion? Figured it out by adding up the chars 
myself but it would be good to make it explicit.

https://github.com/llvm/llvm-project/pull/141327
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [HLSL] Diagnose overlapping resource bindings (PR #140982)

2025-05-28 Thread Finn Plummer via llvm-branch-commits



@@ -50,15 +51,55 @@ static void reportInvalidDirection(Module &M, 
DXILResourceMap &DRM) {
   }
 }
 
-} // namespace
+static void reportOverlappingError(Module &M, ResourceInfo R1,
+   ResourceInfo R2) {
+  SmallString<64> Message;
+  raw_svector_ostream OS(Message);
+  OS << "resource " << R1.getName() << " at register "
+ << R1.getBinding().LowerBound << " overlaps with resource " << 
R2.getName()
+ << " at register " << R2.getBinding().LowerBound << ", space "
+ << R2.getBinding().Space;
+  M.getContext().diagnose(DiagnosticInfoGeneric(Message));
+}
 
-PreservedAnalyses
-DXILPostOptimizationValidation::run(Module &M, ModuleAnalysisManager &MAM) {
-  DXILResourceMap &DRM = MAM.getResult(M);
+static void reportOverlappingBinding(Module &M, DXILResourceMap &DRM) {
+  if (DRM.empty())
+return;
 
+  for (auto ResList :
+   {DRM.srvs(), DRM.uavs(), DRM.cbuffers(), DRM.samplers()}) {
+if (ResList.empty())
+  continue;
+const ResourceInfo *PrevRI = &*ResList.begin();
+for (auto *I = ResList.begin() + 1; I != ResList.end(); ++I) {
+  const ResourceInfo *RI = &*I;
+  if (PrevRI->getBinding().overlapsWith(RI->getBinding())) {

inbelic wrote:

Ah I see. Yep, then my issues are resolved and this LGTM

https://github.com/llvm/llvm-project/pull/140982
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [utils][TableGen] Handle versions on clause/directive spellings (PR #141766)

2025-05-28 Thread Krzysztof Parzyszek via llvm-branch-commits


https://github.com/kparzysz updated 
https://github.com/llvm/llvm-project/pull/141766

>From 2ef30aacee4d80c0e4a925aa5ba9416423d10b1b Mon Sep 17 00:00:00 2001
From: Krzysztof Parzyszek 
Date: Tue, 27 May 2025 07:55:04 -0500
Subject: [PATCH 1/4] [utils][TableGen] Handle versions on clause/directive
 spellings

In "getDirectiveName(Kind, Version)", return the spelling that
corresponds to Version, and in "getDirectiveKindAndVersions(Name)"
return the pair {Kind, VersionRange}, where VersionRange contains the
minimum and the maximum versions that allow "Name" as a spelling.
This applies to clauses as well. In general it applies to classes that
have spellings (defined via TableGen class "Spelling").

Given a Kind and a Version, getting the corresponding spelling requires
a runtime search (which can fail in a general case). To avoid generating
the search function inline, a small additional component of llvm/Frontent
was added: LLVMFrontendDirective. The corresponding header file also
defines C++ classes "Spelling" and "VersionRange", which are used in
TableGen/DirectiveEmitter as well.

For background information see
https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507
---
 .../llvm/Frontend/Directive/Spelling.h|  39 +
 llvm/include/llvm/TableGen/DirectiveEmitter.h |  25 +--
 llvm/lib/Frontend/CMakeLists.txt  |   1 +
 llvm/lib/Frontend/Directive/CMakeLists.txt|   6 +
 llvm/lib/Frontend/Directive/Spelling.cpp  |  31 
 llvm/lib/Frontend/OpenACC/CMakeLists.txt  |   2 +-
 llvm/lib/Frontend/OpenMP/CMakeLists.txt   |   1 +
 llvm/test/TableGen/directive1.td  |  34 ++--
 llvm/test/TableGen/directive2.td  |  24 +--
 .../utils/TableGen/Basic/DirectiveEmitter.cpp | 146 +++---
 10 files changed, 212 insertions(+), 97 deletions(-)
 create mode 100644 llvm/include/llvm/Frontend/Directive/Spelling.h
 create mode 100644 llvm/lib/Frontend/Directive/CMakeLists.txt
 create mode 100644 llvm/lib/Frontend/Directive/Spelling.cpp

diff --git a/llvm/include/llvm/Frontend/Directive/Spelling.h 
b/llvm/include/llvm/Frontend/Directive/Spelling.h
new file mode 100644
index 0..3ba0ae2296535
--- /dev/null
+++ b/llvm/include/llvm/Frontend/Directive/Spelling.h
@@ -0,0 +1,39 @@
+//===-- Spelling.h  C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+#ifndef LLVM_FRONTEND_DIRECTIVE_SPELLING_H
+#define LLVM_FRONTEND_DIRECTIVE_SPELLING_H
+
+#include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/iterator_range.h"
+
+#include 
+
+namespace llvm::directive {
+
+struct VersionRange {
+  static constexpr int MaxValue = std::numeric_limits::max();
+  int Min = 1;
+  int Max = MaxValue;
+};
+
+inline bool operator<(const VersionRange &A, const VersionRange &B) {
+  if (A.Min != B.Min)
+return A.Min < B.Min;
+  return A.Max < B.Max;
+}
+
+struct Spelling {
+  StringRef Name;
+  VersionRange Versions;
+};
+
+StringRef FindName(llvm::iterator_range, unsigned Version);
+
+} // namespace llvm::directive
+
+#endif // LLVM_FRONTEND_DIRECTIVE_SPELLING_H
diff --git a/llvm/include/llvm/TableGen/DirectiveEmitter.h 
b/llvm/include/llvm/TableGen/DirectiveEmitter.h
index 1235b7638e761..c7d7460087723 100644
--- a/llvm/include/llvm/TableGen/DirectiveEmitter.h
+++ b/llvm/include/llvm/TableGen/DirectiveEmitter.h
@@ -17,6 +17,7 @@
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/Frontend/Directive/Spelling.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/TableGen/Record.h"
 #include 
@@ -113,29 +114,19 @@ class Versioned {
   constexpr static int IntWidth = 8 * sizeof(int);
 };
 
-// Range of specification versions: [Min, Max]
-// Default value: all possible versions.
-// This is the same structure as the one emitted into the generated sources.
-#define STRUCT_VERSION_RANGE   
\
-  struct VersionRange {
\
-int Min = 1;   
\
-int Max = INT_MAX; 
\
-  }
-
-STRUCT_VERSION_RANGE;
-
 class Spelling : public Versioned {
 public:
-  using Value = std::pair;
+  using Value = llvm::directive::Spelling;
 
   Spelling(const Record *Def) : Def(Def) {}
 
   StringRef getText() const { return Def->getValueAsString("spelling"); }
-  VersionRange getVersions() const {
-return VersionRange{getMinVersion(Def), getMaxVersion(Def)};
+  llvm::directive::VersionRange getVersions() const {
+return llvm::directive::VersionRange{getMinVersion(Def),
+

[llvm-branch-commits] [llvm] [SPARC][IAS] Add definitions for OSA 2011 instructions (PR #138403)

2025-05-28 Thread via llvm-branch-commits


https://github.com/koachan updated 
https://github.com/llvm/llvm-project/pull/138403

>From 5b59eb6176ee2790e7b31e99ae7f7769bf630b1a Mon Sep 17 00:00:00 2001
From: Koakuma 
Date: Thu, 29 May 2025 11:04:46 +0700
Subject: [PATCH] Apply feedback

Created using spr 1.3.5
---
 .../Sparc/MCTargetDesc/SparcAsmBackend.cpp|   6 +
 .../Sparc/MCTargetDesc/SparcMCCodeEmitter.cpp |   9 +-
 llvm/lib/Target/Sparc/SparcInstrAliases.td|  18 +-
 llvm/lib/Target/Sparc/SparcInstrFormats.td|   4 +-
 llvm/test/MC/Sparc/Relocations/expr.s |  16 +-
 llvm/test/MC/Sparc/sparc64-branch-offset.s| 508 +-
 6 files changed, 289 insertions(+), 272 deletions(-)

diff --git a/llvm/lib/Target/Sparc/MCTargetDesc/SparcAsmBackend.cpp 
b/llvm/lib/Target/Sparc/MCTargetDesc/SparcAsmBackend.cpp
index c74f24d95523e..743752ad2c107 100644
--- a/llvm/lib/Target/Sparc/MCTargetDesc/SparcAsmBackend.cpp
+++ b/llvm/lib/Target/Sparc/MCTargetDesc/SparcAsmBackend.cpp
@@ -51,6 +51,9 @@ static unsigned adjustFixupValue(unsigned Kind, uint64_t 
Value) {
   }
 
   case ELF::R_SPARC_WDISP10: {
+// FIXME this really should be an error reporting check.
+assert((Value & 0x3) == 0);
+
 // 7.17 Compare and Branch
 // Inst{20-19} = d10hi;
 // Inst{12-5}  = d10lo;
@@ -70,6 +73,9 @@ static unsigned adjustFixupValue(unsigned Kind, uint64_t 
Value) {
   case Sparc::fixup_sparc_13:
 return Value & 0x1fff;
 
+  case ELF::R_SPARC_5:
+return Value & 0x1f;
+
   case ELF::R_SPARC_LOX10:
 return (Value & 0x3ff) | 0x1c00;
 
diff --git a/llvm/lib/Target/Sparc/MCTargetDesc/SparcMCCodeEmitter.cpp 
b/llvm/lib/Target/Sparc/MCTargetDesc/SparcMCCodeEmitter.cpp
index b44d4361dacdb..2c8dbaa5aba60 100644
--- a/llvm/lib/Target/Sparc/MCTargetDesc/SparcMCCodeEmitter.cpp
+++ b/llvm/lib/Target/Sparc/MCTargetDesc/SparcMCCodeEmitter.cpp
@@ -164,7 +164,12 @@ unsigned SparcMCCodeEmitter::getSImm5OpValue(const MCInst 
&MI, unsigned OpNo,
   if (const MCConstantExpr *CE = dyn_cast(Expr))
 return CE->getValue();
 
-  llvm_unreachable("simm5 operands can only be used with constants!");
+  if (const SparcMCExpr *SExpr = dyn_cast(Expr)) {
+Fixups.push_back(MCFixup::create(0, Expr, SExpr->getFixupKind()));
+return 0;
+  }
+  Fixups.push_back(MCFixup::create(0, Expr, ELF::R_SPARC_5));
+  return 0;
 }
 
 unsigned
@@ -247,7 +252,7 @@ unsigned 
SparcMCCodeEmitter::getCompareAndBranchTargetOpValue(
 const MCInst &MI, unsigned OpNo, SmallVectorImpl &Fixups,
 const MCSubtargetInfo &STI) const {
   const MCOperand &MO = MI.getOperand(OpNo);
-  if (MO.isReg() || MO.isImm())
+  if (MO.isImm())
 return getMachineOpValue(MI, MO, Fixups, STI);
 
   Fixups.push_back(MCFixup::create(0, MO.getExpr(), ELF::R_SPARC_WDISP10));
diff --git a/llvm/lib/Target/Sparc/SparcInstrAliases.td 
b/llvm/lib/Target/Sparc/SparcInstrAliases.td
index fa2c62101d30e..459fd193db0ed 100644
--- a/llvm/lib/Target/Sparc/SparcInstrAliases.td
+++ b/llvm/lib/Target/Sparc/SparcInstrAliases.td
@@ -333,19 +333,19 @@ multiclass reg_cond_alias {
 
 // Instruction aliases for compare-and-branch.
 multiclass cwb_cond_alias {
-  def : InstAlias,
   Requires<[HasOSA2011]>;
-  def : InstAlias,
   Requires<[HasOSA2011]>;
 }
 
 multiclass cxb_cond_alias {
-  def : InstAlias,
   Requires<[HasOSA2011]>;
-  def : InstAlias,
   Requires<[HasOSA2011]>;
 }
@@ -441,8 +441,7 @@ defm : cwb_cond_alias<"pos",  0b1110>;
 defm : cwb_cond_alias<"neg",  0b0110>;
 defm : cwb_cond_alias<"vc",   0b>;
 defm : cwb_cond_alias<"vs",   0b0111>;
-let EmitPriority = 0 in
-{
+let EmitPriority = 0 in {
   defm : cwb_cond_alias<"geu",  0b1101>; // same as cc
   defm : cwb_cond_alias<"lu",   0b0101>; // same as cs
 }
@@ -461,8 +460,7 @@ defm : cxb_cond_alias<"pos",  0b1110>;
 defm : cxb_cond_alias<"neg",  0b0110>;
 defm : cxb_cond_alias<"vc",   0b>;
 defm : cxb_cond_alias<"vs",   0b0111>;
-let EmitPriority = 0 in
-{
+let EmitPriority = 0 in {
   defm : cxb_cond_alias<"geu",  0b1101>; // same as cc
   defm : cxb_cond_alias<"lu",   0b0101>; // same as cs
 }
@@ -727,6 +725,6 @@ def : InstAlias<"sir", (SIR 0), 0>;
 
 // pause reg_or_imm -> wrasr %g0, reg_or_imm, %asr27
 let Predicates = [HasOSA2011] in {
-def : InstAlias<"pause $rs2", (WRASRrr ASR27, G0, IntRegs:$rs2), 1>;
-def : InstAlias<"pause $simm13", (WRASRri ASR27, G0, simm13Op:$simm13), 1>;
+  def : InstAlias<"pause $rs2", (WRASRrr ASR27, G0, IntRegs:$rs2), 1>;
+  def : InstAlias<"pause $simm13", (WRASRri ASR27, G0, simm13Op:$simm13), 1>;
 } // Predicates = [HasOSA2011]
diff --git a/llvm/lib/Target/Sparc/SparcInstrFormats.td 
b/llvm/lib/Target/Sparc/SparcInstrFormats.td
index fe10bb443348a..79c4cb2128a0f 100644
--- a/llvm/lib/Target/Sparc/SparcInstrFormats.td
+++ b/llvm/lib/Target/Sparc/SparcInstrFormats.td
@@ -104,7 +104,7 @@ class F2_4 pattern = [], InstrItinClass itin = NoItinerary>
-   : InstSP {
+: InstSP {
   bits<10> imm10;
   bits<5>  rs1;
   bits<5>  rs2;
@@ -1

[llvm-branch-commits] [clang] Implement src:*=sanitize for UBSan. (PR #140489)

2025-05-28 Thread Qinkun Bao via llvm-branch-commits


https://github.com/qinkunbao edited 
https://github.com/llvm/llvm-project/pull/140489
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Add overflow operations to isBoolSGPR (PR #141803)

2025-05-28 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/141803?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#141803** https://app.graphite.dev/github/pr/llvm/llvm-project/141803?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/141803?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#141801** https://app.graphite.dev/github/pr/llvm/llvm-project/141801?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/141803
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Add is.shared/is.private intrinsics to isBoolSGPR (PR #141804)

2025-05-28 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/141804

No change in the net output since these ultimately expand to setcc,
but saves a step in the DAG.

>From 6967e6460456e755ce0767243834847cabcfbc06 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 28 May 2025 18:37:25 +0200
Subject: [PATCH] AMDGPU: Add is.shared/is.private intrinsics to isBoolSGPR

No change in the net output since these ultimately expand to setcc,
but saves a step in the DAG.
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 12 +
 .../CodeGen/AMDGPU/combine-cond-add-sub.ll| 48 +++
 2 files changed, 60 insertions(+)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 7ad10454e7931..b124f02d32a8a 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -11922,6 +11922,18 @@ bool llvm::isBoolSGPR(SDValue V) {
   case ISD::SMULO:
   case ISD::UMULO:
 return V.getResNo() == 1;
+  case ISD::INTRINSIC_WO_CHAIN: {
+unsigned IntrinsicID = V.getConstantOperandVal(0);
+switch (IntrinsicID) {
+case Intrinsic::amdgcn_is_shared:
+case Intrinsic::amdgcn_is_private:
+  return true;
+default:
+  return false;
+}
+
+return false;
+  }
   }
   return false;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/combine-cond-add-sub.ll 
b/llvm/test/CodeGen/AMDGPU/combine-cond-add-sub.ll
index 1778fa42fbf7e..ba8abdc17fb05 100644
--- a/llvm/test/CodeGen/AMDGPU/combine-cond-add-sub.ll
+++ b/llvm/test/CodeGen/AMDGPU/combine-cond-add-sub.ll
@@ -740,6 +740,54 @@ bb:
   ret void
 }
 
+define i32 @add_sext_bool_is_shared(ptr %ptr, i32 %y) {
+; GCN-LABEL: add_sext_bool_is_shared:
+; GCN:   ; %bb.0:
+; GCN-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:s_mov_b64 s[4:5], 0xe8
+; GCN-NEXT:s_load_dword s4, s[4:5], 0x0
+; GCN-NEXT:s_waitcnt lgkmcnt(0)
+; GCN-NEXT:v_cmp_eq_u32_e32 vcc, s4, v1
+; GCN-NEXT:v_subbrev_u32_e32 v0, vcc, 0, v2, vcc
+; GCN-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: add_sext_bool_is_shared:
+; GFX9:   ; %bb.0:
+; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:s_mov_b64 s[4:5], src_shared_base
+; GFX9-NEXT:v_cmp_eq_u32_e32 vcc, s5, v1
+; GFX9-NEXT:v_subbrev_co_u32_e32 v0, vcc, 0, v2, vcc
+; GFX9-NEXT:s_setpc_b64 s[30:31]
+  %is.shared = call i1 @llvm.amdgcn.is.shared(ptr %ptr)
+  %sext = sext i1 %is.shared to i32
+  %add = add i32 %sext, %y
+  ret i32 %add
+}
+
+define i32 @add_sext_bool_is_private(ptr %ptr, i32 %y) {
+; GCN-LABEL: add_sext_bool_is_private:
+; GCN:   ; %bb.0:
+; GCN-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:s_mov_b64 s[4:5], 0xe4
+; GCN-NEXT:s_load_dword s4, s[4:5], 0x0
+; GCN-NEXT:s_waitcnt lgkmcnt(0)
+; GCN-NEXT:v_cmp_eq_u32_e32 vcc, s4, v1
+; GCN-NEXT:v_subbrev_u32_e32 v0, vcc, 0, v2, vcc
+; GCN-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: add_sext_bool_is_private:
+; GFX9:   ; %bb.0:
+; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:s_mov_b64 s[4:5], src_private_base
+; GFX9-NEXT:v_cmp_eq_u32_e32 vcc, s5, v1
+; GFX9-NEXT:v_subbrev_co_u32_e32 v0, vcc, 0, v2, vcc
+; GFX9-NEXT:s_setpc_b64 s[30:31]
+  %is.private = call i1 @llvm.amdgcn.is.private(ptr %ptr)
+  %sext = sext i1 %is.private to i32
+  %add = add i32 %sext, %y
+  ret i32 %add
+}
+
 declare i1 @llvm.amdgcn.class.f32(float, i32) #0
 
 declare i32 @llvm.amdgcn.workitem.id.x() #0

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Add overflow operations to isBoolSGPR (PR #141803)

2025-05-28 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

The particular use in the test doesn't seem to do anything for
the expanded cases (i.e. the signed add/sub or multiplies).

---
Full diff: https://github.com/llvm/llvm-project/pull/141803.diff


2 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+7) 
- (modified) llvm/test/CodeGen/AMDGPU/combine-and-sext-bool.ll (+89) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index c9fd2948d669f..7ad10454e7931 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -11915,6 +11915,13 @@ bool llvm::isBoolSGPR(SDValue V) {
   case ISD::OR:
   case ISD::XOR:
 return isBoolSGPR(V.getOperand(0)) && isBoolSGPR(V.getOperand(1));
+  case ISD::SADDO:
+  case ISD::UADDO:
+  case ISD::SSUBO:
+  case ISD::USUBO:
+  case ISD::SMULO:
+  case ISD::UMULO:
+return V.getResNo() == 1;
   }
   return false;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/combine-and-sext-bool.ll 
b/llvm/test/CodeGen/AMDGPU/combine-and-sext-bool.ll
index bdad6f40480d3..b98c81db5da99 100644
--- a/llvm/test/CodeGen/AMDGPU/combine-and-sext-bool.ll
+++ b/llvm/test/CodeGen/AMDGPU/combine-and-sext-bool.ll
@@ -45,6 +45,95 @@ define i32 @and_sext_bool_fpclass(float %x, i32 %y) {
   ret i32 %and
 }
 
+; GCN-LABEL: {{^}}and_sext_bool_uadd_w_overflow:
+; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_add_i32_e32 v0, vcc, v0, v1
+; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc
+; GCN-NEXT: s_setpc_b64
+define i32 @and_sext_bool_uadd_w_overflow(i32 %x, i32 %y) {
+  %uadd = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %x, i32 %y)
+  %carry = extractvalue { i32, i1 } %uadd, 1
+  %sext = sext i1 %carry to i32
+  %and = and i32 %sext, %y
+  ret i32 %and
+}
+
+; GCN-LABEL: {{^}}and_sext_bool_usub_w_overflow:
+; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_sub_i32_e32 v0, vcc, v0, v1
+; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc
+; GCN-NEXT: s_setpc_b64
+define i32 @and_sext_bool_usub_w_overflow(i32 %x, i32 %y) {
+  %uadd = call { i32, i1 } @llvm.usub.with.overflow.i32(i32 %x, i32 %y)
+  %carry = extractvalue { i32, i1 } %uadd, 1
+  %sext = sext i1 %carry to i32
+  %and = and i32 %sext, %y
+  ret i32 %and
+}
+
+; GCN-LABEL: {{^}}and_sext_bool_sadd_w_overflow:
+; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_gt_i32_e32 vcc, 0, v1
+; GCN-NEXT: v_add_i32_e64 v2, s[4:5], v0, v1
+; GCN-NEXT: v_cmp_lt_i32_e64 s[4:5], v2, v0
+; GCN-NEXT: s_xor_b64 vcc, vcc, s[4:5]
+; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc
+; GCN-NEXT: s_setpc_b64
+define i32 @and_sext_bool_sadd_w_overflow(i32 %x, i32 %y) {
+  %uadd = call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %x, i32 %y)
+  %carry = extractvalue { i32, i1 } %uadd, 1
+  %sext = sext i1 %carry to i32
+  %and = and i32 %sext, %y
+  ret i32 %and
+}
+
+; GCN-LABEL: {{^}}and_sext_bool_ssub_w_overflow:
+; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_gt_i32_e32 vcc, 0, v1
+; GCN-NEXT: v_add_i32_e64 v2, s[4:5], v0, v1
+; GCN-NEXT: v_cmp_lt_i32_e64 s[4:5], v2, v0
+; GCN-NEXT: s_xor_b64 vcc, vcc, s[4:5]
+; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc
+; GCN-NEXT: s_setpc_b64
+define i32 @and_sext_bool_ssub_w_overflow(i32 %x, i32 %y) {
+  %uadd = call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %x, i32 %y)
+  %carry = extractvalue { i32, i1 } %uadd, 1
+  %sext = sext i1 %carry to i32
+  %and = and i32 %sext, %y
+  ret i32 %and
+}
+
+; GCN-LABEL: {{^}}and_sext_bool_smul_w_overflow:
+; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mul_hi_i32 v2, v0, v1
+; GCN-NEXT: v_mul_lo_u32 v0, v0, v1
+; GCN-NEXT: v_ashrrev_i32_e32 v0, 31, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, v2, v0
+; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc
+; GCN-NEXT: s_setpc_b64
+define i32 @and_sext_bool_smul_w_overflow(i32 %x, i32 %y) {
+  %uadd = call { i32, i1 } @llvm.smul.with.overflow.i32(i32 %x, i32 %y)
+  %carry = extractvalue { i32, i1 } %uadd, 1
+  %sext = sext i1 %carry to i32
+  %and = and i32 %sext, %y
+  ret i32 %and
+}
+
+; GCN-LABEL: {{^}}and_sext_bool_umul_w_overflow:
+; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mul_hi_u32 v0, v0, v1
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v0
+; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc
+; GCN-NEXT: s_setpc_b64
+define i32 @and_sext_bool_umul_w_overflow(i32 %x, i32 %y) {
+  %uadd = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %x, i32 %y)
+  %carry = extractvalue { i32, i1 } %uadd, 1
+  %sext = sext i1 %carry to i32
+  %and = and i32 %sext, %y
+  ret i32 %and
+}
+
+
 declare i32 @llvm.amdgcn.workitem.id.x() #0
 
 declare i32 @llvm.amdgcn.workitem.id.y() #0

``




https://github.com/llvm/llvm-project/pull/141803
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commi

[llvm-branch-commits] [llvm] AMDGPU: Add overflow operations to isBoolSGPR (PR #141803)

2025-05-28 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/141803

The particular use in the test doesn't seem to do anything for
the expanded cases (i.e. the signed add/sub or multiplies).

>From 20482481b443e2d3422be8baa779498bb5c54574 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 28 May 2025 18:06:03 +0200
Subject: [PATCH] AMDGPU: Add overflow operations to isBoolSGPR

The particular use in the test doesn't seem to do anything for
the expanded cases (i.e. the signed add/sub or multiplies).
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |  7 ++
 .../CodeGen/AMDGPU/combine-and-sext-bool.ll   | 89 +++
 2 files changed, 96 insertions(+)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index c9fd2948d669f..7ad10454e7931 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -11915,6 +11915,13 @@ bool llvm::isBoolSGPR(SDValue V) {
   case ISD::OR:
   case ISD::XOR:
 return isBoolSGPR(V.getOperand(0)) && isBoolSGPR(V.getOperand(1));
+  case ISD::SADDO:
+  case ISD::UADDO:
+  case ISD::SSUBO:
+  case ISD::USUBO:
+  case ISD::SMULO:
+  case ISD::UMULO:
+return V.getResNo() == 1;
   }
   return false;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/combine-and-sext-bool.ll 
b/llvm/test/CodeGen/AMDGPU/combine-and-sext-bool.ll
index bdad6f40480d3..b98c81db5da99 100644
--- a/llvm/test/CodeGen/AMDGPU/combine-and-sext-bool.ll
+++ b/llvm/test/CodeGen/AMDGPU/combine-and-sext-bool.ll
@@ -45,6 +45,95 @@ define i32 @and_sext_bool_fpclass(float %x, i32 %y) {
   ret i32 %and
 }
 
+; GCN-LABEL: {{^}}and_sext_bool_uadd_w_overflow:
+; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_add_i32_e32 v0, vcc, v0, v1
+; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc
+; GCN-NEXT: s_setpc_b64
+define i32 @and_sext_bool_uadd_w_overflow(i32 %x, i32 %y) {
+  %uadd = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %x, i32 %y)
+  %carry = extractvalue { i32, i1 } %uadd, 1
+  %sext = sext i1 %carry to i32
+  %and = and i32 %sext, %y
+  ret i32 %and
+}
+
+; GCN-LABEL: {{^}}and_sext_bool_usub_w_overflow:
+; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_sub_i32_e32 v0, vcc, v0, v1
+; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc
+; GCN-NEXT: s_setpc_b64
+define i32 @and_sext_bool_usub_w_overflow(i32 %x, i32 %y) {
+  %uadd = call { i32, i1 } @llvm.usub.with.overflow.i32(i32 %x, i32 %y)
+  %carry = extractvalue { i32, i1 } %uadd, 1
+  %sext = sext i1 %carry to i32
+  %and = and i32 %sext, %y
+  ret i32 %and
+}
+
+; GCN-LABEL: {{^}}and_sext_bool_sadd_w_overflow:
+; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_gt_i32_e32 vcc, 0, v1
+; GCN-NEXT: v_add_i32_e64 v2, s[4:5], v0, v1
+; GCN-NEXT: v_cmp_lt_i32_e64 s[4:5], v2, v0
+; GCN-NEXT: s_xor_b64 vcc, vcc, s[4:5]
+; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc
+; GCN-NEXT: s_setpc_b64
+define i32 @and_sext_bool_sadd_w_overflow(i32 %x, i32 %y) {
+  %uadd = call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %x, i32 %y)
+  %carry = extractvalue { i32, i1 } %uadd, 1
+  %sext = sext i1 %carry to i32
+  %and = and i32 %sext, %y
+  ret i32 %and
+}
+
+; GCN-LABEL: {{^}}and_sext_bool_ssub_w_overflow:
+; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_gt_i32_e32 vcc, 0, v1
+; GCN-NEXT: v_add_i32_e64 v2, s[4:5], v0, v1
+; GCN-NEXT: v_cmp_lt_i32_e64 s[4:5], v2, v0
+; GCN-NEXT: s_xor_b64 vcc, vcc, s[4:5]
+; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc
+; GCN-NEXT: s_setpc_b64
+define i32 @and_sext_bool_ssub_w_overflow(i32 %x, i32 %y) {
+  %uadd = call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %x, i32 %y)
+  %carry = extractvalue { i32, i1 } %uadd, 1
+  %sext = sext i1 %carry to i32
+  %and = and i32 %sext, %y
+  ret i32 %and
+}
+
+; GCN-LABEL: {{^}}and_sext_bool_smul_w_overflow:
+; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mul_hi_i32 v2, v0, v1
+; GCN-NEXT: v_mul_lo_u32 v0, v0, v1
+; GCN-NEXT: v_ashrrev_i32_e32 v0, 31, v0
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, v2, v0
+; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc
+; GCN-NEXT: s_setpc_b64
+define i32 @and_sext_bool_smul_w_overflow(i32 %x, i32 %y) {
+  %uadd = call { i32, i1 } @llvm.smul.with.overflow.i32(i32 %x, i32 %y)
+  %carry = extractvalue { i32, i1 } %uadd, 1
+  %sext = sext i1 %carry to i32
+  %and = and i32 %sext, %y
+  ret i32 %and
+}
+
+; GCN-LABEL: {{^}}and_sext_bool_umul_w_overflow:
+; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_mul_hi_u32 v0, v0, v1
+; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v0
+; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc
+; GCN-NEXT: s_setpc_b64
+define i32 @and_sext_bool_umul_w_overflow(i32 %x, i32 %y) {
+  %uadd = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %x, i32 %y)
+  %carry = extractvalue { i32, i1 } %uadd, 1
+  %sext = sext i1 %carry to i32
+  %and = and i32 %sext, %y
+  ret i32 %and
+}
+
+
 declare i32 @llvm.amdgcn.workitem.id.x() #0
 
 declare i32 @llvm.amdgcn.workitem.id.y() #0

___

[llvm-branch-commits] [llvm] AMDGPU: Add overflow operations to isBoolSGPR (PR #141803)

2025-05-28 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/141803
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Add is.shared/is.private intrinsics to isBoolSGPR (PR #141804)

2025-05-28 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/141804
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Add is.shared/is.private intrinsics to isBoolSGPR (PR #141804)

2025-05-28 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/141804?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#141804** https://app.graphite.dev/github/pr/llvm/llvm-project/141804?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/141804?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#141803** https://app.graphite.dev/github/pr/llvm/llvm-project/141803?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#141801** https://app.graphite.dev/github/pr/llvm/llvm-project/141801?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/141804
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Add is.shared/is.private intrinsics to isBoolSGPR (PR #141804)

2025-05-28 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

No change in the net output since these ultimately expand to setcc,
but saves a step in the DAG.

---
Full diff: https://github.com/llvm/llvm-project/pull/141804.diff


2 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+12) 
- (modified) llvm/test/CodeGen/AMDGPU/combine-cond-add-sub.ll (+48) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 7ad10454e7931..b124f02d32a8a 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -11922,6 +11922,18 @@ bool llvm::isBoolSGPR(SDValue V) {
   case ISD::SMULO:
   case ISD::UMULO:
 return V.getResNo() == 1;
+  case ISD::INTRINSIC_WO_CHAIN: {
+unsigned IntrinsicID = V.getConstantOperandVal(0);
+switch (IntrinsicID) {
+case Intrinsic::amdgcn_is_shared:
+case Intrinsic::amdgcn_is_private:
+  return true;
+default:
+  return false;
+}
+
+return false;
+  }
   }
   return false;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/combine-cond-add-sub.ll 
b/llvm/test/CodeGen/AMDGPU/combine-cond-add-sub.ll
index 1778fa42fbf7e..ba8abdc17fb05 100644
--- a/llvm/test/CodeGen/AMDGPU/combine-cond-add-sub.ll
+++ b/llvm/test/CodeGen/AMDGPU/combine-cond-add-sub.ll
@@ -740,6 +740,54 @@ bb:
   ret void
 }
 
+define i32 @add_sext_bool_is_shared(ptr %ptr, i32 %y) {
+; GCN-LABEL: add_sext_bool_is_shared:
+; GCN:   ; %bb.0:
+; GCN-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:s_mov_b64 s[4:5], 0xe8
+; GCN-NEXT:s_load_dword s4, s[4:5], 0x0
+; GCN-NEXT:s_waitcnt lgkmcnt(0)
+; GCN-NEXT:v_cmp_eq_u32_e32 vcc, s4, v1
+; GCN-NEXT:v_subbrev_u32_e32 v0, vcc, 0, v2, vcc
+; GCN-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: add_sext_bool_is_shared:
+; GFX9:   ; %bb.0:
+; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:s_mov_b64 s[4:5], src_shared_base
+; GFX9-NEXT:v_cmp_eq_u32_e32 vcc, s5, v1
+; GFX9-NEXT:v_subbrev_co_u32_e32 v0, vcc, 0, v2, vcc
+; GFX9-NEXT:s_setpc_b64 s[30:31]
+  %is.shared = call i1 @llvm.amdgcn.is.shared(ptr %ptr)
+  %sext = sext i1 %is.shared to i32
+  %add = add i32 %sext, %y
+  ret i32 %add
+}
+
+define i32 @add_sext_bool_is_private(ptr %ptr, i32 %y) {
+; GCN-LABEL: add_sext_bool_is_private:
+; GCN:   ; %bb.0:
+; GCN-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:s_mov_b64 s[4:5], 0xe4
+; GCN-NEXT:s_load_dword s4, s[4:5], 0x0
+; GCN-NEXT:s_waitcnt lgkmcnt(0)
+; GCN-NEXT:v_cmp_eq_u32_e32 vcc, s4, v1
+; GCN-NEXT:v_subbrev_u32_e32 v0, vcc, 0, v2, vcc
+; GCN-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: add_sext_bool_is_private:
+; GFX9:   ; %bb.0:
+; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:s_mov_b64 s[4:5], src_private_base
+; GFX9-NEXT:v_cmp_eq_u32_e32 vcc, s5, v1
+; GFX9-NEXT:v_subbrev_co_u32_e32 v0, vcc, 0, v2, vcc
+; GFX9-NEXT:s_setpc_b64 s[30:31]
+  %is.private = call i1 @llvm.amdgcn.is.private(ptr %ptr)
+  %sext = sext i1 %is.private to i32
+  %add = add i32 %sext, %y
+  ret i32 %add
+}
+
 declare i1 @llvm.amdgcn.class.f32(float, i32) #0
 
 declare i32 @llvm.amdgcn.workitem.id.x() #0

``




https://github.com/llvm/llvm-project/pull/141804
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Add is.shared/is.private intrinsics to isBoolSGPR (PR #141804)

2025-05-28 Thread Shilei Tian via llvm-branch-commits


https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/141804
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Add is.shared/is.private intrinsics to isBoolSGPR (PR #141804)

2025-05-28 Thread Shilei Tian via llvm-branch-commits



@@ -11922,6 +11922,18 @@ bool llvm::isBoolSGPR(SDValue V) {
   case ISD::SMULO:
   case ISD::UMULO:
 return V.getResNo() == 1;
+  case ISD::INTRINSIC_WO_CHAIN: {
+unsigned IntrinsicID = V.getConstantOperandVal(0);
+switch (IntrinsicID) {
+case Intrinsic::amdgcn_is_shared:
+case Intrinsic::amdgcn_is_private:
+  return true;
+default:
+  return false;
+}
+
+return false;

shiltian wrote:

nit: llvm_unreachable?

https://github.com/llvm/llvm-project/pull/141804
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [utils][TableGen] Treat clause aliases equally with names (PR #141763)

2025-05-28 Thread Valentin Clement バレンタインクレメン via llvm-branch-commits


https://github.com/clementval approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/141763
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: make use of C++17 features and LLVM helpers (PR #141665)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/141665

>From 7a71b56676323327d012a9500f3e107d9b16d83c Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 27 May 2025 21:06:03 +0300
Subject: [PATCH] [BOLT] Gadget scanner: make use of C++17 features and LLVM
 helpers

Perform trivial syntactical cleanups:
* make use of structured binding declarations
* use LLVM utility functions when appropriate
* omit braces around single expression inside single-line LLVM_DEBUG()

This patch is NFC aside from minor debug output changes.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 67 +--
 .../AArch64/gs-pauth-debug-output.s   | 14 ++--
 2 files changed, 38 insertions(+), 43 deletions(-)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 34b5b1d51de4e..dac274c0f4130 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -88,8 +88,8 @@ class TrackedRegisters {
   TrackedRegisters(ArrayRef RegsToTrack)
   : Registers(RegsToTrack),
 RegToIndexMapping(getMappingSize(RegsToTrack), NoIndex) {
-for (unsigned I = 0; I < RegsToTrack.size(); ++I)
-  RegToIndexMapping[RegsToTrack[I]] = I;
+for (auto [MappedIndex, Reg] : llvm::enumerate(RegsToTrack))
+  RegToIndexMapping[Reg] = MappedIndex;
   }
 
   ArrayRef getRegisters() const { return Registers; }
@@ -203,9 +203,9 @@ struct SrcState {
 
 SafeToDerefRegs &= StateIn.SafeToDerefRegs;
 TrustedRegs &= StateIn.TrustedRegs;
-for (unsigned I = 0; I < LastInstWritingReg.size(); ++I)
-  for (const MCInst *J : StateIn.LastInstWritingReg[I])
-LastInstWritingReg[I].insert(J);
+for (auto [ThisSet, OtherSet] :
+ llvm::zip_equal(LastInstWritingReg, StateIn.LastInstWritingReg))
+  ThisSet.insert_range(OtherSet);
 return *this;
   }
 
@@ -224,11 +224,9 @@ struct SrcState {
 static void printInstsShort(raw_ostream &OS,
 ArrayRef Insts) {
   OS << "Insts: ";
-  for (unsigned I = 0; I < Insts.size(); ++I) {
-auto &Set = Insts[I];
+  for (auto [I, PtrSet] : llvm::enumerate(Insts)) {
 OS << "[" << I << "](";
-for (const MCInst *MCInstP : Set)
-  OS << MCInstP << " ";
+interleave(PtrSet, OS, " ");
 OS << ")";
   }
 }
@@ -416,8 +414,9 @@ class SrcSafetyAnalysis {
 // ... an address can be updated in a safe manner, producing the result
 // which is as trusted as the input address.
 if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) {
-  if (Cur.SafeToDerefRegs[DstAndSrc->second])
-Regs.push_back(DstAndSrc->first);
+  auto [DstReg, SrcReg] = *DstAndSrc;
+  if (Cur.SafeToDerefRegs[SrcReg])
+Regs.push_back(DstReg);
 }
 
 // Make sure explicit checker sequence keeps register safe-to-dereference
@@ -469,8 +468,9 @@ class SrcSafetyAnalysis {
 // ... an address can be updated in a safe manner, producing the result
 // which is as trusted as the input address.
 if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) {
-  if (Cur.TrustedRegs[DstAndSrc->second])
-Regs.push_back(DstAndSrc->first);
+  auto [DstReg, SrcReg] = *DstAndSrc;
+  if (Cur.TrustedRegs[SrcReg])
+Regs.push_back(DstReg);
 }
 
 return Regs;
@@ -858,9 +858,9 @@ struct DstState {
   return (*this = StateIn);
 
 CannotEscapeUnchecked &= StateIn.CannotEscapeUnchecked;
-for (unsigned I = 0; I < FirstInstLeakingReg.size(); ++I)
-  for (const MCInst *J : StateIn.FirstInstLeakingReg[I])
-FirstInstLeakingReg[I].insert(J);
+for (auto [ThisSet, OtherSet] :
+ llvm::zip_equal(FirstInstLeakingReg, StateIn.FirstInstLeakingReg))
+  ThisSet.insert_range(OtherSet);
 return *this;
   }
 
@@ -1025,8 +1025,7 @@ class DstSafetyAnalysis {
 
 // ... an address can be updated in a safe manner, or
 if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Inst)) {
-  MCPhysReg DstReg, SrcReg;
-  std::tie(DstReg, SrcReg) = *DstAndSrc;
+  auto [DstReg, SrcReg] = *DstAndSrc;
   // Note that *all* registers containing the derived values must be safe,
   // both source and destination ones. No temporaries are supported at now.
   if (Cur.CannotEscapeUnchecked[SrcReg] &&
@@ -1065,7 +1064,7 @@ class DstSafetyAnalysis {
 // If this instruction terminates the program immediately, no
 // authentication oracles are possible past this point.
 if (BC.MIB->isTrap(Point)) {
-  LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); });
+  LLVM_DEBUG(traceInst(BC, "Trap instruction found", Point));
   DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
   Next.CannotEscapeUnchecked.set();
   return Next;
@@ -1243,7 +1242,7 @@ class CFGUnawareDstSafetyAnalysis : public 
DstSafetyAnalysis,
   // starting to analyze Inst.

[llvm-branch-commits] [llvm] [BOLT] Introduce helpers to match `MCInst`s one at a time (NFC) (PR #138883)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/138883

>From 4e08d36fcde69e0c9eebbac4ab2261e8db797393 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Wed, 7 May 2025 16:42:00 +0300
Subject: [PATCH] [BOLT] Introduce helpers to match `MCInst`s one at a time
 (NFC)

Introduce matchInst helper function to capture and/or match the operands
of MCInst. Unlike the existing `MCPlusBuilder::MCInstMatcher` machinery,
matchInst is intended for the use cases when precise control over the
instruction order is required. For example, when validating PtrAuth
hardening, all registers are usually considered unsafe after a function
call, even though callee-saved registers should preserve their old
values *under normal operation*.
---
 bolt/include/bolt/Core/MCInstUtils.h  | 128 ++
 .../Target/AArch64/AArch64MCPlusBuilder.cpp   |  90 +---
 2 files changed, 162 insertions(+), 56 deletions(-)

diff --git a/bolt/include/bolt/Core/MCInstUtils.h 
b/bolt/include/bolt/Core/MCInstUtils.h
index 69bf5e6159b74..50b7d56470c99 100644
--- a/bolt/include/bolt/Core/MCInstUtils.h
+++ b/bolt/include/bolt/Core/MCInstUtils.h
@@ -162,6 +162,134 @@ static inline raw_ostream &operator<<(raw_ostream &OS,
   return Ref.print(OS);
 }
 
+/// Instruction-matching helpers operating on a single instruction at a time.
+///
+/// Unlike MCPlusBuilder::MCInstMatcher, this matchInst() function focuses on
+/// the cases where a precise control over the instruction order is important:
+///
+/// // Bring the short names into the local scope:
+/// using namespace MCInstMatcher;
+/// // Declare the registers to capture:
+/// Reg Xn, Xm;
+/// // Capture the 0th and 1st operands, match the 2nd operand against the
+/// // just captured Xm register, match the 3rd operand against literal 0:
+/// if (!matchInst(MaybeAdd, AArch64::ADDXrs, Xm, Xn, Xm, Imm(0))
+///   return AArch64::NoRegister;
+/// // Match the 0th operand against Xm:
+/// if (!matchInst(MaybeBr, AArch64::BR, Xm))
+///   return AArch64::NoRegister;
+/// // Return the matched register:
+/// return Xm.get();
+namespace MCInstMatcher {
+
+// The base class to match an operand of type T.
+//
+// The subclasses of OpMatcher are intended to be allocated on the stack and
+// to only be used by passing them to matchInst() and by calling their get()
+// function, thus the peculiar `mutable` specifiers: to make the calling code
+// compact and readable, the templated matchInst() function has to accept both
+// long-lived Imm/Reg wrappers declared as local variables (intended to capture
+// the first operand's value and match the subsequent operands, whether inside
+// a single instruction or across multiple instructions), as well as temporary
+// wrappers around literal values to match, f.e. Imm(42) or Reg(AArch64::XZR).
+template  class OpMatcher {
+  mutable std::optional Value;
+  mutable std::optional SavedValue;
+
+  // Remember/restore the last Value - to be called by matchInst.
+  void remember() const { SavedValue = Value; }
+  void restore() const { Value = SavedValue; }
+
+  template 
+  friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...);
+
+protected:
+  OpMatcher(std::optional ValueToMatch) : Value(ValueToMatch) {}
+
+  bool matchValue(T OpValue) const {
+// Check that OpValue does not contradict the existing Value.
+bool MatchResult = !Value || *Value == OpValue;
+// If MatchResult is false, all matchers will be reset before returning 
from
+// matchInst, including this one, thus no need to assign conditionally.
+Value = OpValue;
+
+return MatchResult;
+  }
+
+public:
+  /// Returns the captured value.
+  T get() const {
+assert(Value.has_value());
+return *Value;
+  }
+};
+
+class Reg : public OpMatcher {
+  bool matches(const MCOperand &Op) const {
+if (!Op.isReg())
+  return false;
+
+return matchValue(Op.getReg());
+  }
+
+  template 
+  friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...);
+
+public:
+  Reg(std::optional RegToMatch = std::nullopt)
+  : OpMatcher(RegToMatch) {}
+};
+
+class Imm : public OpMatcher {
+  bool matches(const MCOperand &Op) const {
+if (!Op.isImm())
+  return false;
+
+return matchValue(Op.getImm());
+  }
+
+  template 
+  friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...);
+
+public:
+  Imm(std::optional ImmToMatch = std::nullopt)
+  : OpMatcher(ImmToMatch) {}
+};
+
+/// Tries to match Inst and updates Ops on success.
+///
+/// If Inst has the specified Opcode and its operand list prefix matches Ops,
+/// this function returns true and updates Ops, otherwise false is returned and
+/// values of Ops are kept as before matchInst was called.
+///
+/// Please note that while Ops are technically passed by a const reference to
+/// make invocations like `matchInst(MI, Opcode, Imm(42))` possible, all their
+/// fields are marked mut

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: optionally assume auth traps on failure (PR #139778)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/139778

>From 9ef4b06a50605ecb15d4d8ffacd39a835e7d43ff Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 13 May 2025 19:50:41 +0300
Subject: [PATCH] [BOLT] Gadget scanner: optionally assume auth traps on
 failure

On AArch64 it is possible for an auth instruction to either return an
invalid address value on failure (without FEAT_FPAC) or generate an
error (with FEAT_FPAC). It thus may be possible to never emit explicit
pointer checks, if the target CPU is known to support FEAT_FPAC.

This commit implements an --auth-traps-on-failure command line option,
which essentially makes "safe-to-dereference" and "trusted" register
properties identical and disables scanning for authentication oracles
completely.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 112 +++
 .../binary-analysis/AArch64/cmdline-args.test |   1 +
 .../AArch64/gs-pauth-authentication-oracles.s |   6 +-
 .../binary-analysis/AArch64/gs-pauth-calls.s  |   5 +-
 .../AArch64/gs-pauth-debug-output.s   | 177 ++---
 .../AArch64/gs-pauth-jump-table.s |   6 +-
 .../AArch64/gs-pauth-signing-oracles.s|  54 ++---
 .../AArch64/gs-pauth-tail-calls.s | 184 +-
 8 files changed, 318 insertions(+), 227 deletions(-)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index e9ed44a47bf6f..34b5b1d51de4e 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -14,6 +14,7 @@
 #include "bolt/Passes/PAuthGadgetScanner.h"
 #include "bolt/Core/ParallelUtilities.h"
 #include "bolt/Passes/DataflowAnalysis.h"
+#include "bolt/Utils/CommandLineOpts.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallSet.h"
 #include "llvm/MC/MCInst.h"
@@ -26,6 +27,11 @@ namespace llvm {
 namespace bolt {
 namespace PAuthGadgetScanner {
 
+static cl::opt AuthTrapsOnFailure(
+"auth-traps-on-failure",
+cl::desc("Assume authentication instructions always trap on failure"),
+cl::cat(opts::BinaryAnalysisCategory));
+
 [[maybe_unused]] static void traceInst(const BinaryContext &BC, StringRef 
Label,
const MCInst &MI) {
   dbgs() << "  " << Label << ": ";
@@ -364,6 +370,34 @@ class SrcSafetyAnalysis {
 return Clobbered;
   }
 
+  std::optional getRegMadeTrustedByChecking(const MCInst &Inst,
+   SrcState Cur) const {
+// This functions cannot return multiple registers. This is never the case
+// on AArch64.
+std::optional RegCheckedByInst =
+BC.MIB->getAuthCheckedReg(Inst, /*MayOverwrite=*/false);
+if (RegCheckedByInst && Cur.SafeToDerefRegs[*RegCheckedByInst])
+  return *RegCheckedByInst;
+
+auto It = CheckerSequenceInfo.find(&Inst);
+if (It == CheckerSequenceInfo.end())
+  return std::nullopt;
+
+MCPhysReg RegCheckedBySequence = It->second.first;
+const MCInst *FirstCheckerInst = It->second.second;
+
+// FirstCheckerInst should belong to the same basic block (see the
+// assertion in DataflowSrcSafetyAnalysis::run()), meaning it was
+// deterministically processed a few steps before this instruction.
+const SrcState &StateBeforeChecker = getStateBefore(*FirstCheckerInst);
+
+// The sequence checks the register, but it should be authenticated before.
+if (!StateBeforeChecker.SafeToDerefRegs[RegCheckedBySequence])
+  return std::nullopt;
+
+return RegCheckedBySequence;
+  }
+
   // Returns all registers that can be treated as if they are written by an
   // authentication instruction.
   SmallVector getRegsMadeSafeToDeref(const MCInst &Point,
@@ -386,18 +420,38 @@ class SrcSafetyAnalysis {
 Regs.push_back(DstAndSrc->first);
 }
 
+// Make sure explicit checker sequence keeps register safe-to-dereference
+// when the register would be clobbered according to the regular rules:
+//
+//; LR is safe to dereference here
+//mov   x16, x30  ; start of the sequence, LR is s-t-d right before
+//xpaclri ; clobbers LR, LR is not safe anymore
+//cmp   x30, x16
+//b.eq  1f; end of the sequence: LR is marked as trusted
+//brk   0x1234
+//  1:
+//; at this point LR would be marked as trusted,
+//; but not safe-to-dereference
+//
+// or even just
+//
+//; X1 is safe to dereference here
+//ldr x0, [x1, #8]!
+//; X1 is trusted here, but it was clobbered due to address write-back
+if (auto CheckedReg = getRegMadeTrustedByChecking(Point, Cur))
+  Regs.push_back(*CheckedReg);
+
 return Regs;
   }
 
   // Returns all registers made trusted by this instruction.
   SmallVector getRegsMadeTrusted(const MCInst &Point,
 const SrcState &Cur) const {
+assert(!AuthTrapsOnFailure &&

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: prevent false positives due to jump tables (PR #138884)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/138884

>From d7167e871fbde24246f71ec1553c3b22d30ad526 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 6 May 2025 11:31:03 +0300
Subject: [PATCH] [BOLT] Gadget scanner: prevent false positives due to jump
 tables

As part of PAuth hardening, AArch64 LLVM backend can use a special
BR_JumpTable pseudo (enabled by -faarch64-jump-table-hardening
Clang option) which is expanded in the AsmPrinter into a contiguous
sequence without unsafe instructions in the middle.

This commit adds another target-specific callback to MCPlusBuilder
to make it possible to inhibit false positives for known-safe jump
table dispatch sequences. Without special handling, the branch
instruction is likely to be reported as a non-protected call (as its
destination is not produced by an auth instruction, PC-relative address
materialization, etc.) and possibly as a tail call being performed with
unsafe link register (as the detection whether the branch instruction
is a tail call is an heuristic).

For now, only the specific instruction sequence used by the AArch64
LLVM backend is matched.
---
 bolt/include/bolt/Core/MCInstUtils.h  |   9 +
 bolt/include/bolt/Core/MCPlusBuilder.h|  14 +
 bolt/lib/Core/MCInstUtils.cpp |  20 +
 bolt/lib/Passes/PAuthGadgetScanner.cpp|  10 +
 .../Target/AArch64/AArch64MCPlusBuilder.cpp   |  73 ++
 .../AArch64/gs-pauth-jump-table.s | 703 ++
 6 files changed, 829 insertions(+)
 create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-jump-table.s

diff --git a/bolt/include/bolt/Core/MCInstUtils.h 
b/bolt/include/bolt/Core/MCInstUtils.h
index 50b7d56470c99..33d36cccbcfff 100644
--- a/bolt/include/bolt/Core/MCInstUtils.h
+++ b/bolt/include/bolt/Core/MCInstUtils.h
@@ -154,6 +154,15 @@ class MCInstReference {
 return nullptr;
   }
 
+  /// Returns the only preceding instruction, or std::nullopt if multiple or no
+  /// predecessors are possible.
+  ///
+  /// If CFG information is available, basic block boundary can be crossed,
+  /// provided there is exactly one predecessor. If CFG is not available, the
+  /// preceding instruction in the offset order is returned, unless this is the
+  /// first instruction of the function.
+  std::optional getSinglePredecessor();
+
   raw_ostream &print(raw_ostream &OS) const;
 };
 
diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h 
b/bolt/include/bolt/Core/MCPlusBuilder.h
index c8cbcaf33f4b5..3abf4d18e94da 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -14,6 +14,7 @@
 #ifndef BOLT_CORE_MCPLUSBUILDER_H
 #define BOLT_CORE_MCPLUSBUILDER_H
 
+#include "bolt/Core/MCInstUtils.h"
 #include "bolt/Core/MCPlus.h"
 #include "bolt/Core/Relocation.h"
 #include "llvm/ADT/ArrayRef.h"
@@ -700,6 +701,19 @@ class MCPlusBuilder {
 return std::nullopt;
   }
 
+  /// Tests if BranchInst corresponds to an instruction sequence which is known
+  /// to be a safe dispatch via jump table.
+  ///
+  /// The target can decide which instruction sequences to consider "safe" from
+  /// the Pointer Authentication point of view, such as any jump table dispatch
+  /// sequence without function calls inside, any sequence which is contiguous,
+  /// or only some specific well-known sequences.
+  virtual bool
+  isSafeJumpTableBranchForPtrAuth(MCInstReference BranchInst) const {
+llvm_unreachable("not implemented");
+return false;
+  }
+
   virtual bool isTerminator(const MCInst &Inst) const;
 
   virtual bool isNoop(const MCInst &Inst) const {
diff --git a/bolt/lib/Core/MCInstUtils.cpp b/bolt/lib/Core/MCInstUtils.cpp
index 40f6edd59135c..b7c6d898988af 100644
--- a/bolt/lib/Core/MCInstUtils.cpp
+++ b/bolt/lib/Core/MCInstUtils.cpp
@@ -55,3 +55,23 @@ raw_ostream &MCInstReference::print(raw_ostream &OS) const {
   OS << ">";
   return OS;
 }
+
+std::optional MCInstReference::getSinglePredecessor() {
+  if (const RefInBB *Ref = tryGetRefInBB()) {
+if (Ref->It != Ref->BB->begin())
+  return MCInstReference(Ref->BB, &*std::prev(Ref->It));
+
+if (Ref->BB->pred_size() != 1)
+  return std::nullopt;
+
+BinaryBasicBlock *PredBB = *Ref->BB->pred_begin();
+assert(!PredBB->empty() && "Empty basic blocks are not supported yet");
+return MCInstReference(PredBB, &*PredBB->rbegin());
+  }
+
+  const RefInBF &Ref = getRefInBF();
+  if (Ref.It == Ref.BF->instrs().begin())
+return std::nullopt;
+
+  return MCInstReference(Ref.BF, std::prev(Ref.It));
+}
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 762c08ffd933e..e9ed44a47bf6f 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -1351,6 +1351,11 @@ shouldReportUnsafeTailCall(const BinaryContext &BC, 
const BinaryFunction &BF,
 return std::nullopt;
   }
 
+  if (BC.MIB->isSafeJumpTableBranchForPtrAuth(Inst)) {
+LL

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: prevent false positives due to jump tables (PR #138884)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/138884

>From d7167e871fbde24246f71ec1553c3b22d30ad526 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 6 May 2025 11:31:03 +0300
Subject: [PATCH] [BOLT] Gadget scanner: prevent false positives due to jump
 tables

As part of PAuth hardening, AArch64 LLVM backend can use a special
BR_JumpTable pseudo (enabled by -faarch64-jump-table-hardening
Clang option) which is expanded in the AsmPrinter into a contiguous
sequence without unsafe instructions in the middle.

This commit adds another target-specific callback to MCPlusBuilder
to make it possible to inhibit false positives for known-safe jump
table dispatch sequences. Without special handling, the branch
instruction is likely to be reported as a non-protected call (as its
destination is not produced by an auth instruction, PC-relative address
materialization, etc.) and possibly as a tail call being performed with
unsafe link register (as the detection whether the branch instruction
is a tail call is an heuristic).

For now, only the specific instruction sequence used by the AArch64
LLVM backend is matched.
---
 bolt/include/bolt/Core/MCInstUtils.h  |   9 +
 bolt/include/bolt/Core/MCPlusBuilder.h|  14 +
 bolt/lib/Core/MCInstUtils.cpp |  20 +
 bolt/lib/Passes/PAuthGadgetScanner.cpp|  10 +
 .../Target/AArch64/AArch64MCPlusBuilder.cpp   |  73 ++
 .../AArch64/gs-pauth-jump-table.s | 703 ++
 6 files changed, 829 insertions(+)
 create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-jump-table.s

diff --git a/bolt/include/bolt/Core/MCInstUtils.h 
b/bolt/include/bolt/Core/MCInstUtils.h
index 50b7d56470c99..33d36cccbcfff 100644
--- a/bolt/include/bolt/Core/MCInstUtils.h
+++ b/bolt/include/bolt/Core/MCInstUtils.h
@@ -154,6 +154,15 @@ class MCInstReference {
 return nullptr;
   }
 
+  /// Returns the only preceding instruction, or std::nullopt if multiple or no
+  /// predecessors are possible.
+  ///
+  /// If CFG information is available, basic block boundary can be crossed,
+  /// provided there is exactly one predecessor. If CFG is not available, the
+  /// preceding instruction in the offset order is returned, unless this is the
+  /// first instruction of the function.
+  std::optional getSinglePredecessor();
+
   raw_ostream &print(raw_ostream &OS) const;
 };
 
diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h 
b/bolt/include/bolt/Core/MCPlusBuilder.h
index c8cbcaf33f4b5..3abf4d18e94da 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -14,6 +14,7 @@
 #ifndef BOLT_CORE_MCPLUSBUILDER_H
 #define BOLT_CORE_MCPLUSBUILDER_H
 
+#include "bolt/Core/MCInstUtils.h"
 #include "bolt/Core/MCPlus.h"
 #include "bolt/Core/Relocation.h"
 #include "llvm/ADT/ArrayRef.h"
@@ -700,6 +701,19 @@ class MCPlusBuilder {
 return std::nullopt;
   }
 
+  /// Tests if BranchInst corresponds to an instruction sequence which is known
+  /// to be a safe dispatch via jump table.
+  ///
+  /// The target can decide which instruction sequences to consider "safe" from
+  /// the Pointer Authentication point of view, such as any jump table dispatch
+  /// sequence without function calls inside, any sequence which is contiguous,
+  /// or only some specific well-known sequences.
+  virtual bool
+  isSafeJumpTableBranchForPtrAuth(MCInstReference BranchInst) const {
+llvm_unreachable("not implemented");
+return false;
+  }
+
   virtual bool isTerminator(const MCInst &Inst) const;
 
   virtual bool isNoop(const MCInst &Inst) const {
diff --git a/bolt/lib/Core/MCInstUtils.cpp b/bolt/lib/Core/MCInstUtils.cpp
index 40f6edd59135c..b7c6d898988af 100644
--- a/bolt/lib/Core/MCInstUtils.cpp
+++ b/bolt/lib/Core/MCInstUtils.cpp
@@ -55,3 +55,23 @@ raw_ostream &MCInstReference::print(raw_ostream &OS) const {
   OS << ">";
   return OS;
 }
+
+std::optional MCInstReference::getSinglePredecessor() {
+  if (const RefInBB *Ref = tryGetRefInBB()) {
+if (Ref->It != Ref->BB->begin())
+  return MCInstReference(Ref->BB, &*std::prev(Ref->It));
+
+if (Ref->BB->pred_size() != 1)
+  return std::nullopt;
+
+BinaryBasicBlock *PredBB = *Ref->BB->pred_begin();
+assert(!PredBB->empty() && "Empty basic blocks are not supported yet");
+return MCInstReference(PredBB, &*PredBB->rbegin());
+  }
+
+  const RefInBF &Ref = getRefInBF();
+  if (Ref.It == Ref.BF->instrs().begin())
+return std::nullopt;
+
+  return MCInstReference(Ref.BF, std::prev(Ref.It));
+}
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 762c08ffd933e..e9ed44a47bf6f 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -1351,6 +1351,11 @@ shouldReportUnsafeTailCall(const BinaryContext &BC, 
const BinaryFunction &BF,
 return std::nullopt;
   }
 
+  if (BC.MIB->isSafeJumpTableBranchForPtrAuth(Inst)) {
+LL

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: make use of C++17 features and LLVM helpers (PR #141665)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/141665

>From 7a71b56676323327d012a9500f3e107d9b16d83c Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 27 May 2025 21:06:03 +0300
Subject: [PATCH] [BOLT] Gadget scanner: make use of C++17 features and LLVM
 helpers

Perform trivial syntactical cleanups:
* make use of structured binding declarations
* use LLVM utility functions when appropriate
* omit braces around single expression inside single-line LLVM_DEBUG()

This patch is NFC aside from minor debug output changes.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 67 +--
 .../AArch64/gs-pauth-debug-output.s   | 14 ++--
 2 files changed, 38 insertions(+), 43 deletions(-)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 34b5b1d51de4e..dac274c0f4130 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -88,8 +88,8 @@ class TrackedRegisters {
   TrackedRegisters(ArrayRef RegsToTrack)
   : Registers(RegsToTrack),
 RegToIndexMapping(getMappingSize(RegsToTrack), NoIndex) {
-for (unsigned I = 0; I < RegsToTrack.size(); ++I)
-  RegToIndexMapping[RegsToTrack[I]] = I;
+for (auto [MappedIndex, Reg] : llvm::enumerate(RegsToTrack))
+  RegToIndexMapping[Reg] = MappedIndex;
   }
 
   ArrayRef getRegisters() const { return Registers; }
@@ -203,9 +203,9 @@ struct SrcState {
 
 SafeToDerefRegs &= StateIn.SafeToDerefRegs;
 TrustedRegs &= StateIn.TrustedRegs;
-for (unsigned I = 0; I < LastInstWritingReg.size(); ++I)
-  for (const MCInst *J : StateIn.LastInstWritingReg[I])
-LastInstWritingReg[I].insert(J);
+for (auto [ThisSet, OtherSet] :
+ llvm::zip_equal(LastInstWritingReg, StateIn.LastInstWritingReg))
+  ThisSet.insert_range(OtherSet);
 return *this;
   }
 
@@ -224,11 +224,9 @@ struct SrcState {
 static void printInstsShort(raw_ostream &OS,
 ArrayRef Insts) {
   OS << "Insts: ";
-  for (unsigned I = 0; I < Insts.size(); ++I) {
-auto &Set = Insts[I];
+  for (auto [I, PtrSet] : llvm::enumerate(Insts)) {
 OS << "[" << I << "](";
-for (const MCInst *MCInstP : Set)
-  OS << MCInstP << " ";
+interleave(PtrSet, OS, " ");
 OS << ")";
   }
 }
@@ -416,8 +414,9 @@ class SrcSafetyAnalysis {
 // ... an address can be updated in a safe manner, producing the result
 // which is as trusted as the input address.
 if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) {
-  if (Cur.SafeToDerefRegs[DstAndSrc->second])
-Regs.push_back(DstAndSrc->first);
+  auto [DstReg, SrcReg] = *DstAndSrc;
+  if (Cur.SafeToDerefRegs[SrcReg])
+Regs.push_back(DstReg);
 }
 
 // Make sure explicit checker sequence keeps register safe-to-dereference
@@ -469,8 +468,9 @@ class SrcSafetyAnalysis {
 // ... an address can be updated in a safe manner, producing the result
 // which is as trusted as the input address.
 if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) {
-  if (Cur.TrustedRegs[DstAndSrc->second])
-Regs.push_back(DstAndSrc->first);
+  auto [DstReg, SrcReg] = *DstAndSrc;
+  if (Cur.TrustedRegs[SrcReg])
+Regs.push_back(DstReg);
 }
 
 return Regs;
@@ -858,9 +858,9 @@ struct DstState {
   return (*this = StateIn);
 
 CannotEscapeUnchecked &= StateIn.CannotEscapeUnchecked;
-for (unsigned I = 0; I < FirstInstLeakingReg.size(); ++I)
-  for (const MCInst *J : StateIn.FirstInstLeakingReg[I])
-FirstInstLeakingReg[I].insert(J);
+for (auto [ThisSet, OtherSet] :
+ llvm::zip_equal(FirstInstLeakingReg, StateIn.FirstInstLeakingReg))
+  ThisSet.insert_range(OtherSet);
 return *this;
   }
 
@@ -1025,8 +1025,7 @@ class DstSafetyAnalysis {
 
 // ... an address can be updated in a safe manner, or
 if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Inst)) {
-  MCPhysReg DstReg, SrcReg;
-  std::tie(DstReg, SrcReg) = *DstAndSrc;
+  auto [DstReg, SrcReg] = *DstAndSrc;
   // Note that *all* registers containing the derived values must be safe,
   // both source and destination ones. No temporaries are supported at now.
   if (Cur.CannotEscapeUnchecked[SrcReg] &&
@@ -1065,7 +1064,7 @@ class DstSafetyAnalysis {
 // If this instruction terminates the program immediately, no
 // authentication oracles are possible past this point.
 if (BC.MIB->isTrap(Point)) {
-  LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); });
+  LLVM_DEBUG(traceInst(BC, "Trap instruction found", Point));
   DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
   Next.CannotEscapeUnchecked.set();
   return Next;
@@ -1243,7 +1242,7 @@ class CFGUnawareDstSafetyAnalysis : public 
DstSafetyAnalysis,
   // starting to analyze Inst.

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: account for BRK when searching for auth oracles (PR #137975)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/137975

>From ff3dc1d1dce6b7ec9ef9fb5a103455b0e946aca0 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Wed, 30 Apr 2025 16:08:10 +0300
Subject: [PATCH] [BOLT] Gadget scanner: account for BRK when searching for
 auth oracles

An authenticated pointer can be explicitly checked by the compiler via a
sequence of instructions that executes BRK on failure. It is important
to recognize such BRK instruction as checking every register (as it is
expected to immediately trigger an abnormal program termination) to
prevent false positive reports about authentication oracles:

autia   x2, x3
autia   x0, x1
; neither x0 nor x2 are checked at this point
eor x16, x0, x0, lsl #1
tbz x16, #62, on_success ; marks x0 as checked
; end of BB: for x2 to be checked here, it must be checked in both
; successor basic blocks
  on_failure:
brk 0xc470
  on_success:
; x2 is checked
ldr x1, [x2] ; marks x2 as checked
---
 bolt/include/bolt/Core/MCPlusBuilder.h| 14 ++
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 13 +-
 .../Target/AArch64/AArch64MCPlusBuilder.cpp   | 24 --
 .../AArch64/gs-pauth-address-checks.s | 44 +--
 .../AArch64/gs-pauth-authentication-oracles.s |  9 ++--
 .../AArch64/gs-pauth-signing-oracles.s|  6 +--
 6 files changed, 75 insertions(+), 35 deletions(-)

diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h 
b/bolt/include/bolt/Core/MCPlusBuilder.h
index b233452985502..c8cbcaf33f4b5 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -707,6 +707,20 @@ class MCPlusBuilder {
 return false;
   }
 
+  /// Returns true if Inst is a trap instruction.
+  ///
+  /// Tests if Inst is an instruction that immediately causes an abnormal
+  /// program termination, for example when a security violation is detected
+  /// by a compiler-inserted check.
+  ///
+  /// @note An implementation of this method should likely return false for
+  /// calls to library functions like abort(), as it is possible that the
+  /// execution state is partially attacker-controlled at this point.
+  virtual bool isTrap(const MCInst &Inst) const {
+llvm_unreachable("not implemented");
+return false;
+  }
+
   virtual bool isBreakpoint(const MCInst &Inst) const {
 llvm_unreachable("not implemented");
 return false;
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 4c7ae3c880db4..11db51f6c6dd1 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -1066,6 +1066,15 @@ class DstSafetyAnalysis {
   dbgs() << ")\n";
 });
 
+// If this instruction terminates the program immediately, no
+// authentication oracles are possible past this point.
+if (BC.MIB->isTrap(Point)) {
+  LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); });
+  DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
+  Next.CannotEscapeUnchecked.set();
+  return Next;
+}
+
 // If this instruction is reachable by the analysis, a non-empty state will
 // be propagated to it sooner or later. Until then, skip computeNext().
 if (Cur.empty()) {
@@ -1173,8 +1182,8 @@ class DataflowDstSafetyAnalysis
 //
 // A basic block without any successors, on the other hand, can be
 // pessimistically initialized to everything-is-unsafe: this will naturally
-// handle both return and tail call instructions and is harmless for
-// internal indirect branch instructions (such as computed gotos).
+// handle return, trap and tail call instructions. At the same time, it is
+// harmless for internal indirect branch instructions, like computed gotos.
 if (BB.succ_empty())
   return createUnsafeState();
 
diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp 
b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
index 9d5a578cfbdff..b669d32cc2032 100644
--- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
@@ -386,10 +386,9 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
 // the list of successors of this basic block as appropriate.
 
 // Any of the above code sequences assume the fall-through basic block
-// is a dead-end BRK instruction (any immediate operand is accepted).
+// is a dead-end trap instruction.
 const BinaryBasicBlock *BreakBB = BB.getFallthrough();
-if (!BreakBB || BreakBB->empty() ||
-BreakBB->front().getOpcode() != AArch64::BRK)
+if (!BreakBB || BreakBB->empty() || !isTrap(BreakBB->front()))
   return std::nullopt;
 
 // Iterate over the instructions of BB in reverse order, matching opcodes
@@ -1751,6 +1750,25 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
 Inst.addOperand(MCOperand::createImm(0));
   }

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: optionally assume auth traps on failure (PR #139778)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/139778

>From 9ef4b06a50605ecb15d4d8ffacd39a835e7d43ff Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 13 May 2025 19:50:41 +0300
Subject: [PATCH] [BOLT] Gadget scanner: optionally assume auth traps on
 failure

On AArch64 it is possible for an auth instruction to either return an
invalid address value on failure (without FEAT_FPAC) or generate an
error (with FEAT_FPAC). It thus may be possible to never emit explicit
pointer checks, if the target CPU is known to support FEAT_FPAC.

This commit implements an --auth-traps-on-failure command line option,
which essentially makes "safe-to-dereference" and "trusted" register
properties identical and disables scanning for authentication oracles
completely.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 112 +++
 .../binary-analysis/AArch64/cmdline-args.test |   1 +
 .../AArch64/gs-pauth-authentication-oracles.s |   6 +-
 .../binary-analysis/AArch64/gs-pauth-calls.s  |   5 +-
 .../AArch64/gs-pauth-debug-output.s   | 177 ++---
 .../AArch64/gs-pauth-jump-table.s |   6 +-
 .../AArch64/gs-pauth-signing-oracles.s|  54 ++---
 .../AArch64/gs-pauth-tail-calls.s | 184 +-
 8 files changed, 318 insertions(+), 227 deletions(-)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index e9ed44a47bf6f..34b5b1d51de4e 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -14,6 +14,7 @@
 #include "bolt/Passes/PAuthGadgetScanner.h"
 #include "bolt/Core/ParallelUtilities.h"
 #include "bolt/Passes/DataflowAnalysis.h"
+#include "bolt/Utils/CommandLineOpts.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallSet.h"
 #include "llvm/MC/MCInst.h"
@@ -26,6 +27,11 @@ namespace llvm {
 namespace bolt {
 namespace PAuthGadgetScanner {
 
+static cl::opt AuthTrapsOnFailure(
+"auth-traps-on-failure",
+cl::desc("Assume authentication instructions always trap on failure"),
+cl::cat(opts::BinaryAnalysisCategory));
+
 [[maybe_unused]] static void traceInst(const BinaryContext &BC, StringRef 
Label,
const MCInst &MI) {
   dbgs() << "  " << Label << ": ";
@@ -364,6 +370,34 @@ class SrcSafetyAnalysis {
 return Clobbered;
   }
 
+  std::optional getRegMadeTrustedByChecking(const MCInst &Inst,
+   SrcState Cur) const {
+// This functions cannot return multiple registers. This is never the case
+// on AArch64.
+std::optional RegCheckedByInst =
+BC.MIB->getAuthCheckedReg(Inst, /*MayOverwrite=*/false);
+if (RegCheckedByInst && Cur.SafeToDerefRegs[*RegCheckedByInst])
+  return *RegCheckedByInst;
+
+auto It = CheckerSequenceInfo.find(&Inst);
+if (It == CheckerSequenceInfo.end())
+  return std::nullopt;
+
+MCPhysReg RegCheckedBySequence = It->second.first;
+const MCInst *FirstCheckerInst = It->second.second;
+
+// FirstCheckerInst should belong to the same basic block (see the
+// assertion in DataflowSrcSafetyAnalysis::run()), meaning it was
+// deterministically processed a few steps before this instruction.
+const SrcState &StateBeforeChecker = getStateBefore(*FirstCheckerInst);
+
+// The sequence checks the register, but it should be authenticated before.
+if (!StateBeforeChecker.SafeToDerefRegs[RegCheckedBySequence])
+  return std::nullopt;
+
+return RegCheckedBySequence;
+  }
+
   // Returns all registers that can be treated as if they are written by an
   // authentication instruction.
   SmallVector getRegsMadeSafeToDeref(const MCInst &Point,
@@ -386,18 +420,38 @@ class SrcSafetyAnalysis {
 Regs.push_back(DstAndSrc->first);
 }
 
+// Make sure explicit checker sequence keeps register safe-to-dereference
+// when the register would be clobbered according to the regular rules:
+//
+//; LR is safe to dereference here
+//mov   x16, x30  ; start of the sequence, LR is s-t-d right before
+//xpaclri ; clobbers LR, LR is not safe anymore
+//cmp   x30, x16
+//b.eq  1f; end of the sequence: LR is marked as trusted
+//brk   0x1234
+//  1:
+//; at this point LR would be marked as trusted,
+//; but not safe-to-dereference
+//
+// or even just
+//
+//; X1 is safe to dereference here
+//ldr x0, [x1, #8]!
+//; X1 is trusted here, but it was clobbered due to address write-back
+if (auto CheckedReg = getRegMadeTrustedByChecking(Point, Cur))
+  Regs.push_back(*CheckedReg);
+
 return Regs;
   }
 
   // Returns all registers made trusted by this instruction.
   SmallVector getRegsMadeTrusted(const MCInst &Point,
 const SrcState &Cur) const {
+assert(!AuthTrapsOnFailure &&

[llvm-branch-commits] [llvm] [BOLT] Factor out MCInstReference from gadget scanner (NFC) (PR #138655)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/138655

>From c41022206fbb32d177b2712f2a80d481e05735c8 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Mon, 28 Apr 2025 18:35:48 +0300
Subject: [PATCH] [BOLT] Factor out MCInstReference from gadget scanner (NFC)

Move MCInstReference representing a constant reference to an instruction
inside a parent entity - either inside a basic block (which has a
reference to its parent function) or directly to the function (when CFG
information is not available).
---
 bolt/include/bolt/Core/MCInstUtils.h  | 168 +
 bolt/include/bolt/Passes/PAuthGadgetScanner.h | 178 +-
 bolt/lib/Core/CMakeLists.txt  |   1 +
 bolt/lib/Core/MCInstUtils.cpp |  57 ++
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 102 +-
 5 files changed, 269 insertions(+), 237 deletions(-)
 create mode 100644 bolt/include/bolt/Core/MCInstUtils.h
 create mode 100644 bolt/lib/Core/MCInstUtils.cpp

diff --git a/bolt/include/bolt/Core/MCInstUtils.h 
b/bolt/include/bolt/Core/MCInstUtils.h
new file mode 100644
index 0..69bf5e6159b74
--- /dev/null
+++ b/bolt/include/bolt/Core/MCInstUtils.h
@@ -0,0 +1,168 @@
+//===- bolt/Core/MCInstUtils.h --*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef BOLT_CORE_MCINSTUTILS_H
+#define BOLT_CORE_MCINSTUTILS_H
+
+#include "bolt/Core/BinaryBasicBlock.h"
+
+#include 
+#include 
+#include 
+
+namespace llvm {
+namespace bolt {
+
+class BinaryFunction;
+
+/// MCInstReference represents a reference to a constant MCInst as stored 
either
+/// in a BinaryFunction (i.e. before a CFG is created), or in a 
BinaryBasicBlock
+/// (after a CFG is created).
+class MCInstReference {
+  using nocfg_const_iterator = std::map::const_iterator;
+
+  // Two cases are possible:
+  // * functions with CFG reconstructed - a function stores a collection of
+  //   basic blocks, each basic block stores a contiguous vector of MCInst
+  // * functions without CFG - there are no basic blocks created,
+  //   the instructions are directly stored in std::map in BinaryFunction
+  //
+  // In both cases, the direct parent of MCInst is stored together with an
+  // iterator pointing to the instruction.
+
+  // Helper struct: CFG is available, the direct parent is a basic block,
+  // iterator's type is `MCInst *`.
+  struct RefInBB {
+RefInBB(const BinaryBasicBlock *BB, const MCInst *Inst)
+: BB(BB), It(Inst) {}
+RefInBB(const RefInBB &Other) = default;
+RefInBB &operator=(const RefInBB &Other) = default;
+
+const BinaryBasicBlock *BB;
+BinaryBasicBlock::const_iterator It;
+
+bool operator<(const RefInBB &Other) const {
+  return std::tie(BB, It) < std::tie(Other.BB, Other.It);
+}
+
+bool operator==(const RefInBB &Other) const {
+  return BB == Other.BB && It == Other.It;
+}
+  };
+
+  // Helper struct: CFG is *not* available, the direct parent is a function,
+  // iterator's type is std::map::iterator (the mapped value
+  // is an instruction's offset).
+  struct RefInBF {
+RefInBF(const BinaryFunction *BF, nocfg_const_iterator It)
+: BF(BF), It(It) {}
+RefInBF(const RefInBF &Other) = default;
+RefInBF &operator=(const RefInBF &Other) = default;
+
+const BinaryFunction *BF;
+nocfg_const_iterator It;
+
+bool operator<(const RefInBF &Other) const {
+  return std::tie(BF, It->first) < std::tie(Other.BF, Other.It->first);
+}
+
+bool operator==(const RefInBF &Other) const {
+  return BF == Other.BF && It->first == Other.It->first;
+}
+  };
+
+  std::variant Reference;
+
+  // Utility methods to be used like this:
+  //
+  // if (auto *Ref = tryGetRefInBB())
+  //   return Ref->doSomething(...);
+  // return getRefInBF().doSomethingElse(...);
+  const RefInBB *tryGetRefInBB() const {
+assert(std::get_if(&Reference) ||
+   std::get_if(&Reference));
+return std::get_if(&Reference);
+  }
+  const RefInBF &getRefInBF() const {
+assert(std::get_if(&Reference));
+return *std::get_if(&Reference);
+  }
+
+public:
+  /// Constructs an empty reference.
+  MCInstReference() : Reference(RefInBB(nullptr, nullptr)) {}
+  /// Constructs a reference to the instruction inside the basic block.
+  MCInstReference(const BinaryBasicBlock *BB, const MCInst *Inst)
+  : Reference(RefInBB(BB, Inst)) {
+assert(BB && Inst && "Neither BB nor Inst should be nullptr");
+  }
+  /// Constructs a reference to the instruction inside the basic block.
+  MCInstReference(const BinaryBasicBlock *BB, unsigned Index)
+  : Reference(RefInBB(BB, &BB->getInstructionAtIndex(I

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: improve handling of unreachable basic blocks (PR #136183)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/136183

>From c63cd7528660a41bf95821648defc6cdb0e09d0a Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Thu, 17 Apr 2025 20:51:16 +0300
Subject: [PATCH 1/3] [BOLT] Gadget scanner: improve handling of unreachable
 basic blocks

Instead of refusing to analyze an instruction completely, when it is
unreachable according to the CFG reconstructed by BOLT, pessimistically
assume all registers to be unsafe at the start of basic blocks without
any predecessors. Nevertheless, unreachable basic blocks found in
optimized code likely means imprecise CFG reconstruction, thus report a
warning once per basic block without predecessors.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 46 ++-
 .../AArch64/gs-pacret-autiasp.s   |  7 ++-
 .../binary-analysis/AArch64/gs-pauth-calls.s  | 57 +++
 3 files changed, 95 insertions(+), 15 deletions(-)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 25be23d64463e..c20c0921d4a17 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -342,6 +342,12 @@ class SrcSafetyAnalysis {
 return S;
   }
 
+  /// Creates a state with all registers marked unsafe (not to be confused
+  /// with empty state).
+  SrcState createUnsafeState() const {
+return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
+  }
+
   BitVector getClobberedRegs(const MCInst &Point) const {
 BitVector Clobbered(NumRegs);
 // Assume a call can clobber all registers, including callee-saved
@@ -585,6 +591,13 @@ class DataflowSrcSafetyAnalysis
 if (BB.isEntryPoint())
   return createEntryState();
 
+// If a basic block without any predecessors is found in an optimized code,
+// this likely means that some CFG edges were not detected. Pessimistically
+// assume all registers to be unsafe before this basic block and warn about
+// this fact in FunctionAnalysis::findUnsafeUses().
+if (BB.pred_empty())
+  return createUnsafeState();
+
 return SrcState();
   }
 
@@ -689,12 +702,6 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis,
   using SrcSafetyAnalysis::BC;
   BinaryFunction &BF;
 
-  /// Creates a state with all registers marked unsafe (not to be confused
-  /// with empty state).
-  SrcState createUnsafeState() const {
-return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
-  }
-
 public:
   CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF,
   MCPlusBuilder::AllocatorIdTy AllocId,
@@ -1364,19 +1371,30 @@ void FunctionAnalysisContext::findUnsafeUses(
 BF.dump();
   });
 
+  if (BF.hasCFG()) {
+// Warn on basic blocks being unreachable according to BOLT, as this
+// likely means CFG is imprecise.
+for (BinaryBasicBlock &BB : BF) {
+  if (!BB.pred_empty() || BB.isEntryPoint())
+continue;
+  // Arbitrarily attach the report to the first instruction of BB.
+  MCInst *InstToReport = BB.getFirstNonPseudoInstr();
+  if (!InstToReport)
+continue; // BB has no real instructions
+
+  Reports.push_back(
+  make_generic_report(MCInstReference::get(InstToReport, BF),
+  "Warning: no predecessor basic blocks detected "
+  "(possibly incomplete CFG)"));
+}
+  }
+
   iterateOverInstrs(BF, [&](MCInstReference Inst) {
 if (BC.MIB->isCFI(Inst))
   return;
 
 const SrcState &S = Analysis->getStateBefore(Inst);
-
-// If non-empty state was never propagated from the entry basic block
-// to Inst, assume it to be unreachable and report a warning.
-if (S.empty()) {
-  Reports.push_back(
-  make_generic_report(Inst, "Warning: unreachable instruction found"));
-  return;
-}
+assert(!S.empty() && "Instruction has no associated state");
 
 if (auto Report = shouldReportReturnGadget(BC, Inst, S))
   Reports.push_back(*Report);
diff --git a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s 
b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
index 284f0bea607a5..6559ba336e8de 100644
--- a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
+++ b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
@@ -215,12 +215,17 @@ f_callclobbered_calleesaved:
 .globl  f_unreachable_instruction
 .type   f_unreachable_instruction,@function
 f_unreachable_instruction:
-// CHECK-LABEL: GS-PAUTH: Warning: unreachable instruction found in function 
f_unreachable_instruction, basic block {{[0-9a-zA-Z.]+}}, at address
+// CHECK-LABEL: GS-PAUTH: Warning: no predecessor basic blocks detected 
(possibly incomplete CFG) in function f_unreachable_instruction, basic block 
{{[0-9a-zA-Z.]+}}, at address
 // CHECK-NEXT:The instruction is {{[0-9a-f]+}}:   add x0, x1, 
x2
 // CHECK-NOT:   instructions that write t

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: do not crash on debug-printing CFI instructions (PR #136151)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/136151

>From 9d8fedf678fe91ca1d7ac3334747227df335ff2c Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 15 Apr 2025 21:47:18 +0300
Subject: [PATCH] [BOLT] Gadget scanner: do not crash on debug-printing CFI
 instructions

Some instruction-printing code used under LLVM_DEBUG does not handle CFI
instructions well. While CFI instructions seem to be harmless for the
correctness of the analysis results, they do not convey any useful
information to the analysis either, so skip them early.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 16 ++
 .../AArch64/gs-pauth-debug-output.s   | 32 +++
 2 files changed, 48 insertions(+)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 345af32650624..25be23d64463e 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -430,6 +430,9 @@ class SrcSafetyAnalysis {
   }
 
   SrcState computeNext(const MCInst &Point, const SrcState &Cur) {
+if (BC.MIB->isCFI(Point))
+  return Cur;
+
 SrcStatePrinter P(BC);
 LLVM_DEBUG({
   dbgs() << "  SrcSafetyAnalysis::ComputeNext(";
@@ -704,6 +707,8 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis,
 SrcState S = createEntryState();
 for (auto &I : BF.instrs()) {
   MCInst &Inst = I.second;
+  if (BC.MIB->isCFI(Inst))
+continue;
 
   // If there is a label before this instruction, it is possible that it
   // can be jumped-to, thus conservatively resetting S. As an exception,
@@ -998,6 +1003,9 @@ class DstSafetyAnalysis {
   }
 
   DstState computeNext(const MCInst &Point, const DstState &Cur) {
+if (BC.MIB->isCFI(Point))
+  return Cur;
+
 DstStatePrinter P(BC);
 LLVM_DEBUG({
   dbgs() << "  DstSafetyAnalysis::ComputeNext(";
@@ -1165,6 +1173,8 @@ class CFGUnawareDstSafetyAnalysis : public 
DstSafetyAnalysis,
 DstState S = createUnsafeState();
 for (auto &I : llvm::reverse(BF.instrs())) {
   MCInst &Inst = I.second;
+  if (BC.MIB->isCFI(Inst))
+continue;
 
   // If Inst can change the control flow, we cannot be sure that the next
   // instruction (to be executed in analyzed program) is the one processed
@@ -1355,6 +1365,9 @@ void FunctionAnalysisContext::findUnsafeUses(
   });
 
   iterateOverInstrs(BF, [&](MCInstReference Inst) {
+if (BC.MIB->isCFI(Inst))
+  return;
+
 const SrcState &S = Analysis->getStateBefore(Inst);
 
 // If non-empty state was never propagated from the entry basic block
@@ -1418,6 +1431,9 @@ void FunctionAnalysisContext::findUnsafeDefs(
   });
 
   iterateOverInstrs(BF, [&](MCInstReference Inst) {
+if (BC.MIB->isCFI(Inst))
+  return;
+
 const DstState &S = Analysis->getStateAfter(Inst);
 
 if (auto Report = shouldReportAuthOracle(BC, Inst, S))
diff --git a/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s 
b/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s
index 61aa84377b88e..5aec945621987 100644
--- a/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s
+++ b/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s
@@ -329,6 +329,38 @@ auth_oracle:
 // PAUTH-EMPTY:
 // PAUTH-NEXT:   Attaching leakage info to: :  autia   x0, x1 
# DataflowDstSafetyAnalysis: dst-state
 
+// Gadget scanner should not crash on CFI instructions, including when 
debug-printing them.
+// Note that the particular debug output is not checked, but BOLT should be
+// compiled with assertions enabled to support -debug-only argument.
+
+.globl  cfi_inst_df
+.type   cfi_inst_df,@function
+cfi_inst_df:
+.cfi_startproc
+sub sp, sp, #16
+.cfi_def_cfa_offset 16
+add sp, sp, #16
+.cfi_def_cfa_offset 0
+ret
+.size   cfi_inst_df, .-cfi_inst_df
+.cfi_endproc
+
+.globl  cfi_inst_nocfg
+.type   cfi_inst_nocfg,@function
+cfi_inst_nocfg:
+.cfi_startproc
+sub sp, sp, #16
+.cfi_def_cfa_offset 16
+
+adr x0, 1f
+br  x0
+1:
+add sp, sp, #16
+.cfi_def_cfa_offset 0
+ret
+.size   cfi_inst_nocfg, .-cfi_inst_nocfg
+.cfi_endproc
+
 // CHECK-LABEL:Analyzing function main, AllocatorId = 1
 .globl  main
 .type   main,@function

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: improve handling of unreachable basic blocks (PR #136183)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/136183

>From c63cd7528660a41bf95821648defc6cdb0e09d0a Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Thu, 17 Apr 2025 20:51:16 +0300
Subject: [PATCH 1/3] [BOLT] Gadget scanner: improve handling of unreachable
 basic blocks

Instead of refusing to analyze an instruction completely, when it is
unreachable according to the CFG reconstructed by BOLT, pessimistically
assume all registers to be unsafe at the start of basic blocks without
any predecessors. Nevertheless, unreachable basic blocks found in
optimized code likely means imprecise CFG reconstruction, thus report a
warning once per basic block without predecessors.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 46 ++-
 .../AArch64/gs-pacret-autiasp.s   |  7 ++-
 .../binary-analysis/AArch64/gs-pauth-calls.s  | 57 +++
 3 files changed, 95 insertions(+), 15 deletions(-)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 25be23d64463e..c20c0921d4a17 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -342,6 +342,12 @@ class SrcSafetyAnalysis {
 return S;
   }
 
+  /// Creates a state with all registers marked unsafe (not to be confused
+  /// with empty state).
+  SrcState createUnsafeState() const {
+return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
+  }
+
   BitVector getClobberedRegs(const MCInst &Point) const {
 BitVector Clobbered(NumRegs);
 // Assume a call can clobber all registers, including callee-saved
@@ -585,6 +591,13 @@ class DataflowSrcSafetyAnalysis
 if (BB.isEntryPoint())
   return createEntryState();
 
+// If a basic block without any predecessors is found in an optimized code,
+// this likely means that some CFG edges were not detected. Pessimistically
+// assume all registers to be unsafe before this basic block and warn about
+// this fact in FunctionAnalysis::findUnsafeUses().
+if (BB.pred_empty())
+  return createUnsafeState();
+
 return SrcState();
   }
 
@@ -689,12 +702,6 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis,
   using SrcSafetyAnalysis::BC;
   BinaryFunction &BF;
 
-  /// Creates a state with all registers marked unsafe (not to be confused
-  /// with empty state).
-  SrcState createUnsafeState() const {
-return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
-  }
-
 public:
   CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF,
   MCPlusBuilder::AllocatorIdTy AllocId,
@@ -1364,19 +1371,30 @@ void FunctionAnalysisContext::findUnsafeUses(
 BF.dump();
   });
 
+  if (BF.hasCFG()) {
+// Warn on basic blocks being unreachable according to BOLT, as this
+// likely means CFG is imprecise.
+for (BinaryBasicBlock &BB : BF) {
+  if (!BB.pred_empty() || BB.isEntryPoint())
+continue;
+  // Arbitrarily attach the report to the first instruction of BB.
+  MCInst *InstToReport = BB.getFirstNonPseudoInstr();
+  if (!InstToReport)
+continue; // BB has no real instructions
+
+  Reports.push_back(
+  make_generic_report(MCInstReference::get(InstToReport, BF),
+  "Warning: no predecessor basic blocks detected "
+  "(possibly incomplete CFG)"));
+}
+  }
+
   iterateOverInstrs(BF, [&](MCInstReference Inst) {
 if (BC.MIB->isCFI(Inst))
   return;
 
 const SrcState &S = Analysis->getStateBefore(Inst);
-
-// If non-empty state was never propagated from the entry basic block
-// to Inst, assume it to be unreachable and report a warning.
-if (S.empty()) {
-  Reports.push_back(
-  make_generic_report(Inst, "Warning: unreachable instruction found"));
-  return;
-}
+assert(!S.empty() && "Instruction has no associated state");
 
 if (auto Report = shouldReportReturnGadget(BC, Inst, S))
   Reports.push_back(*Report);
diff --git a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s 
b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
index 284f0bea607a5..6559ba336e8de 100644
--- a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
+++ b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
@@ -215,12 +215,17 @@ f_callclobbered_calleesaved:
 .globl  f_unreachable_instruction
 .type   f_unreachable_instruction,@function
 f_unreachable_instruction:
-// CHECK-LABEL: GS-PAUTH: Warning: unreachable instruction found in function 
f_unreachable_instruction, basic block {{[0-9a-zA-Z.]+}}, at address
+// CHECK-LABEL: GS-PAUTH: Warning: no predecessor basic blocks detected 
(possibly incomplete CFG) in function f_unreachable_instruction, basic block 
{{[0-9a-zA-Z.]+}}, at address
 // CHECK-NEXT:The instruction is {{[0-9a-f]+}}:   add x0, x1, 
x2
 // CHECK-NOT:   instructions that write t

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: do not crash on debug-printing CFI instructions (PR #136151)

2025-05-28 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/136151

>From 9d8fedf678fe91ca1d7ac3334747227df335ff2c Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 15 Apr 2025 21:47:18 +0300
Subject: [PATCH] [BOLT] Gadget scanner: do not crash on debug-printing CFI
 instructions

Some instruction-printing code used under LLVM_DEBUG does not handle CFI
instructions well. While CFI instructions seem to be harmless for the
correctness of the analysis results, they do not convey any useful
information to the analysis either, so skip them early.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 16 ++
 .../AArch64/gs-pauth-debug-output.s   | 32 +++
 2 files changed, 48 insertions(+)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 345af32650624..25be23d64463e 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -430,6 +430,9 @@ class SrcSafetyAnalysis {
   }
 
   SrcState computeNext(const MCInst &Point, const SrcState &Cur) {
+if (BC.MIB->isCFI(Point))
+  return Cur;
+
 SrcStatePrinter P(BC);
 LLVM_DEBUG({
   dbgs() << "  SrcSafetyAnalysis::ComputeNext(";
@@ -704,6 +707,8 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis,
 SrcState S = createEntryState();
 for (auto &I : BF.instrs()) {
   MCInst &Inst = I.second;
+  if (BC.MIB->isCFI(Inst))
+continue;
 
   // If there is a label before this instruction, it is possible that it
   // can be jumped-to, thus conservatively resetting S. As an exception,
@@ -998,6 +1003,9 @@ class DstSafetyAnalysis {
   }
 
   DstState computeNext(const MCInst &Point, const DstState &Cur) {
+if (BC.MIB->isCFI(Point))
+  return Cur;
+
 DstStatePrinter P(BC);
 LLVM_DEBUG({
   dbgs() << "  DstSafetyAnalysis::ComputeNext(";
@@ -1165,6 +1173,8 @@ class CFGUnawareDstSafetyAnalysis : public 
DstSafetyAnalysis,
 DstState S = createUnsafeState();
 for (auto &I : llvm::reverse(BF.instrs())) {
   MCInst &Inst = I.second;
+  if (BC.MIB->isCFI(Inst))
+continue;
 
   // If Inst can change the control flow, we cannot be sure that the next
   // instruction (to be executed in analyzed program) is the one processed
@@ -1355,6 +1365,9 @@ void FunctionAnalysisContext::findUnsafeUses(
   });
 
   iterateOverInstrs(BF, [&](MCInstReference Inst) {
+if (BC.MIB->isCFI(Inst))
+  return;
+
 const SrcState &S = Analysis->getStateBefore(Inst);
 
 // If non-empty state was never propagated from the entry basic block
@@ -1418,6 +1431,9 @@ void FunctionAnalysisContext::findUnsafeDefs(
   });
 
   iterateOverInstrs(BF, [&](MCInstReference Inst) {
+if (BC.MIB->isCFI(Inst))
+  return;
+
 const DstState &S = Analysis->getStateAfter(Inst);
 
 if (auto Report = shouldReportAuthOracle(BC, Inst, S))
diff --git a/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s 
b/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s
index 61aa84377b88e..5aec945621987 100644
--- a/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s
+++ b/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s
@@ -329,6 +329,38 @@ auth_oracle:
 // PAUTH-EMPTY:
 // PAUTH-NEXT:   Attaching leakage info to: :  autia   x0, x1 
# DataflowDstSafetyAnalysis: dst-state
 
+// Gadget scanner should not crash on CFI instructions, including when 
debug-printing them.
+// Note that the particular debug output is not checked, but BOLT should be
+// compiled with assertions enabled to support -debug-only argument.
+
+.globl  cfi_inst_df
+.type   cfi_inst_df,@function
+cfi_inst_df:
+.cfi_startproc
+sub sp, sp, #16
+.cfi_def_cfa_offset 16
+add sp, sp, #16
+.cfi_def_cfa_offset 0
+ret
+.size   cfi_inst_df, .-cfi_inst_df
+.cfi_endproc
+
+.globl  cfi_inst_nocfg
+.type   cfi_inst_nocfg,@function
+cfi_inst_nocfg:
+.cfi_startproc
+sub sp, sp, #16
+.cfi_def_cfa_offset 16
+
+adr x0, 1f
+br  x0
+1:
+add sp, sp, #16
+.cfi_def_cfa_offset 0
+ret
+.size   cfi_inst_nocfg, .-cfi_inst_nocfg
+.cfi_endproc
+
 // CHECK-LABEL:Analyzing function main, AllocatorId = 1
 .globl  main
 .type   main,@function

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

1 2 >