[llvm-branch-commits] [mlir] [MLIR] Integration tests for lowering vector.contract to SVE FEAT_I8MM (PR #140573)
banach-space wrote: Thanks - great to finally be reaching this stage! I have a few high-level questions and suggestions: **1. Why is the scalable dimension always [4]?** From the current tests, it looks like the scalable dim is always `[4]`. Could you remind me why that value is chosen? **2. Reduce duplication in the 4x8x4 tests** The current tests differ only in terms of **input**/**output** and `extsi` vs `extui`. It should be possible to reduce duplication by extracting shared logic into helpers, and writing 4 separate entry points (set via `entry_point`) to isolate the differences. For example: ```mlir func.func @main_smmla() { // Init LHS, RHS, ACC // CHECK-LINES for LHS print(lhs); // CHECK-LINES for RHS print(rhs); arith.extsi (lhs) arith.extsi (rhs) vector.contract // CHECK-LINES for ACC print(acc); } ``` This would keep the test logic focused and easier to maintain. **3. Add checks for generated IR (LLVM dialect)** It would be good to verify that the lowered IR includes the correct SME MMLA intrinsics. For example: ```mlir // CHECK-COUNT-4: llvm.intr.smmla ``` This would help confirm both correctness and that the expected number of operations are emitted. **4. Consider toggling VL within tests** Have you considered toggling the scalable vector length (`VL`) within the test? That would allow verifying behaviour for multiple `VL` values. From what I can tell, this would only work if the inputs are generated inside a loop, similar to this example: https://github.com/llvm/llvm-project/blob/88f61f2c5c0ad9dad9c8df2fb86352629e7572c1/mlir/test/Integration/Dialect/Vector/CPU/ArmSME/load-vertical.mlir#L19-L37 That might be a nice validation of the "scalability" aspect. https://github.com/llvm/llvm-project/pull/140573 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Move S_BFE lowering into RegBankCombiner (PR #141589)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/141589 >From c7a0fb8f9846faa98cd5dbf3d71d5149051fa8a8 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Tue, 27 May 2025 11:16:16 +0200 Subject: [PATCH 1/2] [AMDGPU] Move S_BFE lowering into RegBankCombiner --- llvm/lib/Target/AMDGPU/AMDGPUCombine.td | 14 +- .../Target/AMDGPU/AMDGPURegBankCombiner.cpp | 51 +++ .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 125 -- 3 files changed, 119 insertions(+), 71 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td index 9587fad1ecd63..94e1175b06b14 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td @@ -151,6 +151,17 @@ def zext_of_shift_amount_combines : GICombineGroup<[ canonicalize_zext_lshr, canonicalize_zext_ashr, canonicalize_zext_shl ]>; +// Early select of uniform BFX into S_BFE instructions. +// These instructions encode the offset/width in a way that requires using +// bitwise operations. Selecting these instructions early allow the combiner +// to potentially fold these. +class lower_uniform_bfx : GICombineRule< + (defs root:$bfx), + (combine (bfx $dst, $src, $o, $w):$bfx, [{ return lowerUniformBFX(*${bfx}); }])>; + +def lower_uniform_sbfx : lower_uniform_bfx; +def lower_uniform_ubfx : lower_uniform_bfx; + let Predicates = [Has16BitInsts, NotHasMed3_16] in { // For gfx8, expand f16-fmed3-as-f32 into a min/max f16 sequence. This // saves one instruction compared to the promotion. @@ -198,5 +209,6 @@ def AMDGPURegBankCombiner : GICombiner< zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain, fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp, identity_combines, redundant_and, constant_fold_cast_op, - cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines]> { + cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines, + lower_uniform_sbfx, lower_uniform_ubfx]> { } diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp index ee324a5e93f0f..2100900bb8eb2 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp @@ -89,6 +89,8 @@ class AMDGPURegBankCombinerImpl : public Combiner { void applyCanonicalizeZextShiftAmt(MachineInstr &MI, MachineInstr &Ext) const; + bool lowerUniformBFX(MachineInstr &MI) const; + private: SIModeRegisterDefaults getMode() const; bool getIEEE() const; @@ -392,6 +394,55 @@ void AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt( MI.eraseFromParent(); } +bool AMDGPURegBankCombinerImpl::lowerUniformBFX(MachineInstr &MI) const { + assert(MI.getOpcode() == TargetOpcode::G_UBFX || + MI.getOpcode() == TargetOpcode::G_SBFX); + const bool Signed = (MI.getOpcode() == TargetOpcode::G_SBFX); + + Register DstReg = MI.getOperand(0).getReg(); + const RegisterBank *RB = RBI.getRegBank(DstReg, MRI, TRI); + assert(RB && "No RB?"); + if (RB->getID() != AMDGPU::SGPRRegBankID) +return false; + + Register SrcReg = MI.getOperand(1).getReg(); + Register OffsetReg = MI.getOperand(2).getReg(); + Register WidthReg = MI.getOperand(3).getReg(); + + const LLT S32 = LLT::scalar(32); + LLT Ty = MRI.getType(DstReg); + + const unsigned Opc = (Ty == S32) + ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) + : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); + + // Ensure the high bits are clear to insert the offset. + auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6)); + auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask); + + // Zeros out the low bits, so don't bother clamping the input value. + auto ShiftAmt = B.buildConstant(S32, 16); + auto ShiftWidth = B.buildShl(S32, WidthReg, ShiftAmt); + + // Transformation function, pack the offset and width of a BFE into + // the format expected by the S_BFE_I32 / S_BFE_U32. In the second + // source, bits [5:0] contain the offset and bits [22:16] the width. + auto MergedInputs = B.buildOr(S32, ClampOffset, ShiftWidth); + + MRI.setRegBank(OffsetMask.getReg(0), *RB); + MRI.setRegBank(ClampOffset.getReg(0), *RB); + MRI.setRegBank(ShiftAmt.getReg(0), *RB); + MRI.setRegBank(ShiftWidth.getReg(0), *RB); + MRI.setRegBank(MergedInputs.getReg(0), *RB); + + auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs}); + if (!constrainSelectedInstRegOperands(*MIB, TII, TRI, RBI)) +llvm_unreachable("failed to constrain BFE"); + + MI.eraseFromParent(); + return true; +} + SIModeRegisterDefaults AMDGPURegBankCombinerImpl::getMode() const { return MF.getInfo()->getMode(); } diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index dd7aef8f0c583..0b7d64ee67c34 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/li
[llvm-branch-commits] [llvm] [AMDGPU] Add KnownBits simplification combines to RegBankCombiner (PR #141591)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/141591 >From 7c8f90225928c0dbffcfa03bd20da3419a80095f Mon Sep 17 00:00:00 2001 From: pvanhout Date: Tue, 27 May 2025 12:29:02 +0200 Subject: [PATCH 1/2] [AMDGPU] Add KnownBits simplification combines to RegBankCombiner --- llvm/lib/Target/AMDGPU/AMDGPUCombine.td | 3 +- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 59 - .../test/CodeGen/AMDGPU/GlobalISel/saddsat.ll | 61 +++--- .../test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll | 63 +++ llvm/test/CodeGen/AMDGPU/div_i128.ll | 30 - llvm/test/CodeGen/AMDGPU/itofp.i128.ll| 11 ++-- llvm/test/CodeGen/AMDGPU/lround.ll| 18 +++--- llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | 16 + 8 files changed, 104 insertions(+), 157 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td index 96be17c487130..df867aaa204b1 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td @@ -210,5 +210,6 @@ def AMDGPURegBankCombiner : GICombiner< fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp, identity_combines, redundant_and, constant_fold_cast_op, cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines, - lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract]> { + lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract, + known_bits_simplifications]> { } diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll index 6baa10bb48621..cc0f45681a3e2 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll @@ -1744,63 +1744,64 @@ define i65 @v_lshr_i65_33(i65 %value) { ; GFX6-LABEL: v_lshr_i65_33: ; GFX6: ; %bb.0: ; GFX6-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX6-NEXT:v_mov_b32_e32 v3, v1 -; GFX6-NEXT:v_mov_b32_e32 v0, 1 +; GFX6-NEXT:v_mov_b32_e32 v3, 1 +; GFX6-NEXT:v_mov_b32_e32 v4, 0 +; GFX6-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX6-NEXT:v_lshl_b64 v[2:3], v[3:4], 31 +; GFX6-NEXT:v_lshrrev_b32_e32 v0, 1, v1 +; GFX6-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX6-NEXT:v_mov_b32_e32 v1, 0 -; GFX6-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX6-NEXT:v_lshl_b64 v[0:1], v[0:1], 31 -; GFX6-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX6-NEXT:v_or_b32_e32 v0, v2, v0 ; GFX6-NEXT:v_mov_b32_e32 v2, 0 ; GFX6-NEXT:s_setpc_b64 s[30:31] ; ; GFX8-LABEL: v_lshr_i65_33: ; GFX8: ; %bb.0: ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX8-NEXT:v_mov_b32_e32 v3, v1 -; GFX8-NEXT:v_mov_b32_e32 v0, 1 +; GFX8-NEXT:v_mov_b32_e32 v3, 1 +; GFX8-NEXT:v_mov_b32_e32 v4, 0 +; GFX8-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX8-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4] +; GFX8-NEXT:v_lshrrev_b32_e32 v0, 1, v1 +; GFX8-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX8-NEXT:v_mov_b32_e32 v1, 0 -; GFX8-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX8-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1] -; GFX8-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX8-NEXT:v_or_b32_e32 v0, v2, v0 ; GFX8-NEXT:v_mov_b32_e32 v2, 0 ; GFX8-NEXT:s_setpc_b64 s[30:31] ; ; GFX9-LABEL: v_lshr_i65_33: ; GFX9: ; %bb.0: ; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX9-NEXT:v_mov_b32_e32 v3, v1 -; GFX9-NEXT:v_mov_b32_e32 v0, 1 +; GFX9-NEXT:v_mov_b32_e32 v3, 1 +; GFX9-NEXT:v_mov_b32_e32 v4, 0 +; GFX9-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX9-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4] +; GFX9-NEXT:v_lshrrev_b32_e32 v0, 1, v1 +; GFX9-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX9-NEXT:v_mov_b32_e32 v1, 0 -; GFX9-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX9-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1] -; GFX9-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX9-NEXT:v_or_b32_e32 v0, v2, v0 ; GFX9-NEXT:v_mov_b32_e32 v2, 0 ; GFX9-NEXT:s_setpc_b64 s[30:31] ; ; GFX10-LABEL: v_lshr_i65_33: ; GFX10: ; %bb.0: ; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX10-NEXT:v_mov_b32_e32 v3, v1 -; GFX10-NEXT:v_mov_b32_e32 v0, 1 +; GFX10-NEXT:v_mov_b32_e32 v3, 1 +; GFX10-NEXT:v_mov_b32_e32 v4, 0 +; GFX10-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX10-NEXT:v_lshrrev_b32_e32 v0, 1, v1 ; GFX10-NEXT:v_mov_b32_e32 v1, 0 -; GFX10-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX10-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX10-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1] -; GFX10-NEXT:v_or_b32_e32 v0, v2, v0 +; GFX10-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4] +; GFX10-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX10-NEXT:v_mov_b32_e32 v2, 0 ; GFX10-NEXT:s_setpc_b64 s[30:31] ; ; GFX11-LABEL: v_lshr_i65_33: ; GFX11: ; %bb.0: ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX11-NEXT:v_dual_mov_b32 v3, v1 :: v_dual_mov_b32 v0, 1 -; GFX11-NEXT:v_dual_mov_b32 v1, 0 :: v_dual_an
[llvm-branch-commits] [llvm] [AMDGPU] Add KnownBits simplification combines to RegBankCombiner (PR #141591)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/141591 >From 7c8f90225928c0dbffcfa03bd20da3419a80095f Mon Sep 17 00:00:00 2001 From: pvanhout Date: Tue, 27 May 2025 12:29:02 +0200 Subject: [PATCH 1/2] [AMDGPU] Add KnownBits simplification combines to RegBankCombiner --- llvm/lib/Target/AMDGPU/AMDGPUCombine.td | 3 +- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 59 - .../test/CodeGen/AMDGPU/GlobalISel/saddsat.ll | 61 +++--- .../test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll | 63 +++ llvm/test/CodeGen/AMDGPU/div_i128.ll | 30 - llvm/test/CodeGen/AMDGPU/itofp.i128.ll| 11 ++-- llvm/test/CodeGen/AMDGPU/lround.ll| 18 +++--- llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | 16 + 8 files changed, 104 insertions(+), 157 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td index 96be17c487130..df867aaa204b1 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td @@ -210,5 +210,6 @@ def AMDGPURegBankCombiner : GICombiner< fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp, identity_combines, redundant_and, constant_fold_cast_op, cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines, - lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract]> { + lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract, + known_bits_simplifications]> { } diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll index 6baa10bb48621..cc0f45681a3e2 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll @@ -1744,63 +1744,64 @@ define i65 @v_lshr_i65_33(i65 %value) { ; GFX6-LABEL: v_lshr_i65_33: ; GFX6: ; %bb.0: ; GFX6-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX6-NEXT:v_mov_b32_e32 v3, v1 -; GFX6-NEXT:v_mov_b32_e32 v0, 1 +; GFX6-NEXT:v_mov_b32_e32 v3, 1 +; GFX6-NEXT:v_mov_b32_e32 v4, 0 +; GFX6-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX6-NEXT:v_lshl_b64 v[2:3], v[3:4], 31 +; GFX6-NEXT:v_lshrrev_b32_e32 v0, 1, v1 +; GFX6-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX6-NEXT:v_mov_b32_e32 v1, 0 -; GFX6-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX6-NEXT:v_lshl_b64 v[0:1], v[0:1], 31 -; GFX6-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX6-NEXT:v_or_b32_e32 v0, v2, v0 ; GFX6-NEXT:v_mov_b32_e32 v2, 0 ; GFX6-NEXT:s_setpc_b64 s[30:31] ; ; GFX8-LABEL: v_lshr_i65_33: ; GFX8: ; %bb.0: ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX8-NEXT:v_mov_b32_e32 v3, v1 -; GFX8-NEXT:v_mov_b32_e32 v0, 1 +; GFX8-NEXT:v_mov_b32_e32 v3, 1 +; GFX8-NEXT:v_mov_b32_e32 v4, 0 +; GFX8-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX8-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4] +; GFX8-NEXT:v_lshrrev_b32_e32 v0, 1, v1 +; GFX8-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX8-NEXT:v_mov_b32_e32 v1, 0 -; GFX8-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX8-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1] -; GFX8-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX8-NEXT:v_or_b32_e32 v0, v2, v0 ; GFX8-NEXT:v_mov_b32_e32 v2, 0 ; GFX8-NEXT:s_setpc_b64 s[30:31] ; ; GFX9-LABEL: v_lshr_i65_33: ; GFX9: ; %bb.0: ; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX9-NEXT:v_mov_b32_e32 v3, v1 -; GFX9-NEXT:v_mov_b32_e32 v0, 1 +; GFX9-NEXT:v_mov_b32_e32 v3, 1 +; GFX9-NEXT:v_mov_b32_e32 v4, 0 +; GFX9-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX9-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4] +; GFX9-NEXT:v_lshrrev_b32_e32 v0, 1, v1 +; GFX9-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX9-NEXT:v_mov_b32_e32 v1, 0 -; GFX9-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX9-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1] -; GFX9-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX9-NEXT:v_or_b32_e32 v0, v2, v0 ; GFX9-NEXT:v_mov_b32_e32 v2, 0 ; GFX9-NEXT:s_setpc_b64 s[30:31] ; ; GFX10-LABEL: v_lshr_i65_33: ; GFX10: ; %bb.0: ; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX10-NEXT:v_mov_b32_e32 v3, v1 -; GFX10-NEXT:v_mov_b32_e32 v0, 1 +; GFX10-NEXT:v_mov_b32_e32 v3, 1 +; GFX10-NEXT:v_mov_b32_e32 v4, 0 +; GFX10-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX10-NEXT:v_lshrrev_b32_e32 v0, 1, v1 ; GFX10-NEXT:v_mov_b32_e32 v1, 0 -; GFX10-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX10-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX10-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1] -; GFX10-NEXT:v_or_b32_e32 v0, v2, v0 +; GFX10-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4] +; GFX10-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX10-NEXT:v_mov_b32_e32 v2, 0 ; GFX10-NEXT:s_setpc_b64 s[30:31] ; ; GFX11-LABEL: v_lshr_i65_33: ; GFX11: ; %bb.0: ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX11-NEXT:v_dual_mov_b32 v3, v1 :: v_dual_mov_b32 v0, 1 -; GFX11-NEXT:v_dual_mov_b32 v1, 0 :: v_dual_an
[llvm-branch-commits] [llvm] [AMDGPU] Add KnownBits simplification combines to RegBankCombiner (PR #141591)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/141591 >From 9e7f29551b788d9060aec2168920554df41ff5df Mon Sep 17 00:00:00 2001 From: pvanhout Date: Tue, 27 May 2025 12:29:02 +0200 Subject: [PATCH 1/2] [AMDGPU] Add KnownBits simplification combines to RegBankCombiner --- llvm/lib/Target/AMDGPU/AMDGPUCombine.td | 3 +- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 59 - .../test/CodeGen/AMDGPU/GlobalISel/saddsat.ll | 61 +++--- .../test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll | 63 +++ llvm/test/CodeGen/AMDGPU/div_i128.ll | 30 - llvm/test/CodeGen/AMDGPU/itofp.i128.ll| 11 ++-- llvm/test/CodeGen/AMDGPU/lround.ll| 18 +++--- llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | 16 + 8 files changed, 104 insertions(+), 157 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td index 96be17c487130..df867aaa204b1 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td @@ -210,5 +210,6 @@ def AMDGPURegBankCombiner : GICombiner< fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp, identity_combines, redundant_and, constant_fold_cast_op, cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines, - lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract]> { + lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract, + known_bits_simplifications]> { } diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll index 6baa10bb48621..cc0f45681a3e2 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll @@ -1744,63 +1744,64 @@ define i65 @v_lshr_i65_33(i65 %value) { ; GFX6-LABEL: v_lshr_i65_33: ; GFX6: ; %bb.0: ; GFX6-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX6-NEXT:v_mov_b32_e32 v3, v1 -; GFX6-NEXT:v_mov_b32_e32 v0, 1 +; GFX6-NEXT:v_mov_b32_e32 v3, 1 +; GFX6-NEXT:v_mov_b32_e32 v4, 0 +; GFX6-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX6-NEXT:v_lshl_b64 v[2:3], v[3:4], 31 +; GFX6-NEXT:v_lshrrev_b32_e32 v0, 1, v1 +; GFX6-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX6-NEXT:v_mov_b32_e32 v1, 0 -; GFX6-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX6-NEXT:v_lshl_b64 v[0:1], v[0:1], 31 -; GFX6-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX6-NEXT:v_or_b32_e32 v0, v2, v0 ; GFX6-NEXT:v_mov_b32_e32 v2, 0 ; GFX6-NEXT:s_setpc_b64 s[30:31] ; ; GFX8-LABEL: v_lshr_i65_33: ; GFX8: ; %bb.0: ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX8-NEXT:v_mov_b32_e32 v3, v1 -; GFX8-NEXT:v_mov_b32_e32 v0, 1 +; GFX8-NEXT:v_mov_b32_e32 v3, 1 +; GFX8-NEXT:v_mov_b32_e32 v4, 0 +; GFX8-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX8-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4] +; GFX8-NEXT:v_lshrrev_b32_e32 v0, 1, v1 +; GFX8-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX8-NEXT:v_mov_b32_e32 v1, 0 -; GFX8-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX8-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1] -; GFX8-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX8-NEXT:v_or_b32_e32 v0, v2, v0 ; GFX8-NEXT:v_mov_b32_e32 v2, 0 ; GFX8-NEXT:s_setpc_b64 s[30:31] ; ; GFX9-LABEL: v_lshr_i65_33: ; GFX9: ; %bb.0: ; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX9-NEXT:v_mov_b32_e32 v3, v1 -; GFX9-NEXT:v_mov_b32_e32 v0, 1 +; GFX9-NEXT:v_mov_b32_e32 v3, 1 +; GFX9-NEXT:v_mov_b32_e32 v4, 0 +; GFX9-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX9-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4] +; GFX9-NEXT:v_lshrrev_b32_e32 v0, 1, v1 +; GFX9-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX9-NEXT:v_mov_b32_e32 v1, 0 -; GFX9-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX9-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1] -; GFX9-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX9-NEXT:v_or_b32_e32 v0, v2, v0 ; GFX9-NEXT:v_mov_b32_e32 v2, 0 ; GFX9-NEXT:s_setpc_b64 s[30:31] ; ; GFX10-LABEL: v_lshr_i65_33: ; GFX10: ; %bb.0: ; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX10-NEXT:v_mov_b32_e32 v3, v1 -; GFX10-NEXT:v_mov_b32_e32 v0, 1 +; GFX10-NEXT:v_mov_b32_e32 v3, 1 +; GFX10-NEXT:v_mov_b32_e32 v4, 0 +; GFX10-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX10-NEXT:v_lshrrev_b32_e32 v0, 1, v1 ; GFX10-NEXT:v_mov_b32_e32 v1, 0 -; GFX10-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX10-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX10-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1] -; GFX10-NEXT:v_or_b32_e32 v0, v2, v0 +; GFX10-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4] +; GFX10-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX10-NEXT:v_mov_b32_e32 v2, 0 ; GFX10-NEXT:s_setpc_b64 s[30:31] ; ; GFX11-LABEL: v_lshr_i65_33: ; GFX11: ; %bb.0: ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX11-NEXT:v_dual_mov_b32 v3, v1 :: v_dual_mov_b32 v0, 1 -; GFX11-NEXT:v_dual_mov_b32 v1, 0 :: v_dual_an
[llvm-branch-commits] [llvm] [AMDGPU] Add KnownBits simplification combines to RegBankCombiner (PR #141591)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/141591 >From 9e7f29551b788d9060aec2168920554df41ff5df Mon Sep 17 00:00:00 2001 From: pvanhout Date: Tue, 27 May 2025 12:29:02 +0200 Subject: [PATCH 1/2] [AMDGPU] Add KnownBits simplification combines to RegBankCombiner --- llvm/lib/Target/AMDGPU/AMDGPUCombine.td | 3 +- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 59 - .../test/CodeGen/AMDGPU/GlobalISel/saddsat.ll | 61 +++--- .../test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll | 63 +++ llvm/test/CodeGen/AMDGPU/div_i128.ll | 30 - llvm/test/CodeGen/AMDGPU/itofp.i128.ll| 11 ++-- llvm/test/CodeGen/AMDGPU/lround.ll| 18 +++--- llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | 16 + 8 files changed, 104 insertions(+), 157 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td index 96be17c487130..df867aaa204b1 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td @@ -210,5 +210,6 @@ def AMDGPURegBankCombiner : GICombiner< fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp, identity_combines, redundant_and, constant_fold_cast_op, cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines, - lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract]> { + lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract, + known_bits_simplifications]> { } diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll index 6baa10bb48621..cc0f45681a3e2 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll @@ -1744,63 +1744,64 @@ define i65 @v_lshr_i65_33(i65 %value) { ; GFX6-LABEL: v_lshr_i65_33: ; GFX6: ; %bb.0: ; GFX6-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX6-NEXT:v_mov_b32_e32 v3, v1 -; GFX6-NEXT:v_mov_b32_e32 v0, 1 +; GFX6-NEXT:v_mov_b32_e32 v3, 1 +; GFX6-NEXT:v_mov_b32_e32 v4, 0 +; GFX6-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX6-NEXT:v_lshl_b64 v[2:3], v[3:4], 31 +; GFX6-NEXT:v_lshrrev_b32_e32 v0, 1, v1 +; GFX6-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX6-NEXT:v_mov_b32_e32 v1, 0 -; GFX6-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX6-NEXT:v_lshl_b64 v[0:1], v[0:1], 31 -; GFX6-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX6-NEXT:v_or_b32_e32 v0, v2, v0 ; GFX6-NEXT:v_mov_b32_e32 v2, 0 ; GFX6-NEXT:s_setpc_b64 s[30:31] ; ; GFX8-LABEL: v_lshr_i65_33: ; GFX8: ; %bb.0: ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX8-NEXT:v_mov_b32_e32 v3, v1 -; GFX8-NEXT:v_mov_b32_e32 v0, 1 +; GFX8-NEXT:v_mov_b32_e32 v3, 1 +; GFX8-NEXT:v_mov_b32_e32 v4, 0 +; GFX8-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX8-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4] +; GFX8-NEXT:v_lshrrev_b32_e32 v0, 1, v1 +; GFX8-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX8-NEXT:v_mov_b32_e32 v1, 0 -; GFX8-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX8-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1] -; GFX8-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX8-NEXT:v_or_b32_e32 v0, v2, v0 ; GFX8-NEXT:v_mov_b32_e32 v2, 0 ; GFX8-NEXT:s_setpc_b64 s[30:31] ; ; GFX9-LABEL: v_lshr_i65_33: ; GFX9: ; %bb.0: ; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX9-NEXT:v_mov_b32_e32 v3, v1 -; GFX9-NEXT:v_mov_b32_e32 v0, 1 +; GFX9-NEXT:v_mov_b32_e32 v3, 1 +; GFX9-NEXT:v_mov_b32_e32 v4, 0 +; GFX9-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX9-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4] +; GFX9-NEXT:v_lshrrev_b32_e32 v0, 1, v1 +; GFX9-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX9-NEXT:v_mov_b32_e32 v1, 0 -; GFX9-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX9-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1] -; GFX9-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX9-NEXT:v_or_b32_e32 v0, v2, v0 ; GFX9-NEXT:v_mov_b32_e32 v2, 0 ; GFX9-NEXT:s_setpc_b64 s[30:31] ; ; GFX10-LABEL: v_lshr_i65_33: ; GFX10: ; %bb.0: ; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX10-NEXT:v_mov_b32_e32 v3, v1 -; GFX10-NEXT:v_mov_b32_e32 v0, 1 +; GFX10-NEXT:v_mov_b32_e32 v3, 1 +; GFX10-NEXT:v_mov_b32_e32 v4, 0 +; GFX10-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX10-NEXT:v_lshrrev_b32_e32 v0, 1, v1 ; GFX10-NEXT:v_mov_b32_e32 v1, 0 -; GFX10-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX10-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX10-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1] -; GFX10-NEXT:v_or_b32_e32 v0, v2, v0 +; GFX10-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4] +; GFX10-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX10-NEXT:v_mov_b32_e32 v2, 0 ; GFX10-NEXT:s_setpc_b64 s[30:31] ; ; GFX11-LABEL: v_lshr_i65_33: ; GFX11: ; %bb.0: ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX11-NEXT:v_dual_mov_b32 v3, v1 :: v_dual_mov_b32 v0, 1 -; GFX11-NEXT:v_dual_mov_b32 v1, 0 :: v_dual_an
[llvm-branch-commits] [llvm] [AMDGPU] Move S_BFE lowering into RegBankCombiner (PR #141589)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/141589 >From c7a0fb8f9846faa98cd5dbf3d71d5149051fa8a8 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Tue, 27 May 2025 11:16:16 +0200 Subject: [PATCH 1/2] [AMDGPU] Move S_BFE lowering into RegBankCombiner --- llvm/lib/Target/AMDGPU/AMDGPUCombine.td | 14 +- .../Target/AMDGPU/AMDGPURegBankCombiner.cpp | 51 +++ .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 125 -- 3 files changed, 119 insertions(+), 71 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td index 9587fad1ecd63..94e1175b06b14 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td @@ -151,6 +151,17 @@ def zext_of_shift_amount_combines : GICombineGroup<[ canonicalize_zext_lshr, canonicalize_zext_ashr, canonicalize_zext_shl ]>; +// Early select of uniform BFX into S_BFE instructions. +// These instructions encode the offset/width in a way that requires using +// bitwise operations. Selecting these instructions early allow the combiner +// to potentially fold these. +class lower_uniform_bfx : GICombineRule< + (defs root:$bfx), + (combine (bfx $dst, $src, $o, $w):$bfx, [{ return lowerUniformBFX(*${bfx}); }])>; + +def lower_uniform_sbfx : lower_uniform_bfx; +def lower_uniform_ubfx : lower_uniform_bfx; + let Predicates = [Has16BitInsts, NotHasMed3_16] in { // For gfx8, expand f16-fmed3-as-f32 into a min/max f16 sequence. This // saves one instruction compared to the promotion. @@ -198,5 +209,6 @@ def AMDGPURegBankCombiner : GICombiner< zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain, fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp, identity_combines, redundant_and, constant_fold_cast_op, - cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines]> { + cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines, + lower_uniform_sbfx, lower_uniform_ubfx]> { } diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp index ee324a5e93f0f..2100900bb8eb2 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp @@ -89,6 +89,8 @@ class AMDGPURegBankCombinerImpl : public Combiner { void applyCanonicalizeZextShiftAmt(MachineInstr &MI, MachineInstr &Ext) const; + bool lowerUniformBFX(MachineInstr &MI) const; + private: SIModeRegisterDefaults getMode() const; bool getIEEE() const; @@ -392,6 +394,55 @@ void AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt( MI.eraseFromParent(); } +bool AMDGPURegBankCombinerImpl::lowerUniformBFX(MachineInstr &MI) const { + assert(MI.getOpcode() == TargetOpcode::G_UBFX || + MI.getOpcode() == TargetOpcode::G_SBFX); + const bool Signed = (MI.getOpcode() == TargetOpcode::G_SBFX); + + Register DstReg = MI.getOperand(0).getReg(); + const RegisterBank *RB = RBI.getRegBank(DstReg, MRI, TRI); + assert(RB && "No RB?"); + if (RB->getID() != AMDGPU::SGPRRegBankID) +return false; + + Register SrcReg = MI.getOperand(1).getReg(); + Register OffsetReg = MI.getOperand(2).getReg(); + Register WidthReg = MI.getOperand(3).getReg(); + + const LLT S32 = LLT::scalar(32); + LLT Ty = MRI.getType(DstReg); + + const unsigned Opc = (Ty == S32) + ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) + : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); + + // Ensure the high bits are clear to insert the offset. + auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6)); + auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask); + + // Zeros out the low bits, so don't bother clamping the input value. + auto ShiftAmt = B.buildConstant(S32, 16); + auto ShiftWidth = B.buildShl(S32, WidthReg, ShiftAmt); + + // Transformation function, pack the offset and width of a BFE into + // the format expected by the S_BFE_I32 / S_BFE_U32. In the second + // source, bits [5:0] contain the offset and bits [22:16] the width. + auto MergedInputs = B.buildOr(S32, ClampOffset, ShiftWidth); + + MRI.setRegBank(OffsetMask.getReg(0), *RB); + MRI.setRegBank(ClampOffset.getReg(0), *RB); + MRI.setRegBank(ShiftAmt.getReg(0), *RB); + MRI.setRegBank(ShiftWidth.getReg(0), *RB); + MRI.setRegBank(MergedInputs.getReg(0), *RB); + + auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs}); + if (!constrainSelectedInstRegOperands(*MIB, TII, TRI, RBI)) +llvm_unreachable("failed to constrain BFE"); + + MI.eraseFromParent(); + return true; +} + SIModeRegisterDefaults AMDGPURegBankCombinerImpl::getMode() const { return MF.getInfo()->getMode(); } diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index dd7aef8f0c583..0b7d64ee67c34 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/li
[llvm-branch-commits] [llvm] [AMDGPU] Move S_BFE lowering into RegBankCombiner (PR #141589)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/141589 >From efa6a12fedf3c87678a1df1e5d03ff1e58531625 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Tue, 27 May 2025 11:16:16 +0200 Subject: [PATCH 1/2] [AMDGPU] Move S_BFE lowering into RegBankCombiner --- llvm/lib/Target/AMDGPU/AMDGPUCombine.td | 14 +- .../Target/AMDGPU/AMDGPURegBankCombiner.cpp | 51 +++ .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 125 -- 3 files changed, 119 insertions(+), 71 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td index 9587fad1ecd63..94e1175b06b14 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td @@ -151,6 +151,17 @@ def zext_of_shift_amount_combines : GICombineGroup<[ canonicalize_zext_lshr, canonicalize_zext_ashr, canonicalize_zext_shl ]>; +// Early select of uniform BFX into S_BFE instructions. +// These instructions encode the offset/width in a way that requires using +// bitwise operations. Selecting these instructions early allow the combiner +// to potentially fold these. +class lower_uniform_bfx : GICombineRule< + (defs root:$bfx), + (combine (bfx $dst, $src, $o, $w):$bfx, [{ return lowerUniformBFX(*${bfx}); }])>; + +def lower_uniform_sbfx : lower_uniform_bfx; +def lower_uniform_ubfx : lower_uniform_bfx; + let Predicates = [Has16BitInsts, NotHasMed3_16] in { // For gfx8, expand f16-fmed3-as-f32 into a min/max f16 sequence. This // saves one instruction compared to the promotion. @@ -198,5 +209,6 @@ def AMDGPURegBankCombiner : GICombiner< zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain, fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp, identity_combines, redundant_and, constant_fold_cast_op, - cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines]> { + cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines, + lower_uniform_sbfx, lower_uniform_ubfx]> { } diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp index ee324a5e93f0f..2100900bb8eb2 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp @@ -89,6 +89,8 @@ class AMDGPURegBankCombinerImpl : public Combiner { void applyCanonicalizeZextShiftAmt(MachineInstr &MI, MachineInstr &Ext) const; + bool lowerUniformBFX(MachineInstr &MI) const; + private: SIModeRegisterDefaults getMode() const; bool getIEEE() const; @@ -392,6 +394,55 @@ void AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt( MI.eraseFromParent(); } +bool AMDGPURegBankCombinerImpl::lowerUniformBFX(MachineInstr &MI) const { + assert(MI.getOpcode() == TargetOpcode::G_UBFX || + MI.getOpcode() == TargetOpcode::G_SBFX); + const bool Signed = (MI.getOpcode() == TargetOpcode::G_SBFX); + + Register DstReg = MI.getOperand(0).getReg(); + const RegisterBank *RB = RBI.getRegBank(DstReg, MRI, TRI); + assert(RB && "No RB?"); + if (RB->getID() != AMDGPU::SGPRRegBankID) +return false; + + Register SrcReg = MI.getOperand(1).getReg(); + Register OffsetReg = MI.getOperand(2).getReg(); + Register WidthReg = MI.getOperand(3).getReg(); + + const LLT S32 = LLT::scalar(32); + LLT Ty = MRI.getType(DstReg); + + const unsigned Opc = (Ty == S32) + ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) + : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); + + // Ensure the high bits are clear to insert the offset. + auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6)); + auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask); + + // Zeros out the low bits, so don't bother clamping the input value. + auto ShiftAmt = B.buildConstant(S32, 16); + auto ShiftWidth = B.buildShl(S32, WidthReg, ShiftAmt); + + // Transformation function, pack the offset and width of a BFE into + // the format expected by the S_BFE_I32 / S_BFE_U32. In the second + // source, bits [5:0] contain the offset and bits [22:16] the width. + auto MergedInputs = B.buildOr(S32, ClampOffset, ShiftWidth); + + MRI.setRegBank(OffsetMask.getReg(0), *RB); + MRI.setRegBank(ClampOffset.getReg(0), *RB); + MRI.setRegBank(ShiftAmt.getReg(0), *RB); + MRI.setRegBank(ShiftWidth.getReg(0), *RB); + MRI.setRegBank(MergedInputs.getReg(0), *RB); + + auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs}); + if (!constrainSelectedInstRegOperands(*MIB, TII, TRI, RBI)) +llvm_unreachable("failed to constrain BFE"); + + MI.eraseFromParent(); + return true; +} + SIModeRegisterDefaults AMDGPURegBankCombinerImpl::getMode() const { return MF.getInfo()->getMode(); } diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index dd7aef8f0c583..0b7d64ee67c34 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/li
[llvm-branch-commits] [llvm] AMDGPU: Fix tracking subreg defs when folding through reg_sequence (PR #140608)
@@ -25,52 +25,151 @@ using namespace llvm; namespace { -struct FoldCandidate { - MachineInstr *UseMI; +/// Track a value we may want to fold into downstream users, applying +/// subregister extracts along the way. +struct FoldableDef { union { -MachineOperand *OpToFold; +MachineOperand *OpToFold = nullptr; uint64_t ImmToFold; int FrameIndexToFold; }; - int ShrinkOpcode; - unsigned UseOpNo; + + /// Register class of the originally defined value. + const TargetRegisterClass *DefRC = nullptr; + + /// Track the original defining instruction for the value. + const MachineInstr *DefMI = nullptr; + + /// Subregister to apply to the value at the use point. + unsigned DefSubReg = AMDGPU::NoSubRegister; + + /// Kind of value stored in the union. MachineOperand::MachineOperandType Kind; - bool Commuted; - FoldCandidate(MachineInstr *MI, unsigned OpNo, MachineOperand *FoldOp, -bool Commuted_ = false, -int ShrinkOp = -1) : -UseMI(MI), OpToFold(nullptr), ShrinkOpcode(ShrinkOp), UseOpNo(OpNo), -Kind(FoldOp->getType()), -Commuted(Commuted_) { -if (FoldOp->isImm()) { - ImmToFold = FoldOp->getImm(); -} else if (FoldOp->isFI()) { - FrameIndexToFold = FoldOp->getIndex(); + FoldableDef() = delete; + FoldableDef(MachineOperand &FoldOp, const TargetRegisterClass *DefRC, + unsigned DefSubReg = AMDGPU::NoSubRegister) + : DefRC(DefRC), DefSubReg(DefSubReg), Kind(FoldOp.getType()) { + +if (FoldOp.isImm()) { + ImmToFold = FoldOp.getImm(); +} else if (FoldOp.isFI()) { + FrameIndexToFold = FoldOp.getIndex(); } else { - assert(FoldOp->isReg() || FoldOp->isGlobal()); - OpToFold = FoldOp; + assert(FoldOp.isReg() || FoldOp.isGlobal()); + OpToFold = &FoldOp; } + +DefMI = FoldOp.getParent(); } - FoldCandidate(MachineInstr *MI, unsigned OpNo, int64_t FoldImm, -bool Commuted_ = false, int ShrinkOp = -1) - : UseMI(MI), ImmToFold(FoldImm), ShrinkOpcode(ShrinkOp), UseOpNo(OpNo), -Kind(MachineOperand::MO_Immediate), Commuted(Commuted_) {} + FoldableDef(int64_t FoldImm, const TargetRegisterClass *DefRC, + unsigned DefSubReg = AMDGPU::NoSubRegister) + : ImmToFold(FoldImm), DefRC(DefRC), DefSubReg(DefSubReg), +Kind(MachineOperand::MO_Immediate) {} + + /// Copy the current def and apply \p SubReg to the value. + FoldableDef getWithSubReg(const SIRegisterInfo &TRI, unsigned SubReg) const { +FoldableDef Copy(*this); +Copy.DefSubReg = TRI.composeSubRegIndices(DefSubReg, SubReg); +return Copy; + } + + bool isReg() const { return Kind == MachineOperand::MO_Register; } + + Register getReg() const { +assert(isReg()); +return OpToFold->getReg(); + } + + unsigned getSubReg() const { +assert(isReg()); +return OpToFold->getSubReg(); + } + + bool isImm() const { return Kind == MachineOperand::MO_Immediate; } bool isFI() const { return Kind == MachineOperand::MO_FrameIndex; } - bool isImm() const { -return Kind == MachineOperand::MO_Immediate; + int getFI() const { +assert(isFI()); +return FrameIndexToFold; } - bool isReg() const { -return Kind == MachineOperand::MO_Register; + bool isGlobal() const { return OpToFold->isGlobal(); } jayfoad wrote: Not safe to access `OpToFold` unless you check for Imm and FI first: ```suggestion bool isGlobal() const { return !isImm() && !isFI() && OpToFold->isGlobal(); } ``` https://github.com/llvm/llvm-project/pull/140608 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix tracking subreg defs when folding through reg_sequence (PR #140608)
https://github.com/jayfoad commented: The idea seems good. I haven't reviewed it all in detail. https://github.com/llvm/llvm-project/pull/140608 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix tracking subreg defs when folding through reg_sequence (PR #140608)
@@ -25,52 +25,151 @@ using namespace llvm; namespace { -struct FoldCandidate { - MachineInstr *UseMI; +/// Track a value we may want to fold into downstream users, applying +/// subregister extracts along the way. +struct FoldableDef { union { -MachineOperand *OpToFold; +MachineOperand *OpToFold = nullptr; uint64_t ImmToFold; int FrameIndexToFold; }; - int ShrinkOpcode; - unsigned UseOpNo; + + /// Register class of the originally defined value. + const TargetRegisterClass *DefRC = nullptr; + + /// Track the original defining instruction for the value. + const MachineInstr *DefMI = nullptr; + + /// Subregister to apply to the value at the use point. + unsigned DefSubReg = AMDGPU::NoSubRegister; + + /// Kind of value stored in the union. MachineOperand::MachineOperandType Kind; - bool Commuted; - FoldCandidate(MachineInstr *MI, unsigned OpNo, MachineOperand *FoldOp, -bool Commuted_ = false, -int ShrinkOp = -1) : -UseMI(MI), OpToFold(nullptr), ShrinkOpcode(ShrinkOp), UseOpNo(OpNo), -Kind(FoldOp->getType()), -Commuted(Commuted_) { -if (FoldOp->isImm()) { - ImmToFold = FoldOp->getImm(); -} else if (FoldOp->isFI()) { - FrameIndexToFold = FoldOp->getIndex(); + FoldableDef() = delete; + FoldableDef(MachineOperand &FoldOp, const TargetRegisterClass *DefRC, + unsigned DefSubReg = AMDGPU::NoSubRegister) + : DefRC(DefRC), DefSubReg(DefSubReg), Kind(FoldOp.getType()) { + +if (FoldOp.isImm()) { + ImmToFold = FoldOp.getImm(); +} else if (FoldOp.isFI()) { + FrameIndexToFold = FoldOp.getIndex(); } else { - assert(FoldOp->isReg() || FoldOp->isGlobal()); - OpToFold = FoldOp; + assert(FoldOp.isReg() || FoldOp.isGlobal()); + OpToFold = &FoldOp; } + +DefMI = FoldOp.getParent(); } - FoldCandidate(MachineInstr *MI, unsigned OpNo, int64_t FoldImm, -bool Commuted_ = false, int ShrinkOp = -1) - : UseMI(MI), ImmToFold(FoldImm), ShrinkOpcode(ShrinkOp), UseOpNo(OpNo), -Kind(MachineOperand::MO_Immediate), Commuted(Commuted_) {} + FoldableDef(int64_t FoldImm, const TargetRegisterClass *DefRC, + unsigned DefSubReg = AMDGPU::NoSubRegister) + : ImmToFold(FoldImm), DefRC(DefRC), DefSubReg(DefSubReg), +Kind(MachineOperand::MO_Immediate) {} + + /// Copy the current def and apply \p SubReg to the value. + FoldableDef getWithSubReg(const SIRegisterInfo &TRI, unsigned SubReg) const { +FoldableDef Copy(*this); +Copy.DefSubReg = TRI.composeSubRegIndices(DefSubReg, SubReg); +return Copy; + } + + bool isReg() const { return Kind == MachineOperand::MO_Register; } + + Register getReg() const { +assert(isReg()); +return OpToFold->getReg(); + } + + unsigned getSubReg() const { +assert(isReg()); +return OpToFold->getSubReg(); + } + + bool isImm() const { return Kind == MachineOperand::MO_Immediate; } bool isFI() const { return Kind == MachineOperand::MO_FrameIndex; } - bool isImm() const { -return Kind == MachineOperand::MO_Immediate; + int getFI() const { +assert(isFI()); +return FrameIndexToFold; } - bool isReg() const { -return Kind == MachineOperand::MO_Register; + bool isGlobal() const { return OpToFold->isGlobal(); } + + /// Return the effective immediate value defined by this instruction, after + /// application of any subregister extracts which may exist between the use + /// and def instruction. + std::optional getEffectiveImmVal() const { +assert(isImm()); +return SIInstrInfo::extractSubregFromImm(ImmToFold, DefSubReg); } - bool isGlobal() const { return Kind == MachineOperand::MO_GlobalAddress; } + /// Check if it is legal to fold this effective value into \p MI's \p OpNo + /// operand. + bool isOperandLegal(const SIInstrInfo &TII, const MachineInstr &MI, + unsigned OpIdx) const { +switch (Kind) { +case MachineOperand::MO_Immediate: { + std::optional ImmToFold = getEffectiveImmVal(); + if (!ImmToFold) +return false; + + // TODO: Should verify the subregister index is supported by the class + // TODO: Avoid the temporary MachineOperand + MachineOperand TmpOp = MachineOperand::CreateImm(*ImmToFold); + return TII.isOperandLegal(MI, OpIdx, &TmpOp); +} +case MachineOperand::MO_FrameIndex: { + if (DefSubReg != AMDGPU::NoSubRegister) +return false; + MachineOperand TmpOp = MachineOperand::CreateFI(FrameIndexToFold); + return TII.isOperandLegal(MI, OpIdx, &TmpOp); +} +default: + // TODO: Try to apply DefSubReg, for global address we can extract + // low/high. + if (DefSubReg != AMDGPU::NoSubRegister) +return false; + return TII.isOperandLegal(MI, OpIdx, OpToFold); +} + +llvm_unreachable("covered MachineOperand kind switch"); + } +}; + +struct FoldCandidate { +
[llvm-branch-commits] [llvm] AMDGPU: Fix tracking subreg defs when folding through reg_sequence (PR #140608)
https://github.com/jayfoad edited https://github.com/llvm/llvm-project/pull/140608 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] Generlize names of delayed privatization CLI flags (PR #138816)
https://github.com/ergawy updated https://github.com/llvm/llvm-project/pull/138816 >From e0eb1611a67579562edefe1c66263c2cc562c5d7 Mon Sep 17 00:00:00 2001 From: ergawy Date: Wed, 7 May 2025 02:41:14 -0500 Subject: [PATCH] [flang] Generlize names of delayed privatization CLI flags Remove the `openmp` prefix from delayed privatization/localization flags since they are now used for `do concurrent` as well. --- flang/include/flang/Support/Flags.h | 17 flang/lib/Lower/Bridge.cpp| 2 +- flang/lib/Lower/OpenMP/OpenMP.cpp | 1 + flang/lib/Lower/OpenMP/Utils.cpp | 12 --- flang/lib/Lower/OpenMP/Utils.h| 2 -- flang/lib/Support/CMakeLists.txt | 1 + flang/lib/Support/Flags.cpp | 20 +++ .../distribute-standalone-private.f90 | 4 ++-- .../DelayedPrivatization/equivalence.f90 | 4 ++-- .../target-private-allocatable.f90| 4 ++-- .../target-private-multiple-variables.f90 | 4 ++-- .../target-private-simple.f90 | 4 ++-- .../OpenMP/allocatable-multiple-vars.f90 | 4 ++-- .../OpenMP/cfg-conversion-omp.private.f90 | 2 +- .../test/Lower/OpenMP/debug_info_conflict.f90 | 2 +- ...elayed-privatization-allocatable-array.f90 | 4 ++-- ...privatization-allocatable-firstprivate.f90 | 6 +++--- ...ayed-privatization-allocatable-private.f90 | 4 ++-- .../OpenMP/delayed-privatization-array.f90| 12 +-- .../delayed-privatization-character-array.f90 | 8 .../delayed-privatization-character.f90 | 8 .../delayed-privatization-default-init.f90| 4 ++-- .../delayed-privatization-firstprivate.f90| 4 ++-- ...rivatization-lower-allocatable-to-llvm.f90 | 2 +- .../OpenMP/delayed-privatization-pointer.f90 | 4 ++-- ...yed-privatization-private-firstprivate.f90 | 4 ++-- .../OpenMP/delayed-privatization-private.f90 | 4 ++-- .../delayed-privatization-reduction-byref.f90 | 2 +- .../delayed-privatization-reduction.f90 | 4 ++-- .../different_vars_lastprivate_barrier.f90| 2 +- .../Lower/OpenMP/firstprivate-commonblock.f90 | 2 +- .../test/Lower/OpenMP/private-commonblock.f90 | 2 +- .../Lower/OpenMP/private-derived-type.f90 | 4 ++-- .../OpenMP/same_var_first_lastprivate.f90 | 2 +- .../Lower/do_concurrent_delayed_locality.f90 | 2 +- 35 files changed, 96 insertions(+), 71 deletions(-) create mode 100644 flang/include/flang/Support/Flags.h create mode 100644 flang/lib/Support/Flags.cpp diff --git a/flang/include/flang/Support/Flags.h b/flang/include/flang/Support/Flags.h new file mode 100644 index 0..bcbb72f8e50d0 --- /dev/null +++ b/flang/include/flang/Support/Flags.h @@ -0,0 +1,17 @@ +//===-- include/flang/Support/Flags.h ---*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#ifndef FORTRAN_SUPPORT_FLAGS_H_ +#define FORTRAN_SUPPORT_FLAGS_H_ + +#include "llvm/Support/CommandLine.h" + +extern llvm::cl::opt enableDelayedPrivatization; +extern llvm::cl::opt enableDelayedPrivatizationStaging; + +#endif // FORTRAN_SUPPORT_FLAGS_H_ diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp index 49675d34215a9..9f3c50a52973a 100644 --- a/flang/lib/Lower/Bridge.cpp +++ b/flang/lib/Lower/Bridge.cpp @@ -13,7 +13,6 @@ #include "flang/Lower/Bridge.h" #include "OpenMP/DataSharingProcessor.h" -#include "OpenMP/Utils.h" #include "flang/Lower/Allocatable.h" #include "flang/Lower/CallInterface.h" #include "flang/Lower/Coarray.h" @@ -63,6 +62,7 @@ #include "flang/Semantics/runtime-type-info.h" #include "flang/Semantics/symbol.h" #include "flang/Semantics/tools.h" +#include "flang/Support/Flags.h" #include "flang/Support/Version.h" #include "mlir/Dialect/ControlFlow/IR/ControlFlowOps.h" #include "mlir/IR/BuiltinAttributes.h" diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp index 5a975384bd371..f76afa2309233 100644 --- a/flang/lib/Lower/OpenMP/OpenMP.cpp +++ b/flang/lib/Lower/OpenMP/OpenMP.cpp @@ -34,6 +34,7 @@ #include "flang/Parser/parse-tree.h" #include "flang/Semantics/openmp-directive-sets.h" #include "flang/Semantics/tools.h" +#include "flang/Support/Flags.h" #include "flang/Support/OpenMP-utils.h" #include "mlir/Dialect/ControlFlow/IR/ControlFlowOps.h" #include "mlir/Dialect/OpenMP/OpenMPDialect.h" diff --git a/flang/lib/Lower/OpenMP/Utils.cpp b/flang/lib/Lower/OpenMP/Utils.cpp index 711d4af287691..c226c2558e7aa 100644 --- a/flang/lib/Lower/OpenMP/Utils.cpp +++ b/flang/lib/Lower/OpenMP/Utils.cpp @@ -33,18 +33,6 @@ llvm::cl::opt treatIndexAsSection( llvm::cl::desc("In the OpenMP data clauses trea
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: do not crash on debug-printing CFI instructions (PR #136151)
https://github.com/kbeyls approved this pull request. https://github.com/llvm/llvm-project/pull/136151 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: do not crash on debug-printing CFI instructions (PR #136151)
https://github.com/kbeyls edited https://github.com/llvm/llvm-project/pull/136151 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix tracking subreg defs when folding through reg_sequence (PR #140608)
@@ -25,52 +25,151 @@ using namespace llvm; namespace { -struct FoldCandidate { - MachineInstr *UseMI; +/// Track a value we may want to fold into downstream users, applying +/// subregister extracts along the way. +struct FoldableDef { union { -MachineOperand *OpToFold; +MachineOperand *OpToFold = nullptr; uint64_t ImmToFold; int FrameIndexToFold; }; - int ShrinkOpcode; - unsigned UseOpNo; + + /// Register class of the originally defined value. + const TargetRegisterClass *DefRC = nullptr; + + /// Track the original defining instruction for the value. + const MachineInstr *DefMI = nullptr; + + /// Subregister to apply to the value at the use point. + unsigned DefSubReg = AMDGPU::NoSubRegister; + + /// Kind of value stored in the union. MachineOperand::MachineOperandType Kind; - bool Commuted; - FoldCandidate(MachineInstr *MI, unsigned OpNo, MachineOperand *FoldOp, -bool Commuted_ = false, -int ShrinkOp = -1) : -UseMI(MI), OpToFold(nullptr), ShrinkOpcode(ShrinkOp), UseOpNo(OpNo), -Kind(FoldOp->getType()), -Commuted(Commuted_) { -if (FoldOp->isImm()) { - ImmToFold = FoldOp->getImm(); -} else if (FoldOp->isFI()) { - FrameIndexToFold = FoldOp->getIndex(); + FoldableDef() = delete; + FoldableDef(MachineOperand &FoldOp, const TargetRegisterClass *DefRC, + unsigned DefSubReg = AMDGPU::NoSubRegister) + : DefRC(DefRC), DefSubReg(DefSubReg), Kind(FoldOp.getType()) { + +if (FoldOp.isImm()) { + ImmToFold = FoldOp.getImm(); +} else if (FoldOp.isFI()) { + FrameIndexToFold = FoldOp.getIndex(); } else { - assert(FoldOp->isReg() || FoldOp->isGlobal()); - OpToFold = FoldOp; + assert(FoldOp.isReg() || FoldOp.isGlobal()); + OpToFold = &FoldOp; } + +DefMI = FoldOp.getParent(); } - FoldCandidate(MachineInstr *MI, unsigned OpNo, int64_t FoldImm, -bool Commuted_ = false, int ShrinkOp = -1) - : UseMI(MI), ImmToFold(FoldImm), ShrinkOpcode(ShrinkOp), UseOpNo(OpNo), -Kind(MachineOperand::MO_Immediate), Commuted(Commuted_) {} + FoldableDef(int64_t FoldImm, const TargetRegisterClass *DefRC, + unsigned DefSubReg = AMDGPU::NoSubRegister) + : ImmToFold(FoldImm), DefRC(DefRC), DefSubReg(DefSubReg), +Kind(MachineOperand::MO_Immediate) {} + + /// Copy the current def and apply \p SubReg to the value. + FoldableDef getWithSubReg(const SIRegisterInfo &TRI, unsigned SubReg) const { +FoldableDef Copy(*this); +Copy.DefSubReg = TRI.composeSubRegIndices(DefSubReg, SubReg); +return Copy; + } + + bool isReg() const { return Kind == MachineOperand::MO_Register; } + + Register getReg() const { +assert(isReg()); +return OpToFold->getReg(); + } + + unsigned getSubReg() const { +assert(isReg()); +return OpToFold->getSubReg(); + } + + bool isImm() const { return Kind == MachineOperand::MO_Immediate; } bool isFI() const { return Kind == MachineOperand::MO_FrameIndex; } - bool isImm() const { -return Kind == MachineOperand::MO_Immediate; + int getFI() const { +assert(isFI()); +return FrameIndexToFold; } - bool isReg() const { -return Kind == MachineOperand::MO_Register; + bool isGlobal() const { return OpToFold->isGlobal(); } + + /// Return the effective immediate value defined by this instruction, after + /// application of any subregister extracts which may exist between the use + /// and def instruction. + std::optional getEffectiveImmVal() const { +assert(isImm()); +return SIInstrInfo::extractSubregFromImm(ImmToFold, DefSubReg); } - bool isGlobal() const { return Kind == MachineOperand::MO_GlobalAddress; } + /// Check if it is legal to fold this effective value into \p MI's \p OpNo + /// operand. + bool isOperandLegal(const SIInstrInfo &TII, const MachineInstr &MI, + unsigned OpIdx) const { +switch (Kind) { +case MachineOperand::MO_Immediate: { + std::optional ImmToFold = getEffectiveImmVal(); + if (!ImmToFold) +return false; + + // TODO: Should verify the subregister index is supported by the class + // TODO: Avoid the temporary MachineOperand + MachineOperand TmpOp = MachineOperand::CreateImm(*ImmToFold); + return TII.isOperandLegal(MI, OpIdx, &TmpOp); +} +case MachineOperand::MO_FrameIndex: { + if (DefSubReg != AMDGPU::NoSubRegister) +return false; + MachineOperand TmpOp = MachineOperand::CreateFI(FrameIndexToFold); + return TII.isOperandLegal(MI, OpIdx, &TmpOp); +} +default: + // TODO: Try to apply DefSubReg, for global address we can extract + // low/high. + if (DefSubReg != AMDGPU::NoSubRegister) +return false; + return TII.isOperandLegal(MI, OpIdx, OpToFold); +} + +llvm_unreachable("covered MachineOperand kind switch"); + } +}; + +struct FoldCandidate { +
[llvm-branch-commits] X86: Add X86TTIImpl::isProfitableToSinkOperands hook for immediate operands. (PR #141326)
@@ -7170,16 +7165,31 @@ bool X86TTIImpl::isProfitableToSinkOperands(Instruction *I, II->getIntrinsicID() == Intrinsic::fshr) ShiftAmountOpNum = 2; } - if (ShiftAmountOpNum == -1) return false; + auto *ShiftAmount = &I->getOperandUse(ShiftAmountOpNum); - auto *Shuf = dyn_cast(I->getOperand(ShiftAmountOpNum)); + // A uniform shift amount in a vector shift or funnel shift may be much + // cheaper than a generic variable vector shift, so make that pattern visible + // to SDAG by sinking the shuffle instruction next to the shift. + auto *Shuf = dyn_cast(ShiftAmount); if (Shuf && getSplatIndex(Shuf->getShuffleMask()) >= 0 && isVectorShiftByScalarCheap(I->getType())) { -Ops.push_back(&I->getOperandUse(ShiftAmountOpNum)); +Ops.push_back(ShiftAmount); return true; } + // Casts taking a constant expression (generally derived from a global + // variable address) as an operand are profitable to sink because they appear + // as subexpressions in the instruction sequence generated by the + // LowerTypeTests pass which is expected to pattern match to the rotate + // instruction's immediate operand. + if (auto *CI = dyn_cast(ShiftAmount)) { +if (isa(CI->getOperand(0))) { + Ops.push_back(ShiftAmount); + return true; +} + } nikic wrote: This check needs to be more specific. Even the `zext ptrtoint (ptr @g to i8)` pattern would not become a relocatable ror immediate under normal circumstances. The fact that the global has `!absolute_symbol` with an appropriate range is load-bearing here. (Looking at selectRelocImm, it does specifically check for getAbsoluteSymbolRange.) https://github.com/llvm/llvm-project/pull/141326 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix tracking subreg defs when folding through reg_sequence (PR #140608)
@@ -380,7 +477,8 @@ bool SIFoldOperandsImpl::canUseImmWithOpSel(FoldCandidate &Fold) const { return true; } -bool SIFoldOperandsImpl::tryFoldImmWithOpSel(FoldCandidate &Fold) const { +bool SIFoldOperandsImpl::tryFoldImmWithOpSel(FoldCandidate &Fold, + int64_t ImmVal) const { jayfoad wrote: Needs a comment explaining the `ImmVal` argument. Is it different from `Fold.getEffectiveImmVal()`? https://github.com/llvm/llvm-project/pull/140608 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Move S_BFE lowering into RegBankCombiner (PR #141589)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/141589 >From efa6a12fedf3c87678a1df1e5d03ff1e58531625 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Tue, 27 May 2025 11:16:16 +0200 Subject: [PATCH] [AMDGPU] Move S_BFE lowering into RegBankCombiner --- llvm/lib/Target/AMDGPU/AMDGPUCombine.td | 14 +- .../Target/AMDGPU/AMDGPURegBankCombiner.cpp | 51 +++ .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 125 -- 3 files changed, 119 insertions(+), 71 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td index 9587fad1ecd63..94e1175b06b14 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td @@ -151,6 +151,17 @@ def zext_of_shift_amount_combines : GICombineGroup<[ canonicalize_zext_lshr, canonicalize_zext_ashr, canonicalize_zext_shl ]>; +// Early select of uniform BFX into S_BFE instructions. +// These instructions encode the offset/width in a way that requires using +// bitwise operations. Selecting these instructions early allow the combiner +// to potentially fold these. +class lower_uniform_bfx : GICombineRule< + (defs root:$bfx), + (combine (bfx $dst, $src, $o, $w):$bfx, [{ return lowerUniformBFX(*${bfx}); }])>; + +def lower_uniform_sbfx : lower_uniform_bfx; +def lower_uniform_ubfx : lower_uniform_bfx; + let Predicates = [Has16BitInsts, NotHasMed3_16] in { // For gfx8, expand f16-fmed3-as-f32 into a min/max f16 sequence. This // saves one instruction compared to the promotion. @@ -198,5 +209,6 @@ def AMDGPURegBankCombiner : GICombiner< zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain, fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp, identity_combines, redundant_and, constant_fold_cast_op, - cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines]> { + cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines, + lower_uniform_sbfx, lower_uniform_ubfx]> { } diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp index ee324a5e93f0f..2100900bb8eb2 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp @@ -89,6 +89,8 @@ class AMDGPURegBankCombinerImpl : public Combiner { void applyCanonicalizeZextShiftAmt(MachineInstr &MI, MachineInstr &Ext) const; + bool lowerUniformBFX(MachineInstr &MI) const; + private: SIModeRegisterDefaults getMode() const; bool getIEEE() const; @@ -392,6 +394,55 @@ void AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt( MI.eraseFromParent(); } +bool AMDGPURegBankCombinerImpl::lowerUniformBFX(MachineInstr &MI) const { + assert(MI.getOpcode() == TargetOpcode::G_UBFX || + MI.getOpcode() == TargetOpcode::G_SBFX); + const bool Signed = (MI.getOpcode() == TargetOpcode::G_SBFX); + + Register DstReg = MI.getOperand(0).getReg(); + const RegisterBank *RB = RBI.getRegBank(DstReg, MRI, TRI); + assert(RB && "No RB?"); + if (RB->getID() != AMDGPU::SGPRRegBankID) +return false; + + Register SrcReg = MI.getOperand(1).getReg(); + Register OffsetReg = MI.getOperand(2).getReg(); + Register WidthReg = MI.getOperand(3).getReg(); + + const LLT S32 = LLT::scalar(32); + LLT Ty = MRI.getType(DstReg); + + const unsigned Opc = (Ty == S32) + ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) + : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); + + // Ensure the high bits are clear to insert the offset. + auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6)); + auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask); + + // Zeros out the low bits, so don't bother clamping the input value. + auto ShiftAmt = B.buildConstant(S32, 16); + auto ShiftWidth = B.buildShl(S32, WidthReg, ShiftAmt); + + // Transformation function, pack the offset and width of a BFE into + // the format expected by the S_BFE_I32 / S_BFE_U32. In the second + // source, bits [5:0] contain the offset and bits [22:16] the width. + auto MergedInputs = B.buildOr(S32, ClampOffset, ShiftWidth); + + MRI.setRegBank(OffsetMask.getReg(0), *RB); + MRI.setRegBank(ClampOffset.getReg(0), *RB); + MRI.setRegBank(ShiftAmt.getReg(0), *RB); + MRI.setRegBank(ShiftWidth.getReg(0), *RB); + MRI.setRegBank(MergedInputs.getReg(0), *RB); + + auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs}); + if (!constrainSelectedInstRegOperands(*MIB, TII, TRI, RBI)) +llvm_unreachable("failed to constrain BFE"); + + MI.eraseFromParent(); + return true; +} + SIModeRegisterDefaults AMDGPURegBankCombinerImpl::getMode() const { return MF.getInfo()->getMode(); } diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index dd7aef8f0c583..0b7d64ee67c34 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/lib/Ta
[llvm-branch-commits] [llvm] [AMDGPU] Add KnownBits simplification combines to RegBankCombiner (PR #141591)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/141591 >From 62031c0316c73a3650223721347854fd0c45e730 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Tue, 27 May 2025 12:29:02 +0200 Subject: [PATCH] [AMDGPU] Add KnownBits simplification combines to RegBankCombiner --- llvm/lib/Target/AMDGPU/AMDGPUCombine.td | 3 +- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 59 - .../test/CodeGen/AMDGPU/GlobalISel/saddsat.ll | 61 +++--- .../test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll | 63 +++ llvm/test/CodeGen/AMDGPU/div_i128.ll | 30 - llvm/test/CodeGen/AMDGPU/itofp.i128.ll| 11 ++-- llvm/test/CodeGen/AMDGPU/lround.ll| 18 +++--- llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | 16 + 8 files changed, 104 insertions(+), 157 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td index 96be17c487130..df867aaa204b1 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td @@ -210,5 +210,6 @@ def AMDGPURegBankCombiner : GICombiner< fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp, identity_combines, redundant_and, constant_fold_cast_op, cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines, - lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract]> { + lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract, + known_bits_simplifications]> { } diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll index 6baa10bb48621..cc0f45681a3e2 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll @@ -1744,63 +1744,64 @@ define i65 @v_lshr_i65_33(i65 %value) { ; GFX6-LABEL: v_lshr_i65_33: ; GFX6: ; %bb.0: ; GFX6-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX6-NEXT:v_mov_b32_e32 v3, v1 -; GFX6-NEXT:v_mov_b32_e32 v0, 1 +; GFX6-NEXT:v_mov_b32_e32 v3, 1 +; GFX6-NEXT:v_mov_b32_e32 v4, 0 +; GFX6-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX6-NEXT:v_lshl_b64 v[2:3], v[3:4], 31 +; GFX6-NEXT:v_lshrrev_b32_e32 v0, 1, v1 +; GFX6-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX6-NEXT:v_mov_b32_e32 v1, 0 -; GFX6-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX6-NEXT:v_lshl_b64 v[0:1], v[0:1], 31 -; GFX6-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX6-NEXT:v_or_b32_e32 v0, v2, v0 ; GFX6-NEXT:v_mov_b32_e32 v2, 0 ; GFX6-NEXT:s_setpc_b64 s[30:31] ; ; GFX8-LABEL: v_lshr_i65_33: ; GFX8: ; %bb.0: ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX8-NEXT:v_mov_b32_e32 v3, v1 -; GFX8-NEXT:v_mov_b32_e32 v0, 1 +; GFX8-NEXT:v_mov_b32_e32 v3, 1 +; GFX8-NEXT:v_mov_b32_e32 v4, 0 +; GFX8-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX8-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4] +; GFX8-NEXT:v_lshrrev_b32_e32 v0, 1, v1 +; GFX8-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX8-NEXT:v_mov_b32_e32 v1, 0 -; GFX8-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX8-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1] -; GFX8-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX8-NEXT:v_or_b32_e32 v0, v2, v0 ; GFX8-NEXT:v_mov_b32_e32 v2, 0 ; GFX8-NEXT:s_setpc_b64 s[30:31] ; ; GFX9-LABEL: v_lshr_i65_33: ; GFX9: ; %bb.0: ; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX9-NEXT:v_mov_b32_e32 v3, v1 -; GFX9-NEXT:v_mov_b32_e32 v0, 1 +; GFX9-NEXT:v_mov_b32_e32 v3, 1 +; GFX9-NEXT:v_mov_b32_e32 v4, 0 +; GFX9-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX9-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4] +; GFX9-NEXT:v_lshrrev_b32_e32 v0, 1, v1 +; GFX9-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX9-NEXT:v_mov_b32_e32 v1, 0 -; GFX9-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX9-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1] -; GFX9-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX9-NEXT:v_or_b32_e32 v0, v2, v0 ; GFX9-NEXT:v_mov_b32_e32 v2, 0 ; GFX9-NEXT:s_setpc_b64 s[30:31] ; ; GFX10-LABEL: v_lshr_i65_33: ; GFX10: ; %bb.0: ; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX10-NEXT:v_mov_b32_e32 v3, v1 -; GFX10-NEXT:v_mov_b32_e32 v0, 1 +; GFX10-NEXT:v_mov_b32_e32 v3, 1 +; GFX10-NEXT:v_mov_b32_e32 v4, 0 +; GFX10-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX10-NEXT:v_lshrrev_b32_e32 v0, 1, v1 ; GFX10-NEXT:v_mov_b32_e32 v1, 0 -; GFX10-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX10-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX10-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1] -; GFX10-NEXT:v_or_b32_e32 v0, v2, v0 +; GFX10-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4] +; GFX10-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX10-NEXT:v_mov_b32_e32 v2, 0 ; GFX10-NEXT:s_setpc_b64 s[30:31] ; ; GFX11-LABEL: v_lshr_i65_33: ; GFX11: ; %bb.0: ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX11-NEXT:v_dual_mov_b32 v3, v1 :: v_dual_mov_b32 v0, 1 -; GFX11-NEXT:v_dual_mov_b32 v1, 0 :: v_dual_and_b3
[llvm-branch-commits] [llvm] [AMDGPU] Move S_BFE lowering into RegBankCombiner (PR #141589)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/141589 >From efa6a12fedf3c87678a1df1e5d03ff1e58531625 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Tue, 27 May 2025 11:16:16 +0200 Subject: [PATCH] [AMDGPU] Move S_BFE lowering into RegBankCombiner --- llvm/lib/Target/AMDGPU/AMDGPUCombine.td | 14 +- .../Target/AMDGPU/AMDGPURegBankCombiner.cpp | 51 +++ .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 125 -- 3 files changed, 119 insertions(+), 71 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td index 9587fad1ecd63..94e1175b06b14 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td @@ -151,6 +151,17 @@ def zext_of_shift_amount_combines : GICombineGroup<[ canonicalize_zext_lshr, canonicalize_zext_ashr, canonicalize_zext_shl ]>; +// Early select of uniform BFX into S_BFE instructions. +// These instructions encode the offset/width in a way that requires using +// bitwise operations. Selecting these instructions early allow the combiner +// to potentially fold these. +class lower_uniform_bfx : GICombineRule< + (defs root:$bfx), + (combine (bfx $dst, $src, $o, $w):$bfx, [{ return lowerUniformBFX(*${bfx}); }])>; + +def lower_uniform_sbfx : lower_uniform_bfx; +def lower_uniform_ubfx : lower_uniform_bfx; + let Predicates = [Has16BitInsts, NotHasMed3_16] in { // For gfx8, expand f16-fmed3-as-f32 into a min/max f16 sequence. This // saves one instruction compared to the promotion. @@ -198,5 +209,6 @@ def AMDGPURegBankCombiner : GICombiner< zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain, fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp, identity_combines, redundant_and, constant_fold_cast_op, - cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines]> { + cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines, + lower_uniform_sbfx, lower_uniform_ubfx]> { } diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp index ee324a5e93f0f..2100900bb8eb2 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp @@ -89,6 +89,8 @@ class AMDGPURegBankCombinerImpl : public Combiner { void applyCanonicalizeZextShiftAmt(MachineInstr &MI, MachineInstr &Ext) const; + bool lowerUniformBFX(MachineInstr &MI) const; + private: SIModeRegisterDefaults getMode() const; bool getIEEE() const; @@ -392,6 +394,55 @@ void AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt( MI.eraseFromParent(); } +bool AMDGPURegBankCombinerImpl::lowerUniformBFX(MachineInstr &MI) const { + assert(MI.getOpcode() == TargetOpcode::G_UBFX || + MI.getOpcode() == TargetOpcode::G_SBFX); + const bool Signed = (MI.getOpcode() == TargetOpcode::G_SBFX); + + Register DstReg = MI.getOperand(0).getReg(); + const RegisterBank *RB = RBI.getRegBank(DstReg, MRI, TRI); + assert(RB && "No RB?"); + if (RB->getID() != AMDGPU::SGPRRegBankID) +return false; + + Register SrcReg = MI.getOperand(1).getReg(); + Register OffsetReg = MI.getOperand(2).getReg(); + Register WidthReg = MI.getOperand(3).getReg(); + + const LLT S32 = LLT::scalar(32); + LLT Ty = MRI.getType(DstReg); + + const unsigned Opc = (Ty == S32) + ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) + : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); + + // Ensure the high bits are clear to insert the offset. + auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6)); + auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask); + + // Zeros out the low bits, so don't bother clamping the input value. + auto ShiftAmt = B.buildConstant(S32, 16); + auto ShiftWidth = B.buildShl(S32, WidthReg, ShiftAmt); + + // Transformation function, pack the offset and width of a BFE into + // the format expected by the S_BFE_I32 / S_BFE_U32. In the second + // source, bits [5:0] contain the offset and bits [22:16] the width. + auto MergedInputs = B.buildOr(S32, ClampOffset, ShiftWidth); + + MRI.setRegBank(OffsetMask.getReg(0), *RB); + MRI.setRegBank(ClampOffset.getReg(0), *RB); + MRI.setRegBank(ShiftAmt.getReg(0), *RB); + MRI.setRegBank(ShiftWidth.getReg(0), *RB); + MRI.setRegBank(MergedInputs.getReg(0), *RB); + + auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs}); + if (!constrainSelectedInstRegOperands(*MIB, TII, TRI, RBI)) +llvm_unreachable("failed to constrain BFE"); + + MI.eraseFromParent(); + return true; +} + SIModeRegisterDefaults AMDGPURegBankCombinerImpl::getMode() const { return MF.getInfo()->getMode(); } diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index dd7aef8f0c583..0b7d64ee67c34 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/lib/Ta
[llvm-branch-commits] [llvm] [AMDGPU] Add KnownBits simplification combines to RegBankCombiner (PR #141591)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/141591 >From 62031c0316c73a3650223721347854fd0c45e730 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Tue, 27 May 2025 12:29:02 +0200 Subject: [PATCH] [AMDGPU] Add KnownBits simplification combines to RegBankCombiner --- llvm/lib/Target/AMDGPU/AMDGPUCombine.td | 3 +- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 59 - .../test/CodeGen/AMDGPU/GlobalISel/saddsat.ll | 61 +++--- .../test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll | 63 +++ llvm/test/CodeGen/AMDGPU/div_i128.ll | 30 - llvm/test/CodeGen/AMDGPU/itofp.i128.ll| 11 ++-- llvm/test/CodeGen/AMDGPU/lround.ll| 18 +++--- llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | 16 + 8 files changed, 104 insertions(+), 157 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td index 96be17c487130..df867aaa204b1 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td @@ -210,5 +210,6 @@ def AMDGPURegBankCombiner : GICombiner< fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp, identity_combines, redundant_and, constant_fold_cast_op, cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines, - lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract]> { + lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract, + known_bits_simplifications]> { } diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll index 6baa10bb48621..cc0f45681a3e2 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll @@ -1744,63 +1744,64 @@ define i65 @v_lshr_i65_33(i65 %value) { ; GFX6-LABEL: v_lshr_i65_33: ; GFX6: ; %bb.0: ; GFX6-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX6-NEXT:v_mov_b32_e32 v3, v1 -; GFX6-NEXT:v_mov_b32_e32 v0, 1 +; GFX6-NEXT:v_mov_b32_e32 v3, 1 +; GFX6-NEXT:v_mov_b32_e32 v4, 0 +; GFX6-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX6-NEXT:v_lshl_b64 v[2:3], v[3:4], 31 +; GFX6-NEXT:v_lshrrev_b32_e32 v0, 1, v1 +; GFX6-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX6-NEXT:v_mov_b32_e32 v1, 0 -; GFX6-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX6-NEXT:v_lshl_b64 v[0:1], v[0:1], 31 -; GFX6-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX6-NEXT:v_or_b32_e32 v0, v2, v0 ; GFX6-NEXT:v_mov_b32_e32 v2, 0 ; GFX6-NEXT:s_setpc_b64 s[30:31] ; ; GFX8-LABEL: v_lshr_i65_33: ; GFX8: ; %bb.0: ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX8-NEXT:v_mov_b32_e32 v3, v1 -; GFX8-NEXT:v_mov_b32_e32 v0, 1 +; GFX8-NEXT:v_mov_b32_e32 v3, 1 +; GFX8-NEXT:v_mov_b32_e32 v4, 0 +; GFX8-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX8-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4] +; GFX8-NEXT:v_lshrrev_b32_e32 v0, 1, v1 +; GFX8-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX8-NEXT:v_mov_b32_e32 v1, 0 -; GFX8-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX8-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1] -; GFX8-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX8-NEXT:v_or_b32_e32 v0, v2, v0 ; GFX8-NEXT:v_mov_b32_e32 v2, 0 ; GFX8-NEXT:s_setpc_b64 s[30:31] ; ; GFX9-LABEL: v_lshr_i65_33: ; GFX9: ; %bb.0: ; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX9-NEXT:v_mov_b32_e32 v3, v1 -; GFX9-NEXT:v_mov_b32_e32 v0, 1 +; GFX9-NEXT:v_mov_b32_e32 v3, 1 +; GFX9-NEXT:v_mov_b32_e32 v4, 0 +; GFX9-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX9-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4] +; GFX9-NEXT:v_lshrrev_b32_e32 v0, 1, v1 +; GFX9-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX9-NEXT:v_mov_b32_e32 v1, 0 -; GFX9-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX9-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1] -; GFX9-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX9-NEXT:v_or_b32_e32 v0, v2, v0 ; GFX9-NEXT:v_mov_b32_e32 v2, 0 ; GFX9-NEXT:s_setpc_b64 s[30:31] ; ; GFX10-LABEL: v_lshr_i65_33: ; GFX10: ; %bb.0: ; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX10-NEXT:v_mov_b32_e32 v3, v1 -; GFX10-NEXT:v_mov_b32_e32 v0, 1 +; GFX10-NEXT:v_mov_b32_e32 v3, 1 +; GFX10-NEXT:v_mov_b32_e32 v4, 0 +; GFX10-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX10-NEXT:v_lshrrev_b32_e32 v0, 1, v1 ; GFX10-NEXT:v_mov_b32_e32 v1, 0 -; GFX10-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX10-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX10-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1] -; GFX10-NEXT:v_or_b32_e32 v0, v2, v0 +; GFX10-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4] +; GFX10-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX10-NEXT:v_mov_b32_e32 v2, 0 ; GFX10-NEXT:s_setpc_b64 s[30:31] ; ; GFX11-LABEL: v_lshr_i65_33: ; GFX11: ; %bb.0: ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX11-NEXT:v_dual_mov_b32 v3, v1 :: v_dual_mov_b32 v0, 1 -; GFX11-NEXT:v_dual_mov_b32 v1, 0 :: v_dual_and_b3
[llvm-branch-commits] [llvm] [AMDGPU] Move S_BFE lowering into RegBankCombiner (PR #141589)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/141589 >From e253bde72750576cab699ad1b6b872fbf60dffe9 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Tue, 27 May 2025 11:16:16 +0200 Subject: [PATCH 1/2] [AMDGPU] Move S_BFE lowering into RegBankCombiner --- llvm/lib/Target/AMDGPU/AMDGPUCombine.td | 14 +- .../Target/AMDGPU/AMDGPURegBankCombiner.cpp | 51 +++ .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 125 -- 3 files changed, 119 insertions(+), 71 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td index 9587fad1ecd63..94e1175b06b14 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td @@ -151,6 +151,17 @@ def zext_of_shift_amount_combines : GICombineGroup<[ canonicalize_zext_lshr, canonicalize_zext_ashr, canonicalize_zext_shl ]>; +// Early select of uniform BFX into S_BFE instructions. +// These instructions encode the offset/width in a way that requires using +// bitwise operations. Selecting these instructions early allow the combiner +// to potentially fold these. +class lower_uniform_bfx : GICombineRule< + (defs root:$bfx), + (combine (bfx $dst, $src, $o, $w):$bfx, [{ return lowerUniformBFX(*${bfx}); }])>; + +def lower_uniform_sbfx : lower_uniform_bfx; +def lower_uniform_ubfx : lower_uniform_bfx; + let Predicates = [Has16BitInsts, NotHasMed3_16] in { // For gfx8, expand f16-fmed3-as-f32 into a min/max f16 sequence. This // saves one instruction compared to the promotion. @@ -198,5 +209,6 @@ def AMDGPURegBankCombiner : GICombiner< zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain, fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp, identity_combines, redundant_and, constant_fold_cast_op, - cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines]> { + cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines, + lower_uniform_sbfx, lower_uniform_ubfx]> { } diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp index ee324a5e93f0f..2100900bb8eb2 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp @@ -89,6 +89,8 @@ class AMDGPURegBankCombinerImpl : public Combiner { void applyCanonicalizeZextShiftAmt(MachineInstr &MI, MachineInstr &Ext) const; + bool lowerUniformBFX(MachineInstr &MI) const; + private: SIModeRegisterDefaults getMode() const; bool getIEEE() const; @@ -392,6 +394,55 @@ void AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt( MI.eraseFromParent(); } +bool AMDGPURegBankCombinerImpl::lowerUniformBFX(MachineInstr &MI) const { + assert(MI.getOpcode() == TargetOpcode::G_UBFX || + MI.getOpcode() == TargetOpcode::G_SBFX); + const bool Signed = (MI.getOpcode() == TargetOpcode::G_SBFX); + + Register DstReg = MI.getOperand(0).getReg(); + const RegisterBank *RB = RBI.getRegBank(DstReg, MRI, TRI); + assert(RB && "No RB?"); + if (RB->getID() != AMDGPU::SGPRRegBankID) +return false; + + Register SrcReg = MI.getOperand(1).getReg(); + Register OffsetReg = MI.getOperand(2).getReg(); + Register WidthReg = MI.getOperand(3).getReg(); + + const LLT S32 = LLT::scalar(32); + LLT Ty = MRI.getType(DstReg); + + const unsigned Opc = (Ty == S32) + ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) + : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); + + // Ensure the high bits are clear to insert the offset. + auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6)); + auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask); + + // Zeros out the low bits, so don't bother clamping the input value. + auto ShiftAmt = B.buildConstant(S32, 16); + auto ShiftWidth = B.buildShl(S32, WidthReg, ShiftAmt); + + // Transformation function, pack the offset and width of a BFE into + // the format expected by the S_BFE_I32 / S_BFE_U32. In the second + // source, bits [5:0] contain the offset and bits [22:16] the width. + auto MergedInputs = B.buildOr(S32, ClampOffset, ShiftWidth); + + MRI.setRegBank(OffsetMask.getReg(0), *RB); + MRI.setRegBank(ClampOffset.getReg(0), *RB); + MRI.setRegBank(ShiftAmt.getReg(0), *RB); + MRI.setRegBank(ShiftWidth.getReg(0), *RB); + MRI.setRegBank(MergedInputs.getReg(0), *RB); + + auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs}); + if (!constrainSelectedInstRegOperands(*MIB, TII, TRI, RBI)) +llvm_unreachable("failed to constrain BFE"); + + MI.eraseFromParent(); + return true; +} + SIModeRegisterDefaults AMDGPURegBankCombinerImpl::getMode() const { return MF.getInfo()->getMode(); } diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index dd7aef8f0c583..0b7d64ee67c34 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/li
[llvm-branch-commits] [llvm] [AMDGPU] Add KnownBits simplification combines to RegBankCombiner (PR #141591)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/141591 >From b249611564844064031ca7be93aeda517fad37ea Mon Sep 17 00:00:00 2001 From: pvanhout Date: Tue, 27 May 2025 12:29:02 +0200 Subject: [PATCH 1/2] [AMDGPU] Add KnownBits simplification combines to RegBankCombiner --- llvm/lib/Target/AMDGPU/AMDGPUCombine.td | 3 +- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 59 - .../test/CodeGen/AMDGPU/GlobalISel/saddsat.ll | 61 +++--- .../test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll | 63 +++ llvm/test/CodeGen/AMDGPU/div_i128.ll | 30 - llvm/test/CodeGen/AMDGPU/itofp.i128.ll| 11 ++-- llvm/test/CodeGen/AMDGPU/lround.ll| 18 +++--- llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | 16 + 8 files changed, 104 insertions(+), 157 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td index 96be17c487130..df867aaa204b1 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td @@ -210,5 +210,6 @@ def AMDGPURegBankCombiner : GICombiner< fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp, identity_combines, redundant_and, constant_fold_cast_op, cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines, - lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract]> { + lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract, + known_bits_simplifications]> { } diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll index 6baa10bb48621..cc0f45681a3e2 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll @@ -1744,63 +1744,64 @@ define i65 @v_lshr_i65_33(i65 %value) { ; GFX6-LABEL: v_lshr_i65_33: ; GFX6: ; %bb.0: ; GFX6-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX6-NEXT:v_mov_b32_e32 v3, v1 -; GFX6-NEXT:v_mov_b32_e32 v0, 1 +; GFX6-NEXT:v_mov_b32_e32 v3, 1 +; GFX6-NEXT:v_mov_b32_e32 v4, 0 +; GFX6-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX6-NEXT:v_lshl_b64 v[2:3], v[3:4], 31 +; GFX6-NEXT:v_lshrrev_b32_e32 v0, 1, v1 +; GFX6-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX6-NEXT:v_mov_b32_e32 v1, 0 -; GFX6-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX6-NEXT:v_lshl_b64 v[0:1], v[0:1], 31 -; GFX6-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX6-NEXT:v_or_b32_e32 v0, v2, v0 ; GFX6-NEXT:v_mov_b32_e32 v2, 0 ; GFX6-NEXT:s_setpc_b64 s[30:31] ; ; GFX8-LABEL: v_lshr_i65_33: ; GFX8: ; %bb.0: ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX8-NEXT:v_mov_b32_e32 v3, v1 -; GFX8-NEXT:v_mov_b32_e32 v0, 1 +; GFX8-NEXT:v_mov_b32_e32 v3, 1 +; GFX8-NEXT:v_mov_b32_e32 v4, 0 +; GFX8-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX8-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4] +; GFX8-NEXT:v_lshrrev_b32_e32 v0, 1, v1 +; GFX8-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX8-NEXT:v_mov_b32_e32 v1, 0 -; GFX8-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX8-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1] -; GFX8-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX8-NEXT:v_or_b32_e32 v0, v2, v0 ; GFX8-NEXT:v_mov_b32_e32 v2, 0 ; GFX8-NEXT:s_setpc_b64 s[30:31] ; ; GFX9-LABEL: v_lshr_i65_33: ; GFX9: ; %bb.0: ; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX9-NEXT:v_mov_b32_e32 v3, v1 -; GFX9-NEXT:v_mov_b32_e32 v0, 1 +; GFX9-NEXT:v_mov_b32_e32 v3, 1 +; GFX9-NEXT:v_mov_b32_e32 v4, 0 +; GFX9-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX9-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4] +; GFX9-NEXT:v_lshrrev_b32_e32 v0, 1, v1 +; GFX9-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX9-NEXT:v_mov_b32_e32 v1, 0 -; GFX9-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX9-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1] -; GFX9-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX9-NEXT:v_or_b32_e32 v0, v2, v0 ; GFX9-NEXT:v_mov_b32_e32 v2, 0 ; GFX9-NEXT:s_setpc_b64 s[30:31] ; ; GFX10-LABEL: v_lshr_i65_33: ; GFX10: ; %bb.0: ; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX10-NEXT:v_mov_b32_e32 v3, v1 -; GFX10-NEXT:v_mov_b32_e32 v0, 1 +; GFX10-NEXT:v_mov_b32_e32 v3, 1 +; GFX10-NEXT:v_mov_b32_e32 v4, 0 +; GFX10-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX10-NEXT:v_lshrrev_b32_e32 v0, 1, v1 ; GFX10-NEXT:v_mov_b32_e32 v1, 0 -; GFX10-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX10-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX10-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1] -; GFX10-NEXT:v_or_b32_e32 v0, v2, v0 +; GFX10-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4] +; GFX10-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX10-NEXT:v_mov_b32_e32 v2, 0 ; GFX10-NEXT:s_setpc_b64 s[30:31] ; ; GFX11-LABEL: v_lshr_i65_33: ; GFX11: ; %bb.0: ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX11-NEXT:v_dual_mov_b32 v3, v1 :: v_dual_mov_b32 v0, 1 -; GFX11-NEXT:v_dual_mov_b32 v1, 0 :: v_dual_an
[llvm-branch-commits] [llvm] [AMDGPU] Move S_BFE lowering into RegBankCombiner (PR #141589)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/141589 >From e253bde72750576cab699ad1b6b872fbf60dffe9 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Tue, 27 May 2025 11:16:16 +0200 Subject: [PATCH 1/2] [AMDGPU] Move S_BFE lowering into RegBankCombiner --- llvm/lib/Target/AMDGPU/AMDGPUCombine.td | 14 +- .../Target/AMDGPU/AMDGPURegBankCombiner.cpp | 51 +++ .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 125 -- 3 files changed, 119 insertions(+), 71 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td index 9587fad1ecd63..94e1175b06b14 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td @@ -151,6 +151,17 @@ def zext_of_shift_amount_combines : GICombineGroup<[ canonicalize_zext_lshr, canonicalize_zext_ashr, canonicalize_zext_shl ]>; +// Early select of uniform BFX into S_BFE instructions. +// These instructions encode the offset/width in a way that requires using +// bitwise operations. Selecting these instructions early allow the combiner +// to potentially fold these. +class lower_uniform_bfx : GICombineRule< + (defs root:$bfx), + (combine (bfx $dst, $src, $o, $w):$bfx, [{ return lowerUniformBFX(*${bfx}); }])>; + +def lower_uniform_sbfx : lower_uniform_bfx; +def lower_uniform_ubfx : lower_uniform_bfx; + let Predicates = [Has16BitInsts, NotHasMed3_16] in { // For gfx8, expand f16-fmed3-as-f32 into a min/max f16 sequence. This // saves one instruction compared to the promotion. @@ -198,5 +209,6 @@ def AMDGPURegBankCombiner : GICombiner< zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain, fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp, identity_combines, redundant_and, constant_fold_cast_op, - cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines]> { + cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines, + lower_uniform_sbfx, lower_uniform_ubfx]> { } diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp index ee324a5e93f0f..2100900bb8eb2 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp @@ -89,6 +89,8 @@ class AMDGPURegBankCombinerImpl : public Combiner { void applyCanonicalizeZextShiftAmt(MachineInstr &MI, MachineInstr &Ext) const; + bool lowerUniformBFX(MachineInstr &MI) const; + private: SIModeRegisterDefaults getMode() const; bool getIEEE() const; @@ -392,6 +394,55 @@ void AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt( MI.eraseFromParent(); } +bool AMDGPURegBankCombinerImpl::lowerUniformBFX(MachineInstr &MI) const { + assert(MI.getOpcode() == TargetOpcode::G_UBFX || + MI.getOpcode() == TargetOpcode::G_SBFX); + const bool Signed = (MI.getOpcode() == TargetOpcode::G_SBFX); + + Register DstReg = MI.getOperand(0).getReg(); + const RegisterBank *RB = RBI.getRegBank(DstReg, MRI, TRI); + assert(RB && "No RB?"); + if (RB->getID() != AMDGPU::SGPRRegBankID) +return false; + + Register SrcReg = MI.getOperand(1).getReg(); + Register OffsetReg = MI.getOperand(2).getReg(); + Register WidthReg = MI.getOperand(3).getReg(); + + const LLT S32 = LLT::scalar(32); + LLT Ty = MRI.getType(DstReg); + + const unsigned Opc = (Ty == S32) + ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) + : (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64); + + // Ensure the high bits are clear to insert the offset. + auto OffsetMask = B.buildConstant(S32, maskTrailingOnes(6)); + auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask); + + // Zeros out the low bits, so don't bother clamping the input value. + auto ShiftAmt = B.buildConstant(S32, 16); + auto ShiftWidth = B.buildShl(S32, WidthReg, ShiftAmt); + + // Transformation function, pack the offset and width of a BFE into + // the format expected by the S_BFE_I32 / S_BFE_U32. In the second + // source, bits [5:0] contain the offset and bits [22:16] the width. + auto MergedInputs = B.buildOr(S32, ClampOffset, ShiftWidth); + + MRI.setRegBank(OffsetMask.getReg(0), *RB); + MRI.setRegBank(ClampOffset.getReg(0), *RB); + MRI.setRegBank(ShiftAmt.getReg(0), *RB); + MRI.setRegBank(ShiftWidth.getReg(0), *RB); + MRI.setRegBank(MergedInputs.getReg(0), *RB); + + auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs}); + if (!constrainSelectedInstRegOperands(*MIB, TII, TRI, RBI)) +llvm_unreachable("failed to constrain BFE"); + + MI.eraseFromParent(); + return true; +} + SIModeRegisterDefaults AMDGPURegBankCombinerImpl::getMode() const { return MF.getInfo()->getMode(); } diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index dd7aef8f0c583..0b7d64ee67c34 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/li
[llvm-branch-commits] [llvm] [AMDGPU] Add KnownBits simplification combines to RegBankCombiner (PR #141591)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/141591 >From b249611564844064031ca7be93aeda517fad37ea Mon Sep 17 00:00:00 2001 From: pvanhout Date: Tue, 27 May 2025 12:29:02 +0200 Subject: [PATCH 1/2] [AMDGPU] Add KnownBits simplification combines to RegBankCombiner --- llvm/lib/Target/AMDGPU/AMDGPUCombine.td | 3 +- llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll | 59 - .../test/CodeGen/AMDGPU/GlobalISel/saddsat.ll | 61 +++--- .../test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll | 63 +++ llvm/test/CodeGen/AMDGPU/div_i128.ll | 30 - llvm/test/CodeGen/AMDGPU/itofp.i128.ll| 11 ++-- llvm/test/CodeGen/AMDGPU/lround.ll| 18 +++--- llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll | 16 + 8 files changed, 104 insertions(+), 157 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td index 96be17c487130..df867aaa204b1 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td @@ -210,5 +210,6 @@ def AMDGPURegBankCombiner : GICombiner< fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp, identity_combines, redundant_and, constant_fold_cast_op, cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines, - lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract]> { + lower_uniform_sbfx, lower_uniform_ubfx, form_bitfield_extract, + known_bits_simplifications]> { } diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll index 6baa10bb48621..cc0f45681a3e2 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll @@ -1744,63 +1744,64 @@ define i65 @v_lshr_i65_33(i65 %value) { ; GFX6-LABEL: v_lshr_i65_33: ; GFX6: ; %bb.0: ; GFX6-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX6-NEXT:v_mov_b32_e32 v3, v1 -; GFX6-NEXT:v_mov_b32_e32 v0, 1 +; GFX6-NEXT:v_mov_b32_e32 v3, 1 +; GFX6-NEXT:v_mov_b32_e32 v4, 0 +; GFX6-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX6-NEXT:v_lshl_b64 v[2:3], v[3:4], 31 +; GFX6-NEXT:v_lshrrev_b32_e32 v0, 1, v1 +; GFX6-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX6-NEXT:v_mov_b32_e32 v1, 0 -; GFX6-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX6-NEXT:v_lshl_b64 v[0:1], v[0:1], 31 -; GFX6-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX6-NEXT:v_or_b32_e32 v0, v2, v0 ; GFX6-NEXT:v_mov_b32_e32 v2, 0 ; GFX6-NEXT:s_setpc_b64 s[30:31] ; ; GFX8-LABEL: v_lshr_i65_33: ; GFX8: ; %bb.0: ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX8-NEXT:v_mov_b32_e32 v3, v1 -; GFX8-NEXT:v_mov_b32_e32 v0, 1 +; GFX8-NEXT:v_mov_b32_e32 v3, 1 +; GFX8-NEXT:v_mov_b32_e32 v4, 0 +; GFX8-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX8-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4] +; GFX8-NEXT:v_lshrrev_b32_e32 v0, 1, v1 +; GFX8-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX8-NEXT:v_mov_b32_e32 v1, 0 -; GFX8-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX8-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1] -; GFX8-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX8-NEXT:v_or_b32_e32 v0, v2, v0 ; GFX8-NEXT:v_mov_b32_e32 v2, 0 ; GFX8-NEXT:s_setpc_b64 s[30:31] ; ; GFX9-LABEL: v_lshr_i65_33: ; GFX9: ; %bb.0: ; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX9-NEXT:v_mov_b32_e32 v3, v1 -; GFX9-NEXT:v_mov_b32_e32 v0, 1 +; GFX9-NEXT:v_mov_b32_e32 v3, 1 +; GFX9-NEXT:v_mov_b32_e32 v4, 0 +; GFX9-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX9-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4] +; GFX9-NEXT:v_lshrrev_b32_e32 v0, 1, v1 +; GFX9-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX9-NEXT:v_mov_b32_e32 v1, 0 -; GFX9-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX9-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1] -; GFX9-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX9-NEXT:v_or_b32_e32 v0, v2, v0 ; GFX9-NEXT:v_mov_b32_e32 v2, 0 ; GFX9-NEXT:s_setpc_b64 s[30:31] ; ; GFX10-LABEL: v_lshr_i65_33: ; GFX10: ; %bb.0: ; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX10-NEXT:v_mov_b32_e32 v3, v1 -; GFX10-NEXT:v_mov_b32_e32 v0, 1 +; GFX10-NEXT:v_mov_b32_e32 v3, 1 +; GFX10-NEXT:v_mov_b32_e32 v4, 0 +; GFX10-NEXT:v_and_b32_e32 v3, 1, v2 +; GFX10-NEXT:v_lshrrev_b32_e32 v0, 1, v1 ; GFX10-NEXT:v_mov_b32_e32 v1, 0 -; GFX10-NEXT:v_and_b32_e32 v0, 1, v2 -; GFX10-NEXT:v_lshrrev_b32_e32 v2, 1, v3 -; GFX10-NEXT:v_lshlrev_b64 v[0:1], 31, v[0:1] -; GFX10-NEXT:v_or_b32_e32 v0, v2, v0 +; GFX10-NEXT:v_lshlrev_b64 v[2:3], 31, v[3:4] +; GFX10-NEXT:v_or_b32_e32 v0, v0, v2 ; GFX10-NEXT:v_mov_b32_e32 v2, 0 ; GFX10-NEXT:s_setpc_b64 s[30:31] ; ; GFX11-LABEL: v_lshr_i65_33: ; GFX11: ; %bb.0: ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX11-NEXT:v_dual_mov_b32 v3, v1 :: v_dual_mov_b32 v0, 1 -; GFX11-NEXT:v_dual_mov_b32 v1, 0 :: v_dual_an
[llvm-branch-commits] [llvm] [BOLT] Factor out MCInstReference from gadget scanner (NFC) (PR #138655)
https://github.com/maksfb approved this pull request. As an NFC this change looks good to me. I've left a comments for a follow-up. https://github.com/llvm/llvm-project/pull/138655 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Factor out MCInstReference from gadget scanner (NFC) (PR #138655)
@@ -1682,48 +1648,66 @@ void Analysis::runOnFunction(BinaryFunction &BF, } } +// Compute the instruction address for printing (may be slow). +static uint64_t getAddress(const MCInstReference &Inst) { + const BinaryFunction *BF = Inst.getFunction(); + + if (Inst.hasCFG()) { +const BinaryBasicBlock *BB = Inst.getBasicBlock(); + +auto It = static_cast(&Inst.getMCInst()); +unsigned IndexInBB = std::distance(BB->begin(), It); + +// FIXME: this assumes all instructions are 4 bytes in size. This is true maksfb wrote: We have `BinaryContext::computeCodeSize()`. https://github.com/llvm/llvm-project/pull/138655 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Factor out MCInstReference from gadget scanner (NFC) (PR #138655)
@@ -0,0 +1,168 @@ +//===- bolt/Core/MCInstUtils.h --*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#ifndef BOLT_CORE_MCINSTUTILS_H +#define BOLT_CORE_MCINSTUTILS_H + +#include "bolt/Core/BinaryBasicBlock.h" + +#include +#include +#include + +namespace llvm { +namespace bolt { + +class BinaryFunction; + +/// MCInstReference represents a reference to a constant MCInst as stored either +/// in a BinaryFunction (i.e. before a CFG is created), or in a BinaryBasicBlock +/// (after a CFG is created). +class MCInstReference { + using nocfg_const_iterator = std::map::const_iterator; + + // Two cases are possible: + // * functions with CFG reconstructed - a function stores a collection of + // basic blocks, each basic block stores a contiguous vector of MCInst + // * functions without CFG - there are no basic blocks created, + // the instructions are directly stored in std::map in BinaryFunction + // + // In both cases, the direct parent of MCInst is stored together with an + // iterator pointing to the instruction. + + // Helper struct: CFG is available, the direct parent is a basic block, + // iterator's type is `MCInst *`. + struct RefInBB { +RefInBB(const BinaryBasicBlock *BB, const MCInst *Inst) +: BB(BB), It(Inst) {} +RefInBB(const RefInBB &Other) = default; +RefInBB &operator=(const RefInBB &Other) = default; + +const BinaryBasicBlock *BB; +BinaryBasicBlock::const_iterator It; + +bool operator<(const RefInBB &Other) const { + return std::tie(BB, It) < std::tie(Other.BB, Other.It); +} + +bool operator==(const RefInBB &Other) const { + return BB == Other.BB && It == Other.It; +} + }; + + // Helper struct: CFG is *not* available, the direct parent is a function, + // iterator's type is std::map::iterator (the mapped value + // is an instruction's offset). + struct RefInBF { +RefInBF(const BinaryFunction *BF, nocfg_const_iterator It) +: BF(BF), It(It) {} +RefInBF(const RefInBF &Other) = default; +RefInBF &operator=(const RefInBF &Other) = default; + +const BinaryFunction *BF; +nocfg_const_iterator It; + +bool operator<(const RefInBF &Other) const { maksfb wrote: Similar concern regarding the `BinaryFunction *` order. https://github.com/llvm/llvm-project/pull/138655 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Factor out MCInstReference from gadget scanner (NFC) (PR #138655)
@@ -0,0 +1,168 @@ +//===- bolt/Core/MCInstUtils.h --*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#ifndef BOLT_CORE_MCINSTUTILS_H +#define BOLT_CORE_MCINSTUTILS_H + +#include "bolt/Core/BinaryBasicBlock.h" + +#include +#include +#include + +namespace llvm { +namespace bolt { + +class BinaryFunction; + +/// MCInstReference represents a reference to a constant MCInst as stored either +/// in a BinaryFunction (i.e. before a CFG is created), or in a BinaryBasicBlock +/// (after a CFG is created). +class MCInstReference { + using nocfg_const_iterator = std::map::const_iterator; + + // Two cases are possible: + // * functions with CFG reconstructed - a function stores a collection of + // basic blocks, each basic block stores a contiguous vector of MCInst + // * functions without CFG - there are no basic blocks created, + // the instructions are directly stored in std::map in BinaryFunction + // + // In both cases, the direct parent of MCInst is stored together with an + // iterator pointing to the instruction. + + // Helper struct: CFG is available, the direct parent is a basic block, + // iterator's type is `MCInst *`. + struct RefInBB { +RefInBB(const BinaryBasicBlock *BB, const MCInst *Inst) +: BB(BB), It(Inst) {} +RefInBB(const RefInBB &Other) = default; +RefInBB &operator=(const RefInBB &Other) = default; + +const BinaryBasicBlock *BB; +BinaryBasicBlock::const_iterator It; + +bool operator<(const RefInBB &Other) const { maksfb wrote: What are expected uses for this comparison? I'm concerned about non-deterministic order of `BinaryBasicBlock *`. https://github.com/llvm/llvm-project/pull/138655 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Factor out MCInstReference from gadget scanner (NFC) (PR #138655)
https://github.com/maksfb edited https://github.com/llvm/llvm-project/pull/138655 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Factor out MCInstReference from gadget scanner (NFC) (PR #138655)
@@ -0,0 +1,168 @@ +//===- bolt/Core/MCInstUtils.h --*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#ifndef BOLT_CORE_MCINSTUTILS_H +#define BOLT_CORE_MCINSTUTILS_H + +#include "bolt/Core/BinaryBasicBlock.h" + maksfb wrote: nit: drop empty line. https://github.com/llvm/llvm-project/pull/138655 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add overflow operations to isBoolSGPR (PR #141803)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/141803 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Factor out MCInstReference from gadget scanner (NFC) (PR #138655)
@@ -0,0 +1,57 @@ +//===- bolt/Passes/MCInstUtils.cpp ===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#include "bolt/Core/MCInstUtils.h" + maksfb wrote: nit: empty line. https://github.com/llvm/llvm-project/pull/138655 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add parsing of address params in StaticSampler (PR #140293)
https://github.com/inbelic edited https://github.com/llvm/llvm-project/pull/140293 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [DirectX] Improve error message when a binding cannot be found for a resource (PR #140642)
https://github.com/hekota closed https://github.com/llvm/llvm-project/pull/140642 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: fix LR to be safe in leaf functions without CFG (PR #141824)
https://github.com/atrosinenko created https://github.com/llvm/llvm-project/pull/141824 After a label in a function without CFG information, use a reasonably pessimistic estimation of register state (assume that any register that can be clobbered in this function was actually clobbered) instead of the most pessimistic "all registers are unsafe". This is the same estimation as used by the dataflow variant of the analysis when the preceding instruction is not known for sure. Without this, leaf functions without CFG information are likely to have false positive reports about non-protected return instructions, as 1) LR is unlikely to be signed and authenticated in a leaf function and 2) LR is likely to be used by a return instruction near the end of the function and 3) the register state is likely to be reset at least once during the linear scan through the function >From 7d38c3ebb3dd7f67f87b494e2dfe6e6c4ca29787 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Wed, 14 May 2025 23:12:13 +0300 Subject: [PATCH] [BOLT] Gadget scanner: fix LR to be safe in leaf functions without CFG After a label in a function without CFG information, use a reasonably pessimistic estimation of register state (assume that any register that can be clobbered in this function was actually clobbered) instead of the most pessimistic "all registers are unsafe". This is the same estimation as used by the dataflow variant of the analysis when the preceding instruction is not known for sure. Without this, leaf functions without CFG information are likely to have false positive reports about non-protected return instructions, as 1) LR is unlikely to be signed and authenticated in a leaf function and 2) LR is likely to be used by a return instruction near the end of the function and 3) the register state is likely to be reset at least once during the linear scan through the function --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 14 +++-- .../AArch64/gs-pacret-autiasp.s | 31 +-- .../AArch64/gs-pauth-authentication-oracles.s | 20 .../AArch64/gs-pauth-debug-output.s | 30 ++ .../AArch64/gs-pauth-signing-oracles.s| 27 5 files changed, 29 insertions(+), 93 deletions(-) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 2aacb38ee19a9..6327a2da54d5b 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -737,19 +737,14 @@ template class CFGUnawareAnalysis { // // Then, a function can be split into a number of disjoint contiguous sequences // of instructions without labels in between. These sequences can be processed -// the same way basic blocks are processed by data-flow analysis, assuming -// pessimistically that all registers are unsafe at the start of each sequence. +// the same way basic blocks are processed by data-flow analysis, with the same +// pessimistic estimation of the initial state at the start of each sequence +// (except the first instruction of the function). class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis, public CFGUnawareAnalysis { using SrcSafetyAnalysis::BC; BinaryFunction &BF; - /// Creates a state with all registers marked unsafe (not to be confused - /// with empty state). - SrcState createUnsafeState() const { -return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); - } - public: CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId, @@ -759,6 +754,7 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis, } void run() override { +const SrcState DefaultState = computePessimisticState(BF); SrcState S = createEntryState(); for (auto &I : BF.instrs()) { MCInst &Inst = I.second; @@ -773,7 +769,7 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis, LLVM_DEBUG({ traceInst(BC, "Due to label, resetting the state before", Inst); }); -S = createUnsafeState(); +S = DefaultState; } // Attach the state *before* this instruction executes. diff --git a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s index df0a83be00986..627f8eb20ab9c 100644 --- a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s +++ b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s @@ -224,20 +224,33 @@ f_unreachable_instruction: ret .size f_unreachable_instruction, .-f_unreachable_instruction -// Expected false positive: without CFG, the state is reset to all-unsafe -// after an unconditional branch. - -.globl state_is_reset_after_indirect_branch_nocfg -.type state_is_reset_after_indirect_branch_nocfg,@function -state_is_reset_after_indirect_branch_nocfg: -// CHECK-LABEL: GS
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: fix LR to be safe in leaf functions without CFG (PR #141824)
atrosinenko wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/141824?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#141824** https://app.graphite.dev/github/pr/llvm/llvm-project/141824?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> đ https://app.graphite.dev/github/pr/llvm/llvm-project/141824?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#136183** https://app.graphite.dev/github/pr/llvm/llvm-project/136183?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/>: 1 other dependent PR ([#137224](https://github.com/llvm/llvm-project/pull/137224) https://app.graphite.dev/github/pr/llvm/llvm-project/137224?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/>) * **#136151** https://app.graphite.dev/github/pr/llvm/llvm-project/136151?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#135663** https://app.graphite.dev/github/pr/llvm/llvm-project/135663?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#136147** https://app.graphite.dev/github/pr/llvm/llvm-project/136147?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#135662** https://app.graphite.dev/github/pr/llvm/llvm-project/135662?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#135661** https://app.graphite.dev/github/pr/llvm/llvm-project/135661?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#134146** https://app.graphite.dev/github/pr/llvm/llvm-project/134146?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#133461** https://app.graphite.dev/github/pr/llvm/llvm-project/133461?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#135073** https://app.graphite.dev/github/pr/llvm/llvm-project/135073?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/141824 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: fix LR to be safe in leaf functions without CFG (PR #141824)
https://github.com/atrosinenko ready_for_review https://github.com/llvm/llvm-project/pull/141824 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: fix LR to be safe in leaf functions without CFG (PR #141824)
llvmbot wrote: @llvm/pr-subscribers-bolt Author: Anatoly Trosinenko (atrosinenko) Changes After a label in a function without CFG information, use a reasonably pessimistic estimation of register state (assume that any register that can be clobbered in this function was actually clobbered) instead of the most pessimistic "all registers are unsafe". This is the same estimation as used by the dataflow variant of the analysis when the preceding instruction is not known for sure. Without this, leaf functions without CFG information are likely to have false positive reports about non-protected return instructions, as 1) LR is unlikely to be signed and authenticated in a leaf function and 2) LR is likely to be used by a return instruction near the end of the function and 3) the register state is likely to be reset at least once during the linear scan through the function --- Full diff: https://github.com/llvm/llvm-project/pull/141824.diff 5 Files Affected: - (modified) bolt/lib/Passes/PAuthGadgetScanner.cpp (+5-9) - (modified) bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s (+22-9) - (modified) bolt/test/binary-analysis/AArch64/gs-pauth-authentication-oracles.s (-20) - (modified) bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s (+2-28) - (modified) bolt/test/binary-analysis/AArch64/gs-pauth-signing-oracles.s (-27) ``diff diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 2aacb38ee19a9..6327a2da54d5b 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -737,19 +737,14 @@ template class CFGUnawareAnalysis { // // Then, a function can be split into a number of disjoint contiguous sequences // of instructions without labels in between. These sequences can be processed -// the same way basic blocks are processed by data-flow analysis, assuming -// pessimistically that all registers are unsafe at the start of each sequence. +// the same way basic blocks are processed by data-flow analysis, with the same +// pessimistic estimation of the initial state at the start of each sequence +// (except the first instruction of the function). class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis, public CFGUnawareAnalysis { using SrcSafetyAnalysis::BC; BinaryFunction &BF; - /// Creates a state with all registers marked unsafe (not to be confused - /// with empty state). - SrcState createUnsafeState() const { -return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); - } - public: CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId, @@ -759,6 +754,7 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis, } void run() override { +const SrcState DefaultState = computePessimisticState(BF); SrcState S = createEntryState(); for (auto &I : BF.instrs()) { MCInst &Inst = I.second; @@ -773,7 +769,7 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis, LLVM_DEBUG({ traceInst(BC, "Due to label, resetting the state before", Inst); }); -S = createUnsafeState(); +S = DefaultState; } // Attach the state *before* this instruction executes. diff --git a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s index df0a83be00986..627f8eb20ab9c 100644 --- a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s +++ b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s @@ -224,20 +224,33 @@ f_unreachable_instruction: ret .size f_unreachable_instruction, .-f_unreachable_instruction -// Expected false positive: without CFG, the state is reset to all-unsafe -// after an unconditional branch. - -.globl state_is_reset_after_indirect_branch_nocfg -.type state_is_reset_after_indirect_branch_nocfg,@function -state_is_reset_after_indirect_branch_nocfg: -// CHECK-LABEL: GS-PAUTH: non-protected ret found in function state_is_reset_after_indirect_branch_nocfg, at address -// CHECK-NEXT: The instruction is {{[0-9a-f]+}}: ret +// Without CFG, the state is reset at labels, assuming every register that can +// be clobbered in the function was actually clobbered. + +.globl lr_untouched_nocfg +.type lr_untouched_nocfg,@function +lr_untouched_nocfg: +// CHECK-NOT: lr_untouched_nocfg +adr x2, 1f +br x2 +1: +ret +.size lr_untouched_nocfg, .-lr_untouched_nocfg + +.globl lr_clobbered_nocfg +.type lr_clobbered_nocfg,@function +lr_clobbered_nocfg: +// CHECK-LABEL: GS-PAUTH: non-protected ret found in function lr_clobbered_nocfg, at address +// CHECK-NEXT: The instruction is {{[0-9a-f]+}}: ret // CHECK-NEXT: The 0 instructions that write to the affected registers after any authent
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)
https://github.com/atrosinenko edited https://github.com/llvm/llvm-project/pull/137224 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: make use of C++17 features and LLVM helpers (PR #141665)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/141665 >From d3742598bbf2a248124fe1b297d1447c52e40be1 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 27 May 2025 21:06:03 +0300 Subject: [PATCH] [BOLT] Gadget scanner: make use of C++17 features and LLVM helpers Perform trivial syntactical cleanups: * make use of structured binding declarations * use LLVM utility functions when appropriate * omit braces around single expression inside single-line LLVM_DEBUG() This patch is NFC aside from minor debug output changes. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 67 +-- .../AArch64/gs-pauth-debug-output.s | 14 ++-- 2 files changed, 38 insertions(+), 43 deletions(-) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 34b5b1d51de4e..dac274c0f4130 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -88,8 +88,8 @@ class TrackedRegisters { TrackedRegisters(ArrayRef RegsToTrack) : Registers(RegsToTrack), RegToIndexMapping(getMappingSize(RegsToTrack), NoIndex) { -for (unsigned I = 0; I < RegsToTrack.size(); ++I) - RegToIndexMapping[RegsToTrack[I]] = I; +for (auto [MappedIndex, Reg] : llvm::enumerate(RegsToTrack)) + RegToIndexMapping[Reg] = MappedIndex; } ArrayRef getRegisters() const { return Registers; } @@ -203,9 +203,9 @@ struct SrcState { SafeToDerefRegs &= StateIn.SafeToDerefRegs; TrustedRegs &= StateIn.TrustedRegs; -for (unsigned I = 0; I < LastInstWritingReg.size(); ++I) - for (const MCInst *J : StateIn.LastInstWritingReg[I]) -LastInstWritingReg[I].insert(J); +for (auto [ThisSet, OtherSet] : + llvm::zip_equal(LastInstWritingReg, StateIn.LastInstWritingReg)) + ThisSet.insert_range(OtherSet); return *this; } @@ -224,11 +224,9 @@ struct SrcState { static void printInstsShort(raw_ostream &OS, ArrayRef Insts) { OS << "Insts: "; - for (unsigned I = 0; I < Insts.size(); ++I) { -auto &Set = Insts[I]; + for (auto [I, PtrSet] : llvm::enumerate(Insts)) { OS << "[" << I << "]("; -for (const MCInst *MCInstP : Set) - OS << MCInstP << " "; +interleave(PtrSet, OS, " "); OS << ")"; } } @@ -416,8 +414,9 @@ class SrcSafetyAnalysis { // ... an address can be updated in a safe manner, producing the result // which is as trusted as the input address. if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) { - if (Cur.SafeToDerefRegs[DstAndSrc->second]) -Regs.push_back(DstAndSrc->first); + auto [DstReg, SrcReg] = *DstAndSrc; + if (Cur.SafeToDerefRegs[SrcReg]) +Regs.push_back(DstReg); } // Make sure explicit checker sequence keeps register safe-to-dereference @@ -469,8 +468,9 @@ class SrcSafetyAnalysis { // ... an address can be updated in a safe manner, producing the result // which is as trusted as the input address. if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) { - if (Cur.TrustedRegs[DstAndSrc->second]) -Regs.push_back(DstAndSrc->first); + auto [DstReg, SrcReg] = *DstAndSrc; + if (Cur.TrustedRegs[SrcReg]) +Regs.push_back(DstReg); } return Regs; @@ -858,9 +858,9 @@ struct DstState { return (*this = StateIn); CannotEscapeUnchecked &= StateIn.CannotEscapeUnchecked; -for (unsigned I = 0; I < FirstInstLeakingReg.size(); ++I) - for (const MCInst *J : StateIn.FirstInstLeakingReg[I]) -FirstInstLeakingReg[I].insert(J); +for (auto [ThisSet, OtherSet] : + llvm::zip_equal(FirstInstLeakingReg, StateIn.FirstInstLeakingReg)) + ThisSet.insert_range(OtherSet); return *this; } @@ -1025,8 +1025,7 @@ class DstSafetyAnalysis { // ... an address can be updated in a safe manner, or if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Inst)) { - MCPhysReg DstReg, SrcReg; - std::tie(DstReg, SrcReg) = *DstAndSrc; + auto [DstReg, SrcReg] = *DstAndSrc; // Note that *all* registers containing the derived values must be safe, // both source and destination ones. No temporaries are supported at now. if (Cur.CannotEscapeUnchecked[SrcReg] && @@ -1065,7 +1064,7 @@ class DstSafetyAnalysis { // If this instruction terminates the program immediately, no // authentication oracles are possible past this point. if (BC.MIB->isTrap(Point)) { - LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); }); + LLVM_DEBUG(traceInst(BC, "Trap instruction found", Point)); DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); Next.CannotEscapeUnchecked.set(); return Next; @@ -1243,7 +1242,7 @@ class CFGUnawareDstSafetyAnalysis : public DstSafetyAnalysis, // starting to analyze Inst.
[llvm-branch-commits] [llvm] [BOLT] Factor out MCInstReference from gadget scanner (NFC) (PR #138655)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/138655 >From 5b9848cf82a1f047d90c1482404ac60f730892cf Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Mon, 28 Apr 2025 18:35:48 +0300 Subject: [PATCH] [BOLT] Factor out MCInstReference from gadget scanner (NFC) Move MCInstReference representing a constant reference to an instruction inside a parent entity - either inside a basic block (which has a reference to its parent function) or directly to the function (when CFG information is not available). --- bolt/include/bolt/Core/MCInstUtils.h | 168 + bolt/include/bolt/Passes/PAuthGadgetScanner.h | 178 +- bolt/lib/Core/CMakeLists.txt | 1 + bolt/lib/Core/MCInstUtils.cpp | 57 ++ bolt/lib/Passes/PAuthGadgetScanner.cpp| 102 +- 5 files changed, 269 insertions(+), 237 deletions(-) create mode 100644 bolt/include/bolt/Core/MCInstUtils.h create mode 100644 bolt/lib/Core/MCInstUtils.cpp diff --git a/bolt/include/bolt/Core/MCInstUtils.h b/bolt/include/bolt/Core/MCInstUtils.h new file mode 100644 index 0..69bf5e6159b74 --- /dev/null +++ b/bolt/include/bolt/Core/MCInstUtils.h @@ -0,0 +1,168 @@ +//===- bolt/Core/MCInstUtils.h --*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#ifndef BOLT_CORE_MCINSTUTILS_H +#define BOLT_CORE_MCINSTUTILS_H + +#include "bolt/Core/BinaryBasicBlock.h" + +#include +#include +#include + +namespace llvm { +namespace bolt { + +class BinaryFunction; + +/// MCInstReference represents a reference to a constant MCInst as stored either +/// in a BinaryFunction (i.e. before a CFG is created), or in a BinaryBasicBlock +/// (after a CFG is created). +class MCInstReference { + using nocfg_const_iterator = std::map::const_iterator; + + // Two cases are possible: + // * functions with CFG reconstructed - a function stores a collection of + // basic blocks, each basic block stores a contiguous vector of MCInst + // * functions without CFG - there are no basic blocks created, + // the instructions are directly stored in std::map in BinaryFunction + // + // In both cases, the direct parent of MCInst is stored together with an + // iterator pointing to the instruction. + + // Helper struct: CFG is available, the direct parent is a basic block, + // iterator's type is `MCInst *`. + struct RefInBB { +RefInBB(const BinaryBasicBlock *BB, const MCInst *Inst) +: BB(BB), It(Inst) {} +RefInBB(const RefInBB &Other) = default; +RefInBB &operator=(const RefInBB &Other) = default; + +const BinaryBasicBlock *BB; +BinaryBasicBlock::const_iterator It; + +bool operator<(const RefInBB &Other) const { + return std::tie(BB, It) < std::tie(Other.BB, Other.It); +} + +bool operator==(const RefInBB &Other) const { + return BB == Other.BB && It == Other.It; +} + }; + + // Helper struct: CFG is *not* available, the direct parent is a function, + // iterator's type is std::map::iterator (the mapped value + // is an instruction's offset). + struct RefInBF { +RefInBF(const BinaryFunction *BF, nocfg_const_iterator It) +: BF(BF), It(It) {} +RefInBF(const RefInBF &Other) = default; +RefInBF &operator=(const RefInBF &Other) = default; + +const BinaryFunction *BF; +nocfg_const_iterator It; + +bool operator<(const RefInBF &Other) const { + return std::tie(BF, It->first) < std::tie(Other.BF, Other.It->first); +} + +bool operator==(const RefInBF &Other) const { + return BF == Other.BF && It->first == Other.It->first; +} + }; + + std::variant Reference; + + // Utility methods to be used like this: + // + // if (auto *Ref = tryGetRefInBB()) + // return Ref->doSomething(...); + // return getRefInBF().doSomethingElse(...); + const RefInBB *tryGetRefInBB() const { +assert(std::get_if(&Reference) || + std::get_if(&Reference)); +return std::get_if(&Reference); + } + const RefInBF &getRefInBF() const { +assert(std::get_if(&Reference)); +return *std::get_if(&Reference); + } + +public: + /// Constructs an empty reference. + MCInstReference() : Reference(RefInBB(nullptr, nullptr)) {} + /// Constructs a reference to the instruction inside the basic block. + MCInstReference(const BinaryBasicBlock *BB, const MCInst *Inst) + : Reference(RefInBB(BB, Inst)) { +assert(BB && Inst && "Neither BB nor Inst should be nullptr"); + } + /// Constructs a reference to the instruction inside the basic block. + MCInstReference(const BinaryBasicBlock *BB, unsigned Index) + : Reference(RefInBB(BB, &BB->getInstructionAtIndex(I
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: account for BRK when searching for auth oracles (PR #137975)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/137975 >From 74bbe1e6f6e759c369ecf517dbfa6f98c40e9ffb Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Wed, 30 Apr 2025 16:08:10 +0300 Subject: [PATCH] [BOLT] Gadget scanner: account for BRK when searching for auth oracles An authenticated pointer can be explicitly checked by the compiler via a sequence of instructions that executes BRK on failure. It is important to recognize such BRK instruction as checking every register (as it is expected to immediately trigger an abnormal program termination) to prevent false positive reports about authentication oracles: autia x2, x3 autia x0, x1 ; neither x0 nor x2 are checked at this point eor x16, x0, x0, lsl #1 tbz x16, #62, on_success ; marks x0 as checked ; end of BB: for x2 to be checked here, it must be checked in both ; successor basic blocks on_failure: brk 0xc470 on_success: ; x2 is checked ldr x1, [x2] ; marks x2 as checked --- bolt/include/bolt/Core/MCPlusBuilder.h| 14 ++ bolt/lib/Passes/PAuthGadgetScanner.cpp| 13 +- .../Target/AArch64/AArch64MCPlusBuilder.cpp | 24 -- .../AArch64/gs-pauth-address-checks.s | 44 +-- .../AArch64/gs-pauth-authentication-oracles.s | 9 ++-- .../AArch64/gs-pauth-signing-oracles.s| 6 +-- 6 files changed, 75 insertions(+), 35 deletions(-) diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h index b233452985502..c8cbcaf33f4b5 100644 --- a/bolt/include/bolt/Core/MCPlusBuilder.h +++ b/bolt/include/bolt/Core/MCPlusBuilder.h @@ -707,6 +707,20 @@ class MCPlusBuilder { return false; } + /// Returns true if Inst is a trap instruction. + /// + /// Tests if Inst is an instruction that immediately causes an abnormal + /// program termination, for example when a security violation is detected + /// by a compiler-inserted check. + /// + /// @note An implementation of this method should likely return false for + /// calls to library functions like abort(), as it is possible that the + /// execution state is partially attacker-controlled at this point. + virtual bool isTrap(const MCInst &Inst) const { +llvm_unreachable("not implemented"); +return false; + } + virtual bool isBreakpoint(const MCInst &Inst) const { llvm_unreachable("not implemented"); return false; diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 4c7ae3c880db4..11db51f6c6dd1 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -1066,6 +1066,15 @@ class DstSafetyAnalysis { dbgs() << ")\n"; }); +// If this instruction terminates the program immediately, no +// authentication oracles are possible past this point. +if (BC.MIB->isTrap(Point)) { + LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); }); + DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); + Next.CannotEscapeUnchecked.set(); + return Next; +} + // If this instruction is reachable by the analysis, a non-empty state will // be propagated to it sooner or later. Until then, skip computeNext(). if (Cur.empty()) { @@ -1173,8 +1182,8 @@ class DataflowDstSafetyAnalysis // // A basic block without any successors, on the other hand, can be // pessimistically initialized to everything-is-unsafe: this will naturally -// handle both return and tail call instructions and is harmless for -// internal indirect branch instructions (such as computed gotos). +// handle return, trap and tail call instructions. At the same time, it is +// harmless for internal indirect branch instructions, like computed gotos. if (BB.succ_empty()) return createUnsafeState(); diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp index 9d5a578cfbdff..b669d32cc2032 100644 --- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp +++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp @@ -386,10 +386,9 @@ class AArch64MCPlusBuilder : public MCPlusBuilder { // the list of successors of this basic block as appropriate. // Any of the above code sequences assume the fall-through basic block -// is a dead-end BRK instruction (any immediate operand is accepted). +// is a dead-end trap instruction. const BinaryBasicBlock *BreakBB = BB.getFallthrough(); -if (!BreakBB || BreakBB->empty() || -BreakBB->front().getOpcode() != AArch64::BRK) +if (!BreakBB || BreakBB->empty() || !isTrap(BreakBB->front())) return std::nullopt; // Iterate over the instructions of BB in reverse order, matching opcodes @@ -1751,6 +1750,25 @@ class AArch64MCPlusBuilder : public MCPlusBuilder { Inst.addOperand(MCOperand::createImm(0)); }
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: prevent false positives due to jump tables (PR #138884)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/138884 >From b0eeddba47f56f0b917c4a43a744f120ea8e1d6e Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 6 May 2025 11:31:03 +0300 Subject: [PATCH] [BOLT] Gadget scanner: prevent false positives due to jump tables As part of PAuth hardening, AArch64 LLVM backend can use a special BR_JumpTable pseudo (enabled by -faarch64-jump-table-hardening Clang option) which is expanded in the AsmPrinter into a contiguous sequence without unsafe instructions in the middle. This commit adds another target-specific callback to MCPlusBuilder to make it possible to inhibit false positives for known-safe jump table dispatch sequences. Without special handling, the branch instruction is likely to be reported as a non-protected call (as its destination is not produced by an auth instruction, PC-relative address materialization, etc.) and possibly as a tail call being performed with unsafe link register (as the detection whether the branch instruction is a tail call is an heuristic). For now, only the specific instruction sequence used by the AArch64 LLVM backend is matched. --- bolt/include/bolt/Core/MCInstUtils.h | 9 + bolt/include/bolt/Core/MCPlusBuilder.h| 14 + bolt/lib/Core/MCInstUtils.cpp | 20 + bolt/lib/Passes/PAuthGadgetScanner.cpp| 10 + .../Target/AArch64/AArch64MCPlusBuilder.cpp | 73 ++ .../AArch64/gs-pauth-jump-table.s | 703 ++ 6 files changed, 829 insertions(+) create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-jump-table.s diff --git a/bolt/include/bolt/Core/MCInstUtils.h b/bolt/include/bolt/Core/MCInstUtils.h index 50b7d56470c99..33d36cccbcfff 100644 --- a/bolt/include/bolt/Core/MCInstUtils.h +++ b/bolt/include/bolt/Core/MCInstUtils.h @@ -154,6 +154,15 @@ class MCInstReference { return nullptr; } + /// Returns the only preceding instruction, or std::nullopt if multiple or no + /// predecessors are possible. + /// + /// If CFG information is available, basic block boundary can be crossed, + /// provided there is exactly one predecessor. If CFG is not available, the + /// preceding instruction in the offset order is returned, unless this is the + /// first instruction of the function. + std::optional getSinglePredecessor(); + raw_ostream &print(raw_ostream &OS) const; }; diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h index c8cbcaf33f4b5..3abf4d18e94da 100644 --- a/bolt/include/bolt/Core/MCPlusBuilder.h +++ b/bolt/include/bolt/Core/MCPlusBuilder.h @@ -14,6 +14,7 @@ #ifndef BOLT_CORE_MCPLUSBUILDER_H #define BOLT_CORE_MCPLUSBUILDER_H +#include "bolt/Core/MCInstUtils.h" #include "bolt/Core/MCPlus.h" #include "bolt/Core/Relocation.h" #include "llvm/ADT/ArrayRef.h" @@ -700,6 +701,19 @@ class MCPlusBuilder { return std::nullopt; } + /// Tests if BranchInst corresponds to an instruction sequence which is known + /// to be a safe dispatch via jump table. + /// + /// The target can decide which instruction sequences to consider "safe" from + /// the Pointer Authentication point of view, such as any jump table dispatch + /// sequence without function calls inside, any sequence which is contiguous, + /// or only some specific well-known sequences. + virtual bool + isSafeJumpTableBranchForPtrAuth(MCInstReference BranchInst) const { +llvm_unreachable("not implemented"); +return false; + } + virtual bool isTerminator(const MCInst &Inst) const; virtual bool isNoop(const MCInst &Inst) const { diff --git a/bolt/lib/Core/MCInstUtils.cpp b/bolt/lib/Core/MCInstUtils.cpp index 40f6edd59135c..b7c6d898988af 100644 --- a/bolt/lib/Core/MCInstUtils.cpp +++ b/bolt/lib/Core/MCInstUtils.cpp @@ -55,3 +55,23 @@ raw_ostream &MCInstReference::print(raw_ostream &OS) const { OS << ">"; return OS; } + +std::optional MCInstReference::getSinglePredecessor() { + if (const RefInBB *Ref = tryGetRefInBB()) { +if (Ref->It != Ref->BB->begin()) + return MCInstReference(Ref->BB, &*std::prev(Ref->It)); + +if (Ref->BB->pred_size() != 1) + return std::nullopt; + +BinaryBasicBlock *PredBB = *Ref->BB->pred_begin(); +assert(!PredBB->empty() && "Empty basic blocks are not supported yet"); +return MCInstReference(PredBB, &*PredBB->rbegin()); + } + + const RefInBF &Ref = getRefInBF(); + if (Ref.It == Ref.BF->instrs().begin()) +return std::nullopt; + + return MCInstReference(Ref.BF, std::prev(Ref.It)); +} diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 762c08ffd933e..e9ed44a47bf6f 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -1351,6 +1351,11 @@ shouldReportUnsafeTailCall(const BinaryContext &BC, const BinaryFunction &BF, return std::nullopt; } + if (BC.MIB->isSafeJumpTableBranchForPtrAuth(Inst)) { +LL
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: optionally assume auth traps on failure (PR #139778)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/139778 >From b096c6ba85935f7a090031eb693612d1a110d965 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 13 May 2025 19:50:41 +0300 Subject: [PATCH] [BOLT] Gadget scanner: optionally assume auth traps on failure On AArch64 it is possible for an auth instruction to either return an invalid address value on failure (without FEAT_FPAC) or generate an error (with FEAT_FPAC). It thus may be possible to never emit explicit pointer checks, if the target CPU is known to support FEAT_FPAC. This commit implements an --auth-traps-on-failure command line option, which essentially makes "safe-to-dereference" and "trusted" register properties identical and disables scanning for authentication oracles completely. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 112 +++ .../binary-analysis/AArch64/cmdline-args.test | 1 + .../AArch64/gs-pauth-authentication-oracles.s | 6 +- .../binary-analysis/AArch64/gs-pauth-calls.s | 5 +- .../AArch64/gs-pauth-debug-output.s | 177 ++--- .../AArch64/gs-pauth-jump-table.s | 6 +- .../AArch64/gs-pauth-signing-oracles.s| 54 ++--- .../AArch64/gs-pauth-tail-calls.s | 184 +- 8 files changed, 318 insertions(+), 227 deletions(-) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index e9ed44a47bf6f..34b5b1d51de4e 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -14,6 +14,7 @@ #include "bolt/Passes/PAuthGadgetScanner.h" #include "bolt/Core/ParallelUtilities.h" #include "bolt/Passes/DataflowAnalysis.h" +#include "bolt/Utils/CommandLineOpts.h" #include "llvm/ADT/STLExtras.h" #include "llvm/ADT/SmallSet.h" #include "llvm/MC/MCInst.h" @@ -26,6 +27,11 @@ namespace llvm { namespace bolt { namespace PAuthGadgetScanner { +static cl::opt AuthTrapsOnFailure( +"auth-traps-on-failure", +cl::desc("Assume authentication instructions always trap on failure"), +cl::cat(opts::BinaryAnalysisCategory)); + [[maybe_unused]] static void traceInst(const BinaryContext &BC, StringRef Label, const MCInst &MI) { dbgs() << " " << Label << ": "; @@ -364,6 +370,34 @@ class SrcSafetyAnalysis { return Clobbered; } + std::optional getRegMadeTrustedByChecking(const MCInst &Inst, + SrcState Cur) const { +// This functions cannot return multiple registers. This is never the case +// on AArch64. +std::optional RegCheckedByInst = +BC.MIB->getAuthCheckedReg(Inst, /*MayOverwrite=*/false); +if (RegCheckedByInst && Cur.SafeToDerefRegs[*RegCheckedByInst]) + return *RegCheckedByInst; + +auto It = CheckerSequenceInfo.find(&Inst); +if (It == CheckerSequenceInfo.end()) + return std::nullopt; + +MCPhysReg RegCheckedBySequence = It->second.first; +const MCInst *FirstCheckerInst = It->second.second; + +// FirstCheckerInst should belong to the same basic block (see the +// assertion in DataflowSrcSafetyAnalysis::run()), meaning it was +// deterministically processed a few steps before this instruction. +const SrcState &StateBeforeChecker = getStateBefore(*FirstCheckerInst); + +// The sequence checks the register, but it should be authenticated before. +if (!StateBeforeChecker.SafeToDerefRegs[RegCheckedBySequence]) + return std::nullopt; + +return RegCheckedBySequence; + } + // Returns all registers that can be treated as if they are written by an // authentication instruction. SmallVector getRegsMadeSafeToDeref(const MCInst &Point, @@ -386,18 +420,38 @@ class SrcSafetyAnalysis { Regs.push_back(DstAndSrc->first); } +// Make sure explicit checker sequence keeps register safe-to-dereference +// when the register would be clobbered according to the regular rules: +// +//; LR is safe to dereference here +//mov x16, x30 ; start of the sequence, LR is s-t-d right before +//xpaclri ; clobbers LR, LR is not safe anymore +//cmp x30, x16 +//b.eq 1f; end of the sequence: LR is marked as trusted +//brk 0x1234 +// 1: +//; at this point LR would be marked as trusted, +//; but not safe-to-dereference +// +// or even just +// +//; X1 is safe to dereference here +//ldr x0, [x1, #8]! +//; X1 is trusted here, but it was clobbered due to address write-back +if (auto CheckedReg = getRegMadeTrustedByChecking(Point, Cur)) + Regs.push_back(*CheckedReg); + return Regs; } // Returns all registers made trusted by this instruction. SmallVector getRegsMadeTrusted(const MCInst &Point, const SrcState &Cur) const { +assert(!AuthTrapsOnFailure &&
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: optionally assume auth traps on failure (PR #139778)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/139778 >From b096c6ba85935f7a090031eb693612d1a110d965 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 13 May 2025 19:50:41 +0300 Subject: [PATCH] [BOLT] Gadget scanner: optionally assume auth traps on failure On AArch64 it is possible for an auth instruction to either return an invalid address value on failure (without FEAT_FPAC) or generate an error (with FEAT_FPAC). It thus may be possible to never emit explicit pointer checks, if the target CPU is known to support FEAT_FPAC. This commit implements an --auth-traps-on-failure command line option, which essentially makes "safe-to-dereference" and "trusted" register properties identical and disables scanning for authentication oracles completely. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 112 +++ .../binary-analysis/AArch64/cmdline-args.test | 1 + .../AArch64/gs-pauth-authentication-oracles.s | 6 +- .../binary-analysis/AArch64/gs-pauth-calls.s | 5 +- .../AArch64/gs-pauth-debug-output.s | 177 ++--- .../AArch64/gs-pauth-jump-table.s | 6 +- .../AArch64/gs-pauth-signing-oracles.s| 54 ++--- .../AArch64/gs-pauth-tail-calls.s | 184 +- 8 files changed, 318 insertions(+), 227 deletions(-) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index e9ed44a47bf6f..34b5b1d51de4e 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -14,6 +14,7 @@ #include "bolt/Passes/PAuthGadgetScanner.h" #include "bolt/Core/ParallelUtilities.h" #include "bolt/Passes/DataflowAnalysis.h" +#include "bolt/Utils/CommandLineOpts.h" #include "llvm/ADT/STLExtras.h" #include "llvm/ADT/SmallSet.h" #include "llvm/MC/MCInst.h" @@ -26,6 +27,11 @@ namespace llvm { namespace bolt { namespace PAuthGadgetScanner { +static cl::opt AuthTrapsOnFailure( +"auth-traps-on-failure", +cl::desc("Assume authentication instructions always trap on failure"), +cl::cat(opts::BinaryAnalysisCategory)); + [[maybe_unused]] static void traceInst(const BinaryContext &BC, StringRef Label, const MCInst &MI) { dbgs() << " " << Label << ": "; @@ -364,6 +370,34 @@ class SrcSafetyAnalysis { return Clobbered; } + std::optional getRegMadeTrustedByChecking(const MCInst &Inst, + SrcState Cur) const { +// This functions cannot return multiple registers. This is never the case +// on AArch64. +std::optional RegCheckedByInst = +BC.MIB->getAuthCheckedReg(Inst, /*MayOverwrite=*/false); +if (RegCheckedByInst && Cur.SafeToDerefRegs[*RegCheckedByInst]) + return *RegCheckedByInst; + +auto It = CheckerSequenceInfo.find(&Inst); +if (It == CheckerSequenceInfo.end()) + return std::nullopt; + +MCPhysReg RegCheckedBySequence = It->second.first; +const MCInst *FirstCheckerInst = It->second.second; + +// FirstCheckerInst should belong to the same basic block (see the +// assertion in DataflowSrcSafetyAnalysis::run()), meaning it was +// deterministically processed a few steps before this instruction. +const SrcState &StateBeforeChecker = getStateBefore(*FirstCheckerInst); + +// The sequence checks the register, but it should be authenticated before. +if (!StateBeforeChecker.SafeToDerefRegs[RegCheckedBySequence]) + return std::nullopt; + +return RegCheckedBySequence; + } + // Returns all registers that can be treated as if they are written by an // authentication instruction. SmallVector getRegsMadeSafeToDeref(const MCInst &Point, @@ -386,18 +420,38 @@ class SrcSafetyAnalysis { Regs.push_back(DstAndSrc->first); } +// Make sure explicit checker sequence keeps register safe-to-dereference +// when the register would be clobbered according to the regular rules: +// +//; LR is safe to dereference here +//mov x16, x30 ; start of the sequence, LR is s-t-d right before +//xpaclri ; clobbers LR, LR is not safe anymore +//cmp x30, x16 +//b.eq 1f; end of the sequence: LR is marked as trusted +//brk 0x1234 +// 1: +//; at this point LR would be marked as trusted, +//; but not safe-to-dereference +// +// or even just +// +//; X1 is safe to dereference here +//ldr x0, [x1, #8]! +//; X1 is trusted here, but it was clobbered due to address write-back +if (auto CheckedReg = getRegMadeTrustedByChecking(Point, Cur)) + Regs.push_back(*CheckedReg); + return Regs; } // Returns all registers made trusted by this instruction. SmallVector getRegsMadeTrusted(const MCInst &Point, const SrcState &Cur) const { +assert(!AuthTrapsOnFailure &&
[llvm-branch-commits] [llvm] [BOLT] Introduce helpers to match `MCInst`s one at a time (NFC) (PR #138883)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/138883 >From af01b4e2be6387240a8cbac90d937e37a3413148 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Wed, 7 May 2025 16:42:00 +0300 Subject: [PATCH] [BOLT] Introduce helpers to match `MCInst`s one at a time (NFC) Introduce matchInst helper function to capture and/or match the operands of MCInst. Unlike the existing `MCPlusBuilder::MCInstMatcher` machinery, matchInst is intended for the use cases when precise control over the instruction order is required. For example, when validating PtrAuth hardening, all registers are usually considered unsafe after a function call, even though callee-saved registers should preserve their old values *under normal operation*. --- bolt/include/bolt/Core/MCInstUtils.h | 128 ++ .../Target/AArch64/AArch64MCPlusBuilder.cpp | 90 +--- 2 files changed, 162 insertions(+), 56 deletions(-) diff --git a/bolt/include/bolt/Core/MCInstUtils.h b/bolt/include/bolt/Core/MCInstUtils.h index 69bf5e6159b74..50b7d56470c99 100644 --- a/bolt/include/bolt/Core/MCInstUtils.h +++ b/bolt/include/bolt/Core/MCInstUtils.h @@ -162,6 +162,134 @@ static inline raw_ostream &operator<<(raw_ostream &OS, return Ref.print(OS); } +/// Instruction-matching helpers operating on a single instruction at a time. +/// +/// Unlike MCPlusBuilder::MCInstMatcher, this matchInst() function focuses on +/// the cases where a precise control over the instruction order is important: +/// +/// // Bring the short names into the local scope: +/// using namespace MCInstMatcher; +/// // Declare the registers to capture: +/// Reg Xn, Xm; +/// // Capture the 0th and 1st operands, match the 2nd operand against the +/// // just captured Xm register, match the 3rd operand against literal 0: +/// if (!matchInst(MaybeAdd, AArch64::ADDXrs, Xm, Xn, Xm, Imm(0)) +/// return AArch64::NoRegister; +/// // Match the 0th operand against Xm: +/// if (!matchInst(MaybeBr, AArch64::BR, Xm)) +/// return AArch64::NoRegister; +/// // Return the matched register: +/// return Xm.get(); +namespace MCInstMatcher { + +// The base class to match an operand of type T. +// +// The subclasses of OpMatcher are intended to be allocated on the stack and +// to only be used by passing them to matchInst() and by calling their get() +// function, thus the peculiar `mutable` specifiers: to make the calling code +// compact and readable, the templated matchInst() function has to accept both +// long-lived Imm/Reg wrappers declared as local variables (intended to capture +// the first operand's value and match the subsequent operands, whether inside +// a single instruction or across multiple instructions), as well as temporary +// wrappers around literal values to match, f.e. Imm(42) or Reg(AArch64::XZR). +template class OpMatcher { + mutable std::optional Value; + mutable std::optional SavedValue; + + // Remember/restore the last Value - to be called by matchInst. + void remember() const { SavedValue = Value; } + void restore() const { Value = SavedValue; } + + template + friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...); + +protected: + OpMatcher(std::optional ValueToMatch) : Value(ValueToMatch) {} + + bool matchValue(T OpValue) const { +// Check that OpValue does not contradict the existing Value. +bool MatchResult = !Value || *Value == OpValue; +// If MatchResult is false, all matchers will be reset before returning from +// matchInst, including this one, thus no need to assign conditionally. +Value = OpValue; + +return MatchResult; + } + +public: + /// Returns the captured value. + T get() const { +assert(Value.has_value()); +return *Value; + } +}; + +class Reg : public OpMatcher { + bool matches(const MCOperand &Op) const { +if (!Op.isReg()) + return false; + +return matchValue(Op.getReg()); + } + + template + friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...); + +public: + Reg(std::optional RegToMatch = std::nullopt) + : OpMatcher(RegToMatch) {} +}; + +class Imm : public OpMatcher { + bool matches(const MCOperand &Op) const { +if (!Op.isImm()) + return false; + +return matchValue(Op.getImm()); + } + + template + friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...); + +public: + Imm(std::optional ImmToMatch = std::nullopt) + : OpMatcher(ImmToMatch) {} +}; + +/// Tries to match Inst and updates Ops on success. +/// +/// If Inst has the specified Opcode and its operand list prefix matches Ops, +/// this function returns true and updates Ops, otherwise false is returned and +/// values of Ops are kept as before matchInst was called. +/// +/// Please note that while Ops are technically passed by a const reference to +/// make invocations like `matchInst(MI, Opcode, Imm(42))` possible, all their +/// fields are marked mut
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: make use of C++17 features and LLVM helpers (PR #141665)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/141665 >From d3742598bbf2a248124fe1b297d1447c52e40be1 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 27 May 2025 21:06:03 +0300 Subject: [PATCH] [BOLT] Gadget scanner: make use of C++17 features and LLVM helpers Perform trivial syntactical cleanups: * make use of structured binding declarations * use LLVM utility functions when appropriate * omit braces around single expression inside single-line LLVM_DEBUG() This patch is NFC aside from minor debug output changes. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 67 +-- .../AArch64/gs-pauth-debug-output.s | 14 ++-- 2 files changed, 38 insertions(+), 43 deletions(-) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 34b5b1d51de4e..dac274c0f4130 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -88,8 +88,8 @@ class TrackedRegisters { TrackedRegisters(ArrayRef RegsToTrack) : Registers(RegsToTrack), RegToIndexMapping(getMappingSize(RegsToTrack), NoIndex) { -for (unsigned I = 0; I < RegsToTrack.size(); ++I) - RegToIndexMapping[RegsToTrack[I]] = I; +for (auto [MappedIndex, Reg] : llvm::enumerate(RegsToTrack)) + RegToIndexMapping[Reg] = MappedIndex; } ArrayRef getRegisters() const { return Registers; } @@ -203,9 +203,9 @@ struct SrcState { SafeToDerefRegs &= StateIn.SafeToDerefRegs; TrustedRegs &= StateIn.TrustedRegs; -for (unsigned I = 0; I < LastInstWritingReg.size(); ++I) - for (const MCInst *J : StateIn.LastInstWritingReg[I]) -LastInstWritingReg[I].insert(J); +for (auto [ThisSet, OtherSet] : + llvm::zip_equal(LastInstWritingReg, StateIn.LastInstWritingReg)) + ThisSet.insert_range(OtherSet); return *this; } @@ -224,11 +224,9 @@ struct SrcState { static void printInstsShort(raw_ostream &OS, ArrayRef Insts) { OS << "Insts: "; - for (unsigned I = 0; I < Insts.size(); ++I) { -auto &Set = Insts[I]; + for (auto [I, PtrSet] : llvm::enumerate(Insts)) { OS << "[" << I << "]("; -for (const MCInst *MCInstP : Set) - OS << MCInstP << " "; +interleave(PtrSet, OS, " "); OS << ")"; } } @@ -416,8 +414,9 @@ class SrcSafetyAnalysis { // ... an address can be updated in a safe manner, producing the result // which is as trusted as the input address. if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) { - if (Cur.SafeToDerefRegs[DstAndSrc->second]) -Regs.push_back(DstAndSrc->first); + auto [DstReg, SrcReg] = *DstAndSrc; + if (Cur.SafeToDerefRegs[SrcReg]) +Regs.push_back(DstReg); } // Make sure explicit checker sequence keeps register safe-to-dereference @@ -469,8 +468,9 @@ class SrcSafetyAnalysis { // ... an address can be updated in a safe manner, producing the result // which is as trusted as the input address. if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) { - if (Cur.TrustedRegs[DstAndSrc->second]) -Regs.push_back(DstAndSrc->first); + auto [DstReg, SrcReg] = *DstAndSrc; + if (Cur.TrustedRegs[SrcReg]) +Regs.push_back(DstReg); } return Regs; @@ -858,9 +858,9 @@ struct DstState { return (*this = StateIn); CannotEscapeUnchecked &= StateIn.CannotEscapeUnchecked; -for (unsigned I = 0; I < FirstInstLeakingReg.size(); ++I) - for (const MCInst *J : StateIn.FirstInstLeakingReg[I]) -FirstInstLeakingReg[I].insert(J); +for (auto [ThisSet, OtherSet] : + llvm::zip_equal(FirstInstLeakingReg, StateIn.FirstInstLeakingReg)) + ThisSet.insert_range(OtherSet); return *this; } @@ -1025,8 +1025,7 @@ class DstSafetyAnalysis { // ... an address can be updated in a safe manner, or if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Inst)) { - MCPhysReg DstReg, SrcReg; - std::tie(DstReg, SrcReg) = *DstAndSrc; + auto [DstReg, SrcReg] = *DstAndSrc; // Note that *all* registers containing the derived values must be safe, // both source and destination ones. No temporaries are supported at now. if (Cur.CannotEscapeUnchecked[SrcReg] && @@ -1065,7 +1064,7 @@ class DstSafetyAnalysis { // If this instruction terminates the program immediately, no // authentication oracles are possible past this point. if (BC.MIB->isTrap(Point)) { - LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); }); + LLVM_DEBUG(traceInst(BC, "Trap instruction found", Point)); DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); Next.CannotEscapeUnchecked.set(); return Next; @@ -1243,7 +1242,7 @@ class CFGUnawareDstSafetyAnalysis : public DstSafetyAnalysis, // starting to analyze Inst.
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: account for BRK when searching for auth oracles (PR #137975)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/137975 >From 74bbe1e6f6e759c369ecf517dbfa6f98c40e9ffb Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Wed, 30 Apr 2025 16:08:10 +0300 Subject: [PATCH] [BOLT] Gadget scanner: account for BRK when searching for auth oracles An authenticated pointer can be explicitly checked by the compiler via a sequence of instructions that executes BRK on failure. It is important to recognize such BRK instruction as checking every register (as it is expected to immediately trigger an abnormal program termination) to prevent false positive reports about authentication oracles: autia x2, x3 autia x0, x1 ; neither x0 nor x2 are checked at this point eor x16, x0, x0, lsl #1 tbz x16, #62, on_success ; marks x0 as checked ; end of BB: for x2 to be checked here, it must be checked in both ; successor basic blocks on_failure: brk 0xc470 on_success: ; x2 is checked ldr x1, [x2] ; marks x2 as checked --- bolt/include/bolt/Core/MCPlusBuilder.h| 14 ++ bolt/lib/Passes/PAuthGadgetScanner.cpp| 13 +- .../Target/AArch64/AArch64MCPlusBuilder.cpp | 24 -- .../AArch64/gs-pauth-address-checks.s | 44 +-- .../AArch64/gs-pauth-authentication-oracles.s | 9 ++-- .../AArch64/gs-pauth-signing-oracles.s| 6 +-- 6 files changed, 75 insertions(+), 35 deletions(-) diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h index b233452985502..c8cbcaf33f4b5 100644 --- a/bolt/include/bolt/Core/MCPlusBuilder.h +++ b/bolt/include/bolt/Core/MCPlusBuilder.h @@ -707,6 +707,20 @@ class MCPlusBuilder { return false; } + /// Returns true if Inst is a trap instruction. + /// + /// Tests if Inst is an instruction that immediately causes an abnormal + /// program termination, for example when a security violation is detected + /// by a compiler-inserted check. + /// + /// @note An implementation of this method should likely return false for + /// calls to library functions like abort(), as it is possible that the + /// execution state is partially attacker-controlled at this point. + virtual bool isTrap(const MCInst &Inst) const { +llvm_unreachable("not implemented"); +return false; + } + virtual bool isBreakpoint(const MCInst &Inst) const { llvm_unreachable("not implemented"); return false; diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 4c7ae3c880db4..11db51f6c6dd1 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -1066,6 +1066,15 @@ class DstSafetyAnalysis { dbgs() << ")\n"; }); +// If this instruction terminates the program immediately, no +// authentication oracles are possible past this point. +if (BC.MIB->isTrap(Point)) { + LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); }); + DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); + Next.CannotEscapeUnchecked.set(); + return Next; +} + // If this instruction is reachable by the analysis, a non-empty state will // be propagated to it sooner or later. Until then, skip computeNext(). if (Cur.empty()) { @@ -1173,8 +1182,8 @@ class DataflowDstSafetyAnalysis // // A basic block without any successors, on the other hand, can be // pessimistically initialized to everything-is-unsafe: this will naturally -// handle both return and tail call instructions and is harmless for -// internal indirect branch instructions (such as computed gotos). +// handle return, trap and tail call instructions. At the same time, it is +// harmless for internal indirect branch instructions, like computed gotos. if (BB.succ_empty()) return createUnsafeState(); diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp index 9d5a578cfbdff..b669d32cc2032 100644 --- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp +++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp @@ -386,10 +386,9 @@ class AArch64MCPlusBuilder : public MCPlusBuilder { // the list of successors of this basic block as appropriate. // Any of the above code sequences assume the fall-through basic block -// is a dead-end BRK instruction (any immediate operand is accepted). +// is a dead-end trap instruction. const BinaryBasicBlock *BreakBB = BB.getFallthrough(); -if (!BreakBB || BreakBB->empty() || -BreakBB->front().getOpcode() != AArch64::BRK) +if (!BreakBB || BreakBB->empty() || !isTrap(BreakBB->front())) return std::nullopt; // Iterate over the instructions of BB in reverse order, matching opcodes @@ -1751,6 +1750,25 @@ class AArch64MCPlusBuilder : public MCPlusBuilder { Inst.addOperand(MCOperand::createImm(0)); }
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/137224 >From a75cab7070e2167a4be39a4467895a2d1622c4e8 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 22 Apr 2025 21:43:14 +0300 Subject: [PATCH] [BOLT] Gadget scanner: detect untrusted LR before tail call Implement the detection of tail calls performed with untrusted link register, which violates the assumption made on entry to every function. Unlike other pauth gadgets, this one involves some amount of guessing which branch instructions should be checked as tail calls. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 80 +++ .../AArch64/gs-pauth-tail-calls.s | 597 ++ 2 files changed, 677 insertions(+) create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-tail-calls.s diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 6327a2da54d5b..4c7ae3c880db4 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -1307,6 +1307,83 @@ shouldReportReturnGadget(const BinaryContext &BC, const MCInstReference &Inst, return make_gadget_report(RetKind, Inst, *RetReg); } +/// While BOLT already marks some of the branch instructions as tail calls, +/// this function tries to improve the coverage by including less obvious cases +/// when it is possible to do without introducing too many false positives. +static bool shouldAnalyzeTailCallInst(const BinaryContext &BC, + const BinaryFunction &BF, + const MCInstReference &Inst) { + // Some BC.MIB->isXYZ(Inst) methods simply delegate to MCInstrDesc::isXYZ() + // (such as isBranch at the time of writing this comment), some don't (such + // as isCall). For that reason, call MCInstrDesc's methods explicitly when + // it is important. + const MCInstrDesc &Desc = + BC.MII->get(static_cast(Inst).getOpcode()); + // Tail call should be a branch (but not necessarily an indirect one). + if (!Desc.isBranch()) +return false; + + // Always analyze the branches already marked as tail calls by BOLT. + if (BC.MIB->isTailCall(Inst)) +return true; + + // Try to also check the branches marked as "UNKNOWN CONTROL FLOW" - the + // below is a simplified condition from BinaryContext::printInstruction. + bool IsUnknownControlFlow = + BC.MIB->isIndirectBranch(Inst) && !BC.MIB->getJumpTable(Inst); + + if (BF.hasCFG() && IsUnknownControlFlow) +return true; + + return false; +} + +static std::optional> +shouldReportUnsafeTailCall(const BinaryContext &BC, const BinaryFunction &BF, + const MCInstReference &Inst, const SrcState &S) { + static const GadgetKind UntrustedLRKind( + "untrusted link register found before tail call"); + + if (!shouldAnalyzeTailCallInst(BC, BF, Inst)) +return std::nullopt; + + // Not only the set of registers returned by getTrustedLiveInRegs() can be + // seen as a reasonable target-independent _approximation_ of "the LR", these + // are *exactly* those registers used by SrcSafetyAnalysis to initialize the + // set of trusted registers on function entry. + // Thus, this function basically checks that the precondition expected to be + // imposed by a function call instruction (which is hardcoded into the target- + // specific getTrustedLiveInRegs() function) is also respected on tail calls. + SmallVector RegsToCheck = BC.MIB->getTrustedLiveInRegs(); + LLVM_DEBUG({ +traceInst(BC, "Found tail call inst", Inst); +traceRegMask(BC, "Trusted regs", S.TrustedRegs); + }); + + // In musl on AArch64, the _start function sets LR to zero and calls the next + // stage initialization function at the end, something along these lines: + // + // _start: + // mov x30, #0 + // ; ... other initialization ... + // b _start_c ; performs "exit" system call at some point + // + // As this would produce a false positive for every executable linked with + // such libc, ignore tail calls performed by ELF entry function. + if (BC.StartFunctionAddress && + *BC.StartFunctionAddress == Inst.getFunction()->getAddress()) { +LLVM_DEBUG({ dbgs() << " Skipping tail call in ELF entry function.\n"; }); +return std::nullopt; + } + + // Returns at most one report per instruction - this is probably OK... + for (auto Reg : RegsToCheck) +if (!S.TrustedRegs[Reg]) + return make_gadget_report(UntrustedLRKind, Inst, Reg); + + return std::nullopt; +} + static std::optional> shouldReportCallGadget(const BinaryContext &BC, const MCInstReference &Inst, const SrcState &S) { @@ -1462,6 +1539,9 @@ void FunctionAnalysisContext::findUnsafeUses( if (PacRetGadgetsOnly) return; +if (auto Report = shouldReportUnsafeTailCall(BC, BF, Inst, S)) + Reports.push_back(*Report); + if (auto Report = shouldReportCallGadget(BC, Inst, S))
[llvm-branch-commits] [llvm] [BOLT] Introduce helpers to match `MCInst`s one at a time (NFC) (PR #138883)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/138883 >From af01b4e2be6387240a8cbac90d937e37a3413148 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Wed, 7 May 2025 16:42:00 +0300 Subject: [PATCH] [BOLT] Introduce helpers to match `MCInst`s one at a time (NFC) Introduce matchInst helper function to capture and/or match the operands of MCInst. Unlike the existing `MCPlusBuilder::MCInstMatcher` machinery, matchInst is intended for the use cases when precise control over the instruction order is required. For example, when validating PtrAuth hardening, all registers are usually considered unsafe after a function call, even though callee-saved registers should preserve their old values *under normal operation*. --- bolt/include/bolt/Core/MCInstUtils.h | 128 ++ .../Target/AArch64/AArch64MCPlusBuilder.cpp | 90 +--- 2 files changed, 162 insertions(+), 56 deletions(-) diff --git a/bolt/include/bolt/Core/MCInstUtils.h b/bolt/include/bolt/Core/MCInstUtils.h index 69bf5e6159b74..50b7d56470c99 100644 --- a/bolt/include/bolt/Core/MCInstUtils.h +++ b/bolt/include/bolt/Core/MCInstUtils.h @@ -162,6 +162,134 @@ static inline raw_ostream &operator<<(raw_ostream &OS, return Ref.print(OS); } +/// Instruction-matching helpers operating on a single instruction at a time. +/// +/// Unlike MCPlusBuilder::MCInstMatcher, this matchInst() function focuses on +/// the cases where a precise control over the instruction order is important: +/// +/// // Bring the short names into the local scope: +/// using namespace MCInstMatcher; +/// // Declare the registers to capture: +/// Reg Xn, Xm; +/// // Capture the 0th and 1st operands, match the 2nd operand against the +/// // just captured Xm register, match the 3rd operand against literal 0: +/// if (!matchInst(MaybeAdd, AArch64::ADDXrs, Xm, Xn, Xm, Imm(0)) +/// return AArch64::NoRegister; +/// // Match the 0th operand against Xm: +/// if (!matchInst(MaybeBr, AArch64::BR, Xm)) +/// return AArch64::NoRegister; +/// // Return the matched register: +/// return Xm.get(); +namespace MCInstMatcher { + +// The base class to match an operand of type T. +// +// The subclasses of OpMatcher are intended to be allocated on the stack and +// to only be used by passing them to matchInst() and by calling their get() +// function, thus the peculiar `mutable` specifiers: to make the calling code +// compact and readable, the templated matchInst() function has to accept both +// long-lived Imm/Reg wrappers declared as local variables (intended to capture +// the first operand's value and match the subsequent operands, whether inside +// a single instruction or across multiple instructions), as well as temporary +// wrappers around literal values to match, f.e. Imm(42) or Reg(AArch64::XZR). +template class OpMatcher { + mutable std::optional Value; + mutable std::optional SavedValue; + + // Remember/restore the last Value - to be called by matchInst. + void remember() const { SavedValue = Value; } + void restore() const { Value = SavedValue; } + + template + friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...); + +protected: + OpMatcher(std::optional ValueToMatch) : Value(ValueToMatch) {} + + bool matchValue(T OpValue) const { +// Check that OpValue does not contradict the existing Value. +bool MatchResult = !Value || *Value == OpValue; +// If MatchResult is false, all matchers will be reset before returning from +// matchInst, including this one, thus no need to assign conditionally. +Value = OpValue; + +return MatchResult; + } + +public: + /// Returns the captured value. + T get() const { +assert(Value.has_value()); +return *Value; + } +}; + +class Reg : public OpMatcher { + bool matches(const MCOperand &Op) const { +if (!Op.isReg()) + return false; + +return matchValue(Op.getReg()); + } + + template + friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...); + +public: + Reg(std::optional RegToMatch = std::nullopt) + : OpMatcher(RegToMatch) {} +}; + +class Imm : public OpMatcher { + bool matches(const MCOperand &Op) const { +if (!Op.isImm()) + return false; + +return matchValue(Op.getImm()); + } + + template + friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...); + +public: + Imm(std::optional ImmToMatch = std::nullopt) + : OpMatcher(ImmToMatch) {} +}; + +/// Tries to match Inst and updates Ops on success. +/// +/// If Inst has the specified Opcode and its operand list prefix matches Ops, +/// this function returns true and updates Ops, otherwise false is returned and +/// values of Ops are kept as before matchInst was called. +/// +/// Please note that while Ops are technically passed by a const reference to +/// make invocations like `matchInst(MI, Opcode, Imm(42))` possible, all their +/// fields are marked mut
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: prevent false positives due to jump tables (PR #138884)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/138884 >From b0eeddba47f56f0b917c4a43a744f120ea8e1d6e Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 6 May 2025 11:31:03 +0300 Subject: [PATCH] [BOLT] Gadget scanner: prevent false positives due to jump tables As part of PAuth hardening, AArch64 LLVM backend can use a special BR_JumpTable pseudo (enabled by -faarch64-jump-table-hardening Clang option) which is expanded in the AsmPrinter into a contiguous sequence without unsafe instructions in the middle. This commit adds another target-specific callback to MCPlusBuilder to make it possible to inhibit false positives for known-safe jump table dispatch sequences. Without special handling, the branch instruction is likely to be reported as a non-protected call (as its destination is not produced by an auth instruction, PC-relative address materialization, etc.) and possibly as a tail call being performed with unsafe link register (as the detection whether the branch instruction is a tail call is an heuristic). For now, only the specific instruction sequence used by the AArch64 LLVM backend is matched. --- bolt/include/bolt/Core/MCInstUtils.h | 9 + bolt/include/bolt/Core/MCPlusBuilder.h| 14 + bolt/lib/Core/MCInstUtils.cpp | 20 + bolt/lib/Passes/PAuthGadgetScanner.cpp| 10 + .../Target/AArch64/AArch64MCPlusBuilder.cpp | 73 ++ .../AArch64/gs-pauth-jump-table.s | 703 ++ 6 files changed, 829 insertions(+) create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-jump-table.s diff --git a/bolt/include/bolt/Core/MCInstUtils.h b/bolt/include/bolt/Core/MCInstUtils.h index 50b7d56470c99..33d36cccbcfff 100644 --- a/bolt/include/bolt/Core/MCInstUtils.h +++ b/bolt/include/bolt/Core/MCInstUtils.h @@ -154,6 +154,15 @@ class MCInstReference { return nullptr; } + /// Returns the only preceding instruction, or std::nullopt if multiple or no + /// predecessors are possible. + /// + /// If CFG information is available, basic block boundary can be crossed, + /// provided there is exactly one predecessor. If CFG is not available, the + /// preceding instruction in the offset order is returned, unless this is the + /// first instruction of the function. + std::optional getSinglePredecessor(); + raw_ostream &print(raw_ostream &OS) const; }; diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h index c8cbcaf33f4b5..3abf4d18e94da 100644 --- a/bolt/include/bolt/Core/MCPlusBuilder.h +++ b/bolt/include/bolt/Core/MCPlusBuilder.h @@ -14,6 +14,7 @@ #ifndef BOLT_CORE_MCPLUSBUILDER_H #define BOLT_CORE_MCPLUSBUILDER_H +#include "bolt/Core/MCInstUtils.h" #include "bolt/Core/MCPlus.h" #include "bolt/Core/Relocation.h" #include "llvm/ADT/ArrayRef.h" @@ -700,6 +701,19 @@ class MCPlusBuilder { return std::nullopt; } + /// Tests if BranchInst corresponds to an instruction sequence which is known + /// to be a safe dispatch via jump table. + /// + /// The target can decide which instruction sequences to consider "safe" from + /// the Pointer Authentication point of view, such as any jump table dispatch + /// sequence without function calls inside, any sequence which is contiguous, + /// or only some specific well-known sequences. + virtual bool + isSafeJumpTableBranchForPtrAuth(MCInstReference BranchInst) const { +llvm_unreachable("not implemented"); +return false; + } + virtual bool isTerminator(const MCInst &Inst) const; virtual bool isNoop(const MCInst &Inst) const { diff --git a/bolt/lib/Core/MCInstUtils.cpp b/bolt/lib/Core/MCInstUtils.cpp index 40f6edd59135c..b7c6d898988af 100644 --- a/bolt/lib/Core/MCInstUtils.cpp +++ b/bolt/lib/Core/MCInstUtils.cpp @@ -55,3 +55,23 @@ raw_ostream &MCInstReference::print(raw_ostream &OS) const { OS << ">"; return OS; } + +std::optional MCInstReference::getSinglePredecessor() { + if (const RefInBB *Ref = tryGetRefInBB()) { +if (Ref->It != Ref->BB->begin()) + return MCInstReference(Ref->BB, &*std::prev(Ref->It)); + +if (Ref->BB->pred_size() != 1) + return std::nullopt; + +BinaryBasicBlock *PredBB = *Ref->BB->pred_begin(); +assert(!PredBB->empty() && "Empty basic blocks are not supported yet"); +return MCInstReference(PredBB, &*PredBB->rbegin()); + } + + const RefInBF &Ref = getRefInBF(); + if (Ref.It == Ref.BF->instrs().begin()) +return std::nullopt; + + return MCInstReference(Ref.BF, std::prev(Ref.It)); +} diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 762c08ffd933e..e9ed44a47bf6f 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -1351,6 +1351,11 @@ shouldReportUnsafeTailCall(const BinaryContext &BC, const BinaryFunction &BF, return std::nullopt; } + if (BC.MIB->isSafeJumpTableBranchForPtrAuth(Inst)) { +LL
[llvm-branch-commits] [llvm] [BOLT] Factor out MCInstReference from gadget scanner (NFC) (PR #138655)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/138655 >From 5b9848cf82a1f047d90c1482404ac60f730892cf Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Mon, 28 Apr 2025 18:35:48 +0300 Subject: [PATCH] [BOLT] Factor out MCInstReference from gadget scanner (NFC) Move MCInstReference representing a constant reference to an instruction inside a parent entity - either inside a basic block (which has a reference to its parent function) or directly to the function (when CFG information is not available). --- bolt/include/bolt/Core/MCInstUtils.h | 168 + bolt/include/bolt/Passes/PAuthGadgetScanner.h | 178 +- bolt/lib/Core/CMakeLists.txt | 1 + bolt/lib/Core/MCInstUtils.cpp | 57 ++ bolt/lib/Passes/PAuthGadgetScanner.cpp| 102 +- 5 files changed, 269 insertions(+), 237 deletions(-) create mode 100644 bolt/include/bolt/Core/MCInstUtils.h create mode 100644 bolt/lib/Core/MCInstUtils.cpp diff --git a/bolt/include/bolt/Core/MCInstUtils.h b/bolt/include/bolt/Core/MCInstUtils.h new file mode 100644 index 0..69bf5e6159b74 --- /dev/null +++ b/bolt/include/bolt/Core/MCInstUtils.h @@ -0,0 +1,168 @@ +//===- bolt/Core/MCInstUtils.h --*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#ifndef BOLT_CORE_MCINSTUTILS_H +#define BOLT_CORE_MCINSTUTILS_H + +#include "bolt/Core/BinaryBasicBlock.h" + +#include +#include +#include + +namespace llvm { +namespace bolt { + +class BinaryFunction; + +/// MCInstReference represents a reference to a constant MCInst as stored either +/// in a BinaryFunction (i.e. before a CFG is created), or in a BinaryBasicBlock +/// (after a CFG is created). +class MCInstReference { + using nocfg_const_iterator = std::map::const_iterator; + + // Two cases are possible: + // * functions with CFG reconstructed - a function stores a collection of + // basic blocks, each basic block stores a contiguous vector of MCInst + // * functions without CFG - there are no basic blocks created, + // the instructions are directly stored in std::map in BinaryFunction + // + // In both cases, the direct parent of MCInst is stored together with an + // iterator pointing to the instruction. + + // Helper struct: CFG is available, the direct parent is a basic block, + // iterator's type is `MCInst *`. + struct RefInBB { +RefInBB(const BinaryBasicBlock *BB, const MCInst *Inst) +: BB(BB), It(Inst) {} +RefInBB(const RefInBB &Other) = default; +RefInBB &operator=(const RefInBB &Other) = default; + +const BinaryBasicBlock *BB; +BinaryBasicBlock::const_iterator It; + +bool operator<(const RefInBB &Other) const { + return std::tie(BB, It) < std::tie(Other.BB, Other.It); +} + +bool operator==(const RefInBB &Other) const { + return BB == Other.BB && It == Other.It; +} + }; + + // Helper struct: CFG is *not* available, the direct parent is a function, + // iterator's type is std::map::iterator (the mapped value + // is an instruction's offset). + struct RefInBF { +RefInBF(const BinaryFunction *BF, nocfg_const_iterator It) +: BF(BF), It(It) {} +RefInBF(const RefInBF &Other) = default; +RefInBF &operator=(const RefInBF &Other) = default; + +const BinaryFunction *BF; +nocfg_const_iterator It; + +bool operator<(const RefInBF &Other) const { + return std::tie(BF, It->first) < std::tie(Other.BF, Other.It->first); +} + +bool operator==(const RefInBF &Other) const { + return BF == Other.BF && It->first == Other.It->first; +} + }; + + std::variant Reference; + + // Utility methods to be used like this: + // + // if (auto *Ref = tryGetRefInBB()) + // return Ref->doSomething(...); + // return getRefInBF().doSomethingElse(...); + const RefInBB *tryGetRefInBB() const { +assert(std::get_if(&Reference) || + std::get_if(&Reference)); +return std::get_if(&Reference); + } + const RefInBF &getRefInBF() const { +assert(std::get_if(&Reference)); +return *std::get_if(&Reference); + } + +public: + /// Constructs an empty reference. + MCInstReference() : Reference(RefInBB(nullptr, nullptr)) {} + /// Constructs a reference to the instruction inside the basic block. + MCInstReference(const BinaryBasicBlock *BB, const MCInst *Inst) + : Reference(RefInBB(BB, Inst)) { +assert(BB && Inst && "Neither BB nor Inst should be nullptr"); + } + /// Constructs a reference to the instruction inside the basic block. + MCInstReference(const BinaryBasicBlock *BB, unsigned Index) + : Reference(RefInBB(BB, &BB->getInstructionAtIndex(I
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: fix LR to be safe in leaf functions without CFG (PR #141824)
atrosinenko wrote: Factored this out of #137224. https://github.com/llvm/llvm-project/pull/141824 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add is.shared/is.private intrinsics to isBoolSGPR (PR #141804)
arsenm wrote: ### Merge activity * **May 28, 7:25 PM UTC**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/141804). https://github.com/llvm/llvm-project/pull/141804 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add overflow operations to isBoolSGPR (PR #141803)
arsenm wrote: ### Merge activity * **May 28, 7:25 PM UTC**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/141803). https://github.com/llvm/llvm-project/pull/141803 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [utils][TableGen] Handle versions on clause/directive spellings (PR #141766)
https://github.com/kparzysz updated https://github.com/llvm/llvm-project/pull/141766 >From 2ef30aacee4d80c0e4a925aa5ba9416423d10b1b Mon Sep 17 00:00:00 2001 From: Krzysztof Parzyszek Date: Tue, 27 May 2025 07:55:04 -0500 Subject: [PATCH 1/3] [utils][TableGen] Handle versions on clause/directive spellings In "getDirectiveName(Kind, Version)", return the spelling that corresponds to Version, and in "getDirectiveKindAndVersions(Name)" return the pair {Kind, VersionRange}, where VersionRange contains the minimum and the maximum versions that allow "Name" as a spelling. This applies to clauses as well. In general it applies to classes that have spellings (defined via TableGen class "Spelling"). Given a Kind and a Version, getting the corresponding spelling requires a runtime search (which can fail in a general case). To avoid generating the search function inline, a small additional component of llvm/Frontent was added: LLVMFrontendDirective. The corresponding header file also defines C++ classes "Spelling" and "VersionRange", which are used in TableGen/DirectiveEmitter as well. For background information see https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507 --- .../llvm/Frontend/Directive/Spelling.h| 39 + llvm/include/llvm/TableGen/DirectiveEmitter.h | 25 +-- llvm/lib/Frontend/CMakeLists.txt | 1 + llvm/lib/Frontend/Directive/CMakeLists.txt| 6 + llvm/lib/Frontend/Directive/Spelling.cpp | 31 llvm/lib/Frontend/OpenACC/CMakeLists.txt | 2 +- llvm/lib/Frontend/OpenMP/CMakeLists.txt | 1 + llvm/test/TableGen/directive1.td | 34 ++-- llvm/test/TableGen/directive2.td | 24 +-- .../utils/TableGen/Basic/DirectiveEmitter.cpp | 146 +++--- 10 files changed, 212 insertions(+), 97 deletions(-) create mode 100644 llvm/include/llvm/Frontend/Directive/Spelling.h create mode 100644 llvm/lib/Frontend/Directive/CMakeLists.txt create mode 100644 llvm/lib/Frontend/Directive/Spelling.cpp diff --git a/llvm/include/llvm/Frontend/Directive/Spelling.h b/llvm/include/llvm/Frontend/Directive/Spelling.h new file mode 100644 index 0..3ba0ae2296535 --- /dev/null +++ b/llvm/include/llvm/Frontend/Directive/Spelling.h @@ -0,0 +1,39 @@ +//===-- Spelling.h C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +#ifndef LLVM_FRONTEND_DIRECTIVE_SPELLING_H +#define LLVM_FRONTEND_DIRECTIVE_SPELLING_H + +#include "llvm/ADT/StringRef.h" +#include "llvm/ADT/iterator_range.h" + +#include + +namespace llvm::directive { + +struct VersionRange { + static constexpr int MaxValue = std::numeric_limits::max(); + int Min = 1; + int Max = MaxValue; +}; + +inline bool operator<(const VersionRange &A, const VersionRange &B) { + if (A.Min != B.Min) +return A.Min < B.Min; + return A.Max < B.Max; +} + +struct Spelling { + StringRef Name; + VersionRange Versions; +}; + +StringRef FindName(llvm::iterator_range, unsigned Version); + +} // namespace llvm::directive + +#endif // LLVM_FRONTEND_DIRECTIVE_SPELLING_H diff --git a/llvm/include/llvm/TableGen/DirectiveEmitter.h b/llvm/include/llvm/TableGen/DirectiveEmitter.h index 1235b7638e761..c7d7460087723 100644 --- a/llvm/include/llvm/TableGen/DirectiveEmitter.h +++ b/llvm/include/llvm/TableGen/DirectiveEmitter.h @@ -17,6 +17,7 @@ #include "llvm/ADT/STLExtras.h" #include "llvm/ADT/StringExtras.h" #include "llvm/ADT/StringRef.h" +#include "llvm/Frontend/Directive/Spelling.h" #include "llvm/Support/MathExtras.h" #include "llvm/TableGen/Record.h" #include @@ -113,29 +114,19 @@ class Versioned { constexpr static int IntWidth = 8 * sizeof(int); }; -// Range of specification versions: [Min, Max] -// Default value: all possible versions. -// This is the same structure as the one emitted into the generated sources. -#define STRUCT_VERSION_RANGE \ - struct VersionRange { \ -int Min = 1; \ -int Max = INT_MAX; \ - } - -STRUCT_VERSION_RANGE; - class Spelling : public Versioned { public: - using Value = std::pair; + using Value = llvm::directive::Spelling; Spelling(const Record *Def) : Def(Def) {} StringRef getText() const { return Def->getValueAsString("spelling"); } - VersionRange getVersions() const { -return VersionRange{getMinVersion(Def), getMaxVersion(Def)}; + llvm::directive::VersionRange getVersions() const { +return llvm::directive::VersionRange{getMinVersion(Def), +
[llvm-branch-commits] [llvm] [utils][TableGen] Unify converting names to upper-camel case (PR #141762)
llvmbot wrote: @llvm/pr-subscribers-tablegen Author: Krzysztof Parzyszek (kparzysz) Changes There were 3 different functions in DirectiveEmitter.cpp doing essentially the same thing: taking a name separated with _ or whitepace, and converting it to the upper-camel case. Extract that into a single function that can handle different sets of separators. --- Full diff: https://github.com/llvm/llvm-project/pull/141762.diff 2 Files Affected: - (modified) llvm/include/llvm/TableGen/DirectiveEmitter.h (+32-44) - (modified) llvm/utils/TableGen/Basic/DirectiveEmitter.cpp (+1-1) ``diff diff --git a/llvm/include/llvm/TableGen/DirectiveEmitter.h b/llvm/include/llvm/TableGen/DirectiveEmitter.h index 8615442ebff9f..48e18de0904c0 100644 --- a/llvm/include/llvm/TableGen/DirectiveEmitter.h +++ b/llvm/include/llvm/TableGen/DirectiveEmitter.h @@ -113,14 +113,39 @@ class BaseRecord { // Returns the name of the directive formatted for output. Whitespace are // replaced with underscores. - static std::string formatName(StringRef Name) { + static std::string getSnakeName(StringRef Name) { std::string N = Name.str(); llvm::replace(N, ' ', '_'); return N; } + static std::string getUpperCamelName(StringRef Name, StringRef Sep) { +std::string Camel = Name.str(); +// Convert to uppercase +bool Cap = true; +llvm::transform(Camel, Camel.begin(), [&](unsigned char C) { + if (Sep.contains(C)) { +assert(!Cap && "No initial or repeated separators"); +Cap = true; + } else if (Cap) { +C = llvm::toUpper(C); +Cap = false; + } + return C; +}); +size_t Out = 0; +// Remove separators +for (size_t In = 0, End = Camel.size(); In != End; ++In) { + unsigned char C = Camel[In]; + if (!Sep.contains(C)) +Camel[Out++] = C; +} +Camel.resize(Out); +return Camel; + } + std::string getFormattedName() const { -return formatName(Def->getValueAsString("name")); +return getSnakeName(Def->getValueAsString("name")); } bool isDefault() const { return Def->getValueAsBit("isDefault"); } @@ -172,26 +197,13 @@ class Directive : public BaseRecord { // Clang uses a different format for names of its directives enum. std::string getClangAccSpelling() const { -std::string Name = Def->getValueAsString("name").str(); +StringRef Name = Def->getValueAsString("name"); // Clang calls the 'unknown' value 'invalid'. if (Name == "unknown") return "Invalid"; -// Clang entries all start with a capital letter, so apply that. -Name[0] = std::toupper(Name[0]); -// Additionally, spaces/underscores are handled by capitalizing the next -// letter of the name and removing the space/underscore. -for (unsigned I = 0; I < Name.size(); ++I) { - if (Name[I] == ' ' || Name[I] == '_') { -Name.erase(I, 1); -assert(Name[I] != ' ' && Name[I] != '_' && - "No double spaces/underscores"); -Name[I] = std::toupper(Name[I]); - } -} - -return Name; +return BaseRecord::getUpperCamelName(Name, " _"); } }; @@ -218,19 +230,7 @@ class Clause : public BaseRecord { // num_threads -> NumThreads std::string getFormattedParserClassName() const { StringRef Name = Def->getValueAsString("name"); -std::string N = Name.str(); -bool Cap = true; -llvm::transform(N, N.begin(), [&Cap](unsigned char C) { - if (Cap == true) { -C = toUpper(C); -Cap = false; - } else if (C == '_') { -Cap = true; - } - return C; -}); -erase(N, '_'); -return N; +return BaseRecord::getUpperCamelName(Name, "_"); } // Clang uses a different format for names of its clause enum, which can be @@ -241,20 +241,8 @@ class Clause : public BaseRecord { !ClangSpelling.empty()) return ClangSpelling.str(); -std::string Name = Def->getValueAsString("name").str(); -// Clang entries all start with a capital letter, so apply that. -Name[0] = std::toupper(Name[0]); -// Additionally, underscores are handled by capitalizing the next letter of -// the name and removing the underscore. -for (unsigned I = 0; I < Name.size(); ++I) { - if (Name[I] == '_') { -Name.erase(I, 1); -assert(Name[I] != '_' && "No double underscores"); -Name[I] = std::toupper(Name[I]); - } -} - -return Name; +StringRef Name = Def->getValueAsString("name"); +return BaseRecord::getUpperCamelName(Name, "_"); } // Optional field. diff --git a/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp b/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp index f459e7c98ebc1..9e79a83ed6e18 100644 --- a/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp +++ b/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp @@ -839,7 +839,7 @@ static void generateGetDirectiveLanguages(const DirectiveLanguage &DirLang, D.getSourceLangu
[llvm-branch-commits] [llvm] [utils][TableGen] Unify converting names to upper-camel case (PR #141762)
https://github.com/kparzysz created https://github.com/llvm/llvm-project/pull/141762 There were 3 different functions in DirectiveEmitter.cpp doing essentially the same thing: taking a name separated with _ or whitepace, and converting it to the upper-camel case. Extract that into a single function that can handle different sets of separators. >From 78d1f1b2344ab48902b44afd7fb84649b46d6749 Mon Sep 17 00:00:00 2001 From: Krzysztof Parzyszek Date: Wed, 21 May 2025 11:26:33 -0500 Subject: [PATCH] [utils][TableGen] Unify converting names to upper-camel case There were 3 different functions in DirectiveEmitter.cpp doing essentially the same thing: taking a name separated with _ or whitepace, and converting it to the upper-camel case. Extract that into a single function that can handle different sets of separators. --- llvm/include/llvm/TableGen/DirectiveEmitter.h | 76 --- .../utils/TableGen/Basic/DirectiveEmitter.cpp | 2 +- 2 files changed, 33 insertions(+), 45 deletions(-) diff --git a/llvm/include/llvm/TableGen/DirectiveEmitter.h b/llvm/include/llvm/TableGen/DirectiveEmitter.h index 8615442ebff9f..48e18de0904c0 100644 --- a/llvm/include/llvm/TableGen/DirectiveEmitter.h +++ b/llvm/include/llvm/TableGen/DirectiveEmitter.h @@ -113,14 +113,39 @@ class BaseRecord { // Returns the name of the directive formatted for output. Whitespace are // replaced with underscores. - static std::string formatName(StringRef Name) { + static std::string getSnakeName(StringRef Name) { std::string N = Name.str(); llvm::replace(N, ' ', '_'); return N; } + static std::string getUpperCamelName(StringRef Name, StringRef Sep) { +std::string Camel = Name.str(); +// Convert to uppercase +bool Cap = true; +llvm::transform(Camel, Camel.begin(), [&](unsigned char C) { + if (Sep.contains(C)) { +assert(!Cap && "No initial or repeated separators"); +Cap = true; + } else if (Cap) { +C = llvm::toUpper(C); +Cap = false; + } + return C; +}); +size_t Out = 0; +// Remove separators +for (size_t In = 0, End = Camel.size(); In != End; ++In) { + unsigned char C = Camel[In]; + if (!Sep.contains(C)) +Camel[Out++] = C; +} +Camel.resize(Out); +return Camel; + } + std::string getFormattedName() const { -return formatName(Def->getValueAsString("name")); +return getSnakeName(Def->getValueAsString("name")); } bool isDefault() const { return Def->getValueAsBit("isDefault"); } @@ -172,26 +197,13 @@ class Directive : public BaseRecord { // Clang uses a different format for names of its directives enum. std::string getClangAccSpelling() const { -std::string Name = Def->getValueAsString("name").str(); +StringRef Name = Def->getValueAsString("name"); // Clang calls the 'unknown' value 'invalid'. if (Name == "unknown") return "Invalid"; -// Clang entries all start with a capital letter, so apply that. -Name[0] = std::toupper(Name[0]); -// Additionally, spaces/underscores are handled by capitalizing the next -// letter of the name and removing the space/underscore. -for (unsigned I = 0; I < Name.size(); ++I) { - if (Name[I] == ' ' || Name[I] == '_') { -Name.erase(I, 1); -assert(Name[I] != ' ' && Name[I] != '_' && - "No double spaces/underscores"); -Name[I] = std::toupper(Name[I]); - } -} - -return Name; +return BaseRecord::getUpperCamelName(Name, " _"); } }; @@ -218,19 +230,7 @@ class Clause : public BaseRecord { // num_threads -> NumThreads std::string getFormattedParserClassName() const { StringRef Name = Def->getValueAsString("name"); -std::string N = Name.str(); -bool Cap = true; -llvm::transform(N, N.begin(), [&Cap](unsigned char C) { - if (Cap == true) { -C = toUpper(C); -Cap = false; - } else if (C == '_') { -Cap = true; - } - return C; -}); -erase(N, '_'); -return N; +return BaseRecord::getUpperCamelName(Name, "_"); } // Clang uses a different format for names of its clause enum, which can be @@ -241,20 +241,8 @@ class Clause : public BaseRecord { !ClangSpelling.empty()) return ClangSpelling.str(); -std::string Name = Def->getValueAsString("name").str(); -// Clang entries all start with a capital letter, so apply that. -Name[0] = std::toupper(Name[0]); -// Additionally, underscores are handled by capitalizing the next letter of -// the name and removing the underscore. -for (unsigned I = 0; I < Name.size(); ++I) { - if (Name[I] == '_') { -Name.erase(I, 1); -assert(Name[I] != '_' && "No double underscores"); -Name[I] = std::toupper(Name[I]); - } -} - -return Name; +StringRef Name = Def->getValueAsString("name"); +return BaseRecord::getUpperCamelName(Name, "_"); }
[llvm-branch-commits] [llvm] [utils][TableGen] Treat clause aliases equally with names (PR #141763)
https://github.com/kparzysz created https://github.com/llvm/llvm-project/pull/141763 The code in DirectiveEmitter that generates clause parsers sorted clause names to ensure that longer names were tried before shorter ones, in cases where a shorter name may be a prefix of a longer one. This matters in the strict Fortran source format, since whitespace is ignored there. This sorting did not take into account clause aliases, which are just alternative names. These extra names were not protected in the same way, and were just appended immediately after the primary name. This patch generates a list of pairs Record+Name, where a given record can appear multiple times with different names. Sort that list and use it to generate parsers for each record. What used to be ``` ("fred" || "f") >> construct{} || "foo" << construct{} ``` is now ``` "fred" >> construct{} || "foo" >> construct{} || "f" >> construct{} ``` >From e7d2e0b40eae0bf37f76d0aa8a59520b529c760c Mon Sep 17 00:00:00 2001 From: Krzysztof Parzyszek Date: Wed, 21 May 2025 14:23:38 -0500 Subject: [PATCH] [utils][TableGen] Treat clause aliases equally with names The code in DirectiveEmitter that generates clause parsers sorted clause names to ensure that longer names were tried before shorter ones, in cases where a shorter name may be a prefix of a longer one. This matters in the strict Fortran source format, since whitespace is ignored there. This sorting did not take into account clause aliases, which are just alternative names. These extra names were not protected in the same way, and were just appended immediately after the primary name. This patch generates a list of pairs Record+Name, where a given record can appear multiple times with different names. Sort that list and use it to generate parsers for each record. What used to be ``` ("fred" || "f") >> construct{} || "foo" << construct{} ``` is now ``` "fred" >> construct{} || "foo" >> construct{} || "f" >> construct{} ``` --- llvm/test/TableGen/directive1.td | 4 +- .../utils/TableGen/Basic/DirectiveEmitter.cpp | 75 ++- 2 files changed, 42 insertions(+), 37 deletions(-) diff --git a/llvm/test/TableGen/directive1.td b/llvm/test/TableGen/directive1.td index 74091edfa2a66..f756f54c03bfb 100644 --- a/llvm/test/TableGen/directive1.td +++ b/llvm/test/TableGen/directive1.td @@ -34,6 +34,7 @@ def TDLC_ClauseB : Clause<"clauseb"> { } def TDLC_ClauseC : Clause<"clausec"> { + let aliases = ["ccc"]; let flangClass = "IntExpr"; let isValueList = 1; } @@ -260,7 +261,8 @@ def TDL_DirA : Directive<"dira"> { // IMPL-NEXT: TYPE_PARSER( // IMPL-NEXT:"clausec" >> construct(construct(parenthesized(nonemptyList(Parser{} || // IMPL-NEXT:"clauseb" >> construct(construct(maybe(parenthesized(Parser{} || -// IMPL-NEXT:"clausea" >> construct(construct()) +// IMPL-NEXT:"clausea" >> construct(construct()) || +// IMPL-NEXT:"ccc" >> construct(construct(parenthesized(nonemptyList(Parser{} // IMPL-NEXT: ) // IMPL-EMPTY: // IMPL-NEXT: #endif // GEN_FLANG_CLAUSES_PARSER diff --git a/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp b/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp index 9e79a83ed6e18..bd6c543e1741a 100644 --- a/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp +++ b/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp @@ -608,7 +608,7 @@ static void emitLeafTable(const DirectiveLanguage &DirLang, raw_ostream &OS, std::vector Ordering(Directives.size()); std::iota(Ordering.begin(), Ordering.end(), 0); - sort(Ordering, [&](int A, int B) { + llvm::sort(Ordering, [&](int A, int B) { auto &LeavesA = LeafTable[A]; auto &LeavesB = LeafTable[B]; int DirA = LeavesA[0], DirB = LeavesB[0]; @@ -1113,59 +1113,63 @@ static void generateFlangClauseParserKindMap(const DirectiveLanguage &DirLang, << " Parser clause\");\n"; } -static bool compareClauseName(const Record *R1, const Record *R2) { - Clause C1(R1); - Clause C2(R2); - return (C1.getName() > C2.getName()); +using RecordWithText = std::pair; + +static bool compareRecordText(const RecordWithText &A, + const RecordWithText &B) { + return A.second > B.second; +} + +static std::vector +getSpellingTexts(ArrayRef Records) { + std::vector List; + for (const Record *R : Records) { +Clause C(R); +List.push_back(std::make_pair(R, C.getName())); +llvm::transform(C.getAliases(), std::back_inserter(List), +[R](StringRef S) { return std::make_pair(R, S); }); + } + return List; } // Generate the parser for the clauses. static void generateFlangClausesParser(const DirectiveLanguage &DirLang, raw_ostream &OS) { std::vector Clauses = DirLang.getClauses(); - // Sort clauses in reverse alphabetical order so with clauses with same - // beginning, the longer option is tried before. - sort(Clauses, compareClauseName); + /
[llvm-branch-commits] [llvm] [mlir] [utils][TableGen] Implement clause aliases as alternative spellings (PR #141765)
llvmbot wrote: @llvm/pr-subscribers-flang-openmp Author: Krzysztof Parzyszek (kparzysz) Changes Use the spellings in the generated clause parser. The functions `getClauseKind` and `get ClauseName` are not yet updated. The definitions of both clauses and directives now take a list of "Spelling"s instead of a single string. For example ``` def ACCC_Copyin : Clause<[Spelling<"copyin">, Spelling<"present_or_copyin">, Spelling<"pcopyin">]> { ... } ``` A "Spelling" is a versioned string, defaulting to "all versions". For background information see https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507 --- Patch is 106.02 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/141765.diff 9 Files Affected: - (modified) llvm/include/llvm/Frontend/Directive/DirectiveBase.td (+23-18) - (modified) llvm/include/llvm/Frontend/OpenACC/ACC.td (+73-73) - (modified) llvm/include/llvm/Frontend/OpenMP/OMP.td (+252-244) - (modified) llvm/include/llvm/TableGen/DirectiveEmitter.h (+80-5) - (modified) llvm/test/TableGen/directive1.td (+36-24) - (modified) llvm/test/TableGen/directive2.td (+38-25) - (modified) llvm/test/TableGen/directive3.td (+5-5) - (modified) llvm/utils/TableGen/Basic/DirectiveEmitter.cpp (+74-41) - (modified) mlir/test/mlir-tblgen/directive-common.td (+1-1) ``diff diff --git a/llvm/include/llvm/Frontend/Directive/DirectiveBase.td b/llvm/include/llvm/Frontend/Directive/DirectiveBase.td index 582da20083aee..142ba0423f251 100644 --- a/llvm/include/llvm/Frontend/Directive/DirectiveBase.td +++ b/llvm/include/llvm/Frontend/Directive/DirectiveBase.td @@ -51,6 +51,20 @@ class DirectiveLanguage { string flangClauseBaseClass = ""; } +// Base class for versioned entities. +class Versioned { + // Mininum version number where this object is valid. + int minVersion = min; + + // Maximum version number where this object is valid. + int maxVersion = max; +} + +class Spelling +: Versioned { + string spelling = s; +} + // Some clauses take an argument from a predefined list of allowed keyword // values. For example, assume a clause "someclause" with an argument from // the list "foo", "bar", "baz". In the user source code this would look @@ -81,12 +95,9 @@ class EnumVal { } // Information about a specific clause. -class Clause { - // Name of the clause. - string name = c; - - // Define aliases used in the parser. - list aliases = []; +class Clause ss> { + // Spellings of the clause. + list spellings = ss; // Optional class holding value of the clause in clang AST. string clangClass = ""; @@ -134,15 +145,9 @@ class Clause { } // Hold information about clause validity by version. -class VersionedClause { - // Actual clause. +class VersionedClause +: Versioned { Clause clause = c; - - // Mininum version number where this clause is valid. - int minVersion = min; - - // Maximum version number where this clause is valid. - int maxVersion = max; } // Kinds of directive associations. @@ -190,15 +195,15 @@ class SourceLanguage { string name = n; // Name of the enum value in enum class Association. } -// The C languages also implies C++ until there is a reason to add C++ +// The C language also implies C++ until there is a reason to add C++ // separately. def L_C : SourceLanguage<"C"> {} def L_Fortran : SourceLanguage<"Fortran"> {} // Information about a specific directive. -class Directive { - // Name of the directive. Can be composite directive sepearted by whitespace. - string name = d; +class Directive ss> { + // Spellings of the directive. + list spellings = ss; // Clauses cannot appear twice in the three allowed lists below. Also, since // required implies allowed, the same clause cannot appear in both the diff --git a/llvm/include/llvm/Frontend/OpenACC/ACC.td b/llvm/include/llvm/Frontend/OpenACC/ACC.td index b74cd6e5642ec..65751839ceb09 100644 --- a/llvm/include/llvm/Frontend/OpenACC/ACC.td +++ b/llvm/include/llvm/Frontend/OpenACC/ACC.td @@ -32,64 +32,65 @@ def OpenACC : DirectiveLanguage { //===--===// // 2.16.1 -def ACCC_Async : Clause<"async"> { +def ACCC_Async : Clause<[Spelling<"async">]> { let flangClass = "ScalarIntExpr"; let isValueOptional = true; } // 2.9.7 -def ACCC_Auto : Clause<"auto"> {} +def ACCC_Auto : Clause<[Spelling<"auto">]> {} // 2.7.12 -def ACCC_Attach : Clause<"attach"> { +def ACCC_Attach : Clause<[Spelling<"attach">]> { let flangClass = "AccObjectList"; } // 2.15.1 -def ACCC_Bind : Clause<"bind"> { +def ACCC_Bind : Clause<[Spelling<"bind">]> { let flangClass = "AccBindClause"; } // 2.12 -def ACCC_Capture : Clause<"capture"> { +def ACCC_Capture : Clause<[Spelling<"capture">]> { } // 2.9.1 -def ACCC_Collapse : Clause<"collapse"> { +def ACCC_Collapse : Clause<[Spel
[llvm-branch-commits] [llvm] [mlir] [utils][TableGen] Implement clause aliases as alternative spellings (PR #141765)
llvmbot wrote: @llvm/pr-subscribers-mlir @llvm/pr-subscribers-openacc Author: Krzysztof Parzyszek (kparzysz) Changes Use the spellings in the generated clause parser. The functions `getClauseKind` and `get ClauseName` are not yet updated. The definitions of both clauses and directives now take a list of "Spelling"s instead of a single string. For example ``` def ACCC_Copyin : Clause<[Spelling<"copyin">, Spelling<"present_or_copyin">, Spelling<"pcopyin">]> { ... } ``` A "Spelling" is a versioned string, defaulting to "all versions". For background information see https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507 --- Patch is 106.02 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/141765.diff 9 Files Affected: - (modified) llvm/include/llvm/Frontend/Directive/DirectiveBase.td (+23-18) - (modified) llvm/include/llvm/Frontend/OpenACC/ACC.td (+73-73) - (modified) llvm/include/llvm/Frontend/OpenMP/OMP.td (+252-244) - (modified) llvm/include/llvm/TableGen/DirectiveEmitter.h (+80-5) - (modified) llvm/test/TableGen/directive1.td (+36-24) - (modified) llvm/test/TableGen/directive2.td (+38-25) - (modified) llvm/test/TableGen/directive3.td (+5-5) - (modified) llvm/utils/TableGen/Basic/DirectiveEmitter.cpp (+74-41) - (modified) mlir/test/mlir-tblgen/directive-common.td (+1-1) ``diff diff --git a/llvm/include/llvm/Frontend/Directive/DirectiveBase.td b/llvm/include/llvm/Frontend/Directive/DirectiveBase.td index 582da20083aee..142ba0423f251 100644 --- a/llvm/include/llvm/Frontend/Directive/DirectiveBase.td +++ b/llvm/include/llvm/Frontend/Directive/DirectiveBase.td @@ -51,6 +51,20 @@ class DirectiveLanguage { string flangClauseBaseClass = ""; } +// Base class for versioned entities. +class Versioned { + // Mininum version number where this object is valid. + int minVersion = min; + + // Maximum version number where this object is valid. + int maxVersion = max; +} + +class Spelling +: Versioned { + string spelling = s; +} + // Some clauses take an argument from a predefined list of allowed keyword // values. For example, assume a clause "someclause" with an argument from // the list "foo", "bar", "baz". In the user source code this would look @@ -81,12 +95,9 @@ class EnumVal { } // Information about a specific clause. -class Clause { - // Name of the clause. - string name = c; - - // Define aliases used in the parser. - list aliases = []; +class Clause ss> { + // Spellings of the clause. + list spellings = ss; // Optional class holding value of the clause in clang AST. string clangClass = ""; @@ -134,15 +145,9 @@ class Clause { } // Hold information about clause validity by version. -class VersionedClause { - // Actual clause. +class VersionedClause +: Versioned { Clause clause = c; - - // Mininum version number where this clause is valid. - int minVersion = min; - - // Maximum version number where this clause is valid. - int maxVersion = max; } // Kinds of directive associations. @@ -190,15 +195,15 @@ class SourceLanguage { string name = n; // Name of the enum value in enum class Association. } -// The C languages also implies C++ until there is a reason to add C++ +// The C language also implies C++ until there is a reason to add C++ // separately. def L_C : SourceLanguage<"C"> {} def L_Fortran : SourceLanguage<"Fortran"> {} // Information about a specific directive. -class Directive { - // Name of the directive. Can be composite directive sepearted by whitespace. - string name = d; +class Directive ss> { + // Spellings of the directive. + list spellings = ss; // Clauses cannot appear twice in the three allowed lists below. Also, since // required implies allowed, the same clause cannot appear in both the diff --git a/llvm/include/llvm/Frontend/OpenACC/ACC.td b/llvm/include/llvm/Frontend/OpenACC/ACC.td index b74cd6e5642ec..65751839ceb09 100644 --- a/llvm/include/llvm/Frontend/OpenACC/ACC.td +++ b/llvm/include/llvm/Frontend/OpenACC/ACC.td @@ -32,64 +32,65 @@ def OpenACC : DirectiveLanguage { //===--===// // 2.16.1 -def ACCC_Async : Clause<"async"> { +def ACCC_Async : Clause<[Spelling<"async">]> { let flangClass = "ScalarIntExpr"; let isValueOptional = true; } // 2.9.7 -def ACCC_Auto : Clause<"auto"> {} +def ACCC_Auto : Clause<[Spelling<"auto">]> {} // 2.7.12 -def ACCC_Attach : Clause<"attach"> { +def ACCC_Attach : Clause<[Spelling<"attach">]> { let flangClass = "AccObjectList"; } // 2.15.1 -def ACCC_Bind : Clause<"bind"> { +def ACCC_Bind : Clause<[Spelling<"bind">]> { let flangClass = "AccBindClause"; } // 2.12 -def ACCC_Capture : Clause<"capture"> { +def ACCC_Capture : Clause<[Spelling<"capture">]> { } // 2.9.1 -def ACCC_Collapse : Clause<"collapse"> { +def ACCC_Co
[llvm-branch-commits] [llvm] [utils][TableGen] Treat clause aliases equally with names (PR #141763)
llvmbot wrote: @llvm/pr-subscribers-tablegen Author: Krzysztof Parzyszek (kparzysz) Changes The code in DirectiveEmitter that generates clause parsers sorted clause names to ensure that longer names were tried before shorter ones, in cases where a shorter name may be a prefix of a longer one. This matters in the strict Fortran source format, since whitespace is ignored there. This sorting did not take into account clause aliases, which are just alternative names. These extra names were not protected in the same way, and were just appended immediately after the primary name. This patch generates a list of pairs Record+Name, where a given record can appear multiple times with different names. Sort that list and use it to generate parsers for each record. What used to be ``` ("fred" || "f") >> construct{} || "foo" << construct {} ``` is now ``` "fred" >> construct {} || "foo" >> construct {} || "f" >> construct {} ``` --- Full diff: https://github.com/llvm/llvm-project/pull/141763.diff 2 Files Affected: - (modified) llvm/test/TableGen/directive1.td (+3-1) - (modified) llvm/utils/TableGen/Basic/DirectiveEmitter.cpp (+39-36) ``diff diff --git a/llvm/test/TableGen/directive1.td b/llvm/test/TableGen/directive1.td index 74091edfa2a66..f756f54c03bfb 100644 --- a/llvm/test/TableGen/directive1.td +++ b/llvm/test/TableGen/directive1.td @@ -34,6 +34,7 @@ def TDLC_ClauseB : Clause<"clauseb"> { } def TDLC_ClauseC : Clause<"clausec"> { + let aliases = ["ccc"]; let flangClass = "IntExpr"; let isValueList = 1; } @@ -260,7 +261,8 @@ def TDL_DirA : Directive<"dira"> { // IMPL-NEXT: TYPE_PARSER( // IMPL-NEXT:"clausec" >> construct(construct(parenthesized(nonemptyList(Parser{} || // IMPL-NEXT:"clauseb" >> construct(construct(maybe(parenthesized(Parser{} || -// IMPL-NEXT:"clausea" >> construct(construct()) +// IMPL-NEXT:"clausea" >> construct(construct()) || +// IMPL-NEXT:"ccc" >> construct(construct(parenthesized(nonemptyList(Parser{} // IMPL-NEXT: ) // IMPL-EMPTY: // IMPL-NEXT: #endif // GEN_FLANG_CLAUSES_PARSER diff --git a/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp b/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp index 9e79a83ed6e18..bd6c543e1741a 100644 --- a/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp +++ b/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp @@ -608,7 +608,7 @@ static void emitLeafTable(const DirectiveLanguage &DirLang, raw_ostream &OS, std::vector Ordering(Directives.size()); std::iota(Ordering.begin(), Ordering.end(), 0); - sort(Ordering, [&](int A, int B) { + llvm::sort(Ordering, [&](int A, int B) { auto &LeavesA = LeafTable[A]; auto &LeavesB = LeafTable[B]; int DirA = LeavesA[0], DirB = LeavesB[0]; @@ -1113,59 +1113,63 @@ static void generateFlangClauseParserKindMap(const DirectiveLanguage &DirLang, << " Parser clause\");\n"; } -static bool compareClauseName(const Record *R1, const Record *R2) { - Clause C1(R1); - Clause C2(R2); - return (C1.getName() > C2.getName()); +using RecordWithText = std::pair; + +static bool compareRecordText(const RecordWithText &A, + const RecordWithText &B) { + return A.second > B.second; +} + +static std::vector +getSpellingTexts(ArrayRef Records) { + std::vector List; + for (const Record *R : Records) { +Clause C(R); +List.push_back(std::make_pair(R, C.getName())); +llvm::transform(C.getAliases(), std::back_inserter(List), +[R](StringRef S) { return std::make_pair(R, S); }); + } + return List; } // Generate the parser for the clauses. static void generateFlangClausesParser(const DirectiveLanguage &DirLang, raw_ostream &OS) { std::vector Clauses = DirLang.getClauses(); - // Sort clauses in reverse alphabetical order so with clauses with same - // beginning, the longer option is tried before. - sort(Clauses, compareClauseName); + // Sort clauses in the reverse alphabetical order with respect to their + // names and aliases, so that longer names are tried before shorter ones. + std::vector> Names = + getSpellingTexts(Clauses); + llvm::sort(Names, compareRecordText); IfDefScope Scope("GEN_FLANG_CLAUSES_PARSER", OS); StringRef Base = DirLang.getFlangClauseBaseClass(); + unsigned LastIndex = Names.size() - 1; OS << "\n"; - unsigned Index = 0; - unsigned LastClauseIndex = Clauses.size() - 1; OS << "TYPE_PARSER(\n"; - for (const Clause Clause : Clauses) { -const std::vector &Aliases = Clause.getAliases(); -if (Aliases.empty()) { - OS << " \"" << Clause.getName() << "\""; -} else { - OS << " (" - << "\"" << Clause.getName() << "\"_tok"; - for (StringRef Alias : Aliases) { -OS << " || \"" << Alias << "\"_tok"; - } - OS << ")"; -} + for (auto [Index, RecTxt] : llvm::enumera
[llvm-branch-commits] [llvm] [utils][TableGen] Handle versions on clause/directive spellings (PR #141766)
https://github.com/kparzysz created https://github.com/llvm/llvm-project/pull/141766 In "getDirectiveName(Kind, Version)", return the spelling that corresponds to Version, and in "getDirectiveKindAndVersions(Name)" return the pair {Kind, VersionRange}, where VersionRange contains the minimum and the maximum versions that allow "Name" as a spelling. This applies to clauses as well. In general it applies to classes that have spellings (defined via TableGen class "Spelling"). Given a Kind and a Version, getting the corresponding spelling requires a runtime search (which can fail in a general case). To avoid generating the search function inline, a small additional component of llvm/Frontent was added: LLVMFrontendDirective. The corresponding header file also defines C++ classes "Spelling" and "VersionRange", which are used in TableGen/DirectiveEmitter as well. For background information see https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507 >From 2ef30aacee4d80c0e4a925aa5ba9416423d10b1b Mon Sep 17 00:00:00 2001 From: Krzysztof Parzyszek Date: Tue, 27 May 2025 07:55:04 -0500 Subject: [PATCH] [utils][TableGen] Handle versions on clause/directive spellings In "getDirectiveName(Kind, Version)", return the spelling that corresponds to Version, and in "getDirectiveKindAndVersions(Name)" return the pair {Kind, VersionRange}, where VersionRange contains the minimum and the maximum versions that allow "Name" as a spelling. This applies to clauses as well. In general it applies to classes that have spellings (defined via TableGen class "Spelling"). Given a Kind and a Version, getting the corresponding spelling requires a runtime search (which can fail in a general case). To avoid generating the search function inline, a small additional component of llvm/Frontent was added: LLVMFrontendDirective. The corresponding header file also defines C++ classes "Spelling" and "VersionRange", which are used in TableGen/DirectiveEmitter as well. For background information see https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507 --- .../llvm/Frontend/Directive/Spelling.h| 39 + llvm/include/llvm/TableGen/DirectiveEmitter.h | 25 +-- llvm/lib/Frontend/CMakeLists.txt | 1 + llvm/lib/Frontend/Directive/CMakeLists.txt| 6 + llvm/lib/Frontend/Directive/Spelling.cpp | 31 llvm/lib/Frontend/OpenACC/CMakeLists.txt | 2 +- llvm/lib/Frontend/OpenMP/CMakeLists.txt | 1 + llvm/test/TableGen/directive1.td | 34 ++-- llvm/test/TableGen/directive2.td | 24 +-- .../utils/TableGen/Basic/DirectiveEmitter.cpp | 146 +++--- 10 files changed, 212 insertions(+), 97 deletions(-) create mode 100644 llvm/include/llvm/Frontend/Directive/Spelling.h create mode 100644 llvm/lib/Frontend/Directive/CMakeLists.txt create mode 100644 llvm/lib/Frontend/Directive/Spelling.cpp diff --git a/llvm/include/llvm/Frontend/Directive/Spelling.h b/llvm/include/llvm/Frontend/Directive/Spelling.h new file mode 100644 index 0..3ba0ae2296535 --- /dev/null +++ b/llvm/include/llvm/Frontend/Directive/Spelling.h @@ -0,0 +1,39 @@ +//===-- Spelling.h C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +#ifndef LLVM_FRONTEND_DIRECTIVE_SPELLING_H +#define LLVM_FRONTEND_DIRECTIVE_SPELLING_H + +#include "llvm/ADT/StringRef.h" +#include "llvm/ADT/iterator_range.h" + +#include + +namespace llvm::directive { + +struct VersionRange { + static constexpr int MaxValue = std::numeric_limits::max(); + int Min = 1; + int Max = MaxValue; +}; + +inline bool operator<(const VersionRange &A, const VersionRange &B) { + if (A.Min != B.Min) +return A.Min < B.Min; + return A.Max < B.Max; +} + +struct Spelling { + StringRef Name; + VersionRange Versions; +}; + +StringRef FindName(llvm::iterator_range, unsigned Version); + +} // namespace llvm::directive + +#endif // LLVM_FRONTEND_DIRECTIVE_SPELLING_H diff --git a/llvm/include/llvm/TableGen/DirectiveEmitter.h b/llvm/include/llvm/TableGen/DirectiveEmitter.h index 1235b7638e761..c7d7460087723 100644 --- a/llvm/include/llvm/TableGen/DirectiveEmitter.h +++ b/llvm/include/llvm/TableGen/DirectiveEmitter.h @@ -17,6 +17,7 @@ #include "llvm/ADT/STLExtras.h" #include "llvm/ADT/StringExtras.h" #include "llvm/ADT/StringRef.h" +#include "llvm/Frontend/Directive/Spelling.h" #include "llvm/Support/MathExtras.h" #include "llvm/TableGen/Record.h" #include @@ -113,29 +114,19 @@ class Versioned { constexpr static int IntWidth = 8 * sizeof(int); }; -// Range of specification versions: [Min, Max] -// Default value: all possible versions. -// This is the same
[llvm-branch-commits] [llvm] [utils][TableGen] Handle versions on clause/directive spellings (PR #141766)
llvmbot wrote: @llvm/pr-subscribers-openacc Author: Krzysztof Parzyszek (kparzysz) Changes In "getDirectiveName(Kind, Version)", return the spelling that corresponds to Version, and in "get DirectiveKindAndVersions(Name)" return the pair {Kind, VersionRange}, where VersionRange contains the minimum and the maximum versions that allow "Name" as a spelling. This applies to clauses as well. In general it applies to classes that have spellings (defined via TableGen class "Spelling"). Given a Kind and a Version, getting the corresponding spelling requires a runtime search (which can fail in a general case). To avoid generating the search function inline, a small additional component of llvm/Frontent was added: LLVMFrontendDirective. The corresponding header file also defines C++ classes "Spelling" and "VersionRange", which are used in TableGen/DirectiveEmitter as well. For background information see https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507 --- Patch is 26.55 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/141766.diff 10 Files Affected: - (added) llvm/include/llvm/Frontend/Directive/Spelling.h (+39) - (modified) llvm/include/llvm/TableGen/DirectiveEmitter.h (+8-17) - (modified) llvm/lib/Frontend/CMakeLists.txt (+1) - (added) llvm/lib/Frontend/Directive/CMakeLists.txt (+6) - (added) llvm/lib/Frontend/Directive/Spelling.cpp (+31) - (modified) llvm/lib/Frontend/OpenACC/CMakeLists.txt (+1-1) - (modified) llvm/lib/Frontend/OpenMP/CMakeLists.txt (+1) - (modified) llvm/test/TableGen/directive1.td (+20-14) - (modified) llvm/test/TableGen/directive2.td (+12-12) - (modified) llvm/utils/TableGen/Basic/DirectiveEmitter.cpp (+93-53) ``diff diff --git a/llvm/include/llvm/Frontend/Directive/Spelling.h b/llvm/include/llvm/Frontend/Directive/Spelling.h new file mode 100644 index 0..3ba0ae2296535 --- /dev/null +++ b/llvm/include/llvm/Frontend/Directive/Spelling.h @@ -0,0 +1,39 @@ +//===-- Spelling.h C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +#ifndef LLVM_FRONTEND_DIRECTIVE_SPELLING_H +#define LLVM_FRONTEND_DIRECTIVE_SPELLING_H + +#include "llvm/ADT/StringRef.h" +#include "llvm/ADT/iterator_range.h" + +#include + +namespace llvm::directive { + +struct VersionRange { + static constexpr int MaxValue = std::numeric_limits::max(); + int Min = 1; + int Max = MaxValue; +}; + +inline bool operator<(const VersionRange &A, const VersionRange &B) { + if (A.Min != B.Min) +return A.Min < B.Min; + return A.Max < B.Max; +} + +struct Spelling { + StringRef Name; + VersionRange Versions; +}; + +StringRef FindName(llvm::iterator_range, unsigned Version); + +} // namespace llvm::directive + +#endif // LLVM_FRONTEND_DIRECTIVE_SPELLING_H diff --git a/llvm/include/llvm/TableGen/DirectiveEmitter.h b/llvm/include/llvm/TableGen/DirectiveEmitter.h index 1235b7638e761..c7d7460087723 100644 --- a/llvm/include/llvm/TableGen/DirectiveEmitter.h +++ b/llvm/include/llvm/TableGen/DirectiveEmitter.h @@ -17,6 +17,7 @@ #include "llvm/ADT/STLExtras.h" #include "llvm/ADT/StringExtras.h" #include "llvm/ADT/StringRef.h" +#include "llvm/Frontend/Directive/Spelling.h" #include "llvm/Support/MathExtras.h" #include "llvm/TableGen/Record.h" #include @@ -113,29 +114,19 @@ class Versioned { constexpr static int IntWidth = 8 * sizeof(int); }; -// Range of specification versions: [Min, Max] -// Default value: all possible versions. -// This is the same structure as the one emitted into the generated sources. -#define STRUCT_VERSION_RANGE \ - struct VersionRange { \ -int Min = 1; \ -int Max = INT_MAX; \ - } - -STRUCT_VERSION_RANGE; - class Spelling : public Versioned { public: - using Value = std::pair; + using Value = llvm::directive::Spelling; Spelling(const Record *Def) : Def(Def) {} StringRef getText() const { return Def->getValueAsString("spelling"); } - VersionRange getVersions() const { -return VersionRange{getMinVersion(Def), getMaxVersion(Def)}; + llvm::directive::VersionRange getVersions() const { +return llvm::directive::VersionRange{getMinVersion(Def), + getMaxVersion(Def)}; } - Value get() const { return std::make_pair(getText(), getVersions()); } + Value get() const { return Value{getText(), getVersions()}; } private: const Record *Def; @@ -177,11 +168,11 @@ class BaseRe
[llvm-branch-commits] [llvm] [utils][TableGen] Handle versions on clause/directive spellings (PR #141766)
https://github.com/kparzysz updated https://github.com/llvm/llvm-project/pull/141766 >From 2ef30aacee4d80c0e4a925aa5ba9416423d10b1b Mon Sep 17 00:00:00 2001 From: Krzysztof Parzyszek Date: Tue, 27 May 2025 07:55:04 -0500 Subject: [PATCH] [utils][TableGen] Handle versions on clause/directive spellings In "getDirectiveName(Kind, Version)", return the spelling that corresponds to Version, and in "getDirectiveKindAndVersions(Name)" return the pair {Kind, VersionRange}, where VersionRange contains the minimum and the maximum versions that allow "Name" as a spelling. This applies to clauses as well. In general it applies to classes that have spellings (defined via TableGen class "Spelling"). Given a Kind and a Version, getting the corresponding spelling requires a runtime search (which can fail in a general case). To avoid generating the search function inline, a small additional component of llvm/Frontent was added: LLVMFrontendDirective. The corresponding header file also defines C++ classes "Spelling" and "VersionRange", which are used in TableGen/DirectiveEmitter as well. For background information see https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507 --- .../llvm/Frontend/Directive/Spelling.h| 39 + llvm/include/llvm/TableGen/DirectiveEmitter.h | 25 +-- llvm/lib/Frontend/CMakeLists.txt | 1 + llvm/lib/Frontend/Directive/CMakeLists.txt| 6 + llvm/lib/Frontend/Directive/Spelling.cpp | 31 llvm/lib/Frontend/OpenACC/CMakeLists.txt | 2 +- llvm/lib/Frontend/OpenMP/CMakeLists.txt | 1 + llvm/test/TableGen/directive1.td | 34 ++-- llvm/test/TableGen/directive2.td | 24 +-- .../utils/TableGen/Basic/DirectiveEmitter.cpp | 146 +++--- 10 files changed, 212 insertions(+), 97 deletions(-) create mode 100644 llvm/include/llvm/Frontend/Directive/Spelling.h create mode 100644 llvm/lib/Frontend/Directive/CMakeLists.txt create mode 100644 llvm/lib/Frontend/Directive/Spelling.cpp diff --git a/llvm/include/llvm/Frontend/Directive/Spelling.h b/llvm/include/llvm/Frontend/Directive/Spelling.h new file mode 100644 index 0..3ba0ae2296535 --- /dev/null +++ b/llvm/include/llvm/Frontend/Directive/Spelling.h @@ -0,0 +1,39 @@ +//===-- Spelling.h C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +#ifndef LLVM_FRONTEND_DIRECTIVE_SPELLING_H +#define LLVM_FRONTEND_DIRECTIVE_SPELLING_H + +#include "llvm/ADT/StringRef.h" +#include "llvm/ADT/iterator_range.h" + +#include + +namespace llvm::directive { + +struct VersionRange { + static constexpr int MaxValue = std::numeric_limits::max(); + int Min = 1; + int Max = MaxValue; +}; + +inline bool operator<(const VersionRange &A, const VersionRange &B) { + if (A.Min != B.Min) +return A.Min < B.Min; + return A.Max < B.Max; +} + +struct Spelling { + StringRef Name; + VersionRange Versions; +}; + +StringRef FindName(llvm::iterator_range, unsigned Version); + +} // namespace llvm::directive + +#endif // LLVM_FRONTEND_DIRECTIVE_SPELLING_H diff --git a/llvm/include/llvm/TableGen/DirectiveEmitter.h b/llvm/include/llvm/TableGen/DirectiveEmitter.h index 1235b7638e761..c7d7460087723 100644 --- a/llvm/include/llvm/TableGen/DirectiveEmitter.h +++ b/llvm/include/llvm/TableGen/DirectiveEmitter.h @@ -17,6 +17,7 @@ #include "llvm/ADT/STLExtras.h" #include "llvm/ADT/StringExtras.h" #include "llvm/ADT/StringRef.h" +#include "llvm/Frontend/Directive/Spelling.h" #include "llvm/Support/MathExtras.h" #include "llvm/TableGen/Record.h" #include @@ -113,29 +114,19 @@ class Versioned { constexpr static int IntWidth = 8 * sizeof(int); }; -// Range of specification versions: [Min, Max] -// Default value: all possible versions. -// This is the same structure as the one emitted into the generated sources. -#define STRUCT_VERSION_RANGE \ - struct VersionRange { \ -int Min = 1; \ -int Max = INT_MAX; \ - } - -STRUCT_VERSION_RANGE; - class Spelling : public Versioned { public: - using Value = std::pair; + using Value = llvm::directive::Spelling; Spelling(const Record *Def) : Def(Def) {} StringRef getText() const { return Def->getValueAsString("spelling"); } - VersionRange getVersions() const { -return VersionRange{getMinVersion(Def), getMaxVersion(Def)}; + llvm::directive::VersionRange getVersions() const { +return llvm::directive::VersionRange{getMinVersion(Def), +
[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add parsing of floats for StaticSampler (PR #140181)
@@ -711,6 +734,35 @@ std::optional RootSignatureParser::parseRegister() { return Reg; } +std::optional RootSignatureParser::parseFloatParam() { + assert(CurToken.TokKind == TokenKind::pu_equal && + "Expects to only be invoked starting at given keyword"); + // Consume sign modifier + bool Signed = + tryConsumeExpectedToken({TokenKind::pu_plus, TokenKind::pu_minus}); + bool Negated = Signed && CurToken.TokKind == TokenKind::pu_minus; + + // DXC will treat a postive signed integer as unsigned + if (!Negated && tryConsumeExpectedToken(TokenKind::int_literal)) { +auto UInt = handleUIntLiteral(); +if (!UInt.has_value()) + return std::nullopt; +return (float)UInt.value(); + } else if (tryConsumeExpectedToken(TokenKind::int_literal)) { llvm-beanz wrote: Flyby style nit: Don't use `else` after a `return` https://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return https://github.com/llvm/llvm-project/pull/140181 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Add SimplifyTypeTests pass. (PR #141327)
@@ -2478,3 +2479,76 @@ PreservedAnalyses LowerTypeTestsPass::run(Module &M, return PreservedAnalyses::all(); return PreservedAnalyses::none(); } + +PreservedAnalyses SimplifyTypeTestsPass::run(Module &M, + ModuleAnalysisManager &AM) { + bool Changed = false; + // Figure out whether inlining has exposed a constant address to a lowered + // type test, and remove the test if so and the address is known to pass the + // test. Unfortunately this pass ends up needing to reverse engineer what + // LowerTypeTests did; this is currently inherent to the design of ThinLTO + // importing where LowerTypeTests needs to run at the start. + for (auto &GV : M.globals()) { +if (!GV.getName().starts_with("__typeid_") || +!GV.getName().ends_with("_global_addr")) + continue; +auto *MD = MDString::get(M.getContext(), + GV.getName().substr(9, GV.getName().size() - 21)); +auto MaySimplifyPtr = [&](Value *Ptr) { + if (auto *GV = dyn_cast(Ptr)) +if (auto *CFIGV = M.getNamedValue((GV->getName() + ".cfi").str())) + Ptr = CFIGV; + return isKnownTypeIdMember(MD, M.getDataLayout(), Ptr, 0); teresajohnson wrote: Are there cases where the GV will not have a ".cfi" extension? I notice the test has that extension. https://github.com/llvm/llvm-project/pull/141327 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Add SimplifyTypeTests pass. (PR #141327)
@@ -2478,3 +2479,76 @@ PreservedAnalyses LowerTypeTestsPass::run(Module &M, return PreservedAnalyses::all(); return PreservedAnalyses::none(); } + +PreservedAnalyses SimplifyTypeTestsPass::run(Module &M, + ModuleAnalysisManager &AM) { + bool Changed = false; + // Figure out whether inlining has exposed a constant address to a lowered + // type test, and remove the test if so and the address is known to pass the + // test. Unfortunately this pass ends up needing to reverse engineer what + // LowerTypeTests did; this is currently inherent to the design of ThinLTO teresajohnson wrote: Can you add a more extensive comment with what this is looking for and why? I don't look at lower type test output often so I don't recall offhand what e.g. it would have looked like without inlining vs with. https://github.com/llvm/llvm-project/pull/141327 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Add SimplifyTypeTests pass. (PR #141327)
@@ -0,0 +1,40 @@ +; RUN: opt -S %s -passes=simplify-type-tests | FileCheck %s teresajohnson wrote: Add a comment about what this is testing https://github.com/llvm/llvm-project/pull/141327 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Add SimplifyTypeTests pass. (PR #141327)
@@ -2478,3 +2479,76 @@ PreservedAnalyses LowerTypeTestsPass::run(Module &M, return PreservedAnalyses::all(); return PreservedAnalyses::none(); } + +PreservedAnalyses SimplifyTypeTestsPass::run(Module &M, + ModuleAnalysisManager &AM) { + bool Changed = false; + // Figure out whether inlining has exposed a constant address to a lowered + // type test, and remove the test if so and the address is known to pass the + // test. Unfortunately this pass ends up needing to reverse engineer what + // LowerTypeTests did; this is currently inherent to the design of ThinLTO + // importing where LowerTypeTests needs to run at the start. + for (auto &GV : M.globals()) { +if (!GV.getName().starts_with("__typeid_") || +!GV.getName().ends_with("_global_addr")) + continue; +auto *MD = MDString::get(M.getContext(), teresajohnson wrote: Can you add a comment on this conversion? Figured it out by adding up the chars myself but it would be good to make it explicit. https://github.com/llvm/llvm-project/pull/141327 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [HLSL] Diagnose overlapping resource bindings (PR #140982)
@@ -50,15 +51,55 @@ static void reportInvalidDirection(Module &M, DXILResourceMap &DRM) { } } -} // namespace +static void reportOverlappingError(Module &M, ResourceInfo R1, + ResourceInfo R2) { + SmallString<64> Message; + raw_svector_ostream OS(Message); + OS << "resource " << R1.getName() << " at register " + << R1.getBinding().LowerBound << " overlaps with resource " << R2.getName() + << " at register " << R2.getBinding().LowerBound << ", space " + << R2.getBinding().Space; + M.getContext().diagnose(DiagnosticInfoGeneric(Message)); +} -PreservedAnalyses -DXILPostOptimizationValidation::run(Module &M, ModuleAnalysisManager &MAM) { - DXILResourceMap &DRM = MAM.getResult(M); +static void reportOverlappingBinding(Module &M, DXILResourceMap &DRM) { + if (DRM.empty()) +return; + for (auto ResList : + {DRM.srvs(), DRM.uavs(), DRM.cbuffers(), DRM.samplers()}) { +if (ResList.empty()) + continue; +const ResourceInfo *PrevRI = &*ResList.begin(); +for (auto *I = ResList.begin() + 1; I != ResList.end(); ++I) { + const ResourceInfo *RI = &*I; + if (PrevRI->getBinding().overlapsWith(RI->getBinding())) { inbelic wrote: Ah I see. Yep, then my issues are resolved and this LGTM https://github.com/llvm/llvm-project/pull/140982 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [utils][TableGen] Handle versions on clause/directive spellings (PR #141766)
https://github.com/kparzysz updated https://github.com/llvm/llvm-project/pull/141766 >From 2ef30aacee4d80c0e4a925aa5ba9416423d10b1b Mon Sep 17 00:00:00 2001 From: Krzysztof Parzyszek Date: Tue, 27 May 2025 07:55:04 -0500 Subject: [PATCH 1/4] [utils][TableGen] Handle versions on clause/directive spellings In "getDirectiveName(Kind, Version)", return the spelling that corresponds to Version, and in "getDirectiveKindAndVersions(Name)" return the pair {Kind, VersionRange}, where VersionRange contains the minimum and the maximum versions that allow "Name" as a spelling. This applies to clauses as well. In general it applies to classes that have spellings (defined via TableGen class "Spelling"). Given a Kind and a Version, getting the corresponding spelling requires a runtime search (which can fail in a general case). To avoid generating the search function inline, a small additional component of llvm/Frontent was added: LLVMFrontendDirective. The corresponding header file also defines C++ classes "Spelling" and "VersionRange", which are used in TableGen/DirectiveEmitter as well. For background information see https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507 --- .../llvm/Frontend/Directive/Spelling.h| 39 + llvm/include/llvm/TableGen/DirectiveEmitter.h | 25 +-- llvm/lib/Frontend/CMakeLists.txt | 1 + llvm/lib/Frontend/Directive/CMakeLists.txt| 6 + llvm/lib/Frontend/Directive/Spelling.cpp | 31 llvm/lib/Frontend/OpenACC/CMakeLists.txt | 2 +- llvm/lib/Frontend/OpenMP/CMakeLists.txt | 1 + llvm/test/TableGen/directive1.td | 34 ++-- llvm/test/TableGen/directive2.td | 24 +-- .../utils/TableGen/Basic/DirectiveEmitter.cpp | 146 +++--- 10 files changed, 212 insertions(+), 97 deletions(-) create mode 100644 llvm/include/llvm/Frontend/Directive/Spelling.h create mode 100644 llvm/lib/Frontend/Directive/CMakeLists.txt create mode 100644 llvm/lib/Frontend/Directive/Spelling.cpp diff --git a/llvm/include/llvm/Frontend/Directive/Spelling.h b/llvm/include/llvm/Frontend/Directive/Spelling.h new file mode 100644 index 0..3ba0ae2296535 --- /dev/null +++ b/llvm/include/llvm/Frontend/Directive/Spelling.h @@ -0,0 +1,39 @@ +//===-- Spelling.h C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +#ifndef LLVM_FRONTEND_DIRECTIVE_SPELLING_H +#define LLVM_FRONTEND_DIRECTIVE_SPELLING_H + +#include "llvm/ADT/StringRef.h" +#include "llvm/ADT/iterator_range.h" + +#include + +namespace llvm::directive { + +struct VersionRange { + static constexpr int MaxValue = std::numeric_limits::max(); + int Min = 1; + int Max = MaxValue; +}; + +inline bool operator<(const VersionRange &A, const VersionRange &B) { + if (A.Min != B.Min) +return A.Min < B.Min; + return A.Max < B.Max; +} + +struct Spelling { + StringRef Name; + VersionRange Versions; +}; + +StringRef FindName(llvm::iterator_range, unsigned Version); + +} // namespace llvm::directive + +#endif // LLVM_FRONTEND_DIRECTIVE_SPELLING_H diff --git a/llvm/include/llvm/TableGen/DirectiveEmitter.h b/llvm/include/llvm/TableGen/DirectiveEmitter.h index 1235b7638e761..c7d7460087723 100644 --- a/llvm/include/llvm/TableGen/DirectiveEmitter.h +++ b/llvm/include/llvm/TableGen/DirectiveEmitter.h @@ -17,6 +17,7 @@ #include "llvm/ADT/STLExtras.h" #include "llvm/ADT/StringExtras.h" #include "llvm/ADT/StringRef.h" +#include "llvm/Frontend/Directive/Spelling.h" #include "llvm/Support/MathExtras.h" #include "llvm/TableGen/Record.h" #include @@ -113,29 +114,19 @@ class Versioned { constexpr static int IntWidth = 8 * sizeof(int); }; -// Range of specification versions: [Min, Max] -// Default value: all possible versions. -// This is the same structure as the one emitted into the generated sources. -#define STRUCT_VERSION_RANGE \ - struct VersionRange { \ -int Min = 1; \ -int Max = INT_MAX; \ - } - -STRUCT_VERSION_RANGE; - class Spelling : public Versioned { public: - using Value = std::pair; + using Value = llvm::directive::Spelling; Spelling(const Record *Def) : Def(Def) {} StringRef getText() const { return Def->getValueAsString("spelling"); } - VersionRange getVersions() const { -return VersionRange{getMinVersion(Def), getMaxVersion(Def)}; + llvm::directive::VersionRange getVersions() const { +return llvm::directive::VersionRange{getMinVersion(Def), +
[llvm-branch-commits] [llvm] [SPARC][IAS] Add definitions for OSA 2011 instructions (PR #138403)
https://github.com/koachan updated https://github.com/llvm/llvm-project/pull/138403 >From 5b59eb6176ee2790e7b31e99ae7f7769bf630b1a Mon Sep 17 00:00:00 2001 From: Koakuma Date: Thu, 29 May 2025 11:04:46 +0700 Subject: [PATCH] Apply feedback Created using spr 1.3.5 --- .../Sparc/MCTargetDesc/SparcAsmBackend.cpp| 6 + .../Sparc/MCTargetDesc/SparcMCCodeEmitter.cpp | 9 +- llvm/lib/Target/Sparc/SparcInstrAliases.td| 18 +- llvm/lib/Target/Sparc/SparcInstrFormats.td| 4 +- llvm/test/MC/Sparc/Relocations/expr.s | 16 +- llvm/test/MC/Sparc/sparc64-branch-offset.s| 508 +- 6 files changed, 289 insertions(+), 272 deletions(-) diff --git a/llvm/lib/Target/Sparc/MCTargetDesc/SparcAsmBackend.cpp b/llvm/lib/Target/Sparc/MCTargetDesc/SparcAsmBackend.cpp index c74f24d95523e..743752ad2c107 100644 --- a/llvm/lib/Target/Sparc/MCTargetDesc/SparcAsmBackend.cpp +++ b/llvm/lib/Target/Sparc/MCTargetDesc/SparcAsmBackend.cpp @@ -51,6 +51,9 @@ static unsigned adjustFixupValue(unsigned Kind, uint64_t Value) { } case ELF::R_SPARC_WDISP10: { +// FIXME this really should be an error reporting check. +assert((Value & 0x3) == 0); + // 7.17 Compare and Branch // Inst{20-19} = d10hi; // Inst{12-5} = d10lo; @@ -70,6 +73,9 @@ static unsigned adjustFixupValue(unsigned Kind, uint64_t Value) { case Sparc::fixup_sparc_13: return Value & 0x1fff; + case ELF::R_SPARC_5: +return Value & 0x1f; + case ELF::R_SPARC_LOX10: return (Value & 0x3ff) | 0x1c00; diff --git a/llvm/lib/Target/Sparc/MCTargetDesc/SparcMCCodeEmitter.cpp b/llvm/lib/Target/Sparc/MCTargetDesc/SparcMCCodeEmitter.cpp index b44d4361dacdb..2c8dbaa5aba60 100644 --- a/llvm/lib/Target/Sparc/MCTargetDesc/SparcMCCodeEmitter.cpp +++ b/llvm/lib/Target/Sparc/MCTargetDesc/SparcMCCodeEmitter.cpp @@ -164,7 +164,12 @@ unsigned SparcMCCodeEmitter::getSImm5OpValue(const MCInst &MI, unsigned OpNo, if (const MCConstantExpr *CE = dyn_cast(Expr)) return CE->getValue(); - llvm_unreachable("simm5 operands can only be used with constants!"); + if (const SparcMCExpr *SExpr = dyn_cast(Expr)) { +Fixups.push_back(MCFixup::create(0, Expr, SExpr->getFixupKind())); +return 0; + } + Fixups.push_back(MCFixup::create(0, Expr, ELF::R_SPARC_5)); + return 0; } unsigned @@ -247,7 +252,7 @@ unsigned SparcMCCodeEmitter::getCompareAndBranchTargetOpValue( const MCInst &MI, unsigned OpNo, SmallVectorImpl &Fixups, const MCSubtargetInfo &STI) const { const MCOperand &MO = MI.getOperand(OpNo); - if (MO.isReg() || MO.isImm()) + if (MO.isImm()) return getMachineOpValue(MI, MO, Fixups, STI); Fixups.push_back(MCFixup::create(0, MO.getExpr(), ELF::R_SPARC_WDISP10)); diff --git a/llvm/lib/Target/Sparc/SparcInstrAliases.td b/llvm/lib/Target/Sparc/SparcInstrAliases.td index fa2c62101d30e..459fd193db0ed 100644 --- a/llvm/lib/Target/Sparc/SparcInstrAliases.td +++ b/llvm/lib/Target/Sparc/SparcInstrAliases.td @@ -333,19 +333,19 @@ multiclass reg_cond_alias { // Instruction aliases for compare-and-branch. multiclass cwb_cond_alias { - def : InstAlias, Requires<[HasOSA2011]>; - def : InstAlias, Requires<[HasOSA2011]>; } multiclass cxb_cond_alias { - def : InstAlias, Requires<[HasOSA2011]>; - def : InstAlias, Requires<[HasOSA2011]>; } @@ -441,8 +441,7 @@ defm : cwb_cond_alias<"pos", 0b1110>; defm : cwb_cond_alias<"neg", 0b0110>; defm : cwb_cond_alias<"vc", 0b>; defm : cwb_cond_alias<"vs", 0b0111>; -let EmitPriority = 0 in -{ +let EmitPriority = 0 in { defm : cwb_cond_alias<"geu", 0b1101>; // same as cc defm : cwb_cond_alias<"lu", 0b0101>; // same as cs } @@ -461,8 +460,7 @@ defm : cxb_cond_alias<"pos", 0b1110>; defm : cxb_cond_alias<"neg", 0b0110>; defm : cxb_cond_alias<"vc", 0b>; defm : cxb_cond_alias<"vs", 0b0111>; -let EmitPriority = 0 in -{ +let EmitPriority = 0 in { defm : cxb_cond_alias<"geu", 0b1101>; // same as cc defm : cxb_cond_alias<"lu", 0b0101>; // same as cs } @@ -727,6 +725,6 @@ def : InstAlias<"sir", (SIR 0), 0>; // pause reg_or_imm -> wrasr %g0, reg_or_imm, %asr27 let Predicates = [HasOSA2011] in { -def : InstAlias<"pause $rs2", (WRASRrr ASR27, G0, IntRegs:$rs2), 1>; -def : InstAlias<"pause $simm13", (WRASRri ASR27, G0, simm13Op:$simm13), 1>; + def : InstAlias<"pause $rs2", (WRASRrr ASR27, G0, IntRegs:$rs2), 1>; + def : InstAlias<"pause $simm13", (WRASRri ASR27, G0, simm13Op:$simm13), 1>; } // Predicates = [HasOSA2011] diff --git a/llvm/lib/Target/Sparc/SparcInstrFormats.td b/llvm/lib/Target/Sparc/SparcInstrFormats.td index fe10bb443348a..79c4cb2128a0f 100644 --- a/llvm/lib/Target/Sparc/SparcInstrFormats.td +++ b/llvm/lib/Target/Sparc/SparcInstrFormats.td @@ -104,7 +104,7 @@ class F2_4 pattern = [], InstrItinClass itin = NoItinerary> - : InstSP { +: InstSP { bits<10> imm10; bits<5> rs1; bits<5> rs2; @@ -1
[llvm-branch-commits] [clang] Implement src:*=sanitize for UBSan. (PR #140489)
https://github.com/qinkunbao edited https://github.com/llvm/llvm-project/pull/140489 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add overflow operations to isBoolSGPR (PR #141803)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/141803?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#141803** https://app.graphite.dev/github/pr/llvm/llvm-project/141803?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> đ https://app.graphite.dev/github/pr/llvm/llvm-project/141803?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#141801** https://app.graphite.dev/github/pr/llvm/llvm-project/141801?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/141803 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add is.shared/is.private intrinsics to isBoolSGPR (PR #141804)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/141804 No change in the net output since these ultimately expand to setcc, but saves a step in the DAG. >From 6967e6460456e755ce0767243834847cabcfbc06 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 28 May 2025 18:37:25 +0200 Subject: [PATCH] AMDGPU: Add is.shared/is.private intrinsics to isBoolSGPR No change in the net output since these ultimately expand to setcc, but saves a step in the DAG. --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 12 + .../CodeGen/AMDGPU/combine-cond-add-sub.ll| 48 +++ 2 files changed, 60 insertions(+) diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 7ad10454e7931..b124f02d32a8a 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -11922,6 +11922,18 @@ bool llvm::isBoolSGPR(SDValue V) { case ISD::SMULO: case ISD::UMULO: return V.getResNo() == 1; + case ISD::INTRINSIC_WO_CHAIN: { +unsigned IntrinsicID = V.getConstantOperandVal(0); +switch (IntrinsicID) { +case Intrinsic::amdgcn_is_shared: +case Intrinsic::amdgcn_is_private: + return true; +default: + return false; +} + +return false; + } } return false; } diff --git a/llvm/test/CodeGen/AMDGPU/combine-cond-add-sub.ll b/llvm/test/CodeGen/AMDGPU/combine-cond-add-sub.ll index 1778fa42fbf7e..ba8abdc17fb05 100644 --- a/llvm/test/CodeGen/AMDGPU/combine-cond-add-sub.ll +++ b/llvm/test/CodeGen/AMDGPU/combine-cond-add-sub.ll @@ -740,6 +740,54 @@ bb: ret void } +define i32 @add_sext_bool_is_shared(ptr %ptr, i32 %y) { +; GCN-LABEL: add_sext_bool_is_shared: +; GCN: ; %bb.0: +; GCN-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GCN-NEXT:s_mov_b64 s[4:5], 0xe8 +; GCN-NEXT:s_load_dword s4, s[4:5], 0x0 +; GCN-NEXT:s_waitcnt lgkmcnt(0) +; GCN-NEXT:v_cmp_eq_u32_e32 vcc, s4, v1 +; GCN-NEXT:v_subbrev_u32_e32 v0, vcc, 0, v2, vcc +; GCN-NEXT:s_setpc_b64 s[30:31] +; +; GFX9-LABEL: add_sext_bool_is_shared: +; GFX9: ; %bb.0: +; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX9-NEXT:s_mov_b64 s[4:5], src_shared_base +; GFX9-NEXT:v_cmp_eq_u32_e32 vcc, s5, v1 +; GFX9-NEXT:v_subbrev_co_u32_e32 v0, vcc, 0, v2, vcc +; GFX9-NEXT:s_setpc_b64 s[30:31] + %is.shared = call i1 @llvm.amdgcn.is.shared(ptr %ptr) + %sext = sext i1 %is.shared to i32 + %add = add i32 %sext, %y + ret i32 %add +} + +define i32 @add_sext_bool_is_private(ptr %ptr, i32 %y) { +; GCN-LABEL: add_sext_bool_is_private: +; GCN: ; %bb.0: +; GCN-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GCN-NEXT:s_mov_b64 s[4:5], 0xe4 +; GCN-NEXT:s_load_dword s4, s[4:5], 0x0 +; GCN-NEXT:s_waitcnt lgkmcnt(0) +; GCN-NEXT:v_cmp_eq_u32_e32 vcc, s4, v1 +; GCN-NEXT:v_subbrev_u32_e32 v0, vcc, 0, v2, vcc +; GCN-NEXT:s_setpc_b64 s[30:31] +; +; GFX9-LABEL: add_sext_bool_is_private: +; GFX9: ; %bb.0: +; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX9-NEXT:s_mov_b64 s[4:5], src_private_base +; GFX9-NEXT:v_cmp_eq_u32_e32 vcc, s5, v1 +; GFX9-NEXT:v_subbrev_co_u32_e32 v0, vcc, 0, v2, vcc +; GFX9-NEXT:s_setpc_b64 s[30:31] + %is.private = call i1 @llvm.amdgcn.is.private(ptr %ptr) + %sext = sext i1 %is.private to i32 + %add = add i32 %sext, %y + ret i32 %add +} + declare i1 @llvm.amdgcn.class.f32(float, i32) #0 declare i32 @llvm.amdgcn.workitem.id.x() #0 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add overflow operations to isBoolSGPR (PR #141803)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) Changes The particular use in the test doesn't seem to do anything for the expanded cases (i.e. the signed add/sub or multiplies). --- Full diff: https://github.com/llvm/llvm-project/pull/141803.diff 2 Files Affected: - (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+7) - (modified) llvm/test/CodeGen/AMDGPU/combine-and-sext-bool.ll (+89) ``diff diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index c9fd2948d669f..7ad10454e7931 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -11915,6 +11915,13 @@ bool llvm::isBoolSGPR(SDValue V) { case ISD::OR: case ISD::XOR: return isBoolSGPR(V.getOperand(0)) && isBoolSGPR(V.getOperand(1)); + case ISD::SADDO: + case ISD::UADDO: + case ISD::SSUBO: + case ISD::USUBO: + case ISD::SMULO: + case ISD::UMULO: +return V.getResNo() == 1; } return false; } diff --git a/llvm/test/CodeGen/AMDGPU/combine-and-sext-bool.ll b/llvm/test/CodeGen/AMDGPU/combine-and-sext-bool.ll index bdad6f40480d3..b98c81db5da99 100644 --- a/llvm/test/CodeGen/AMDGPU/combine-and-sext-bool.ll +++ b/llvm/test/CodeGen/AMDGPU/combine-and-sext-bool.ll @@ -45,6 +45,95 @@ define i32 @and_sext_bool_fpclass(float %x, i32 %y) { ret i32 %and } +; GCN-LABEL: {{^}}and_sext_bool_uadd_w_overflow: +; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GCN-NEXT: v_add_i32_e32 v0, vcc, v0, v1 +; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc +; GCN-NEXT: s_setpc_b64 +define i32 @and_sext_bool_uadd_w_overflow(i32 %x, i32 %y) { + %uadd = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %x, i32 %y) + %carry = extractvalue { i32, i1 } %uadd, 1 + %sext = sext i1 %carry to i32 + %and = and i32 %sext, %y + ret i32 %and +} + +; GCN-LABEL: {{^}}and_sext_bool_usub_w_overflow: +; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GCN-NEXT: v_sub_i32_e32 v0, vcc, v0, v1 +; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc +; GCN-NEXT: s_setpc_b64 +define i32 @and_sext_bool_usub_w_overflow(i32 %x, i32 %y) { + %uadd = call { i32, i1 } @llvm.usub.with.overflow.i32(i32 %x, i32 %y) + %carry = extractvalue { i32, i1 } %uadd, 1 + %sext = sext i1 %carry to i32 + %and = and i32 %sext, %y + ret i32 %and +} + +; GCN-LABEL: {{^}}and_sext_bool_sadd_w_overflow: +; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GCN-NEXT: v_cmp_gt_i32_e32 vcc, 0, v1 +; GCN-NEXT: v_add_i32_e64 v2, s[4:5], v0, v1 +; GCN-NEXT: v_cmp_lt_i32_e64 s[4:5], v2, v0 +; GCN-NEXT: s_xor_b64 vcc, vcc, s[4:5] +; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc +; GCN-NEXT: s_setpc_b64 +define i32 @and_sext_bool_sadd_w_overflow(i32 %x, i32 %y) { + %uadd = call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %x, i32 %y) + %carry = extractvalue { i32, i1 } %uadd, 1 + %sext = sext i1 %carry to i32 + %and = and i32 %sext, %y + ret i32 %and +} + +; GCN-LABEL: {{^}}and_sext_bool_ssub_w_overflow: +; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GCN-NEXT: v_cmp_gt_i32_e32 vcc, 0, v1 +; GCN-NEXT: v_add_i32_e64 v2, s[4:5], v0, v1 +; GCN-NEXT: v_cmp_lt_i32_e64 s[4:5], v2, v0 +; GCN-NEXT: s_xor_b64 vcc, vcc, s[4:5] +; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc +; GCN-NEXT: s_setpc_b64 +define i32 @and_sext_bool_ssub_w_overflow(i32 %x, i32 %y) { + %uadd = call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %x, i32 %y) + %carry = extractvalue { i32, i1 } %uadd, 1 + %sext = sext i1 %carry to i32 + %and = and i32 %sext, %y + ret i32 %and +} + +; GCN-LABEL: {{^}}and_sext_bool_smul_w_overflow: +; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GCN-NEXT: v_mul_hi_i32 v2, v0, v1 +; GCN-NEXT: v_mul_lo_u32 v0, v0, v1 +; GCN-NEXT: v_ashrrev_i32_e32 v0, 31, v0 +; GCN-NEXT: v_cmp_ne_u32_e32 vcc, v2, v0 +; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc +; GCN-NEXT: s_setpc_b64 +define i32 @and_sext_bool_smul_w_overflow(i32 %x, i32 %y) { + %uadd = call { i32, i1 } @llvm.smul.with.overflow.i32(i32 %x, i32 %y) + %carry = extractvalue { i32, i1 } %uadd, 1 + %sext = sext i1 %carry to i32 + %and = and i32 %sext, %y + ret i32 %and +} + +; GCN-LABEL: {{^}}and_sext_bool_umul_w_overflow: +; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GCN-NEXT: v_mul_hi_u32 v0, v0, v1 +; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v0 +; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc +; GCN-NEXT: s_setpc_b64 +define i32 @and_sext_bool_umul_w_overflow(i32 %x, i32 %y) { + %uadd = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %x, i32 %y) + %carry = extractvalue { i32, i1 } %uadd, 1 + %sext = sext i1 %carry to i32 + %and = and i32 %sext, %y + ret i32 %and +} + + declare i32 @llvm.amdgcn.workitem.id.x() #0 declare i32 @llvm.amdgcn.workitem.id.y() #0 `` https://github.com/llvm/llvm-project/pull/141803 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commi
[llvm-branch-commits] [llvm] AMDGPU: Add overflow operations to isBoolSGPR (PR #141803)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/141803 The particular use in the test doesn't seem to do anything for the expanded cases (i.e. the signed add/sub or multiplies). >From 20482481b443e2d3422be8baa779498bb5c54574 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 28 May 2025 18:06:03 +0200 Subject: [PATCH] AMDGPU: Add overflow operations to isBoolSGPR The particular use in the test doesn't seem to do anything for the expanded cases (i.e. the signed add/sub or multiplies). --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 7 ++ .../CodeGen/AMDGPU/combine-and-sext-bool.ll | 89 +++ 2 files changed, 96 insertions(+) diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index c9fd2948d669f..7ad10454e7931 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -11915,6 +11915,13 @@ bool llvm::isBoolSGPR(SDValue V) { case ISD::OR: case ISD::XOR: return isBoolSGPR(V.getOperand(0)) && isBoolSGPR(V.getOperand(1)); + case ISD::SADDO: + case ISD::UADDO: + case ISD::SSUBO: + case ISD::USUBO: + case ISD::SMULO: + case ISD::UMULO: +return V.getResNo() == 1; } return false; } diff --git a/llvm/test/CodeGen/AMDGPU/combine-and-sext-bool.ll b/llvm/test/CodeGen/AMDGPU/combine-and-sext-bool.ll index bdad6f40480d3..b98c81db5da99 100644 --- a/llvm/test/CodeGen/AMDGPU/combine-and-sext-bool.ll +++ b/llvm/test/CodeGen/AMDGPU/combine-and-sext-bool.ll @@ -45,6 +45,95 @@ define i32 @and_sext_bool_fpclass(float %x, i32 %y) { ret i32 %and } +; GCN-LABEL: {{^}}and_sext_bool_uadd_w_overflow: +; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GCN-NEXT: v_add_i32_e32 v0, vcc, v0, v1 +; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc +; GCN-NEXT: s_setpc_b64 +define i32 @and_sext_bool_uadd_w_overflow(i32 %x, i32 %y) { + %uadd = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %x, i32 %y) + %carry = extractvalue { i32, i1 } %uadd, 1 + %sext = sext i1 %carry to i32 + %and = and i32 %sext, %y + ret i32 %and +} + +; GCN-LABEL: {{^}}and_sext_bool_usub_w_overflow: +; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GCN-NEXT: v_sub_i32_e32 v0, vcc, v0, v1 +; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc +; GCN-NEXT: s_setpc_b64 +define i32 @and_sext_bool_usub_w_overflow(i32 %x, i32 %y) { + %uadd = call { i32, i1 } @llvm.usub.with.overflow.i32(i32 %x, i32 %y) + %carry = extractvalue { i32, i1 } %uadd, 1 + %sext = sext i1 %carry to i32 + %and = and i32 %sext, %y + ret i32 %and +} + +; GCN-LABEL: {{^}}and_sext_bool_sadd_w_overflow: +; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GCN-NEXT: v_cmp_gt_i32_e32 vcc, 0, v1 +; GCN-NEXT: v_add_i32_e64 v2, s[4:5], v0, v1 +; GCN-NEXT: v_cmp_lt_i32_e64 s[4:5], v2, v0 +; GCN-NEXT: s_xor_b64 vcc, vcc, s[4:5] +; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc +; GCN-NEXT: s_setpc_b64 +define i32 @and_sext_bool_sadd_w_overflow(i32 %x, i32 %y) { + %uadd = call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %x, i32 %y) + %carry = extractvalue { i32, i1 } %uadd, 1 + %sext = sext i1 %carry to i32 + %and = and i32 %sext, %y + ret i32 %and +} + +; GCN-LABEL: {{^}}and_sext_bool_ssub_w_overflow: +; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GCN-NEXT: v_cmp_gt_i32_e32 vcc, 0, v1 +; GCN-NEXT: v_add_i32_e64 v2, s[4:5], v0, v1 +; GCN-NEXT: v_cmp_lt_i32_e64 s[4:5], v2, v0 +; GCN-NEXT: s_xor_b64 vcc, vcc, s[4:5] +; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc +; GCN-NEXT: s_setpc_b64 +define i32 @and_sext_bool_ssub_w_overflow(i32 %x, i32 %y) { + %uadd = call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %x, i32 %y) + %carry = extractvalue { i32, i1 } %uadd, 1 + %sext = sext i1 %carry to i32 + %and = and i32 %sext, %y + ret i32 %and +} + +; GCN-LABEL: {{^}}and_sext_bool_smul_w_overflow: +; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GCN-NEXT: v_mul_hi_i32 v2, v0, v1 +; GCN-NEXT: v_mul_lo_u32 v0, v0, v1 +; GCN-NEXT: v_ashrrev_i32_e32 v0, 31, v0 +; GCN-NEXT: v_cmp_ne_u32_e32 vcc, v2, v0 +; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc +; GCN-NEXT: s_setpc_b64 +define i32 @and_sext_bool_smul_w_overflow(i32 %x, i32 %y) { + %uadd = call { i32, i1 } @llvm.smul.with.overflow.i32(i32 %x, i32 %y) + %carry = extractvalue { i32, i1 } %uadd, 1 + %sext = sext i1 %carry to i32 + %and = and i32 %sext, %y + ret i32 %and +} + +; GCN-LABEL: {{^}}and_sext_bool_umul_w_overflow: +; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GCN-NEXT: v_mul_hi_u32 v0, v0, v1 +; GCN-NEXT: v_cmp_ne_u32_e32 vcc, 0, v0 +; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc +; GCN-NEXT: s_setpc_b64 +define i32 @and_sext_bool_umul_w_overflow(i32 %x, i32 %y) { + %uadd = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %x, i32 %y) + %carry = extractvalue { i32, i1 } %uadd, 1 + %sext = sext i1 %carry to i32 + %and = and i32 %sext, %y + ret i32 %and +} + + declare i32 @llvm.amdgcn.workitem.id.x() #0 declare i32 @llvm.amdgcn.workitem.id.y() #0 ___
[llvm-branch-commits] [llvm] AMDGPU: Add overflow operations to isBoolSGPR (PR #141803)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/141803 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add is.shared/is.private intrinsics to isBoolSGPR (PR #141804)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/141804 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add is.shared/is.private intrinsics to isBoolSGPR (PR #141804)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/141804?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#141804** https://app.graphite.dev/github/pr/llvm/llvm-project/141804?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> đ https://app.graphite.dev/github/pr/llvm/llvm-project/141804?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#141803** https://app.graphite.dev/github/pr/llvm/llvm-project/141803?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#141801** https://app.graphite.dev/github/pr/llvm/llvm-project/141801?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/141804 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add is.shared/is.private intrinsics to isBoolSGPR (PR #141804)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) Changes No change in the net output since these ultimately expand to setcc, but saves a step in the DAG. --- Full diff: https://github.com/llvm/llvm-project/pull/141804.diff 2 Files Affected: - (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+12) - (modified) llvm/test/CodeGen/AMDGPU/combine-cond-add-sub.ll (+48) ``diff diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 7ad10454e7931..b124f02d32a8a 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -11922,6 +11922,18 @@ bool llvm::isBoolSGPR(SDValue V) { case ISD::SMULO: case ISD::UMULO: return V.getResNo() == 1; + case ISD::INTRINSIC_WO_CHAIN: { +unsigned IntrinsicID = V.getConstantOperandVal(0); +switch (IntrinsicID) { +case Intrinsic::amdgcn_is_shared: +case Intrinsic::amdgcn_is_private: + return true; +default: + return false; +} + +return false; + } } return false; } diff --git a/llvm/test/CodeGen/AMDGPU/combine-cond-add-sub.ll b/llvm/test/CodeGen/AMDGPU/combine-cond-add-sub.ll index 1778fa42fbf7e..ba8abdc17fb05 100644 --- a/llvm/test/CodeGen/AMDGPU/combine-cond-add-sub.ll +++ b/llvm/test/CodeGen/AMDGPU/combine-cond-add-sub.ll @@ -740,6 +740,54 @@ bb: ret void } +define i32 @add_sext_bool_is_shared(ptr %ptr, i32 %y) { +; GCN-LABEL: add_sext_bool_is_shared: +; GCN: ; %bb.0: +; GCN-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GCN-NEXT:s_mov_b64 s[4:5], 0xe8 +; GCN-NEXT:s_load_dword s4, s[4:5], 0x0 +; GCN-NEXT:s_waitcnt lgkmcnt(0) +; GCN-NEXT:v_cmp_eq_u32_e32 vcc, s4, v1 +; GCN-NEXT:v_subbrev_u32_e32 v0, vcc, 0, v2, vcc +; GCN-NEXT:s_setpc_b64 s[30:31] +; +; GFX9-LABEL: add_sext_bool_is_shared: +; GFX9: ; %bb.0: +; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX9-NEXT:s_mov_b64 s[4:5], src_shared_base +; GFX9-NEXT:v_cmp_eq_u32_e32 vcc, s5, v1 +; GFX9-NEXT:v_subbrev_co_u32_e32 v0, vcc, 0, v2, vcc +; GFX9-NEXT:s_setpc_b64 s[30:31] + %is.shared = call i1 @llvm.amdgcn.is.shared(ptr %ptr) + %sext = sext i1 %is.shared to i32 + %add = add i32 %sext, %y + ret i32 %add +} + +define i32 @add_sext_bool_is_private(ptr %ptr, i32 %y) { +; GCN-LABEL: add_sext_bool_is_private: +; GCN: ; %bb.0: +; GCN-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GCN-NEXT:s_mov_b64 s[4:5], 0xe4 +; GCN-NEXT:s_load_dword s4, s[4:5], 0x0 +; GCN-NEXT:s_waitcnt lgkmcnt(0) +; GCN-NEXT:v_cmp_eq_u32_e32 vcc, s4, v1 +; GCN-NEXT:v_subbrev_u32_e32 v0, vcc, 0, v2, vcc +; GCN-NEXT:s_setpc_b64 s[30:31] +; +; GFX9-LABEL: add_sext_bool_is_private: +; GFX9: ; %bb.0: +; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX9-NEXT:s_mov_b64 s[4:5], src_private_base +; GFX9-NEXT:v_cmp_eq_u32_e32 vcc, s5, v1 +; GFX9-NEXT:v_subbrev_co_u32_e32 v0, vcc, 0, v2, vcc +; GFX9-NEXT:s_setpc_b64 s[30:31] + %is.private = call i1 @llvm.amdgcn.is.private(ptr %ptr) + %sext = sext i1 %is.private to i32 + %add = add i32 %sext, %y + ret i32 %add +} + declare i1 @llvm.amdgcn.class.f32(float, i32) #0 declare i32 @llvm.amdgcn.workitem.id.x() #0 `` https://github.com/llvm/llvm-project/pull/141804 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add is.shared/is.private intrinsics to isBoolSGPR (PR #141804)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/141804 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add is.shared/is.private intrinsics to isBoolSGPR (PR #141804)
@@ -11922,6 +11922,18 @@ bool llvm::isBoolSGPR(SDValue V) { case ISD::SMULO: case ISD::UMULO: return V.getResNo() == 1; + case ISD::INTRINSIC_WO_CHAIN: { +unsigned IntrinsicID = V.getConstantOperandVal(0); +switch (IntrinsicID) { +case Intrinsic::amdgcn_is_shared: +case Intrinsic::amdgcn_is_private: + return true; +default: + return false; +} + +return false; shiltian wrote: nit: llvm_unreachable? https://github.com/llvm/llvm-project/pull/141804 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [utils][TableGen] Treat clause aliases equally with names (PR #141763)
https://github.com/clementval approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/141763 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: make use of C++17 features and LLVM helpers (PR #141665)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/141665 >From 7a71b56676323327d012a9500f3e107d9b16d83c Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 27 May 2025 21:06:03 +0300 Subject: [PATCH] [BOLT] Gadget scanner: make use of C++17 features and LLVM helpers Perform trivial syntactical cleanups: * make use of structured binding declarations * use LLVM utility functions when appropriate * omit braces around single expression inside single-line LLVM_DEBUG() This patch is NFC aside from minor debug output changes. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 67 +-- .../AArch64/gs-pauth-debug-output.s | 14 ++-- 2 files changed, 38 insertions(+), 43 deletions(-) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 34b5b1d51de4e..dac274c0f4130 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -88,8 +88,8 @@ class TrackedRegisters { TrackedRegisters(ArrayRef RegsToTrack) : Registers(RegsToTrack), RegToIndexMapping(getMappingSize(RegsToTrack), NoIndex) { -for (unsigned I = 0; I < RegsToTrack.size(); ++I) - RegToIndexMapping[RegsToTrack[I]] = I; +for (auto [MappedIndex, Reg] : llvm::enumerate(RegsToTrack)) + RegToIndexMapping[Reg] = MappedIndex; } ArrayRef getRegisters() const { return Registers; } @@ -203,9 +203,9 @@ struct SrcState { SafeToDerefRegs &= StateIn.SafeToDerefRegs; TrustedRegs &= StateIn.TrustedRegs; -for (unsigned I = 0; I < LastInstWritingReg.size(); ++I) - for (const MCInst *J : StateIn.LastInstWritingReg[I]) -LastInstWritingReg[I].insert(J); +for (auto [ThisSet, OtherSet] : + llvm::zip_equal(LastInstWritingReg, StateIn.LastInstWritingReg)) + ThisSet.insert_range(OtherSet); return *this; } @@ -224,11 +224,9 @@ struct SrcState { static void printInstsShort(raw_ostream &OS, ArrayRef Insts) { OS << "Insts: "; - for (unsigned I = 0; I < Insts.size(); ++I) { -auto &Set = Insts[I]; + for (auto [I, PtrSet] : llvm::enumerate(Insts)) { OS << "[" << I << "]("; -for (const MCInst *MCInstP : Set) - OS << MCInstP << " "; +interleave(PtrSet, OS, " "); OS << ")"; } } @@ -416,8 +414,9 @@ class SrcSafetyAnalysis { // ... an address can be updated in a safe manner, producing the result // which is as trusted as the input address. if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) { - if (Cur.SafeToDerefRegs[DstAndSrc->second]) -Regs.push_back(DstAndSrc->first); + auto [DstReg, SrcReg] = *DstAndSrc; + if (Cur.SafeToDerefRegs[SrcReg]) +Regs.push_back(DstReg); } // Make sure explicit checker sequence keeps register safe-to-dereference @@ -469,8 +468,9 @@ class SrcSafetyAnalysis { // ... an address can be updated in a safe manner, producing the result // which is as trusted as the input address. if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) { - if (Cur.TrustedRegs[DstAndSrc->second]) -Regs.push_back(DstAndSrc->first); + auto [DstReg, SrcReg] = *DstAndSrc; + if (Cur.TrustedRegs[SrcReg]) +Regs.push_back(DstReg); } return Regs; @@ -858,9 +858,9 @@ struct DstState { return (*this = StateIn); CannotEscapeUnchecked &= StateIn.CannotEscapeUnchecked; -for (unsigned I = 0; I < FirstInstLeakingReg.size(); ++I) - for (const MCInst *J : StateIn.FirstInstLeakingReg[I]) -FirstInstLeakingReg[I].insert(J); +for (auto [ThisSet, OtherSet] : + llvm::zip_equal(FirstInstLeakingReg, StateIn.FirstInstLeakingReg)) + ThisSet.insert_range(OtherSet); return *this; } @@ -1025,8 +1025,7 @@ class DstSafetyAnalysis { // ... an address can be updated in a safe manner, or if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Inst)) { - MCPhysReg DstReg, SrcReg; - std::tie(DstReg, SrcReg) = *DstAndSrc; + auto [DstReg, SrcReg] = *DstAndSrc; // Note that *all* registers containing the derived values must be safe, // both source and destination ones. No temporaries are supported at now. if (Cur.CannotEscapeUnchecked[SrcReg] && @@ -1065,7 +1064,7 @@ class DstSafetyAnalysis { // If this instruction terminates the program immediately, no // authentication oracles are possible past this point. if (BC.MIB->isTrap(Point)) { - LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); }); + LLVM_DEBUG(traceInst(BC, "Trap instruction found", Point)); DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); Next.CannotEscapeUnchecked.set(); return Next; @@ -1243,7 +1242,7 @@ class CFGUnawareDstSafetyAnalysis : public DstSafetyAnalysis, // starting to analyze Inst.
[llvm-branch-commits] [llvm] [BOLT] Introduce helpers to match `MCInst`s one at a time (NFC) (PR #138883)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/138883 >From 4e08d36fcde69e0c9eebbac4ab2261e8db797393 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Wed, 7 May 2025 16:42:00 +0300 Subject: [PATCH] [BOLT] Introduce helpers to match `MCInst`s one at a time (NFC) Introduce matchInst helper function to capture and/or match the operands of MCInst. Unlike the existing `MCPlusBuilder::MCInstMatcher` machinery, matchInst is intended for the use cases when precise control over the instruction order is required. For example, when validating PtrAuth hardening, all registers are usually considered unsafe after a function call, even though callee-saved registers should preserve their old values *under normal operation*. --- bolt/include/bolt/Core/MCInstUtils.h | 128 ++ .../Target/AArch64/AArch64MCPlusBuilder.cpp | 90 +--- 2 files changed, 162 insertions(+), 56 deletions(-) diff --git a/bolt/include/bolt/Core/MCInstUtils.h b/bolt/include/bolt/Core/MCInstUtils.h index 69bf5e6159b74..50b7d56470c99 100644 --- a/bolt/include/bolt/Core/MCInstUtils.h +++ b/bolt/include/bolt/Core/MCInstUtils.h @@ -162,6 +162,134 @@ static inline raw_ostream &operator<<(raw_ostream &OS, return Ref.print(OS); } +/// Instruction-matching helpers operating on a single instruction at a time. +/// +/// Unlike MCPlusBuilder::MCInstMatcher, this matchInst() function focuses on +/// the cases where a precise control over the instruction order is important: +/// +/// // Bring the short names into the local scope: +/// using namespace MCInstMatcher; +/// // Declare the registers to capture: +/// Reg Xn, Xm; +/// // Capture the 0th and 1st operands, match the 2nd operand against the +/// // just captured Xm register, match the 3rd operand against literal 0: +/// if (!matchInst(MaybeAdd, AArch64::ADDXrs, Xm, Xn, Xm, Imm(0)) +/// return AArch64::NoRegister; +/// // Match the 0th operand against Xm: +/// if (!matchInst(MaybeBr, AArch64::BR, Xm)) +/// return AArch64::NoRegister; +/// // Return the matched register: +/// return Xm.get(); +namespace MCInstMatcher { + +// The base class to match an operand of type T. +// +// The subclasses of OpMatcher are intended to be allocated on the stack and +// to only be used by passing them to matchInst() and by calling their get() +// function, thus the peculiar `mutable` specifiers: to make the calling code +// compact and readable, the templated matchInst() function has to accept both +// long-lived Imm/Reg wrappers declared as local variables (intended to capture +// the first operand's value and match the subsequent operands, whether inside +// a single instruction or across multiple instructions), as well as temporary +// wrappers around literal values to match, f.e. Imm(42) or Reg(AArch64::XZR). +template class OpMatcher { + mutable std::optional Value; + mutable std::optional SavedValue; + + // Remember/restore the last Value - to be called by matchInst. + void remember() const { SavedValue = Value; } + void restore() const { Value = SavedValue; } + + template + friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...); + +protected: + OpMatcher(std::optional ValueToMatch) : Value(ValueToMatch) {} + + bool matchValue(T OpValue) const { +// Check that OpValue does not contradict the existing Value. +bool MatchResult = !Value || *Value == OpValue; +// If MatchResult is false, all matchers will be reset before returning from +// matchInst, including this one, thus no need to assign conditionally. +Value = OpValue; + +return MatchResult; + } + +public: + /// Returns the captured value. + T get() const { +assert(Value.has_value()); +return *Value; + } +}; + +class Reg : public OpMatcher { + bool matches(const MCOperand &Op) const { +if (!Op.isReg()) + return false; + +return matchValue(Op.getReg()); + } + + template + friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...); + +public: + Reg(std::optional RegToMatch = std::nullopt) + : OpMatcher(RegToMatch) {} +}; + +class Imm : public OpMatcher { + bool matches(const MCOperand &Op) const { +if (!Op.isImm()) + return false; + +return matchValue(Op.getImm()); + } + + template + friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...); + +public: + Imm(std::optional ImmToMatch = std::nullopt) + : OpMatcher(ImmToMatch) {} +}; + +/// Tries to match Inst and updates Ops on success. +/// +/// If Inst has the specified Opcode and its operand list prefix matches Ops, +/// this function returns true and updates Ops, otherwise false is returned and +/// values of Ops are kept as before matchInst was called. +/// +/// Please note that while Ops are technically passed by a const reference to +/// make invocations like `matchInst(MI, Opcode, Imm(42))` possible, all their +/// fields are marked mut
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: optionally assume auth traps on failure (PR #139778)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/139778 >From 9ef4b06a50605ecb15d4d8ffacd39a835e7d43ff Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 13 May 2025 19:50:41 +0300 Subject: [PATCH] [BOLT] Gadget scanner: optionally assume auth traps on failure On AArch64 it is possible for an auth instruction to either return an invalid address value on failure (without FEAT_FPAC) or generate an error (with FEAT_FPAC). It thus may be possible to never emit explicit pointer checks, if the target CPU is known to support FEAT_FPAC. This commit implements an --auth-traps-on-failure command line option, which essentially makes "safe-to-dereference" and "trusted" register properties identical and disables scanning for authentication oracles completely. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 112 +++ .../binary-analysis/AArch64/cmdline-args.test | 1 + .../AArch64/gs-pauth-authentication-oracles.s | 6 +- .../binary-analysis/AArch64/gs-pauth-calls.s | 5 +- .../AArch64/gs-pauth-debug-output.s | 177 ++--- .../AArch64/gs-pauth-jump-table.s | 6 +- .../AArch64/gs-pauth-signing-oracles.s| 54 ++--- .../AArch64/gs-pauth-tail-calls.s | 184 +- 8 files changed, 318 insertions(+), 227 deletions(-) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index e9ed44a47bf6f..34b5b1d51de4e 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -14,6 +14,7 @@ #include "bolt/Passes/PAuthGadgetScanner.h" #include "bolt/Core/ParallelUtilities.h" #include "bolt/Passes/DataflowAnalysis.h" +#include "bolt/Utils/CommandLineOpts.h" #include "llvm/ADT/STLExtras.h" #include "llvm/ADT/SmallSet.h" #include "llvm/MC/MCInst.h" @@ -26,6 +27,11 @@ namespace llvm { namespace bolt { namespace PAuthGadgetScanner { +static cl::opt AuthTrapsOnFailure( +"auth-traps-on-failure", +cl::desc("Assume authentication instructions always trap on failure"), +cl::cat(opts::BinaryAnalysisCategory)); + [[maybe_unused]] static void traceInst(const BinaryContext &BC, StringRef Label, const MCInst &MI) { dbgs() << " " << Label << ": "; @@ -364,6 +370,34 @@ class SrcSafetyAnalysis { return Clobbered; } + std::optional getRegMadeTrustedByChecking(const MCInst &Inst, + SrcState Cur) const { +// This functions cannot return multiple registers. This is never the case +// on AArch64. +std::optional RegCheckedByInst = +BC.MIB->getAuthCheckedReg(Inst, /*MayOverwrite=*/false); +if (RegCheckedByInst && Cur.SafeToDerefRegs[*RegCheckedByInst]) + return *RegCheckedByInst; + +auto It = CheckerSequenceInfo.find(&Inst); +if (It == CheckerSequenceInfo.end()) + return std::nullopt; + +MCPhysReg RegCheckedBySequence = It->second.first; +const MCInst *FirstCheckerInst = It->second.second; + +// FirstCheckerInst should belong to the same basic block (see the +// assertion in DataflowSrcSafetyAnalysis::run()), meaning it was +// deterministically processed a few steps before this instruction. +const SrcState &StateBeforeChecker = getStateBefore(*FirstCheckerInst); + +// The sequence checks the register, but it should be authenticated before. +if (!StateBeforeChecker.SafeToDerefRegs[RegCheckedBySequence]) + return std::nullopt; + +return RegCheckedBySequence; + } + // Returns all registers that can be treated as if they are written by an // authentication instruction. SmallVector getRegsMadeSafeToDeref(const MCInst &Point, @@ -386,18 +420,38 @@ class SrcSafetyAnalysis { Regs.push_back(DstAndSrc->first); } +// Make sure explicit checker sequence keeps register safe-to-dereference +// when the register would be clobbered according to the regular rules: +// +//; LR is safe to dereference here +//mov x16, x30 ; start of the sequence, LR is s-t-d right before +//xpaclri ; clobbers LR, LR is not safe anymore +//cmp x30, x16 +//b.eq 1f; end of the sequence: LR is marked as trusted +//brk 0x1234 +// 1: +//; at this point LR would be marked as trusted, +//; but not safe-to-dereference +// +// or even just +// +//; X1 is safe to dereference here +//ldr x0, [x1, #8]! +//; X1 is trusted here, but it was clobbered due to address write-back +if (auto CheckedReg = getRegMadeTrustedByChecking(Point, Cur)) + Regs.push_back(*CheckedReg); + return Regs; } // Returns all registers made trusted by this instruction. SmallVector getRegsMadeTrusted(const MCInst &Point, const SrcState &Cur) const { +assert(!AuthTrapsOnFailure &&
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: prevent false positives due to jump tables (PR #138884)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/138884 >From d7167e871fbde24246f71ec1553c3b22d30ad526 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 6 May 2025 11:31:03 +0300 Subject: [PATCH] [BOLT] Gadget scanner: prevent false positives due to jump tables As part of PAuth hardening, AArch64 LLVM backend can use a special BR_JumpTable pseudo (enabled by -faarch64-jump-table-hardening Clang option) which is expanded in the AsmPrinter into a contiguous sequence without unsafe instructions in the middle. This commit adds another target-specific callback to MCPlusBuilder to make it possible to inhibit false positives for known-safe jump table dispatch sequences. Without special handling, the branch instruction is likely to be reported as a non-protected call (as its destination is not produced by an auth instruction, PC-relative address materialization, etc.) and possibly as a tail call being performed with unsafe link register (as the detection whether the branch instruction is a tail call is an heuristic). For now, only the specific instruction sequence used by the AArch64 LLVM backend is matched. --- bolt/include/bolt/Core/MCInstUtils.h | 9 + bolt/include/bolt/Core/MCPlusBuilder.h| 14 + bolt/lib/Core/MCInstUtils.cpp | 20 + bolt/lib/Passes/PAuthGadgetScanner.cpp| 10 + .../Target/AArch64/AArch64MCPlusBuilder.cpp | 73 ++ .../AArch64/gs-pauth-jump-table.s | 703 ++ 6 files changed, 829 insertions(+) create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-jump-table.s diff --git a/bolt/include/bolt/Core/MCInstUtils.h b/bolt/include/bolt/Core/MCInstUtils.h index 50b7d56470c99..33d36cccbcfff 100644 --- a/bolt/include/bolt/Core/MCInstUtils.h +++ b/bolt/include/bolt/Core/MCInstUtils.h @@ -154,6 +154,15 @@ class MCInstReference { return nullptr; } + /// Returns the only preceding instruction, or std::nullopt if multiple or no + /// predecessors are possible. + /// + /// If CFG information is available, basic block boundary can be crossed, + /// provided there is exactly one predecessor. If CFG is not available, the + /// preceding instruction in the offset order is returned, unless this is the + /// first instruction of the function. + std::optional getSinglePredecessor(); + raw_ostream &print(raw_ostream &OS) const; }; diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h index c8cbcaf33f4b5..3abf4d18e94da 100644 --- a/bolt/include/bolt/Core/MCPlusBuilder.h +++ b/bolt/include/bolt/Core/MCPlusBuilder.h @@ -14,6 +14,7 @@ #ifndef BOLT_CORE_MCPLUSBUILDER_H #define BOLT_CORE_MCPLUSBUILDER_H +#include "bolt/Core/MCInstUtils.h" #include "bolt/Core/MCPlus.h" #include "bolt/Core/Relocation.h" #include "llvm/ADT/ArrayRef.h" @@ -700,6 +701,19 @@ class MCPlusBuilder { return std::nullopt; } + /// Tests if BranchInst corresponds to an instruction sequence which is known + /// to be a safe dispatch via jump table. + /// + /// The target can decide which instruction sequences to consider "safe" from + /// the Pointer Authentication point of view, such as any jump table dispatch + /// sequence without function calls inside, any sequence which is contiguous, + /// or only some specific well-known sequences. + virtual bool + isSafeJumpTableBranchForPtrAuth(MCInstReference BranchInst) const { +llvm_unreachable("not implemented"); +return false; + } + virtual bool isTerminator(const MCInst &Inst) const; virtual bool isNoop(const MCInst &Inst) const { diff --git a/bolt/lib/Core/MCInstUtils.cpp b/bolt/lib/Core/MCInstUtils.cpp index 40f6edd59135c..b7c6d898988af 100644 --- a/bolt/lib/Core/MCInstUtils.cpp +++ b/bolt/lib/Core/MCInstUtils.cpp @@ -55,3 +55,23 @@ raw_ostream &MCInstReference::print(raw_ostream &OS) const { OS << ">"; return OS; } + +std::optional MCInstReference::getSinglePredecessor() { + if (const RefInBB *Ref = tryGetRefInBB()) { +if (Ref->It != Ref->BB->begin()) + return MCInstReference(Ref->BB, &*std::prev(Ref->It)); + +if (Ref->BB->pred_size() != 1) + return std::nullopt; + +BinaryBasicBlock *PredBB = *Ref->BB->pred_begin(); +assert(!PredBB->empty() && "Empty basic blocks are not supported yet"); +return MCInstReference(PredBB, &*PredBB->rbegin()); + } + + const RefInBF &Ref = getRefInBF(); + if (Ref.It == Ref.BF->instrs().begin()) +return std::nullopt; + + return MCInstReference(Ref.BF, std::prev(Ref.It)); +} diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 762c08ffd933e..e9ed44a47bf6f 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -1351,6 +1351,11 @@ shouldReportUnsafeTailCall(const BinaryContext &BC, const BinaryFunction &BF, return std::nullopt; } + if (BC.MIB->isSafeJumpTableBranchForPtrAuth(Inst)) { +LL
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: prevent false positives due to jump tables (PR #138884)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/138884 >From d7167e871fbde24246f71ec1553c3b22d30ad526 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 6 May 2025 11:31:03 +0300 Subject: [PATCH] [BOLT] Gadget scanner: prevent false positives due to jump tables As part of PAuth hardening, AArch64 LLVM backend can use a special BR_JumpTable pseudo (enabled by -faarch64-jump-table-hardening Clang option) which is expanded in the AsmPrinter into a contiguous sequence without unsafe instructions in the middle. This commit adds another target-specific callback to MCPlusBuilder to make it possible to inhibit false positives for known-safe jump table dispatch sequences. Without special handling, the branch instruction is likely to be reported as a non-protected call (as its destination is not produced by an auth instruction, PC-relative address materialization, etc.) and possibly as a tail call being performed with unsafe link register (as the detection whether the branch instruction is a tail call is an heuristic). For now, only the specific instruction sequence used by the AArch64 LLVM backend is matched. --- bolt/include/bolt/Core/MCInstUtils.h | 9 + bolt/include/bolt/Core/MCPlusBuilder.h| 14 + bolt/lib/Core/MCInstUtils.cpp | 20 + bolt/lib/Passes/PAuthGadgetScanner.cpp| 10 + .../Target/AArch64/AArch64MCPlusBuilder.cpp | 73 ++ .../AArch64/gs-pauth-jump-table.s | 703 ++ 6 files changed, 829 insertions(+) create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-jump-table.s diff --git a/bolt/include/bolt/Core/MCInstUtils.h b/bolt/include/bolt/Core/MCInstUtils.h index 50b7d56470c99..33d36cccbcfff 100644 --- a/bolt/include/bolt/Core/MCInstUtils.h +++ b/bolt/include/bolt/Core/MCInstUtils.h @@ -154,6 +154,15 @@ class MCInstReference { return nullptr; } + /// Returns the only preceding instruction, or std::nullopt if multiple or no + /// predecessors are possible. + /// + /// If CFG information is available, basic block boundary can be crossed, + /// provided there is exactly one predecessor. If CFG is not available, the + /// preceding instruction in the offset order is returned, unless this is the + /// first instruction of the function. + std::optional getSinglePredecessor(); + raw_ostream &print(raw_ostream &OS) const; }; diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h index c8cbcaf33f4b5..3abf4d18e94da 100644 --- a/bolt/include/bolt/Core/MCPlusBuilder.h +++ b/bolt/include/bolt/Core/MCPlusBuilder.h @@ -14,6 +14,7 @@ #ifndef BOLT_CORE_MCPLUSBUILDER_H #define BOLT_CORE_MCPLUSBUILDER_H +#include "bolt/Core/MCInstUtils.h" #include "bolt/Core/MCPlus.h" #include "bolt/Core/Relocation.h" #include "llvm/ADT/ArrayRef.h" @@ -700,6 +701,19 @@ class MCPlusBuilder { return std::nullopt; } + /// Tests if BranchInst corresponds to an instruction sequence which is known + /// to be a safe dispatch via jump table. + /// + /// The target can decide which instruction sequences to consider "safe" from + /// the Pointer Authentication point of view, such as any jump table dispatch + /// sequence without function calls inside, any sequence which is contiguous, + /// or only some specific well-known sequences. + virtual bool + isSafeJumpTableBranchForPtrAuth(MCInstReference BranchInst) const { +llvm_unreachable("not implemented"); +return false; + } + virtual bool isTerminator(const MCInst &Inst) const; virtual bool isNoop(const MCInst &Inst) const { diff --git a/bolt/lib/Core/MCInstUtils.cpp b/bolt/lib/Core/MCInstUtils.cpp index 40f6edd59135c..b7c6d898988af 100644 --- a/bolt/lib/Core/MCInstUtils.cpp +++ b/bolt/lib/Core/MCInstUtils.cpp @@ -55,3 +55,23 @@ raw_ostream &MCInstReference::print(raw_ostream &OS) const { OS << ">"; return OS; } + +std::optional MCInstReference::getSinglePredecessor() { + if (const RefInBB *Ref = tryGetRefInBB()) { +if (Ref->It != Ref->BB->begin()) + return MCInstReference(Ref->BB, &*std::prev(Ref->It)); + +if (Ref->BB->pred_size() != 1) + return std::nullopt; + +BinaryBasicBlock *PredBB = *Ref->BB->pred_begin(); +assert(!PredBB->empty() && "Empty basic blocks are not supported yet"); +return MCInstReference(PredBB, &*PredBB->rbegin()); + } + + const RefInBF &Ref = getRefInBF(); + if (Ref.It == Ref.BF->instrs().begin()) +return std::nullopt; + + return MCInstReference(Ref.BF, std::prev(Ref.It)); +} diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 762c08ffd933e..e9ed44a47bf6f 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -1351,6 +1351,11 @@ shouldReportUnsafeTailCall(const BinaryContext &BC, const BinaryFunction &BF, return std::nullopt; } + if (BC.MIB->isSafeJumpTableBranchForPtrAuth(Inst)) { +LL
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: make use of C++17 features and LLVM helpers (PR #141665)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/141665 >From 7a71b56676323327d012a9500f3e107d9b16d83c Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 27 May 2025 21:06:03 +0300 Subject: [PATCH] [BOLT] Gadget scanner: make use of C++17 features and LLVM helpers Perform trivial syntactical cleanups: * make use of structured binding declarations * use LLVM utility functions when appropriate * omit braces around single expression inside single-line LLVM_DEBUG() This patch is NFC aside from minor debug output changes. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 67 +-- .../AArch64/gs-pauth-debug-output.s | 14 ++-- 2 files changed, 38 insertions(+), 43 deletions(-) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 34b5b1d51de4e..dac274c0f4130 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -88,8 +88,8 @@ class TrackedRegisters { TrackedRegisters(ArrayRef RegsToTrack) : Registers(RegsToTrack), RegToIndexMapping(getMappingSize(RegsToTrack), NoIndex) { -for (unsigned I = 0; I < RegsToTrack.size(); ++I) - RegToIndexMapping[RegsToTrack[I]] = I; +for (auto [MappedIndex, Reg] : llvm::enumerate(RegsToTrack)) + RegToIndexMapping[Reg] = MappedIndex; } ArrayRef getRegisters() const { return Registers; } @@ -203,9 +203,9 @@ struct SrcState { SafeToDerefRegs &= StateIn.SafeToDerefRegs; TrustedRegs &= StateIn.TrustedRegs; -for (unsigned I = 0; I < LastInstWritingReg.size(); ++I) - for (const MCInst *J : StateIn.LastInstWritingReg[I]) -LastInstWritingReg[I].insert(J); +for (auto [ThisSet, OtherSet] : + llvm::zip_equal(LastInstWritingReg, StateIn.LastInstWritingReg)) + ThisSet.insert_range(OtherSet); return *this; } @@ -224,11 +224,9 @@ struct SrcState { static void printInstsShort(raw_ostream &OS, ArrayRef Insts) { OS << "Insts: "; - for (unsigned I = 0; I < Insts.size(); ++I) { -auto &Set = Insts[I]; + for (auto [I, PtrSet] : llvm::enumerate(Insts)) { OS << "[" << I << "]("; -for (const MCInst *MCInstP : Set) - OS << MCInstP << " "; +interleave(PtrSet, OS, " "); OS << ")"; } } @@ -416,8 +414,9 @@ class SrcSafetyAnalysis { // ... an address can be updated in a safe manner, producing the result // which is as trusted as the input address. if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) { - if (Cur.SafeToDerefRegs[DstAndSrc->second]) -Regs.push_back(DstAndSrc->first); + auto [DstReg, SrcReg] = *DstAndSrc; + if (Cur.SafeToDerefRegs[SrcReg]) +Regs.push_back(DstReg); } // Make sure explicit checker sequence keeps register safe-to-dereference @@ -469,8 +468,9 @@ class SrcSafetyAnalysis { // ... an address can be updated in a safe manner, producing the result // which is as trusted as the input address. if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) { - if (Cur.TrustedRegs[DstAndSrc->second]) -Regs.push_back(DstAndSrc->first); + auto [DstReg, SrcReg] = *DstAndSrc; + if (Cur.TrustedRegs[SrcReg]) +Regs.push_back(DstReg); } return Regs; @@ -858,9 +858,9 @@ struct DstState { return (*this = StateIn); CannotEscapeUnchecked &= StateIn.CannotEscapeUnchecked; -for (unsigned I = 0; I < FirstInstLeakingReg.size(); ++I) - for (const MCInst *J : StateIn.FirstInstLeakingReg[I]) -FirstInstLeakingReg[I].insert(J); +for (auto [ThisSet, OtherSet] : + llvm::zip_equal(FirstInstLeakingReg, StateIn.FirstInstLeakingReg)) + ThisSet.insert_range(OtherSet); return *this; } @@ -1025,8 +1025,7 @@ class DstSafetyAnalysis { // ... an address can be updated in a safe manner, or if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Inst)) { - MCPhysReg DstReg, SrcReg; - std::tie(DstReg, SrcReg) = *DstAndSrc; + auto [DstReg, SrcReg] = *DstAndSrc; // Note that *all* registers containing the derived values must be safe, // both source and destination ones. No temporaries are supported at now. if (Cur.CannotEscapeUnchecked[SrcReg] && @@ -1065,7 +1064,7 @@ class DstSafetyAnalysis { // If this instruction terminates the program immediately, no // authentication oracles are possible past this point. if (BC.MIB->isTrap(Point)) { - LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); }); + LLVM_DEBUG(traceInst(BC, "Trap instruction found", Point)); DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); Next.CannotEscapeUnchecked.set(); return Next; @@ -1243,7 +1242,7 @@ class CFGUnawareDstSafetyAnalysis : public DstSafetyAnalysis, // starting to analyze Inst.
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: account for BRK when searching for auth oracles (PR #137975)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/137975 >From ff3dc1d1dce6b7ec9ef9fb5a103455b0e946aca0 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Wed, 30 Apr 2025 16:08:10 +0300 Subject: [PATCH] [BOLT] Gadget scanner: account for BRK when searching for auth oracles An authenticated pointer can be explicitly checked by the compiler via a sequence of instructions that executes BRK on failure. It is important to recognize such BRK instruction as checking every register (as it is expected to immediately trigger an abnormal program termination) to prevent false positive reports about authentication oracles: autia x2, x3 autia x0, x1 ; neither x0 nor x2 are checked at this point eor x16, x0, x0, lsl #1 tbz x16, #62, on_success ; marks x0 as checked ; end of BB: for x2 to be checked here, it must be checked in both ; successor basic blocks on_failure: brk 0xc470 on_success: ; x2 is checked ldr x1, [x2] ; marks x2 as checked --- bolt/include/bolt/Core/MCPlusBuilder.h| 14 ++ bolt/lib/Passes/PAuthGadgetScanner.cpp| 13 +- .../Target/AArch64/AArch64MCPlusBuilder.cpp | 24 -- .../AArch64/gs-pauth-address-checks.s | 44 +-- .../AArch64/gs-pauth-authentication-oracles.s | 9 ++-- .../AArch64/gs-pauth-signing-oracles.s| 6 +-- 6 files changed, 75 insertions(+), 35 deletions(-) diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h index b233452985502..c8cbcaf33f4b5 100644 --- a/bolt/include/bolt/Core/MCPlusBuilder.h +++ b/bolt/include/bolt/Core/MCPlusBuilder.h @@ -707,6 +707,20 @@ class MCPlusBuilder { return false; } + /// Returns true if Inst is a trap instruction. + /// + /// Tests if Inst is an instruction that immediately causes an abnormal + /// program termination, for example when a security violation is detected + /// by a compiler-inserted check. + /// + /// @note An implementation of this method should likely return false for + /// calls to library functions like abort(), as it is possible that the + /// execution state is partially attacker-controlled at this point. + virtual bool isTrap(const MCInst &Inst) const { +llvm_unreachable("not implemented"); +return false; + } + virtual bool isBreakpoint(const MCInst &Inst) const { llvm_unreachable("not implemented"); return false; diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 4c7ae3c880db4..11db51f6c6dd1 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -1066,6 +1066,15 @@ class DstSafetyAnalysis { dbgs() << ")\n"; }); +// If this instruction terminates the program immediately, no +// authentication oracles are possible past this point. +if (BC.MIB->isTrap(Point)) { + LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); }); + DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); + Next.CannotEscapeUnchecked.set(); + return Next; +} + // If this instruction is reachable by the analysis, a non-empty state will // be propagated to it sooner or later. Until then, skip computeNext(). if (Cur.empty()) { @@ -1173,8 +1182,8 @@ class DataflowDstSafetyAnalysis // // A basic block without any successors, on the other hand, can be // pessimistically initialized to everything-is-unsafe: this will naturally -// handle both return and tail call instructions and is harmless for -// internal indirect branch instructions (such as computed gotos). +// handle return, trap and tail call instructions. At the same time, it is +// harmless for internal indirect branch instructions, like computed gotos. if (BB.succ_empty()) return createUnsafeState(); diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp index 9d5a578cfbdff..b669d32cc2032 100644 --- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp +++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp @@ -386,10 +386,9 @@ class AArch64MCPlusBuilder : public MCPlusBuilder { // the list of successors of this basic block as appropriate. // Any of the above code sequences assume the fall-through basic block -// is a dead-end BRK instruction (any immediate operand is accepted). +// is a dead-end trap instruction. const BinaryBasicBlock *BreakBB = BB.getFallthrough(); -if (!BreakBB || BreakBB->empty() || -BreakBB->front().getOpcode() != AArch64::BRK) +if (!BreakBB || BreakBB->empty() || !isTrap(BreakBB->front())) return std::nullopt; // Iterate over the instructions of BB in reverse order, matching opcodes @@ -1751,6 +1750,25 @@ class AArch64MCPlusBuilder : public MCPlusBuilder { Inst.addOperand(MCOperand::createImm(0)); }
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: optionally assume auth traps on failure (PR #139778)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/139778 >From 9ef4b06a50605ecb15d4d8ffacd39a835e7d43ff Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 13 May 2025 19:50:41 +0300 Subject: [PATCH] [BOLT] Gadget scanner: optionally assume auth traps on failure On AArch64 it is possible for an auth instruction to either return an invalid address value on failure (without FEAT_FPAC) or generate an error (with FEAT_FPAC). It thus may be possible to never emit explicit pointer checks, if the target CPU is known to support FEAT_FPAC. This commit implements an --auth-traps-on-failure command line option, which essentially makes "safe-to-dereference" and "trusted" register properties identical and disables scanning for authentication oracles completely. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 112 +++ .../binary-analysis/AArch64/cmdline-args.test | 1 + .../AArch64/gs-pauth-authentication-oracles.s | 6 +- .../binary-analysis/AArch64/gs-pauth-calls.s | 5 +- .../AArch64/gs-pauth-debug-output.s | 177 ++--- .../AArch64/gs-pauth-jump-table.s | 6 +- .../AArch64/gs-pauth-signing-oracles.s| 54 ++--- .../AArch64/gs-pauth-tail-calls.s | 184 +- 8 files changed, 318 insertions(+), 227 deletions(-) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index e9ed44a47bf6f..34b5b1d51de4e 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -14,6 +14,7 @@ #include "bolt/Passes/PAuthGadgetScanner.h" #include "bolt/Core/ParallelUtilities.h" #include "bolt/Passes/DataflowAnalysis.h" +#include "bolt/Utils/CommandLineOpts.h" #include "llvm/ADT/STLExtras.h" #include "llvm/ADT/SmallSet.h" #include "llvm/MC/MCInst.h" @@ -26,6 +27,11 @@ namespace llvm { namespace bolt { namespace PAuthGadgetScanner { +static cl::opt AuthTrapsOnFailure( +"auth-traps-on-failure", +cl::desc("Assume authentication instructions always trap on failure"), +cl::cat(opts::BinaryAnalysisCategory)); + [[maybe_unused]] static void traceInst(const BinaryContext &BC, StringRef Label, const MCInst &MI) { dbgs() << " " << Label << ": "; @@ -364,6 +370,34 @@ class SrcSafetyAnalysis { return Clobbered; } + std::optional getRegMadeTrustedByChecking(const MCInst &Inst, + SrcState Cur) const { +// This functions cannot return multiple registers. This is never the case +// on AArch64. +std::optional RegCheckedByInst = +BC.MIB->getAuthCheckedReg(Inst, /*MayOverwrite=*/false); +if (RegCheckedByInst && Cur.SafeToDerefRegs[*RegCheckedByInst]) + return *RegCheckedByInst; + +auto It = CheckerSequenceInfo.find(&Inst); +if (It == CheckerSequenceInfo.end()) + return std::nullopt; + +MCPhysReg RegCheckedBySequence = It->second.first; +const MCInst *FirstCheckerInst = It->second.second; + +// FirstCheckerInst should belong to the same basic block (see the +// assertion in DataflowSrcSafetyAnalysis::run()), meaning it was +// deterministically processed a few steps before this instruction. +const SrcState &StateBeforeChecker = getStateBefore(*FirstCheckerInst); + +// The sequence checks the register, but it should be authenticated before. +if (!StateBeforeChecker.SafeToDerefRegs[RegCheckedBySequence]) + return std::nullopt; + +return RegCheckedBySequence; + } + // Returns all registers that can be treated as if they are written by an // authentication instruction. SmallVector getRegsMadeSafeToDeref(const MCInst &Point, @@ -386,18 +420,38 @@ class SrcSafetyAnalysis { Regs.push_back(DstAndSrc->first); } +// Make sure explicit checker sequence keeps register safe-to-dereference +// when the register would be clobbered according to the regular rules: +// +//; LR is safe to dereference here +//mov x16, x30 ; start of the sequence, LR is s-t-d right before +//xpaclri ; clobbers LR, LR is not safe anymore +//cmp x30, x16 +//b.eq 1f; end of the sequence: LR is marked as trusted +//brk 0x1234 +// 1: +//; at this point LR would be marked as trusted, +//; but not safe-to-dereference +// +// or even just +// +//; X1 is safe to dereference here +//ldr x0, [x1, #8]! +//; X1 is trusted here, but it was clobbered due to address write-back +if (auto CheckedReg = getRegMadeTrustedByChecking(Point, Cur)) + Regs.push_back(*CheckedReg); + return Regs; } // Returns all registers made trusted by this instruction. SmallVector getRegsMadeTrusted(const MCInst &Point, const SrcState &Cur) const { +assert(!AuthTrapsOnFailure &&
[llvm-branch-commits] [llvm] [BOLT] Factor out MCInstReference from gadget scanner (NFC) (PR #138655)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/138655 >From c41022206fbb32d177b2712f2a80d481e05735c8 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Mon, 28 Apr 2025 18:35:48 +0300 Subject: [PATCH] [BOLT] Factor out MCInstReference from gadget scanner (NFC) Move MCInstReference representing a constant reference to an instruction inside a parent entity - either inside a basic block (which has a reference to its parent function) or directly to the function (when CFG information is not available). --- bolt/include/bolt/Core/MCInstUtils.h | 168 + bolt/include/bolt/Passes/PAuthGadgetScanner.h | 178 +- bolt/lib/Core/CMakeLists.txt | 1 + bolt/lib/Core/MCInstUtils.cpp | 57 ++ bolt/lib/Passes/PAuthGadgetScanner.cpp| 102 +- 5 files changed, 269 insertions(+), 237 deletions(-) create mode 100644 bolt/include/bolt/Core/MCInstUtils.h create mode 100644 bolt/lib/Core/MCInstUtils.cpp diff --git a/bolt/include/bolt/Core/MCInstUtils.h b/bolt/include/bolt/Core/MCInstUtils.h new file mode 100644 index 0..69bf5e6159b74 --- /dev/null +++ b/bolt/include/bolt/Core/MCInstUtils.h @@ -0,0 +1,168 @@ +//===- bolt/Core/MCInstUtils.h --*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#ifndef BOLT_CORE_MCINSTUTILS_H +#define BOLT_CORE_MCINSTUTILS_H + +#include "bolt/Core/BinaryBasicBlock.h" + +#include +#include +#include + +namespace llvm { +namespace bolt { + +class BinaryFunction; + +/// MCInstReference represents a reference to a constant MCInst as stored either +/// in a BinaryFunction (i.e. before a CFG is created), or in a BinaryBasicBlock +/// (after a CFG is created). +class MCInstReference { + using nocfg_const_iterator = std::map::const_iterator; + + // Two cases are possible: + // * functions with CFG reconstructed - a function stores a collection of + // basic blocks, each basic block stores a contiguous vector of MCInst + // * functions without CFG - there are no basic blocks created, + // the instructions are directly stored in std::map in BinaryFunction + // + // In both cases, the direct parent of MCInst is stored together with an + // iterator pointing to the instruction. + + // Helper struct: CFG is available, the direct parent is a basic block, + // iterator's type is `MCInst *`. + struct RefInBB { +RefInBB(const BinaryBasicBlock *BB, const MCInst *Inst) +: BB(BB), It(Inst) {} +RefInBB(const RefInBB &Other) = default; +RefInBB &operator=(const RefInBB &Other) = default; + +const BinaryBasicBlock *BB; +BinaryBasicBlock::const_iterator It; + +bool operator<(const RefInBB &Other) const { + return std::tie(BB, It) < std::tie(Other.BB, Other.It); +} + +bool operator==(const RefInBB &Other) const { + return BB == Other.BB && It == Other.It; +} + }; + + // Helper struct: CFG is *not* available, the direct parent is a function, + // iterator's type is std::map::iterator (the mapped value + // is an instruction's offset). + struct RefInBF { +RefInBF(const BinaryFunction *BF, nocfg_const_iterator It) +: BF(BF), It(It) {} +RefInBF(const RefInBF &Other) = default; +RefInBF &operator=(const RefInBF &Other) = default; + +const BinaryFunction *BF; +nocfg_const_iterator It; + +bool operator<(const RefInBF &Other) const { + return std::tie(BF, It->first) < std::tie(Other.BF, Other.It->first); +} + +bool operator==(const RefInBF &Other) const { + return BF == Other.BF && It->first == Other.It->first; +} + }; + + std::variant Reference; + + // Utility methods to be used like this: + // + // if (auto *Ref = tryGetRefInBB()) + // return Ref->doSomething(...); + // return getRefInBF().doSomethingElse(...); + const RefInBB *tryGetRefInBB() const { +assert(std::get_if(&Reference) || + std::get_if(&Reference)); +return std::get_if(&Reference); + } + const RefInBF &getRefInBF() const { +assert(std::get_if(&Reference)); +return *std::get_if(&Reference); + } + +public: + /// Constructs an empty reference. + MCInstReference() : Reference(RefInBB(nullptr, nullptr)) {} + /// Constructs a reference to the instruction inside the basic block. + MCInstReference(const BinaryBasicBlock *BB, const MCInst *Inst) + : Reference(RefInBB(BB, Inst)) { +assert(BB && Inst && "Neither BB nor Inst should be nullptr"); + } + /// Constructs a reference to the instruction inside the basic block. + MCInstReference(const BinaryBasicBlock *BB, unsigned Index) + : Reference(RefInBB(BB, &BB->getInstructionAtIndex(I
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: improve handling of unreachable basic blocks (PR #136183)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/136183 >From c63cd7528660a41bf95821648defc6cdb0e09d0a Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Thu, 17 Apr 2025 20:51:16 +0300 Subject: [PATCH 1/3] [BOLT] Gadget scanner: improve handling of unreachable basic blocks Instead of refusing to analyze an instruction completely, when it is unreachable according to the CFG reconstructed by BOLT, pessimistically assume all registers to be unsafe at the start of basic blocks without any predecessors. Nevertheless, unreachable basic blocks found in optimized code likely means imprecise CFG reconstruction, thus report a warning once per basic block without predecessors. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 46 ++- .../AArch64/gs-pacret-autiasp.s | 7 ++- .../binary-analysis/AArch64/gs-pauth-calls.s | 57 +++ 3 files changed, 95 insertions(+), 15 deletions(-) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 25be23d64463e..c20c0921d4a17 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -342,6 +342,12 @@ class SrcSafetyAnalysis { return S; } + /// Creates a state with all registers marked unsafe (not to be confused + /// with empty state). + SrcState createUnsafeState() const { +return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); + } + BitVector getClobberedRegs(const MCInst &Point) const { BitVector Clobbered(NumRegs); // Assume a call can clobber all registers, including callee-saved @@ -585,6 +591,13 @@ class DataflowSrcSafetyAnalysis if (BB.isEntryPoint()) return createEntryState(); +// If a basic block without any predecessors is found in an optimized code, +// this likely means that some CFG edges were not detected. Pessimistically +// assume all registers to be unsafe before this basic block and warn about +// this fact in FunctionAnalysis::findUnsafeUses(). +if (BB.pred_empty()) + return createUnsafeState(); + return SrcState(); } @@ -689,12 +702,6 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis, using SrcSafetyAnalysis::BC; BinaryFunction &BF; - /// Creates a state with all registers marked unsafe (not to be confused - /// with empty state). - SrcState createUnsafeState() const { -return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); - } - public: CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId, @@ -1364,19 +1371,30 @@ void FunctionAnalysisContext::findUnsafeUses( BF.dump(); }); + if (BF.hasCFG()) { +// Warn on basic blocks being unreachable according to BOLT, as this +// likely means CFG is imprecise. +for (BinaryBasicBlock &BB : BF) { + if (!BB.pred_empty() || BB.isEntryPoint()) +continue; + // Arbitrarily attach the report to the first instruction of BB. + MCInst *InstToReport = BB.getFirstNonPseudoInstr(); + if (!InstToReport) +continue; // BB has no real instructions + + Reports.push_back( + make_generic_report(MCInstReference::get(InstToReport, BF), + "Warning: no predecessor basic blocks detected " + "(possibly incomplete CFG)")); +} + } + iterateOverInstrs(BF, [&](MCInstReference Inst) { if (BC.MIB->isCFI(Inst)) return; const SrcState &S = Analysis->getStateBefore(Inst); - -// If non-empty state was never propagated from the entry basic block -// to Inst, assume it to be unreachable and report a warning. -if (S.empty()) { - Reports.push_back( - make_generic_report(Inst, "Warning: unreachable instruction found")); - return; -} +assert(!S.empty() && "Instruction has no associated state"); if (auto Report = shouldReportReturnGadget(BC, Inst, S)) Reports.push_back(*Report); diff --git a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s index 284f0bea607a5..6559ba336e8de 100644 --- a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s +++ b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s @@ -215,12 +215,17 @@ f_callclobbered_calleesaved: .globl f_unreachable_instruction .type f_unreachable_instruction,@function f_unreachable_instruction: -// CHECK-LABEL: GS-PAUTH: Warning: unreachable instruction found in function f_unreachable_instruction, basic block {{[0-9a-zA-Z.]+}}, at address +// CHECK-LABEL: GS-PAUTH: Warning: no predecessor basic blocks detected (possibly incomplete CFG) in function f_unreachable_instruction, basic block {{[0-9a-zA-Z.]+}}, at address // CHECK-NEXT:The instruction is {{[0-9a-f]+}}: add x0, x1, x2 // CHECK-NOT: instructions that write t
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: do not crash on debug-printing CFI instructions (PR #136151)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/136151 >From 9d8fedf678fe91ca1d7ac3334747227df335ff2c Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 15 Apr 2025 21:47:18 +0300 Subject: [PATCH] [BOLT] Gadget scanner: do not crash on debug-printing CFI instructions Some instruction-printing code used under LLVM_DEBUG does not handle CFI instructions well. While CFI instructions seem to be harmless for the correctness of the analysis results, they do not convey any useful information to the analysis either, so skip them early. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 16 ++ .../AArch64/gs-pauth-debug-output.s | 32 +++ 2 files changed, 48 insertions(+) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 345af32650624..25be23d64463e 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -430,6 +430,9 @@ class SrcSafetyAnalysis { } SrcState computeNext(const MCInst &Point, const SrcState &Cur) { +if (BC.MIB->isCFI(Point)) + return Cur; + SrcStatePrinter P(BC); LLVM_DEBUG({ dbgs() << " SrcSafetyAnalysis::ComputeNext("; @@ -704,6 +707,8 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis, SrcState S = createEntryState(); for (auto &I : BF.instrs()) { MCInst &Inst = I.second; + if (BC.MIB->isCFI(Inst)) +continue; // If there is a label before this instruction, it is possible that it // can be jumped-to, thus conservatively resetting S. As an exception, @@ -998,6 +1003,9 @@ class DstSafetyAnalysis { } DstState computeNext(const MCInst &Point, const DstState &Cur) { +if (BC.MIB->isCFI(Point)) + return Cur; + DstStatePrinter P(BC); LLVM_DEBUG({ dbgs() << " DstSafetyAnalysis::ComputeNext("; @@ -1165,6 +1173,8 @@ class CFGUnawareDstSafetyAnalysis : public DstSafetyAnalysis, DstState S = createUnsafeState(); for (auto &I : llvm::reverse(BF.instrs())) { MCInst &Inst = I.second; + if (BC.MIB->isCFI(Inst)) +continue; // If Inst can change the control flow, we cannot be sure that the next // instruction (to be executed in analyzed program) is the one processed @@ -1355,6 +1365,9 @@ void FunctionAnalysisContext::findUnsafeUses( }); iterateOverInstrs(BF, [&](MCInstReference Inst) { +if (BC.MIB->isCFI(Inst)) + return; + const SrcState &S = Analysis->getStateBefore(Inst); // If non-empty state was never propagated from the entry basic block @@ -1418,6 +1431,9 @@ void FunctionAnalysisContext::findUnsafeDefs( }); iterateOverInstrs(BF, [&](MCInstReference Inst) { +if (BC.MIB->isCFI(Inst)) + return; + const DstState &S = Analysis->getStateAfter(Inst); if (auto Report = shouldReportAuthOracle(BC, Inst, S)) diff --git a/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s b/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s index 61aa84377b88e..5aec945621987 100644 --- a/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s +++ b/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s @@ -329,6 +329,38 @@ auth_oracle: // PAUTH-EMPTY: // PAUTH-NEXT: Attaching leakage info to: : autia x0, x1 # DataflowDstSafetyAnalysis: dst-state +// Gadget scanner should not crash on CFI instructions, including when debug-printing them. +// Note that the particular debug output is not checked, but BOLT should be +// compiled with assertions enabled to support -debug-only argument. + +.globl cfi_inst_df +.type cfi_inst_df,@function +cfi_inst_df: +.cfi_startproc +sub sp, sp, #16 +.cfi_def_cfa_offset 16 +add sp, sp, #16 +.cfi_def_cfa_offset 0 +ret +.size cfi_inst_df, .-cfi_inst_df +.cfi_endproc + +.globl cfi_inst_nocfg +.type cfi_inst_nocfg,@function +cfi_inst_nocfg: +.cfi_startproc +sub sp, sp, #16 +.cfi_def_cfa_offset 16 + +adr x0, 1f +br x0 +1: +add sp, sp, #16 +.cfi_def_cfa_offset 0 +ret +.size cfi_inst_nocfg, .-cfi_inst_nocfg +.cfi_endproc + // CHECK-LABEL:Analyzing function main, AllocatorId = 1 .globl main .type main,@function ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: improve handling of unreachable basic blocks (PR #136183)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/136183 >From c63cd7528660a41bf95821648defc6cdb0e09d0a Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Thu, 17 Apr 2025 20:51:16 +0300 Subject: [PATCH 1/3] [BOLT] Gadget scanner: improve handling of unreachable basic blocks Instead of refusing to analyze an instruction completely, when it is unreachable according to the CFG reconstructed by BOLT, pessimistically assume all registers to be unsafe at the start of basic blocks without any predecessors. Nevertheless, unreachable basic blocks found in optimized code likely means imprecise CFG reconstruction, thus report a warning once per basic block without predecessors. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 46 ++- .../AArch64/gs-pacret-autiasp.s | 7 ++- .../binary-analysis/AArch64/gs-pauth-calls.s | 57 +++ 3 files changed, 95 insertions(+), 15 deletions(-) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 25be23d64463e..c20c0921d4a17 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -342,6 +342,12 @@ class SrcSafetyAnalysis { return S; } + /// Creates a state with all registers marked unsafe (not to be confused + /// with empty state). + SrcState createUnsafeState() const { +return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); + } + BitVector getClobberedRegs(const MCInst &Point) const { BitVector Clobbered(NumRegs); // Assume a call can clobber all registers, including callee-saved @@ -585,6 +591,13 @@ class DataflowSrcSafetyAnalysis if (BB.isEntryPoint()) return createEntryState(); +// If a basic block without any predecessors is found in an optimized code, +// this likely means that some CFG edges were not detected. Pessimistically +// assume all registers to be unsafe before this basic block and warn about +// this fact in FunctionAnalysis::findUnsafeUses(). +if (BB.pred_empty()) + return createUnsafeState(); + return SrcState(); } @@ -689,12 +702,6 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis, using SrcSafetyAnalysis::BC; BinaryFunction &BF; - /// Creates a state with all registers marked unsafe (not to be confused - /// with empty state). - SrcState createUnsafeState() const { -return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); - } - public: CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId, @@ -1364,19 +1371,30 @@ void FunctionAnalysisContext::findUnsafeUses( BF.dump(); }); + if (BF.hasCFG()) { +// Warn on basic blocks being unreachable according to BOLT, as this +// likely means CFG is imprecise. +for (BinaryBasicBlock &BB : BF) { + if (!BB.pred_empty() || BB.isEntryPoint()) +continue; + // Arbitrarily attach the report to the first instruction of BB. + MCInst *InstToReport = BB.getFirstNonPseudoInstr(); + if (!InstToReport) +continue; // BB has no real instructions + + Reports.push_back( + make_generic_report(MCInstReference::get(InstToReport, BF), + "Warning: no predecessor basic blocks detected " + "(possibly incomplete CFG)")); +} + } + iterateOverInstrs(BF, [&](MCInstReference Inst) { if (BC.MIB->isCFI(Inst)) return; const SrcState &S = Analysis->getStateBefore(Inst); - -// If non-empty state was never propagated from the entry basic block -// to Inst, assume it to be unreachable and report a warning. -if (S.empty()) { - Reports.push_back( - make_generic_report(Inst, "Warning: unreachable instruction found")); - return; -} +assert(!S.empty() && "Instruction has no associated state"); if (auto Report = shouldReportReturnGadget(BC, Inst, S)) Reports.push_back(*Report); diff --git a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s index 284f0bea607a5..6559ba336e8de 100644 --- a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s +++ b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s @@ -215,12 +215,17 @@ f_callclobbered_calleesaved: .globl f_unreachable_instruction .type f_unreachable_instruction,@function f_unreachable_instruction: -// CHECK-LABEL: GS-PAUTH: Warning: unreachable instruction found in function f_unreachable_instruction, basic block {{[0-9a-zA-Z.]+}}, at address +// CHECK-LABEL: GS-PAUTH: Warning: no predecessor basic blocks detected (possibly incomplete CFG) in function f_unreachable_instruction, basic block {{[0-9a-zA-Z.]+}}, at address // CHECK-NEXT:The instruction is {{[0-9a-f]+}}: add x0, x1, x2 // CHECK-NOT: instructions that write t
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: do not crash on debug-printing CFI instructions (PR #136151)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/136151 >From 9d8fedf678fe91ca1d7ac3334747227df335ff2c Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 15 Apr 2025 21:47:18 +0300 Subject: [PATCH] [BOLT] Gadget scanner: do not crash on debug-printing CFI instructions Some instruction-printing code used under LLVM_DEBUG does not handle CFI instructions well. While CFI instructions seem to be harmless for the correctness of the analysis results, they do not convey any useful information to the analysis either, so skip them early. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 16 ++ .../AArch64/gs-pauth-debug-output.s | 32 +++ 2 files changed, 48 insertions(+) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 345af32650624..25be23d64463e 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -430,6 +430,9 @@ class SrcSafetyAnalysis { } SrcState computeNext(const MCInst &Point, const SrcState &Cur) { +if (BC.MIB->isCFI(Point)) + return Cur; + SrcStatePrinter P(BC); LLVM_DEBUG({ dbgs() << " SrcSafetyAnalysis::ComputeNext("; @@ -704,6 +707,8 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis, SrcState S = createEntryState(); for (auto &I : BF.instrs()) { MCInst &Inst = I.second; + if (BC.MIB->isCFI(Inst)) +continue; // If there is a label before this instruction, it is possible that it // can be jumped-to, thus conservatively resetting S. As an exception, @@ -998,6 +1003,9 @@ class DstSafetyAnalysis { } DstState computeNext(const MCInst &Point, const DstState &Cur) { +if (BC.MIB->isCFI(Point)) + return Cur; + DstStatePrinter P(BC); LLVM_DEBUG({ dbgs() << " DstSafetyAnalysis::ComputeNext("; @@ -1165,6 +1173,8 @@ class CFGUnawareDstSafetyAnalysis : public DstSafetyAnalysis, DstState S = createUnsafeState(); for (auto &I : llvm::reverse(BF.instrs())) { MCInst &Inst = I.second; + if (BC.MIB->isCFI(Inst)) +continue; // If Inst can change the control flow, we cannot be sure that the next // instruction (to be executed in analyzed program) is the one processed @@ -1355,6 +1365,9 @@ void FunctionAnalysisContext::findUnsafeUses( }); iterateOverInstrs(BF, [&](MCInstReference Inst) { +if (BC.MIB->isCFI(Inst)) + return; + const SrcState &S = Analysis->getStateBefore(Inst); // If non-empty state was never propagated from the entry basic block @@ -1418,6 +1431,9 @@ void FunctionAnalysisContext::findUnsafeDefs( }); iterateOverInstrs(BF, [&](MCInstReference Inst) { +if (BC.MIB->isCFI(Inst)) + return; + const DstState &S = Analysis->getStateAfter(Inst); if (auto Report = shouldReportAuthOracle(BC, Inst, S)) diff --git a/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s b/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s index 61aa84377b88e..5aec945621987 100644 --- a/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s +++ b/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s @@ -329,6 +329,38 @@ auth_oracle: // PAUTH-EMPTY: // PAUTH-NEXT: Attaching leakage info to: : autia x0, x1 # DataflowDstSafetyAnalysis: dst-state +// Gadget scanner should not crash on CFI instructions, including when debug-printing them. +// Note that the particular debug output is not checked, but BOLT should be +// compiled with assertions enabled to support -debug-only argument. + +.globl cfi_inst_df +.type cfi_inst_df,@function +cfi_inst_df: +.cfi_startproc +sub sp, sp, #16 +.cfi_def_cfa_offset 16 +add sp, sp, #16 +.cfi_def_cfa_offset 0 +ret +.size cfi_inst_df, .-cfi_inst_df +.cfi_endproc + +.globl cfi_inst_nocfg +.type cfi_inst_nocfg,@function +cfi_inst_nocfg: +.cfi_startproc +sub sp, sp, #16 +.cfi_def_cfa_offset 16 + +adr x0, 1f +br x0 +1: +add sp, sp, #16 +.cfi_def_cfa_offset 0 +ret +.size cfi_inst_nocfg, .-cfi_inst_nocfg +.cfi_endproc + // CHECK-LABEL:Analyzing function main, AllocatorId = 1 .globl main .type main,@function ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits