[llvm-branch-commits] [llvm] [BOLT] Factor out MCInstReference from gadget scanner (NFC) (PR #138655)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/138655 >From 9745bc91f93b047958c35c98499fa4e0f81a5113 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Thu, 19 Jun 2025 14:03:59 +0300 Subject: [PATCH] [BOLT] Factor out MCInstReference from gadget scanner (NFC) Move MCInstReference representing a constant reference to an instruction inside a parent entity - either inside a basic block (which has a reference to its parent function) or directly to the function (when CFG information is not available). --- bolt/include/bolt/Core/MCInstUtils.h | 168 + bolt/include/bolt/Passes/PAuthGadgetScanner.h | 176 +- bolt/lib/Core/CMakeLists.txt | 1 + bolt/lib/Core/MCInstUtils.cpp | 57 ++ bolt/lib/Passes/PAuthGadgetScanner.cpp| 102 +- 5 files changed, 269 insertions(+), 235 deletions(-) create mode 100644 bolt/include/bolt/Core/MCInstUtils.h create mode 100644 bolt/lib/Core/MCInstUtils.cpp diff --git a/bolt/include/bolt/Core/MCInstUtils.h b/bolt/include/bolt/Core/MCInstUtils.h new file mode 100644 index 0..69bf5e6159b74 --- /dev/null +++ b/bolt/include/bolt/Core/MCInstUtils.h @@ -0,0 +1,168 @@ +//===- bolt/Core/MCInstUtils.h --*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#ifndef BOLT_CORE_MCINSTUTILS_H +#define BOLT_CORE_MCINSTUTILS_H + +#include "bolt/Core/BinaryBasicBlock.h" + +#include +#include +#include + +namespace llvm { +namespace bolt { + +class BinaryFunction; + +/// MCInstReference represents a reference to a constant MCInst as stored either +/// in a BinaryFunction (i.e. before a CFG is created), or in a BinaryBasicBlock +/// (after a CFG is created). +class MCInstReference { + using nocfg_const_iterator = std::map::const_iterator; + + // Two cases are possible: + // * functions with CFG reconstructed - a function stores a collection of + // basic blocks, each basic block stores a contiguous vector of MCInst + // * functions without CFG - there are no basic blocks created, + // the instructions are directly stored in std::map in BinaryFunction + // + // In both cases, the direct parent of MCInst is stored together with an + // iterator pointing to the instruction. + + // Helper struct: CFG is available, the direct parent is a basic block, + // iterator's type is `MCInst *`. + struct RefInBB { +RefInBB(const BinaryBasicBlock *BB, const MCInst *Inst) +: BB(BB), It(Inst) {} +RefInBB(const RefInBB &Other) = default; +RefInBB &operator=(const RefInBB &Other) = default; + +const BinaryBasicBlock *BB; +BinaryBasicBlock::const_iterator It; + +bool operator<(const RefInBB &Other) const { + return std::tie(BB, It) < std::tie(Other.BB, Other.It); +} + +bool operator==(const RefInBB &Other) const { + return BB == Other.BB && It == Other.It; +} + }; + + // Helper struct: CFG is *not* available, the direct parent is a function, + // iterator's type is std::map::iterator (the mapped value + // is an instruction's offset). + struct RefInBF { +RefInBF(const BinaryFunction *BF, nocfg_const_iterator It) +: BF(BF), It(It) {} +RefInBF(const RefInBF &Other) = default; +RefInBF &operator=(const RefInBF &Other) = default; + +const BinaryFunction *BF; +nocfg_const_iterator It; + +bool operator<(const RefInBF &Other) const { + return std::tie(BF, It->first) < std::tie(Other.BF, Other.It->first); +} + +bool operator==(const RefInBF &Other) const { + return BF == Other.BF && It->first == Other.It->first; +} + }; + + std::variant Reference; + + // Utility methods to be used like this: + // + // if (auto *Ref = tryGetRefInBB()) + // return Ref->doSomething(...); + // return getRefInBF().doSomethingElse(...); + const RefInBB *tryGetRefInBB() const { +assert(std::get_if(&Reference) || + std::get_if(&Reference)); +return std::get_if(&Reference); + } + const RefInBF &getRefInBF() const { +assert(std::get_if(&Reference)); +return *std::get_if(&Reference); + } + +public: + /// Constructs an empty reference. + MCInstReference() : Reference(RefInBB(nullptr, nullptr)) {} + /// Constructs a reference to the instruction inside the basic block. + MCInstReference(const BinaryBasicBlock *BB, const MCInst *Inst) + : Reference(RefInBB(BB, Inst)) { +assert(BB && Inst && "Neither BB nor Inst should be nullptr"); + } + /// Constructs a reference to the instruction inside the basic block. + MCInstReference(const BinaryBasicBlock *BB, unsigned Index) + : Reference(RefInBB(BB, &BB->getInstructionAtIndex(I
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: optionally assume auth traps on failure (PR #139778)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/139778 >From c44a8d1cc66cb215492838aec4ba5d58b33d0570 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 13 May 2025 19:50:41 +0300 Subject: [PATCH] [BOLT] Gadget scanner: optionally assume auth traps on failure On AArch64 it is possible for an auth instruction to either return an invalid address value on failure (without FEAT_FPAC) or generate an error (with FEAT_FPAC). It thus may be possible to never emit explicit pointer checks, if the target CPU is known to support FEAT_FPAC. This commit implements an --auth-traps-on-failure command line option, which essentially makes "safe-to-dereference" and "trusted" register properties identical and disables scanning for authentication oracles completely. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 112 +++ .../binary-analysis/AArch64/cmdline-args.test | 1 + .../AArch64/gs-pauth-authentication-oracles.s | 6 +- .../binary-analysis/AArch64/gs-pauth-calls.s | 5 +- .../AArch64/gs-pauth-debug-output.s | 177 ++--- .../AArch64/gs-pauth-jump-table.s | 6 +- .../AArch64/gs-pauth-signing-oracles.s| 54 ++--- .../AArch64/gs-pauth-tail-calls.s | 184 +- 8 files changed, 318 insertions(+), 227 deletions(-) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index d218ff622b75c..b540a8d0b7ee7 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -14,6 +14,7 @@ #include "bolt/Passes/PAuthGadgetScanner.h" #include "bolt/Core/ParallelUtilities.h" #include "bolt/Passes/DataflowAnalysis.h" +#include "bolt/Utils/CommandLineOpts.h" #include "llvm/ADT/STLExtras.h" #include "llvm/ADT/SmallSet.h" #include "llvm/MC/MCInst.h" @@ -26,6 +27,11 @@ namespace llvm { namespace bolt { namespace PAuthGadgetScanner { +static cl::opt AuthTrapsOnFailure( +"auth-traps-on-failure", +cl::desc("Assume authentication instructions always trap on failure"), +cl::cat(opts::BinaryAnalysisCategory)); + [[maybe_unused]] static void traceInst(const BinaryContext &BC, StringRef Label, const MCInst &MI) { dbgs() << " " << Label << ": "; @@ -364,6 +370,34 @@ class SrcSafetyAnalysis { return Clobbered; } + std::optional getRegMadeTrustedByChecking(const MCInst &Inst, + SrcState Cur) const { +// This functions cannot return multiple registers. This is never the case +// on AArch64. +std::optional RegCheckedByInst = +BC.MIB->getAuthCheckedReg(Inst, /*MayOverwrite=*/false); +if (RegCheckedByInst && Cur.SafeToDerefRegs[*RegCheckedByInst]) + return *RegCheckedByInst; + +auto It = CheckerSequenceInfo.find(&Inst); +if (It == CheckerSequenceInfo.end()) + return std::nullopt; + +MCPhysReg RegCheckedBySequence = It->second.first; +const MCInst *FirstCheckerInst = It->second.second; + +// FirstCheckerInst should belong to the same basic block (see the +// assertion in DataflowSrcSafetyAnalysis::run()), meaning it was +// deterministically processed a few steps before this instruction. +const SrcState &StateBeforeChecker = getStateBefore(*FirstCheckerInst); + +// The sequence checks the register, but it should be authenticated before. +if (!StateBeforeChecker.SafeToDerefRegs[RegCheckedBySequence]) + return std::nullopt; + +return RegCheckedBySequence; + } + // Returns all registers that can be treated as if they are written by an // authentication instruction. SmallVector getRegsMadeSafeToDeref(const MCInst &Point, @@ -386,18 +420,38 @@ class SrcSafetyAnalysis { Regs.push_back(DstAndSrc->first); } +// Make sure explicit checker sequence keeps register safe-to-dereference +// when the register would be clobbered according to the regular rules: +// +//; LR is safe to dereference here +//mov x16, x30 ; start of the sequence, LR is s-t-d right before +//xpaclri ; clobbers LR, LR is not safe anymore +//cmp x30, x16 +//b.eq 1f; end of the sequence: LR is marked as trusted +//brk 0x1234 +// 1: +//; at this point LR would be marked as trusted, +//; but not safe-to-dereference +// +// or even just +// +//; X1 is safe to dereference here +//ldr x0, [x1, #8]! +//; X1 is trusted here, but it was clobbered due to address write-back +if (auto CheckedReg = getRegMadeTrustedByChecking(Point, Cur)) + Regs.push_back(*CheckedReg); + return Regs; } // Returns all registers made trusted by this instruction. SmallVector getRegsMadeTrusted(const MCInst &Point, const SrcState &Cur) const { +assert(!AuthTrapsOnFailure &&
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: optionally assume auth traps on failure (PR #139778)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/139778 >From c44a8d1cc66cb215492838aec4ba5d58b33d0570 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 13 May 2025 19:50:41 +0300 Subject: [PATCH] [BOLT] Gadget scanner: optionally assume auth traps on failure On AArch64 it is possible for an auth instruction to either return an invalid address value on failure (without FEAT_FPAC) or generate an error (with FEAT_FPAC). It thus may be possible to never emit explicit pointer checks, if the target CPU is known to support FEAT_FPAC. This commit implements an --auth-traps-on-failure command line option, which essentially makes "safe-to-dereference" and "trusted" register properties identical and disables scanning for authentication oracles completely. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 112 +++ .../binary-analysis/AArch64/cmdline-args.test | 1 + .../AArch64/gs-pauth-authentication-oracles.s | 6 +- .../binary-analysis/AArch64/gs-pauth-calls.s | 5 +- .../AArch64/gs-pauth-debug-output.s | 177 ++--- .../AArch64/gs-pauth-jump-table.s | 6 +- .../AArch64/gs-pauth-signing-oracles.s| 54 ++--- .../AArch64/gs-pauth-tail-calls.s | 184 +- 8 files changed, 318 insertions(+), 227 deletions(-) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index d218ff622b75c..b540a8d0b7ee7 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -14,6 +14,7 @@ #include "bolt/Passes/PAuthGadgetScanner.h" #include "bolt/Core/ParallelUtilities.h" #include "bolt/Passes/DataflowAnalysis.h" +#include "bolt/Utils/CommandLineOpts.h" #include "llvm/ADT/STLExtras.h" #include "llvm/ADT/SmallSet.h" #include "llvm/MC/MCInst.h" @@ -26,6 +27,11 @@ namespace llvm { namespace bolt { namespace PAuthGadgetScanner { +static cl::opt AuthTrapsOnFailure( +"auth-traps-on-failure", +cl::desc("Assume authentication instructions always trap on failure"), +cl::cat(opts::BinaryAnalysisCategory)); + [[maybe_unused]] static void traceInst(const BinaryContext &BC, StringRef Label, const MCInst &MI) { dbgs() << " " << Label << ": "; @@ -364,6 +370,34 @@ class SrcSafetyAnalysis { return Clobbered; } + std::optional getRegMadeTrustedByChecking(const MCInst &Inst, + SrcState Cur) const { +// This functions cannot return multiple registers. This is never the case +// on AArch64. +std::optional RegCheckedByInst = +BC.MIB->getAuthCheckedReg(Inst, /*MayOverwrite=*/false); +if (RegCheckedByInst && Cur.SafeToDerefRegs[*RegCheckedByInst]) + return *RegCheckedByInst; + +auto It = CheckerSequenceInfo.find(&Inst); +if (It == CheckerSequenceInfo.end()) + return std::nullopt; + +MCPhysReg RegCheckedBySequence = It->second.first; +const MCInst *FirstCheckerInst = It->second.second; + +// FirstCheckerInst should belong to the same basic block (see the +// assertion in DataflowSrcSafetyAnalysis::run()), meaning it was +// deterministically processed a few steps before this instruction. +const SrcState &StateBeforeChecker = getStateBefore(*FirstCheckerInst); + +// The sequence checks the register, but it should be authenticated before. +if (!StateBeforeChecker.SafeToDerefRegs[RegCheckedBySequence]) + return std::nullopt; + +return RegCheckedBySequence; + } + // Returns all registers that can be treated as if they are written by an // authentication instruction. SmallVector getRegsMadeSafeToDeref(const MCInst &Point, @@ -386,18 +420,38 @@ class SrcSafetyAnalysis { Regs.push_back(DstAndSrc->first); } +// Make sure explicit checker sequence keeps register safe-to-dereference +// when the register would be clobbered according to the regular rules: +// +//; LR is safe to dereference here +//mov x16, x30 ; start of the sequence, LR is s-t-d right before +//xpaclri ; clobbers LR, LR is not safe anymore +//cmp x30, x16 +//b.eq 1f; end of the sequence: LR is marked as trusted +//brk 0x1234 +// 1: +//; at this point LR would be marked as trusted, +//; but not safe-to-dereference +// +// or even just +// +//; X1 is safe to dereference here +//ldr x0, [x1, #8]! +//; X1 is trusted here, but it was clobbered due to address write-back +if (auto CheckedReg = getRegMadeTrustedByChecking(Point, Cur)) + Regs.push_back(*CheckedReg); + return Regs; } // Returns all registers made trusted by this instruction. SmallVector getRegsMadeTrusted(const MCInst &Point, const SrcState &Cur) const { +assert(!AuthTrapsOnFailure &&
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/137224 >From 27d75c4248864d12381aac765674106f573de923 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 22 Apr 2025 21:43:14 +0300 Subject: [PATCH] [BOLT] Gadget scanner: detect untrusted LR before tail call Implement the detection of tail calls performed with untrusted link register, which violates the assumption made on entry to every function. Unlike other pauth gadgets, this one involves some amount of guessing which branch instructions should be checked as tail calls. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 80 +++ .../AArch64/gs-pauth-tail-calls.s | 597 ++ 2 files changed, 677 insertions(+) create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-tail-calls.s diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 2eadaf15d3a65..0a3948e2e278e 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -1319,6 +1319,83 @@ shouldReportReturnGadget(const BinaryContext &BC, const MCInstReference &Inst, return make_gadget_report(RetKind, Inst, *RetReg); } +/// While BOLT already marks some of the branch instructions as tail calls, +/// this function tries to improve the coverage by including less obvious cases +/// when it is possible to do without introducing too many false positives. +static bool shouldAnalyzeTailCallInst(const BinaryContext &BC, + const BinaryFunction &BF, + const MCInstReference &Inst) { + // Some BC.MIB->isXYZ(Inst) methods simply delegate to MCInstrDesc::isXYZ() + // (such as isBranch at the time of writing this comment), some don't (such + // as isCall). For that reason, call MCInstrDesc's methods explicitly when + // it is important. + const MCInstrDesc &Desc = + BC.MII->get(static_cast(Inst).getOpcode()); + // Tail call should be a branch (but not necessarily an indirect one). + if (!Desc.isBranch()) +return false; + + // Always analyze the branches already marked as tail calls by BOLT. + if (BC.MIB->isTailCall(Inst)) +return true; + + // Try to also check the branches marked as "UNKNOWN CONTROL FLOW" - the + // below is a simplified condition from BinaryContext::printInstruction. + bool IsUnknownControlFlow = + BC.MIB->isIndirectBranch(Inst) && !BC.MIB->getJumpTable(Inst); + + if (BF.hasCFG() && IsUnknownControlFlow) +return true; + + return false; +} + +static std::optional> +shouldReportUnsafeTailCall(const BinaryContext &BC, const BinaryFunction &BF, + const MCInstReference &Inst, const SrcState &S) { + static const GadgetKind UntrustedLRKind( + "untrusted link register found before tail call"); + + if (!shouldAnalyzeTailCallInst(BC, BF, Inst)) +return std::nullopt; + + // Not only the set of registers returned by getTrustedLiveInRegs() can be + // seen as a reasonable target-independent _approximation_ of "the LR", these + // are *exactly* those registers used by SrcSafetyAnalysis to initialize the + // set of trusted registers on function entry. + // Thus, this function basically checks that the precondition expected to be + // imposed by a function call instruction (which is hardcoded into the target- + // specific getTrustedLiveInRegs() function) is also respected on tail calls. + SmallVector RegsToCheck = BC.MIB->getTrustedLiveInRegs(); + LLVM_DEBUG({ +traceInst(BC, "Found tail call inst", Inst); +traceRegMask(BC, "Trusted regs", S.TrustedRegs); + }); + + // In musl on AArch64, the _start function sets LR to zero and calls the next + // stage initialization function at the end, something along these lines: + // + // _start: + // mov x30, #0 + // ; ... other initialization ... + // b _start_c ; performs "exit" system call at some point + // + // As this would produce a false positive for every executable linked with + // such libc, ignore tail calls performed by ELF entry function. + if (BC.StartFunctionAddress && + *BC.StartFunctionAddress == Inst.getFunction()->getAddress()) { +LLVM_DEBUG({ dbgs() << " Skipping tail call in ELF entry function.\n"; }); +return std::nullopt; + } + + // Returns at most one report per instruction - this is probably OK... + for (auto Reg : RegsToCheck) +if (!S.TrustedRegs[Reg]) + return make_gadget_report(UntrustedLRKind, Inst, Reg); + + return std::nullopt; +} + static std::optional> shouldReportCallGadget(const BinaryContext &BC, const MCInstReference &Inst, const SrcState &S) { @@ -1478,6 +1555,9 @@ void FunctionAnalysisContext::findUnsafeUses( if (PacRetGadgetsOnly) return; +if (auto Report = shouldReportUnsafeTailCall(BC, BF, Inst, S)) + Reports.push_back(*Report); + if (auto Report = shouldReportCallGadget(BC, Inst, S))
[llvm-branch-commits] [llvm] [BOLT] Introduce helpers to match `MCInst`s one at a time (NFC) (PR #138883)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/138883 >From cf77e61451a082bcdcc34cb35639436cccf6f1c6 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Wed, 7 May 2025 16:42:00 +0300 Subject: [PATCH] [BOLT] Introduce helpers to match `MCInst`s one at a time (NFC) Introduce matchInst helper function to capture and/or match the operands of MCInst. Unlike the existing `MCPlusBuilder::MCInstMatcher` machinery, matchInst is intended for the use cases when precise control over the instruction order is required. For example, when validating PtrAuth hardening, all registers are usually considered unsafe after a function call, even though callee-saved registers should preserve their old values *under normal operation*. --- bolt/include/bolt/Core/MCInstUtils.h | 128 ++ .../Target/AArch64/AArch64MCPlusBuilder.cpp | 90 +--- 2 files changed, 162 insertions(+), 56 deletions(-) diff --git a/bolt/include/bolt/Core/MCInstUtils.h b/bolt/include/bolt/Core/MCInstUtils.h index 69bf5e6159b74..50b7d56470c99 100644 --- a/bolt/include/bolt/Core/MCInstUtils.h +++ b/bolt/include/bolt/Core/MCInstUtils.h @@ -162,6 +162,134 @@ static inline raw_ostream &operator<<(raw_ostream &OS, return Ref.print(OS); } +/// Instruction-matching helpers operating on a single instruction at a time. +/// +/// Unlike MCPlusBuilder::MCInstMatcher, this matchInst() function focuses on +/// the cases where a precise control over the instruction order is important: +/// +/// // Bring the short names into the local scope: +/// using namespace MCInstMatcher; +/// // Declare the registers to capture: +/// Reg Xn, Xm; +/// // Capture the 0th and 1st operands, match the 2nd operand against the +/// // just captured Xm register, match the 3rd operand against literal 0: +/// if (!matchInst(MaybeAdd, AArch64::ADDXrs, Xm, Xn, Xm, Imm(0)) +/// return AArch64::NoRegister; +/// // Match the 0th operand against Xm: +/// if (!matchInst(MaybeBr, AArch64::BR, Xm)) +/// return AArch64::NoRegister; +/// // Return the matched register: +/// return Xm.get(); +namespace MCInstMatcher { + +// The base class to match an operand of type T. +// +// The subclasses of OpMatcher are intended to be allocated on the stack and +// to only be used by passing them to matchInst() and by calling their get() +// function, thus the peculiar `mutable` specifiers: to make the calling code +// compact and readable, the templated matchInst() function has to accept both +// long-lived Imm/Reg wrappers declared as local variables (intended to capture +// the first operand's value and match the subsequent operands, whether inside +// a single instruction or across multiple instructions), as well as temporary +// wrappers around literal values to match, f.e. Imm(42) or Reg(AArch64::XZR). +template class OpMatcher { + mutable std::optional Value; + mutable std::optional SavedValue; + + // Remember/restore the last Value - to be called by matchInst. + void remember() const { SavedValue = Value; } + void restore() const { Value = SavedValue; } + + template + friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...); + +protected: + OpMatcher(std::optional ValueToMatch) : Value(ValueToMatch) {} + + bool matchValue(T OpValue) const { +// Check that OpValue does not contradict the existing Value. +bool MatchResult = !Value || *Value == OpValue; +// If MatchResult is false, all matchers will be reset before returning from +// matchInst, including this one, thus no need to assign conditionally. +Value = OpValue; + +return MatchResult; + } + +public: + /// Returns the captured value. + T get() const { +assert(Value.has_value()); +return *Value; + } +}; + +class Reg : public OpMatcher { + bool matches(const MCOperand &Op) const { +if (!Op.isReg()) + return false; + +return matchValue(Op.getReg()); + } + + template + friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...); + +public: + Reg(std::optional RegToMatch = std::nullopt) + : OpMatcher(RegToMatch) {} +}; + +class Imm : public OpMatcher { + bool matches(const MCOperand &Op) const { +if (!Op.isImm()) + return false; + +return matchValue(Op.getImm()); + } + + template + friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...); + +public: + Imm(std::optional ImmToMatch = std::nullopt) + : OpMatcher(ImmToMatch) {} +}; + +/// Tries to match Inst and updates Ops on success. +/// +/// If Inst has the specified Opcode and its operand list prefix matches Ops, +/// this function returns true and updates Ops, otherwise false is returned and +/// values of Ops are kept as before matchInst was called. +/// +/// Please note that while Ops are technically passed by a const reference to +/// make invocations like `matchInst(MI, Opcode, Imm(42))` possible, all their +/// fields are marked mut
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: account for BRK when searching for auth oracles (PR #137975)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/137975 >From 51b4bbced62522ffbf421f454e5f091e9e38af8f Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Wed, 30 Apr 2025 16:08:10 +0300 Subject: [PATCH] [BOLT] Gadget scanner: account for BRK when searching for auth oracles An authenticated pointer can be explicitly checked by the compiler via a sequence of instructions that executes BRK on failure. It is important to recognize such BRK instruction as checking every register (as it is expected to immediately trigger an abnormal program termination) to prevent false positive reports about authentication oracles: autia x2, x3 autia x0, x1 ; neither x0 nor x2 are checked at this point eor x16, x0, x0, lsl #1 tbz x16, #62, on_success ; marks x0 as checked ; end of BB: for x2 to be checked here, it must be checked in both ; successor basic blocks on_failure: brk 0xc470 on_success: ; x2 is checked ldr x1, [x2] ; marks x2 as checked --- bolt/include/bolt/Core/MCPlusBuilder.h| 14 ++ bolt/lib/Passes/PAuthGadgetScanner.cpp| 13 +- .../Target/AArch64/AArch64MCPlusBuilder.cpp | 24 -- .../AArch64/gs-pauth-address-checks.s | 44 +-- .../AArch64/gs-pauth-authentication-oracles.s | 9 ++-- .../AArch64/gs-pauth-signing-oracles.s| 6 +-- 6 files changed, 75 insertions(+), 35 deletions(-) diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h index 804100db80793..c31c9984ed43e 100644 --- a/bolt/include/bolt/Core/MCPlusBuilder.h +++ b/bolt/include/bolt/Core/MCPlusBuilder.h @@ -707,6 +707,20 @@ class MCPlusBuilder { return false; } + /// Returns true if Inst is a trap instruction. + /// + /// Tests if Inst is an instruction that immediately causes an abnormal + /// program termination, for example when a security violation is detected + /// by a compiler-inserted check. + /// + /// @note An implementation of this method should likely return false for + /// calls to library functions like abort(), as it is possible that the + /// execution state is partially attacker-controlled at this point. + virtual bool isTrap(const MCInst &Inst) const { +llvm_unreachable("not implemented"); +return false; + } + virtual bool isBreakpoint(const MCInst &Inst) const { llvm_unreachable("not implemented"); return false; diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 0a3948e2e278e..16fd24745bc53 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -1078,6 +1078,15 @@ class DstSafetyAnalysis { dbgs() << ")\n"; }); +// If this instruction terminates the program immediately, no +// authentication oracles are possible past this point. +if (BC.MIB->isTrap(Point)) { + LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); }); + DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); + Next.CannotEscapeUnchecked.set(); + return Next; +} + // If this instruction is reachable by the analysis, a non-empty state will // be propagated to it sooner or later. Until then, skip computeNext(). if (Cur.empty()) { @@ -1185,8 +1194,8 @@ class DataflowDstSafetyAnalysis // // A basic block without any successors, on the other hand, can be // pessimistically initialized to everything-is-unsafe: this will naturally -// handle both return and tail call instructions and is harmless for -// internal indirect branch instructions (such as computed gotos). +// handle return, trap and tail call instructions. At the same time, it is +// harmless for internal indirect branch instructions, like computed gotos. if (BB.succ_empty()) return createUnsafeState(); diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp index 612c1304efd60..68820f86fc47f 100644 --- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp +++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp @@ -383,10 +383,9 @@ class AArch64MCPlusBuilder : public MCPlusBuilder { // the list of successors of this basic block as appropriate. // Any of the above code sequences assume the fall-through basic block -// is a dead-end BRK instruction (any immediate operand is accepted). +// is a dead-end trap instruction. const BinaryBasicBlock *BreakBB = BB.getFallthrough(); -if (!BreakBB || BreakBB->empty() || -BreakBB->front().getOpcode() != AArch64::BRK) +if (!BreakBB || BreakBB->empty() || !isTrap(BreakBB->front())) return std::nullopt; // Iterate over the instructions of BB in reverse order, matching opcodes @@ -1748,6 +1747,25 @@ class AArch64MCPlusBuilder : public MCPlusBuilder { Inst.addOperand(MCOperand::createImm(0)); }
[llvm-branch-commits] [llvm] RuntimeLibcalls: Cleanup darwin exp10 case (PR #145638)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/145638 >From 56a5cab5a8fb95735202f864bb93031f2de5f555 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 25 Jun 2025 14:23:21 +0900 Subject: [PATCH] RuntimeLibcalls: Cleanup darwin exp10 case Add a predicate function following the example of __sincos_stret --- llvm/include/llvm/IR/RuntimeLibcalls.h | 2 ++ llvm/lib/IR/RuntimeLibcalls.cpp| 48 -- 2 files changed, 25 insertions(+), 25 deletions(-) diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.h b/llvm/include/llvm/IR/RuntimeLibcalls.h index 2a095be58a49e..5bd5fd1ce8d3f 100644 --- a/llvm/include/llvm/IR/RuntimeLibcalls.h +++ b/llvm/include/llvm/IR/RuntimeLibcalls.h @@ -149,6 +149,8 @@ struct RuntimeLibcallsInfo { return true; } + static bool darwinHasExp10(const Triple &TT); + /// Return true if the target has sincosf/sincos/sincosl functions static bool hasSinCos(const Triple &TT) { return TT.isGNUEnvironment() || TT.isOSFuchsia() || diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp index e9cb970f804ca..cb8c8457f5a47 100644 --- a/llvm/lib/IR/RuntimeLibcalls.cpp +++ b/llvm/lib/IR/RuntimeLibcalls.cpp @@ -457,33 +457,12 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT, } } -switch (TT.getOS()) { -case Triple::MacOSX: - if (TT.isMacOSXVersionLT(10, 9)) { -setLibcallName(RTLIB::EXP10_F32, nullptr); -setLibcallName(RTLIB::EXP10_F64, nullptr); - } else { -setLibcallName(RTLIB::EXP10_F32, "__exp10f"); -setLibcallName(RTLIB::EXP10_F64, "__exp10"); - } - break; -case Triple::IOS: - if (TT.isOSVersionLT(7, 0)) { -setLibcallName(RTLIB::EXP10_F32, nullptr); -setLibcallName(RTLIB::EXP10_F64, nullptr); -break; - } - [[fallthrough]]; -case Triple::DriverKit: -case Triple::TvOS: -case Triple::WatchOS: -case Triple::XROS: -case Triple::BridgeOS: +if (darwinHasExp10(TT)) { setLibcallName(RTLIB::EXP10_F32, "__exp10f"); setLibcallName(RTLIB::EXP10_F64, "__exp10"); - break; -default: - break; +} else { + setLibcallName(RTLIB::EXP10_F32, nullptr); + setLibcallName(RTLIB::EXP10_F64, nullptr); } } @@ -662,3 +641,22 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT, if (TT.getArch() == Triple::ArchType::msp430) setMSP430Libcalls(*this, TT); } + +bool RuntimeLibcallsInfo::darwinHasExp10(const Triple &TT) { + assert(TT.isOSDarwin() && "should be called with darwin triple"); + + switch (TT.getOS()) { + case Triple::MacOSX: +return !TT.isMacOSXVersionLT(10, 9); + case Triple::IOS: +return !TT.isOSVersionLT(7, 0); + case Triple::DriverKit: + case Triple::TvOS: + case Triple::WatchOS: + case Triple::XROS: + case Triple::BridgeOS: +return true; + default: +return false; + } +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] RuntimeLibcalls: Cleanup darwin bzero configuration (PR #145639)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/145639 >From 9eebced4e8423aeaa8aeb8c13e13e2483c7a3988 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 25 Jun 2025 14:42:24 +0900 Subject: [PATCH] RuntimeLibcalls: Cleanup darwin bzero configuration Write this in a more predicate-apply style instead of the switch. --- llvm/lib/IR/RuntimeLibcalls.cpp | 12 ++-- 1 file changed, 2 insertions(+), 10 deletions(-) diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp index cb8c8457f5a47..5c01d8595d0f9 100644 --- a/llvm/lib/IR/RuntimeLibcalls.cpp +++ b/llvm/lib/IR/RuntimeLibcalls.cpp @@ -432,19 +432,11 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT, setLibcallName(RTLIB::FPROUND_F32_F16, "__truncsfhf2"); // Some darwins have an optimized __bzero/bzero function. -switch (TT.getArch()) { -case Triple::x86: -case Triple::x86_64: +if (TT.isX86()) { if (TT.isMacOSX() && !TT.isMacOSXVersionLT(10, 6)) setLibcallName(RTLIB::BZERO, "__bzero"); - break; -case Triple::aarch64: -case Triple::aarch64_32: +} else if (TT.isAArch64()) setLibcallName(RTLIB::BZERO, "bzero"); - break; -default: - break; -} if (darwinHasSinCosStret(TT)) { setLibcallName(RTLIB::SINCOS_STRET_F32, "__sincosf_stret"); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] RuntimeLibcalls: Cleanup darwin bzero configuration (PR #145639)
https://github.com/fhahn approved this pull request. LGTM, thanks https://github.com/llvm/llvm-project/pull/145639 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: account for BRK when searching for auth oracles (PR #137975)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/137975 >From 51b4bbced62522ffbf421f454e5f091e9e38af8f Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Wed, 30 Apr 2025 16:08:10 +0300 Subject: [PATCH] [BOLT] Gadget scanner: account for BRK when searching for auth oracles An authenticated pointer can be explicitly checked by the compiler via a sequence of instructions that executes BRK on failure. It is important to recognize such BRK instruction as checking every register (as it is expected to immediately trigger an abnormal program termination) to prevent false positive reports about authentication oracles: autia x2, x3 autia x0, x1 ; neither x0 nor x2 are checked at this point eor x16, x0, x0, lsl #1 tbz x16, #62, on_success ; marks x0 as checked ; end of BB: for x2 to be checked here, it must be checked in both ; successor basic blocks on_failure: brk 0xc470 on_success: ; x2 is checked ldr x1, [x2] ; marks x2 as checked --- bolt/include/bolt/Core/MCPlusBuilder.h| 14 ++ bolt/lib/Passes/PAuthGadgetScanner.cpp| 13 +- .../Target/AArch64/AArch64MCPlusBuilder.cpp | 24 -- .../AArch64/gs-pauth-address-checks.s | 44 +-- .../AArch64/gs-pauth-authentication-oracles.s | 9 ++-- .../AArch64/gs-pauth-signing-oracles.s| 6 +-- 6 files changed, 75 insertions(+), 35 deletions(-) diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h index 804100db80793..c31c9984ed43e 100644 --- a/bolt/include/bolt/Core/MCPlusBuilder.h +++ b/bolt/include/bolt/Core/MCPlusBuilder.h @@ -707,6 +707,20 @@ class MCPlusBuilder { return false; } + /// Returns true if Inst is a trap instruction. + /// + /// Tests if Inst is an instruction that immediately causes an abnormal + /// program termination, for example when a security violation is detected + /// by a compiler-inserted check. + /// + /// @note An implementation of this method should likely return false for + /// calls to library functions like abort(), as it is possible that the + /// execution state is partially attacker-controlled at this point. + virtual bool isTrap(const MCInst &Inst) const { +llvm_unreachable("not implemented"); +return false; + } + virtual bool isBreakpoint(const MCInst &Inst) const { llvm_unreachable("not implemented"); return false; diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 0a3948e2e278e..16fd24745bc53 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -1078,6 +1078,15 @@ class DstSafetyAnalysis { dbgs() << ")\n"; }); +// If this instruction terminates the program immediately, no +// authentication oracles are possible past this point. +if (BC.MIB->isTrap(Point)) { + LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); }); + DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); + Next.CannotEscapeUnchecked.set(); + return Next; +} + // If this instruction is reachable by the analysis, a non-empty state will // be propagated to it sooner or later. Until then, skip computeNext(). if (Cur.empty()) { @@ -1185,8 +1194,8 @@ class DataflowDstSafetyAnalysis // // A basic block without any successors, on the other hand, can be // pessimistically initialized to everything-is-unsafe: this will naturally -// handle both return and tail call instructions and is harmless for -// internal indirect branch instructions (such as computed gotos). +// handle return, trap and tail call instructions. At the same time, it is +// harmless for internal indirect branch instructions, like computed gotos. if (BB.succ_empty()) return createUnsafeState(); diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp index 612c1304efd60..68820f86fc47f 100644 --- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp +++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp @@ -383,10 +383,9 @@ class AArch64MCPlusBuilder : public MCPlusBuilder { // the list of successors of this basic block as appropriate. // Any of the above code sequences assume the fall-through basic block -// is a dead-end BRK instruction (any immediate operand is accepted). +// is a dead-end trap instruction. const BinaryBasicBlock *BreakBB = BB.getFallthrough(); -if (!BreakBB || BreakBB->empty() || -BreakBB->front().getOpcode() != AArch64::BRK) +if (!BreakBB || BreakBB->empty() || !isTrap(BreakBB->front())) return std::nullopt; // Iterate over the instructions of BB in reverse order, matching opcodes @@ -1748,6 +1747,25 @@ class AArch64MCPlusBuilder : public MCPlusBuilder { Inst.addOperand(MCOperand::createImm(0)); }
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: prevent false positives due to jump tables (PR #138884)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/138884 >From 7f3d14f3b173b0425c7624e8d62f0188081ef691 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 6 May 2025 11:31:03 +0300 Subject: [PATCH 1/2] [BOLT] Gadget scanner: prevent false positives due to jump tables As part of PAuth hardening, AArch64 LLVM backend can use a special BR_JumpTable pseudo (enabled by -faarch64-jump-table-hardening Clang option) which is expanded in the AsmPrinter into a contiguous sequence without unsafe instructions in the middle. This commit adds another target-specific callback to MCPlusBuilder to make it possible to inhibit false positives for known-safe jump table dispatch sequences. Without special handling, the branch instruction is likely to be reported as a non-protected call (as its destination is not produced by an auth instruction, PC-relative address materialization, etc.) and possibly as a tail call being performed with unsafe link register (as the detection whether the branch instruction is a tail call is an heuristic). For now, only the specific instruction sequence used by the AArch64 LLVM backend is matched. --- bolt/include/bolt/Core/MCInstUtils.h | 9 + bolt/include/bolt/Core/MCPlusBuilder.h| 14 + bolt/lib/Core/MCInstUtils.cpp | 20 + bolt/lib/Passes/PAuthGadgetScanner.cpp| 10 + .../Target/AArch64/AArch64MCPlusBuilder.cpp | 73 ++ .../AArch64/gs-pauth-jump-table.s | 703 ++ 6 files changed, 829 insertions(+) create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-jump-table.s diff --git a/bolt/include/bolt/Core/MCInstUtils.h b/bolt/include/bolt/Core/MCInstUtils.h index 50b7d56470c99..33d36cccbcfff 100644 --- a/bolt/include/bolt/Core/MCInstUtils.h +++ b/bolt/include/bolt/Core/MCInstUtils.h @@ -154,6 +154,15 @@ class MCInstReference { return nullptr; } + /// Returns the only preceding instruction, or std::nullopt if multiple or no + /// predecessors are possible. + /// + /// If CFG information is available, basic block boundary can be crossed, + /// provided there is exactly one predecessor. If CFG is not available, the + /// preceding instruction in the offset order is returned, unless this is the + /// first instruction of the function. + std::optional getSinglePredecessor(); + raw_ostream &print(raw_ostream &OS) const; }; diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h index c31c9984ed43e..b6f70fc831fca 100644 --- a/bolt/include/bolt/Core/MCPlusBuilder.h +++ b/bolt/include/bolt/Core/MCPlusBuilder.h @@ -14,6 +14,7 @@ #ifndef BOLT_CORE_MCPLUSBUILDER_H #define BOLT_CORE_MCPLUSBUILDER_H +#include "bolt/Core/MCInstUtils.h" #include "bolt/Core/MCPlus.h" #include "bolt/Core/Relocation.h" #include "llvm/ADT/ArrayRef.h" @@ -700,6 +701,19 @@ class MCPlusBuilder { return std::nullopt; } + /// Tests if BranchInst corresponds to an instruction sequence which is known + /// to be a safe dispatch via jump table. + /// + /// The target can decide which instruction sequences to consider "safe" from + /// the Pointer Authentication point of view, such as any jump table dispatch + /// sequence without function calls inside, any sequence which is contiguous, + /// or only some specific well-known sequences. + virtual bool + isSafeJumpTableBranchForPtrAuth(MCInstReference BranchInst) const { +llvm_unreachable("not implemented"); +return false; + } + virtual bool isTerminator(const MCInst &Inst) const; virtual bool isNoop(const MCInst &Inst) const { diff --git a/bolt/lib/Core/MCInstUtils.cpp b/bolt/lib/Core/MCInstUtils.cpp index 40f6edd59135c..b7c6d898988af 100644 --- a/bolt/lib/Core/MCInstUtils.cpp +++ b/bolt/lib/Core/MCInstUtils.cpp @@ -55,3 +55,23 @@ raw_ostream &MCInstReference::print(raw_ostream &OS) const { OS << ">"; return OS; } + +std::optional MCInstReference::getSinglePredecessor() { + if (const RefInBB *Ref = tryGetRefInBB()) { +if (Ref->It != Ref->BB->begin()) + return MCInstReference(Ref->BB, &*std::prev(Ref->It)); + +if (Ref->BB->pred_size() != 1) + return std::nullopt; + +BinaryBasicBlock *PredBB = *Ref->BB->pred_begin(); +assert(!PredBB->empty() && "Empty basic blocks are not supported yet"); +return MCInstReference(PredBB, &*PredBB->rbegin()); + } + + const RefInBF &Ref = getRefInBF(); + if (Ref.It == Ref.BF->instrs().begin()) +return std::nullopt; + + return MCInstReference(Ref.BF, std::prev(Ref.It)); +} diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index b9fdf6d655d4c..d218ff622b75c 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -1363,6 +1363,11 @@ shouldReportUnsafeTailCall(const BinaryContext &BC, const BinaryFunction &BF, return std::nullopt; } + if (BC.MIB->isSafeJumpTableBranchForPtrAuth(Inst)) { +
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: account for BRK when searching for auth oracles (PR #137975)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/137975 >From 150ff318699edf41906df90ad2729a2acf5d3833 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Wed, 30 Apr 2025 16:08:10 +0300 Subject: [PATCH] [BOLT] Gadget scanner: account for BRK when searching for auth oracles An authenticated pointer can be explicitly checked by the compiler via a sequence of instructions that executes BRK on failure. It is important to recognize such BRK instruction as checking every register (as it is expected to immediately trigger an abnormal program termination) to prevent false positive reports about authentication oracles: autia x2, x3 autia x0, x1 ; neither x0 nor x2 are checked at this point eor x16, x0, x0, lsl #1 tbz x16, #62, on_success ; marks x0 as checked ; end of BB: for x2 to be checked here, it must be checked in both ; successor basic blocks on_failure: brk 0xc470 on_success: ; x2 is checked ldr x1, [x2] ; marks x2 as checked --- bolt/include/bolt/Core/MCPlusBuilder.h| 14 ++ bolt/lib/Passes/PAuthGadgetScanner.cpp| 13 +- .../Target/AArch64/AArch64MCPlusBuilder.cpp | 24 -- .../AArch64/gs-pauth-address-checks.s | 44 +-- .../AArch64/gs-pauth-authentication-oracles.s | 9 ++-- .../AArch64/gs-pauth-signing-oracles.s| 6 +-- 6 files changed, 75 insertions(+), 35 deletions(-) diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h index 804100db80793..c31c9984ed43e 100644 --- a/bolt/include/bolt/Core/MCPlusBuilder.h +++ b/bolt/include/bolt/Core/MCPlusBuilder.h @@ -707,6 +707,20 @@ class MCPlusBuilder { return false; } + /// Returns true if Inst is a trap instruction. + /// + /// Tests if Inst is an instruction that immediately causes an abnormal + /// program termination, for example when a security violation is detected + /// by a compiler-inserted check. + /// + /// @note An implementation of this method should likely return false for + /// calls to library functions like abort(), as it is possible that the + /// execution state is partially attacker-controlled at this point. + virtual bool isTrap(const MCInst &Inst) const { +llvm_unreachable("not implemented"); +return false; + } + virtual bool isBreakpoint(const MCInst &Inst) const { llvm_unreachable("not implemented"); return false; diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 0a3948e2e278e..16fd24745bc53 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -1078,6 +1078,15 @@ class DstSafetyAnalysis { dbgs() << ")\n"; }); +// If this instruction terminates the program immediately, no +// authentication oracles are possible past this point. +if (BC.MIB->isTrap(Point)) { + LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); }); + DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); + Next.CannotEscapeUnchecked.set(); + return Next; +} + // If this instruction is reachable by the analysis, a non-empty state will // be propagated to it sooner or later. Until then, skip computeNext(). if (Cur.empty()) { @@ -1185,8 +1194,8 @@ class DataflowDstSafetyAnalysis // // A basic block without any successors, on the other hand, can be // pessimistically initialized to everything-is-unsafe: this will naturally -// handle both return and tail call instructions and is harmless for -// internal indirect branch instructions (such as computed gotos). +// handle return, trap and tail call instructions. At the same time, it is +// harmless for internal indirect branch instructions, like computed gotos. if (BB.succ_empty()) return createUnsafeState(); diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp index 612c1304efd60..68820f86fc47f 100644 --- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp +++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp @@ -383,10 +383,9 @@ class AArch64MCPlusBuilder : public MCPlusBuilder { // the list of successors of this basic block as appropriate. // Any of the above code sequences assume the fall-through basic block -// is a dead-end BRK instruction (any immediate operand is accepted). +// is a dead-end trap instruction. const BinaryBasicBlock *BreakBB = BB.getFallthrough(); -if (!BreakBB || BreakBB->empty() || -BreakBB->front().getOpcode() != AArch64::BRK) +if (!BreakBB || BreakBB->empty() || !isTrap(BreakBB->front())) return std::nullopt; // Iterate over the instructions of BB in reverse order, matching opcodes @@ -1748,6 +1747,25 @@ class AArch64MCPlusBuilder : public MCPlusBuilder { Inst.addOperand(MCOperand::createImm(0)); }
[llvm-branch-commits] [mlir] [MLIR][AArch64] Add integration test for lowering of `vector.contract` to Neon FEAT_I8MM (PR #144699)
@@ -0,0 +1,336 @@ +// REQUIRES: arm-emulator + +// DEFINE: %{compile} = mlir-opt %s \ +// DEFINE: --convert-vector-to-scf --convert-scf-to-cf --convert-vector-to-llvm='enable-arm-neon enable-arm-i8mm' \ +// DEFINE: --expand-strided-metadata --convert-to-llvm --finalize-memref-to-llvm \ +// DEFINE: --lower-affine --convert-arith-to-llvm --reconcile-unrealized-casts \ +// DEFINE: -o %t + +// DEFINE: %{entry_point} = main + +// DEFINE: %{run} = %mcr_aarch64_cmd %t -e %{entry_point} -entry-point-result=void --march=aarch64 --mattr="+neon,+i8mm" \ +// DEFINE: -shared-libs=%mlir_runner_utils,%mlir_c_runner_utils,%native_mlir_arm_runner_utils + +// RUN: rm -f %t && %{compile} && FileCheck %s --input-file=%t -check-prefix CHECK-IR && %{run} | FileCheck %s + +#packed_maps = [ + affine_map<(m, n, k) -> (m, k)>, + affine_map<(m, n, k) -> (n, k)>, + affine_map<(m, n, k) -> (m, n)> +] + +// +// Test the lowering of `vector.contract` using the `LowerContractionToNeonI8MMPattern` +// +// The operation that the `vector.contract` in this test performs is matrix +// multiplication with accumulate +// OUT = ACC + LHS * RHS +// of two 8-bit integer matrices LHS and RHS, and a 32-bit integer matrix ACC +// into a 32-bit integer matrix OUT. The LHS and RHS can be sign- or zero- extended, +// this test covers all the possible variants. +// +// Tested are calculations as well as that the relevant `ArmNeon` dialect +// operations ('arm_neon.smmla`, arm_neon.ummla`, etc) are emitted. +// +// That pattern above handles (therefore this test prepares) input/output vectors with +// specific shapes: +// * LHS: vector +// * RHS: vector +// * ACC, OUT: vector +// where the M and N are even and K is divisible by 8. +// Note that the RHS is transposed. +// This data layout makes it efficient to load data into SIMD +// registers in the layout expected by FEAT_I8MM instructions. +// Such a `vector.contract` is representative of the code we aim to generate +// by vectorisation of `linalg.mmt4d`. +// +// In this specific test we use M == 4, N == 4, and K == 8. momchil-velikov wrote: Ah, yes it is. Comment fixed. https://github.com/llvm/llvm-project/pull/144699 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: optionally assume auth traps on failure (PR #139778)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/139778 >From 6ba3d538efb8a31e38ffe96c7b4229d27ace93ae Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 13 May 2025 19:50:41 +0300 Subject: [PATCH] [BOLT] Gadget scanner: optionally assume auth traps on failure On AArch64 it is possible for an auth instruction to either return an invalid address value on failure (without FEAT_FPAC) or generate an error (with FEAT_FPAC). It thus may be possible to never emit explicit pointer checks, if the target CPU is known to support FEAT_FPAC. This commit implements an --auth-traps-on-failure command line option, which essentially makes "safe-to-dereference" and "trusted" register properties identical and disables scanning for authentication oracles completely. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 112 +++ .../binary-analysis/AArch64/cmdline-args.test | 1 + .../AArch64/gs-pauth-authentication-oracles.s | 6 +- .../binary-analysis/AArch64/gs-pauth-calls.s | 5 +- .../AArch64/gs-pauth-debug-output.s | 177 ++--- .../AArch64/gs-pauth-jump-table.s | 6 +- .../AArch64/gs-pauth-signing-oracles.s| 54 ++--- .../AArch64/gs-pauth-tail-calls.s | 184 +- 8 files changed, 318 insertions(+), 227 deletions(-) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index d218ff622b75c..b540a8d0b7ee7 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -14,6 +14,7 @@ #include "bolt/Passes/PAuthGadgetScanner.h" #include "bolt/Core/ParallelUtilities.h" #include "bolt/Passes/DataflowAnalysis.h" +#include "bolt/Utils/CommandLineOpts.h" #include "llvm/ADT/STLExtras.h" #include "llvm/ADT/SmallSet.h" #include "llvm/MC/MCInst.h" @@ -26,6 +27,11 @@ namespace llvm { namespace bolt { namespace PAuthGadgetScanner { +static cl::opt AuthTrapsOnFailure( +"auth-traps-on-failure", +cl::desc("Assume authentication instructions always trap on failure"), +cl::cat(opts::BinaryAnalysisCategory)); + [[maybe_unused]] static void traceInst(const BinaryContext &BC, StringRef Label, const MCInst &MI) { dbgs() << " " << Label << ": "; @@ -364,6 +370,34 @@ class SrcSafetyAnalysis { return Clobbered; } + std::optional getRegMadeTrustedByChecking(const MCInst &Inst, + SrcState Cur) const { +// This functions cannot return multiple registers. This is never the case +// on AArch64. +std::optional RegCheckedByInst = +BC.MIB->getAuthCheckedReg(Inst, /*MayOverwrite=*/false); +if (RegCheckedByInst && Cur.SafeToDerefRegs[*RegCheckedByInst]) + return *RegCheckedByInst; + +auto It = CheckerSequenceInfo.find(&Inst); +if (It == CheckerSequenceInfo.end()) + return std::nullopt; + +MCPhysReg RegCheckedBySequence = It->second.first; +const MCInst *FirstCheckerInst = It->second.second; + +// FirstCheckerInst should belong to the same basic block (see the +// assertion in DataflowSrcSafetyAnalysis::run()), meaning it was +// deterministically processed a few steps before this instruction. +const SrcState &StateBeforeChecker = getStateBefore(*FirstCheckerInst); + +// The sequence checks the register, but it should be authenticated before. +if (!StateBeforeChecker.SafeToDerefRegs[RegCheckedBySequence]) + return std::nullopt; + +return RegCheckedBySequence; + } + // Returns all registers that can be treated as if they are written by an // authentication instruction. SmallVector getRegsMadeSafeToDeref(const MCInst &Point, @@ -386,18 +420,38 @@ class SrcSafetyAnalysis { Regs.push_back(DstAndSrc->first); } +// Make sure explicit checker sequence keeps register safe-to-dereference +// when the register would be clobbered according to the regular rules: +// +//; LR is safe to dereference here +//mov x16, x30 ; start of the sequence, LR is s-t-d right before +//xpaclri ; clobbers LR, LR is not safe anymore +//cmp x30, x16 +//b.eq 1f; end of the sequence: LR is marked as trusted +//brk 0x1234 +// 1: +//; at this point LR would be marked as trusted, +//; but not safe-to-dereference +// +// or even just +// +//; X1 is safe to dereference here +//ldr x0, [x1, #8]! +//; X1 is trusted here, but it was clobbered due to address write-back +if (auto CheckedReg = getRegMadeTrustedByChecking(Point, Cur)) + Regs.push_back(*CheckedReg); + return Regs; } // Returns all registers made trusted by this instruction. SmallVector getRegsMadeTrusted(const MCInst &Point, const SrcState &Cur) const { +assert(!AuthTrapsOnFailure &&
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: prevent false positives due to jump tables (PR #138884)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/138884 >From a5d94c6935e2ab9eb60258ca3e38b4979cd40d0d Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 6 May 2025 11:31:03 +0300 Subject: [PATCH 1/2] [BOLT] Gadget scanner: prevent false positives due to jump tables As part of PAuth hardening, AArch64 LLVM backend can use a special BR_JumpTable pseudo (enabled by -faarch64-jump-table-hardening Clang option) which is expanded in the AsmPrinter into a contiguous sequence without unsafe instructions in the middle. This commit adds another target-specific callback to MCPlusBuilder to make it possible to inhibit false positives for known-safe jump table dispatch sequences. Without special handling, the branch instruction is likely to be reported as a non-protected call (as its destination is not produced by an auth instruction, PC-relative address materialization, etc.) and possibly as a tail call being performed with unsafe link register (as the detection whether the branch instruction is a tail call is an heuristic). For now, only the specific instruction sequence used by the AArch64 LLVM backend is matched. --- bolt/include/bolt/Core/MCInstUtils.h | 9 + bolt/include/bolt/Core/MCPlusBuilder.h| 14 + bolt/lib/Core/MCInstUtils.cpp | 20 + bolt/lib/Passes/PAuthGadgetScanner.cpp| 10 + .../Target/AArch64/AArch64MCPlusBuilder.cpp | 73 ++ .../AArch64/gs-pauth-jump-table.s | 703 ++ 6 files changed, 829 insertions(+) create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-jump-table.s diff --git a/bolt/include/bolt/Core/MCInstUtils.h b/bolt/include/bolt/Core/MCInstUtils.h index 50b7d56470c99..33d36cccbcfff 100644 --- a/bolt/include/bolt/Core/MCInstUtils.h +++ b/bolt/include/bolt/Core/MCInstUtils.h @@ -154,6 +154,15 @@ class MCInstReference { return nullptr; } + /// Returns the only preceding instruction, or std::nullopt if multiple or no + /// predecessors are possible. + /// + /// If CFG information is available, basic block boundary can be crossed, + /// provided there is exactly one predecessor. If CFG is not available, the + /// preceding instruction in the offset order is returned, unless this is the + /// first instruction of the function. + std::optional getSinglePredecessor(); + raw_ostream &print(raw_ostream &OS) const; }; diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h index c31c9984ed43e..b6f70fc831fca 100644 --- a/bolt/include/bolt/Core/MCPlusBuilder.h +++ b/bolt/include/bolt/Core/MCPlusBuilder.h @@ -14,6 +14,7 @@ #ifndef BOLT_CORE_MCPLUSBUILDER_H #define BOLT_CORE_MCPLUSBUILDER_H +#include "bolt/Core/MCInstUtils.h" #include "bolt/Core/MCPlus.h" #include "bolt/Core/Relocation.h" #include "llvm/ADT/ArrayRef.h" @@ -700,6 +701,19 @@ class MCPlusBuilder { return std::nullopt; } + /// Tests if BranchInst corresponds to an instruction sequence which is known + /// to be a safe dispatch via jump table. + /// + /// The target can decide which instruction sequences to consider "safe" from + /// the Pointer Authentication point of view, such as any jump table dispatch + /// sequence without function calls inside, any sequence which is contiguous, + /// or only some specific well-known sequences. + virtual bool + isSafeJumpTableBranchForPtrAuth(MCInstReference BranchInst) const { +llvm_unreachable("not implemented"); +return false; + } + virtual bool isTerminator(const MCInst &Inst) const; virtual bool isNoop(const MCInst &Inst) const { diff --git a/bolt/lib/Core/MCInstUtils.cpp b/bolt/lib/Core/MCInstUtils.cpp index 40f6edd59135c..b7c6d898988af 100644 --- a/bolt/lib/Core/MCInstUtils.cpp +++ b/bolt/lib/Core/MCInstUtils.cpp @@ -55,3 +55,23 @@ raw_ostream &MCInstReference::print(raw_ostream &OS) const { OS << ">"; return OS; } + +std::optional MCInstReference::getSinglePredecessor() { + if (const RefInBB *Ref = tryGetRefInBB()) { +if (Ref->It != Ref->BB->begin()) + return MCInstReference(Ref->BB, &*std::prev(Ref->It)); + +if (Ref->BB->pred_size() != 1) + return std::nullopt; + +BinaryBasicBlock *PredBB = *Ref->BB->pred_begin(); +assert(!PredBB->empty() && "Empty basic blocks are not supported yet"); +return MCInstReference(PredBB, &*PredBB->rbegin()); + } + + const RefInBF &Ref = getRefInBF(); + if (Ref.It == Ref.BF->instrs().begin()) +return std::nullopt; + + return MCInstReference(Ref.BF, std::prev(Ref.It)); +} diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index b9fdf6d655d4c..d218ff622b75c 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -1363,6 +1363,11 @@ shouldReportUnsafeTailCall(const BinaryContext &BC, const BinaryFunction &BF, return std::nullopt; } + if (BC.MIB->isSafeJumpTableBranchForPtrAuth(Inst)) { +
[llvm-branch-commits] [llvm] [BOLT] Introduce helpers to match `MCInst`s one at a time (NFC) (PR #138883)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/138883 >From fb028a66bed548835cb6fb34a59bfb43eba1f454 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Wed, 7 May 2025 16:42:00 +0300 Subject: [PATCH] [BOLT] Introduce helpers to match `MCInst`s one at a time (NFC) Introduce matchInst helper function to capture and/or match the operands of MCInst. Unlike the existing `MCPlusBuilder::MCInstMatcher` machinery, matchInst is intended for the use cases when precise control over the instruction order is required. For example, when validating PtrAuth hardening, all registers are usually considered unsafe after a function call, even though callee-saved registers should preserve their old values *under normal operation*. --- bolt/include/bolt/Core/MCInstUtils.h | 128 ++ .../Target/AArch64/AArch64MCPlusBuilder.cpp | 90 +--- 2 files changed, 162 insertions(+), 56 deletions(-) diff --git a/bolt/include/bolt/Core/MCInstUtils.h b/bolt/include/bolt/Core/MCInstUtils.h index 69bf5e6159b74..50b7d56470c99 100644 --- a/bolt/include/bolt/Core/MCInstUtils.h +++ b/bolt/include/bolt/Core/MCInstUtils.h @@ -162,6 +162,134 @@ static inline raw_ostream &operator<<(raw_ostream &OS, return Ref.print(OS); } +/// Instruction-matching helpers operating on a single instruction at a time. +/// +/// Unlike MCPlusBuilder::MCInstMatcher, this matchInst() function focuses on +/// the cases where a precise control over the instruction order is important: +/// +/// // Bring the short names into the local scope: +/// using namespace MCInstMatcher; +/// // Declare the registers to capture: +/// Reg Xn, Xm; +/// // Capture the 0th and 1st operands, match the 2nd operand against the +/// // just captured Xm register, match the 3rd operand against literal 0: +/// if (!matchInst(MaybeAdd, AArch64::ADDXrs, Xm, Xn, Xm, Imm(0)) +/// return AArch64::NoRegister; +/// // Match the 0th operand against Xm: +/// if (!matchInst(MaybeBr, AArch64::BR, Xm)) +/// return AArch64::NoRegister; +/// // Return the matched register: +/// return Xm.get(); +namespace MCInstMatcher { + +// The base class to match an operand of type T. +// +// The subclasses of OpMatcher are intended to be allocated on the stack and +// to only be used by passing them to matchInst() and by calling their get() +// function, thus the peculiar `mutable` specifiers: to make the calling code +// compact and readable, the templated matchInst() function has to accept both +// long-lived Imm/Reg wrappers declared as local variables (intended to capture +// the first operand's value and match the subsequent operands, whether inside +// a single instruction or across multiple instructions), as well as temporary +// wrappers around literal values to match, f.e. Imm(42) or Reg(AArch64::XZR). +template class OpMatcher { + mutable std::optional Value; + mutable std::optional SavedValue; + + // Remember/restore the last Value - to be called by matchInst. + void remember() const { SavedValue = Value; } + void restore() const { Value = SavedValue; } + + template + friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...); + +protected: + OpMatcher(std::optional ValueToMatch) : Value(ValueToMatch) {} + + bool matchValue(T OpValue) const { +// Check that OpValue does not contradict the existing Value. +bool MatchResult = !Value || *Value == OpValue; +// If MatchResult is false, all matchers will be reset before returning from +// matchInst, including this one, thus no need to assign conditionally. +Value = OpValue; + +return MatchResult; + } + +public: + /// Returns the captured value. + T get() const { +assert(Value.has_value()); +return *Value; + } +}; + +class Reg : public OpMatcher { + bool matches(const MCOperand &Op) const { +if (!Op.isReg()) + return false; + +return matchValue(Op.getReg()); + } + + template + friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...); + +public: + Reg(std::optional RegToMatch = std::nullopt) + : OpMatcher(RegToMatch) {} +}; + +class Imm : public OpMatcher { + bool matches(const MCOperand &Op) const { +if (!Op.isImm()) + return false; + +return matchValue(Op.getImm()); + } + + template + friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...); + +public: + Imm(std::optional ImmToMatch = std::nullopt) + : OpMatcher(ImmToMatch) {} +}; + +/// Tries to match Inst and updates Ops on success. +/// +/// If Inst has the specified Opcode and its operand list prefix matches Ops, +/// this function returns true and updates Ops, otherwise false is returned and +/// values of Ops are kept as before matchInst was called. +/// +/// Please note that while Ops are technically passed by a const reference to +/// make invocations like `matchInst(MI, Opcode, Imm(42))` possible, all their +/// fields are marked mut
[llvm-branch-commits] [llvm] [BOLT] Introduce helpers to match `MCInst`s one at a time (NFC) (PR #138883)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/138883 >From fb028a66bed548835cb6fb34a59bfb43eba1f454 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Wed, 7 May 2025 16:42:00 +0300 Subject: [PATCH] [BOLT] Introduce helpers to match `MCInst`s one at a time (NFC) Introduce matchInst helper function to capture and/or match the operands of MCInst. Unlike the existing `MCPlusBuilder::MCInstMatcher` machinery, matchInst is intended for the use cases when precise control over the instruction order is required. For example, when validating PtrAuth hardening, all registers are usually considered unsafe after a function call, even though callee-saved registers should preserve their old values *under normal operation*. --- bolt/include/bolt/Core/MCInstUtils.h | 128 ++ .../Target/AArch64/AArch64MCPlusBuilder.cpp | 90 +--- 2 files changed, 162 insertions(+), 56 deletions(-) diff --git a/bolt/include/bolt/Core/MCInstUtils.h b/bolt/include/bolt/Core/MCInstUtils.h index 69bf5e6159b74..50b7d56470c99 100644 --- a/bolt/include/bolt/Core/MCInstUtils.h +++ b/bolt/include/bolt/Core/MCInstUtils.h @@ -162,6 +162,134 @@ static inline raw_ostream &operator<<(raw_ostream &OS, return Ref.print(OS); } +/// Instruction-matching helpers operating on a single instruction at a time. +/// +/// Unlike MCPlusBuilder::MCInstMatcher, this matchInst() function focuses on +/// the cases where a precise control over the instruction order is important: +/// +/// // Bring the short names into the local scope: +/// using namespace MCInstMatcher; +/// // Declare the registers to capture: +/// Reg Xn, Xm; +/// // Capture the 0th and 1st operands, match the 2nd operand against the +/// // just captured Xm register, match the 3rd operand against literal 0: +/// if (!matchInst(MaybeAdd, AArch64::ADDXrs, Xm, Xn, Xm, Imm(0)) +/// return AArch64::NoRegister; +/// // Match the 0th operand against Xm: +/// if (!matchInst(MaybeBr, AArch64::BR, Xm)) +/// return AArch64::NoRegister; +/// // Return the matched register: +/// return Xm.get(); +namespace MCInstMatcher { + +// The base class to match an operand of type T. +// +// The subclasses of OpMatcher are intended to be allocated on the stack and +// to only be used by passing them to matchInst() and by calling their get() +// function, thus the peculiar `mutable` specifiers: to make the calling code +// compact and readable, the templated matchInst() function has to accept both +// long-lived Imm/Reg wrappers declared as local variables (intended to capture +// the first operand's value and match the subsequent operands, whether inside +// a single instruction or across multiple instructions), as well as temporary +// wrappers around literal values to match, f.e. Imm(42) or Reg(AArch64::XZR). +template class OpMatcher { + mutable std::optional Value; + mutable std::optional SavedValue; + + // Remember/restore the last Value - to be called by matchInst. + void remember() const { SavedValue = Value; } + void restore() const { Value = SavedValue; } + + template + friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...); + +protected: + OpMatcher(std::optional ValueToMatch) : Value(ValueToMatch) {} + + bool matchValue(T OpValue) const { +// Check that OpValue does not contradict the existing Value. +bool MatchResult = !Value || *Value == OpValue; +// If MatchResult is false, all matchers will be reset before returning from +// matchInst, including this one, thus no need to assign conditionally. +Value = OpValue; + +return MatchResult; + } + +public: + /// Returns the captured value. + T get() const { +assert(Value.has_value()); +return *Value; + } +}; + +class Reg : public OpMatcher { + bool matches(const MCOperand &Op) const { +if (!Op.isReg()) + return false; + +return matchValue(Op.getReg()); + } + + template + friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...); + +public: + Reg(std::optional RegToMatch = std::nullopt) + : OpMatcher(RegToMatch) {} +}; + +class Imm : public OpMatcher { + bool matches(const MCOperand &Op) const { +if (!Op.isImm()) + return false; + +return matchValue(Op.getImm()); + } + + template + friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...); + +public: + Imm(std::optional ImmToMatch = std::nullopt) + : OpMatcher(ImmToMatch) {} +}; + +/// Tries to match Inst and updates Ops on success. +/// +/// If Inst has the specified Opcode and its operand list prefix matches Ops, +/// this function returns true and updates Ops, otherwise false is returned and +/// values of Ops are kept as before matchInst was called. +/// +/// Please note that while Ops are technically passed by a const reference to +/// make invocations like `matchInst(MI, Opcode, Imm(42))` possible, all their +/// fields are marked mut
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: make use of C++17 features and LLVM helpers (PR #141665)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/141665 >From bac57af9732865c18057a6d034925ec8227f1a78 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 27 May 2025 21:06:03 +0300 Subject: [PATCH] [BOLT] Gadget scanner: make use of C++17 features and LLVM helpers Perform trivial syntactical cleanups: * make use of structured binding declarations * use LLVM utility functions when appropriate * omit braces around single expression inside single-line LLVM_DEBUG() This patch is NFC aside from minor debug output changes. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 67 +-- .../AArch64/gs-pauth-debug-output.s | 14 ++-- 2 files changed, 38 insertions(+), 43 deletions(-) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index b540a8d0b7ee7..94cbea5b6344a 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -88,8 +88,8 @@ class TrackedRegisters { TrackedRegisters(ArrayRef RegsToTrack) : Registers(RegsToTrack), RegToIndexMapping(getMappingSize(RegsToTrack), NoIndex) { -for (unsigned I = 0; I < RegsToTrack.size(); ++I) - RegToIndexMapping[RegsToTrack[I]] = I; +for (auto [MappedIndex, Reg] : llvm::enumerate(RegsToTrack)) + RegToIndexMapping[Reg] = MappedIndex; } ArrayRef getRegisters() const { return Registers; } @@ -203,9 +203,9 @@ struct SrcState { SafeToDerefRegs &= StateIn.SafeToDerefRegs; TrustedRegs &= StateIn.TrustedRegs; -for (unsigned I = 0; I < LastInstWritingReg.size(); ++I) - for (const MCInst *J : StateIn.LastInstWritingReg[I]) -LastInstWritingReg[I].insert(J); +for (auto [ThisSet, OtherSet] : + llvm::zip_equal(LastInstWritingReg, StateIn.LastInstWritingReg)) + ThisSet.insert_range(OtherSet); return *this; } @@ -224,11 +224,9 @@ struct SrcState { static void printInstsShort(raw_ostream &OS, ArrayRef Insts) { OS << "Insts: "; - for (unsigned I = 0; I < Insts.size(); ++I) { -auto &Set = Insts[I]; + for (auto [I, PtrSet] : llvm::enumerate(Insts)) { OS << "[" << I << "]("; -for (const MCInst *MCInstP : Set) - OS << MCInstP << " "; +interleave(PtrSet, OS, " "); OS << ")"; } } @@ -416,8 +414,9 @@ class SrcSafetyAnalysis { // ... an address can be updated in a safe manner, producing the result // which is as trusted as the input address. if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) { - if (Cur.SafeToDerefRegs[DstAndSrc->second]) -Regs.push_back(DstAndSrc->first); + auto [DstReg, SrcReg] = *DstAndSrc; + if (Cur.SafeToDerefRegs[SrcReg]) +Regs.push_back(DstReg); } // Make sure explicit checker sequence keeps register safe-to-dereference @@ -469,8 +468,9 @@ class SrcSafetyAnalysis { // ... an address can be updated in a safe manner, producing the result // which is as trusted as the input address. if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) { - if (Cur.TrustedRegs[DstAndSrc->second]) -Regs.push_back(DstAndSrc->first); + auto [DstReg, SrcReg] = *DstAndSrc; + if (Cur.TrustedRegs[SrcReg]) +Regs.push_back(DstReg); } return Regs; @@ -868,9 +868,9 @@ struct DstState { return (*this = StateIn); CannotEscapeUnchecked &= StateIn.CannotEscapeUnchecked; -for (unsigned I = 0; I < FirstInstLeakingReg.size(); ++I) - for (const MCInst *J : StateIn.FirstInstLeakingReg[I]) -FirstInstLeakingReg[I].insert(J); +for (auto [ThisSet, OtherSet] : + llvm::zip_equal(FirstInstLeakingReg, StateIn.FirstInstLeakingReg)) + ThisSet.insert_range(OtherSet); return *this; } @@ -1036,8 +1036,7 @@ class DstSafetyAnalysis { // ... an address can be updated in a safe manner, or if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Inst)) { - MCPhysReg DstReg, SrcReg; - std::tie(DstReg, SrcReg) = *DstAndSrc; + auto [DstReg, SrcReg] = *DstAndSrc; // Note that *all* registers containing the derived values must be safe, // both source and destination ones. No temporaries are supported at now. if (Cur.CannotEscapeUnchecked[SrcReg] && @@ -1077,7 +1076,7 @@ class DstSafetyAnalysis { // If this instruction terminates the program immediately, no // authentication oracles are possible past this point. if (BC.MIB->isTrap(Point)) { - LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); }); + LLVM_DEBUG(traceInst(BC, "Trap instruction found", Point)); DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); Next.CannotEscapeUnchecked.set(); return Next; @@ -1255,7 +1254,7 @@ class CFGUnawareDstSafetyAnalysis : public DstSafetyAnalysis, // starting to analyze Inst.
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: account for BRK when searching for auth oracles (PR #137975)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/137975 >From 150ff318699edf41906df90ad2729a2acf5d3833 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Wed, 30 Apr 2025 16:08:10 +0300 Subject: [PATCH] [BOLT] Gadget scanner: account for BRK when searching for auth oracles An authenticated pointer can be explicitly checked by the compiler via a sequence of instructions that executes BRK on failure. It is important to recognize such BRK instruction as checking every register (as it is expected to immediately trigger an abnormal program termination) to prevent false positive reports about authentication oracles: autia x2, x3 autia x0, x1 ; neither x0 nor x2 are checked at this point eor x16, x0, x0, lsl #1 tbz x16, #62, on_success ; marks x0 as checked ; end of BB: for x2 to be checked here, it must be checked in both ; successor basic blocks on_failure: brk 0xc470 on_success: ; x2 is checked ldr x1, [x2] ; marks x2 as checked --- bolt/include/bolt/Core/MCPlusBuilder.h| 14 ++ bolt/lib/Passes/PAuthGadgetScanner.cpp| 13 +- .../Target/AArch64/AArch64MCPlusBuilder.cpp | 24 -- .../AArch64/gs-pauth-address-checks.s | 44 +-- .../AArch64/gs-pauth-authentication-oracles.s | 9 ++-- .../AArch64/gs-pauth-signing-oracles.s| 6 +-- 6 files changed, 75 insertions(+), 35 deletions(-) diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h index 804100db80793..c31c9984ed43e 100644 --- a/bolt/include/bolt/Core/MCPlusBuilder.h +++ b/bolt/include/bolt/Core/MCPlusBuilder.h @@ -707,6 +707,20 @@ class MCPlusBuilder { return false; } + /// Returns true if Inst is a trap instruction. + /// + /// Tests if Inst is an instruction that immediately causes an abnormal + /// program termination, for example when a security violation is detected + /// by a compiler-inserted check. + /// + /// @note An implementation of this method should likely return false for + /// calls to library functions like abort(), as it is possible that the + /// execution state is partially attacker-controlled at this point. + virtual bool isTrap(const MCInst &Inst) const { +llvm_unreachable("not implemented"); +return false; + } + virtual bool isBreakpoint(const MCInst &Inst) const { llvm_unreachable("not implemented"); return false; diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 0a3948e2e278e..16fd24745bc53 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -1078,6 +1078,15 @@ class DstSafetyAnalysis { dbgs() << ")\n"; }); +// If this instruction terminates the program immediately, no +// authentication oracles are possible past this point. +if (BC.MIB->isTrap(Point)) { + LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); }); + DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); + Next.CannotEscapeUnchecked.set(); + return Next; +} + // If this instruction is reachable by the analysis, a non-empty state will // be propagated to it sooner or later. Until then, skip computeNext(). if (Cur.empty()) { @@ -1185,8 +1194,8 @@ class DataflowDstSafetyAnalysis // // A basic block without any successors, on the other hand, can be // pessimistically initialized to everything-is-unsafe: this will naturally -// handle both return and tail call instructions and is harmless for -// internal indirect branch instructions (such as computed gotos). +// handle return, trap and tail call instructions. At the same time, it is +// harmless for internal indirect branch instructions, like computed gotos. if (BB.succ_empty()) return createUnsafeState(); diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp index 612c1304efd60..68820f86fc47f 100644 --- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp +++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp @@ -383,10 +383,9 @@ class AArch64MCPlusBuilder : public MCPlusBuilder { // the list of successors of this basic block as appropriate. // Any of the above code sequences assume the fall-through basic block -// is a dead-end BRK instruction (any immediate operand is accepted). +// is a dead-end trap instruction. const BinaryBasicBlock *BreakBB = BB.getFallthrough(); -if (!BreakBB || BreakBB->empty() || -BreakBB->front().getOpcode() != AArch64::BRK) +if (!BreakBB || BreakBB->empty() || !isTrap(BreakBB->front())) return std::nullopt; // Iterate over the instructions of BB in reverse order, matching opcodes @@ -1748,6 +1747,25 @@ class AArch64MCPlusBuilder : public MCPlusBuilder { Inst.addOperand(MCOperand::createImm(0)); }
[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)
https://github.com/MacDue updated https://github.com/llvm/llvm-project/pull/142391 >From 55bd461f342d5dcca49b2bac2f2142be9214823a Mon Sep 17 00:00:00 2001 From: Benjamin Maxwell Date: Thu, 8 May 2025 17:38:27 + Subject: [PATCH] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) This patch attempts to refactor AArch64FrameLowering to allow the size of the ZPR and PPR areas to be calculated separately. This will be used by a subsequent patch to support allocating ZPRs and PPRs to separate areas. This patch should be an NFC and is split out to make later functional changes easier to spot. --- .../Target/AArch64/AArch64FrameLowering.cpp | 310 -- .../lib/Target/AArch64/AArch64FrameLowering.h | 12 +- .../AArch64/AArch64MachineFunctionInfo.h | 49 +-- .../Target/AArch64/AArch64RegisterInfo.cpp| 7 +- 4 files changed, 249 insertions(+), 129 deletions(-) diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp index 2a9ef38a11cdc..7c2b68c87775b 100644 --- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp @@ -330,10 +330,13 @@ static int64_t getArgumentStackToRestore(MachineFunction &MF, static bool produceCompactUnwindFrame(MachineFunction &MF); static bool needsWinCFI(const MachineFunction &MF); +static StackOffset getZPRStackSize(const MachineFunction &MF); +static StackOffset getPPRStackSize(const MachineFunction &MF); static StackOffset getSVEStackSize(const MachineFunction &MF); static Register findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB, bool HasCall = false); static bool requiresSaveVG(const MachineFunction &MF); +static bool hasSVEStackSize(const MachineFunction &MF); /// Returns true if a homogeneous prolog or epilog code can be emitted /// for the size optimization. If possible, a frame helper call is injected. @@ -351,7 +354,7 @@ bool AArch64FrameLowering::homogeneousPrologEpilog( if (needsWinCFI(MF)) return false; // TODO: SVE is not supported yet. - if (getSVEStackSize(MF)) + if (hasSVEStackSize(MF)) return false; // Bail on stack adjustment needed on return for simplicity. @@ -451,10 +454,36 @@ static unsigned getFixedObjectSize(const MachineFunction &MF, } } -/// Returns the size of the entire SVE stackframe (calleesaves + spills). +static unsigned getStackHazardSize(const MachineFunction &MF) { + return MF.getSubtarget().getStreamingHazardSize(); +} + +/// Returns the size of the entire ZPR stackframe (calleesaves + spills). +static StackOffset getZPRStackSize(const MachineFunction &MF) { + const AArch64FunctionInfo *AFI = MF.getInfo(); + return StackOffset::getScalable(AFI->getStackSizeZPR()); +} + +/// Returns the size of the entire PPR stackframe (calleesaves + spills). +static StackOffset getPPRStackSize(const MachineFunction &MF) { + const AArch64FunctionInfo *AFI = MF.getInfo(); + return StackOffset::getScalable(AFI->getStackSizePPR()); +} + +/// Returns the size of the entire SVE stackframe (PPRs + ZPRs). static StackOffset getSVEStackSize(const MachineFunction &MF) { + return getZPRStackSize(MF) + getPPRStackSize(MF); +} + +static bool hasSVEStackSize(const MachineFunction &MF) { const AArch64FunctionInfo *AFI = MF.getInfo(); - return StackOffset::getScalable((int64_t)AFI->getStackSizeSVE()); + return AFI->getStackSizeZPR() > 0 || AFI->getStackSizePPR() > 0; +} + +/// Returns true if PPRs are spilled as ZPRs. +static bool arePPRsSpilledAsZPR(const MachineFunction &MF) { + return MF.getSubtarget().getRegisterInfo()->getSpillSize( + AArch64::PPRRegClass) == 16; } bool AArch64FrameLowering::canUseRedZone(const MachineFunction &MF) const { @@ -482,7 +511,7 @@ bool AArch64FrameLowering::canUseRedZone(const MachineFunction &MF) const { !Subtarget.hasSVE(); return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize || - getSVEStackSize(MF) || LowerQRegCopyThroughMem); + hasSVEStackSize(MF) || LowerQRegCopyThroughMem); } /// hasFPImpl - Return true if the specified function should have a dedicated @@ -1161,7 +1190,7 @@ bool AArch64FrameLowering::shouldCombineCSRLocalStackBump( // When there is an SVE area on the stack, always allocate the // callee-saves and spills/locals separately. - if (getSVEStackSize(MF)) + if (hasSVEStackSize(MF)) return false; return true; @@ -1605,25 +1634,19 @@ static bool isTargetWindows(const MachineFunction &MF) { return MF.getSubtarget().isTargetWindows(); } -static unsigned getStackHazardSize(const MachineFunction &MF) { - return MF.getSubtarget().getStreamingHazardSize(); -} - // Convenience function to determine whether I is an SVE callee save. -static bool IsSVECalleeSave(MachineBasicBlock::iterator I) { +static bool IsZPRCalleeSave(MachineBasicBlock::iterator I) { switch
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: prevent false positives due to jump tables (PR #138884)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/138884 >From a5d94c6935e2ab9eb60258ca3e38b4979cd40d0d Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 6 May 2025 11:31:03 +0300 Subject: [PATCH 1/2] [BOLT] Gadget scanner: prevent false positives due to jump tables As part of PAuth hardening, AArch64 LLVM backend can use a special BR_JumpTable pseudo (enabled by -faarch64-jump-table-hardening Clang option) which is expanded in the AsmPrinter into a contiguous sequence without unsafe instructions in the middle. This commit adds another target-specific callback to MCPlusBuilder to make it possible to inhibit false positives for known-safe jump table dispatch sequences. Without special handling, the branch instruction is likely to be reported as a non-protected call (as its destination is not produced by an auth instruction, PC-relative address materialization, etc.) and possibly as a tail call being performed with unsafe link register (as the detection whether the branch instruction is a tail call is an heuristic). For now, only the specific instruction sequence used by the AArch64 LLVM backend is matched. --- bolt/include/bolt/Core/MCInstUtils.h | 9 + bolt/include/bolt/Core/MCPlusBuilder.h| 14 + bolt/lib/Core/MCInstUtils.cpp | 20 + bolt/lib/Passes/PAuthGadgetScanner.cpp| 10 + .../Target/AArch64/AArch64MCPlusBuilder.cpp | 73 ++ .../AArch64/gs-pauth-jump-table.s | 703 ++ 6 files changed, 829 insertions(+) create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-jump-table.s diff --git a/bolt/include/bolt/Core/MCInstUtils.h b/bolt/include/bolt/Core/MCInstUtils.h index 50b7d56470c99..33d36cccbcfff 100644 --- a/bolt/include/bolt/Core/MCInstUtils.h +++ b/bolt/include/bolt/Core/MCInstUtils.h @@ -154,6 +154,15 @@ class MCInstReference { return nullptr; } + /// Returns the only preceding instruction, or std::nullopt if multiple or no + /// predecessors are possible. + /// + /// If CFG information is available, basic block boundary can be crossed, + /// provided there is exactly one predecessor. If CFG is not available, the + /// preceding instruction in the offset order is returned, unless this is the + /// first instruction of the function. + std::optional getSinglePredecessor(); + raw_ostream &print(raw_ostream &OS) const; }; diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h index c31c9984ed43e..b6f70fc831fca 100644 --- a/bolt/include/bolt/Core/MCPlusBuilder.h +++ b/bolt/include/bolt/Core/MCPlusBuilder.h @@ -14,6 +14,7 @@ #ifndef BOLT_CORE_MCPLUSBUILDER_H #define BOLT_CORE_MCPLUSBUILDER_H +#include "bolt/Core/MCInstUtils.h" #include "bolt/Core/MCPlus.h" #include "bolt/Core/Relocation.h" #include "llvm/ADT/ArrayRef.h" @@ -700,6 +701,19 @@ class MCPlusBuilder { return std::nullopt; } + /// Tests if BranchInst corresponds to an instruction sequence which is known + /// to be a safe dispatch via jump table. + /// + /// The target can decide which instruction sequences to consider "safe" from + /// the Pointer Authentication point of view, such as any jump table dispatch + /// sequence without function calls inside, any sequence which is contiguous, + /// or only some specific well-known sequences. + virtual bool + isSafeJumpTableBranchForPtrAuth(MCInstReference BranchInst) const { +llvm_unreachable("not implemented"); +return false; + } + virtual bool isTerminator(const MCInst &Inst) const; virtual bool isNoop(const MCInst &Inst) const { diff --git a/bolt/lib/Core/MCInstUtils.cpp b/bolt/lib/Core/MCInstUtils.cpp index 40f6edd59135c..b7c6d898988af 100644 --- a/bolt/lib/Core/MCInstUtils.cpp +++ b/bolt/lib/Core/MCInstUtils.cpp @@ -55,3 +55,23 @@ raw_ostream &MCInstReference::print(raw_ostream &OS) const { OS << ">"; return OS; } + +std::optional MCInstReference::getSinglePredecessor() { + if (const RefInBB *Ref = tryGetRefInBB()) { +if (Ref->It != Ref->BB->begin()) + return MCInstReference(Ref->BB, &*std::prev(Ref->It)); + +if (Ref->BB->pred_size() != 1) + return std::nullopt; + +BinaryBasicBlock *PredBB = *Ref->BB->pred_begin(); +assert(!PredBB->empty() && "Empty basic blocks are not supported yet"); +return MCInstReference(PredBB, &*PredBB->rbegin()); + } + + const RefInBF &Ref = getRefInBF(); + if (Ref.It == Ref.BF->instrs().begin()) +return std::nullopt; + + return MCInstReference(Ref.BF, std::prev(Ref.It)); +} diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index b9fdf6d655d4c..d218ff622b75c 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -1363,6 +1363,11 @@ shouldReportUnsafeTailCall(const BinaryContext &BC, const BinaryFunction &BF, return std::nullopt; } + if (BC.MIB->isSafeJumpTableBranchForPtrAuth(Inst)) { +
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: make use of C++17 features and LLVM helpers (PR #141665)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/141665 >From bac57af9732865c18057a6d034925ec8227f1a78 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 27 May 2025 21:06:03 +0300 Subject: [PATCH] [BOLT] Gadget scanner: make use of C++17 features and LLVM helpers Perform trivial syntactical cleanups: * make use of structured binding declarations * use LLVM utility functions when appropriate * omit braces around single expression inside single-line LLVM_DEBUG() This patch is NFC aside from minor debug output changes. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 67 +-- .../AArch64/gs-pauth-debug-output.s | 14 ++-- 2 files changed, 38 insertions(+), 43 deletions(-) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index b540a8d0b7ee7..94cbea5b6344a 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -88,8 +88,8 @@ class TrackedRegisters { TrackedRegisters(ArrayRef RegsToTrack) : Registers(RegsToTrack), RegToIndexMapping(getMappingSize(RegsToTrack), NoIndex) { -for (unsigned I = 0; I < RegsToTrack.size(); ++I) - RegToIndexMapping[RegsToTrack[I]] = I; +for (auto [MappedIndex, Reg] : llvm::enumerate(RegsToTrack)) + RegToIndexMapping[Reg] = MappedIndex; } ArrayRef getRegisters() const { return Registers; } @@ -203,9 +203,9 @@ struct SrcState { SafeToDerefRegs &= StateIn.SafeToDerefRegs; TrustedRegs &= StateIn.TrustedRegs; -for (unsigned I = 0; I < LastInstWritingReg.size(); ++I) - for (const MCInst *J : StateIn.LastInstWritingReg[I]) -LastInstWritingReg[I].insert(J); +for (auto [ThisSet, OtherSet] : + llvm::zip_equal(LastInstWritingReg, StateIn.LastInstWritingReg)) + ThisSet.insert_range(OtherSet); return *this; } @@ -224,11 +224,9 @@ struct SrcState { static void printInstsShort(raw_ostream &OS, ArrayRef Insts) { OS << "Insts: "; - for (unsigned I = 0; I < Insts.size(); ++I) { -auto &Set = Insts[I]; + for (auto [I, PtrSet] : llvm::enumerate(Insts)) { OS << "[" << I << "]("; -for (const MCInst *MCInstP : Set) - OS << MCInstP << " "; +interleave(PtrSet, OS, " "); OS << ")"; } } @@ -416,8 +414,9 @@ class SrcSafetyAnalysis { // ... an address can be updated in a safe manner, producing the result // which is as trusted as the input address. if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) { - if (Cur.SafeToDerefRegs[DstAndSrc->second]) -Regs.push_back(DstAndSrc->first); + auto [DstReg, SrcReg] = *DstAndSrc; + if (Cur.SafeToDerefRegs[SrcReg]) +Regs.push_back(DstReg); } // Make sure explicit checker sequence keeps register safe-to-dereference @@ -469,8 +468,9 @@ class SrcSafetyAnalysis { // ... an address can be updated in a safe manner, producing the result // which is as trusted as the input address. if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) { - if (Cur.TrustedRegs[DstAndSrc->second]) -Regs.push_back(DstAndSrc->first); + auto [DstReg, SrcReg] = *DstAndSrc; + if (Cur.TrustedRegs[SrcReg]) +Regs.push_back(DstReg); } return Regs; @@ -868,9 +868,9 @@ struct DstState { return (*this = StateIn); CannotEscapeUnchecked &= StateIn.CannotEscapeUnchecked; -for (unsigned I = 0; I < FirstInstLeakingReg.size(); ++I) - for (const MCInst *J : StateIn.FirstInstLeakingReg[I]) -FirstInstLeakingReg[I].insert(J); +for (auto [ThisSet, OtherSet] : + llvm::zip_equal(FirstInstLeakingReg, StateIn.FirstInstLeakingReg)) + ThisSet.insert_range(OtherSet); return *this; } @@ -1036,8 +1036,7 @@ class DstSafetyAnalysis { // ... an address can be updated in a safe manner, or if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Inst)) { - MCPhysReg DstReg, SrcReg; - std::tie(DstReg, SrcReg) = *DstAndSrc; + auto [DstReg, SrcReg] = *DstAndSrc; // Note that *all* registers containing the derived values must be safe, // both source and destination ones. No temporaries are supported at now. if (Cur.CannotEscapeUnchecked[SrcReg] && @@ -1077,7 +1076,7 @@ class DstSafetyAnalysis { // If this instruction terminates the program immediately, no // authentication oracles are possible past this point. if (BC.MIB->isTrap(Point)) { - LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); }); + LLVM_DEBUG(traceInst(BC, "Trap instruction found", Point)); DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); Next.CannotEscapeUnchecked.set(); return Next; @@ -1255,7 +1254,7 @@ class CFGUnawareDstSafetyAnalysis : public DstSafetyAnalysis, // starting to analyze Inst.
[llvm-branch-commits] [llvm] RuntimeLibcalls: Cleanup darwin exp10 case (PR #145638)
arsenm wrote: ### Merge activity * **Jun 25, 11:01 AM UTC**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/145638). https://github.com/llvm/llvm-project/pull/145638 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] RuntimeLibcalls: Cleanup darwin bzero configuration (PR #145639)
arsenm wrote: ### Merge activity * **Jun 25, 11:01 AM UTC**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/145639). https://github.com/llvm/llvm-project/pull/145639 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Introduce helpers to match `MCInst`s one at a time (NFC) (PR #138883)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/138883 >From cf77e61451a082bcdcc34cb35639436cccf6f1c6 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Wed, 7 May 2025 16:42:00 +0300 Subject: [PATCH] [BOLT] Introduce helpers to match `MCInst`s one at a time (NFC) Introduce matchInst helper function to capture and/or match the operands of MCInst. Unlike the existing `MCPlusBuilder::MCInstMatcher` machinery, matchInst is intended for the use cases when precise control over the instruction order is required. For example, when validating PtrAuth hardening, all registers are usually considered unsafe after a function call, even though callee-saved registers should preserve their old values *under normal operation*. --- bolt/include/bolt/Core/MCInstUtils.h | 128 ++ .../Target/AArch64/AArch64MCPlusBuilder.cpp | 90 +--- 2 files changed, 162 insertions(+), 56 deletions(-) diff --git a/bolt/include/bolt/Core/MCInstUtils.h b/bolt/include/bolt/Core/MCInstUtils.h index 69bf5e6159b74..50b7d56470c99 100644 --- a/bolt/include/bolt/Core/MCInstUtils.h +++ b/bolt/include/bolt/Core/MCInstUtils.h @@ -162,6 +162,134 @@ static inline raw_ostream &operator<<(raw_ostream &OS, return Ref.print(OS); } +/// Instruction-matching helpers operating on a single instruction at a time. +/// +/// Unlike MCPlusBuilder::MCInstMatcher, this matchInst() function focuses on +/// the cases where a precise control over the instruction order is important: +/// +/// // Bring the short names into the local scope: +/// using namespace MCInstMatcher; +/// // Declare the registers to capture: +/// Reg Xn, Xm; +/// // Capture the 0th and 1st operands, match the 2nd operand against the +/// // just captured Xm register, match the 3rd operand against literal 0: +/// if (!matchInst(MaybeAdd, AArch64::ADDXrs, Xm, Xn, Xm, Imm(0)) +/// return AArch64::NoRegister; +/// // Match the 0th operand against Xm: +/// if (!matchInst(MaybeBr, AArch64::BR, Xm)) +/// return AArch64::NoRegister; +/// // Return the matched register: +/// return Xm.get(); +namespace MCInstMatcher { + +// The base class to match an operand of type T. +// +// The subclasses of OpMatcher are intended to be allocated on the stack and +// to only be used by passing them to matchInst() and by calling their get() +// function, thus the peculiar `mutable` specifiers: to make the calling code +// compact and readable, the templated matchInst() function has to accept both +// long-lived Imm/Reg wrappers declared as local variables (intended to capture +// the first operand's value and match the subsequent operands, whether inside +// a single instruction or across multiple instructions), as well as temporary +// wrappers around literal values to match, f.e. Imm(42) or Reg(AArch64::XZR). +template class OpMatcher { + mutable std::optional Value; + mutable std::optional SavedValue; + + // Remember/restore the last Value - to be called by matchInst. + void remember() const { SavedValue = Value; } + void restore() const { Value = SavedValue; } + + template + friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...); + +protected: + OpMatcher(std::optional ValueToMatch) : Value(ValueToMatch) {} + + bool matchValue(T OpValue) const { +// Check that OpValue does not contradict the existing Value. +bool MatchResult = !Value || *Value == OpValue; +// If MatchResult is false, all matchers will be reset before returning from +// matchInst, including this one, thus no need to assign conditionally. +Value = OpValue; + +return MatchResult; + } + +public: + /// Returns the captured value. + T get() const { +assert(Value.has_value()); +return *Value; + } +}; + +class Reg : public OpMatcher { + bool matches(const MCOperand &Op) const { +if (!Op.isReg()) + return false; + +return matchValue(Op.getReg()); + } + + template + friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...); + +public: + Reg(std::optional RegToMatch = std::nullopt) + : OpMatcher(RegToMatch) {} +}; + +class Imm : public OpMatcher { + bool matches(const MCOperand &Op) const { +if (!Op.isImm()) + return false; + +return matchValue(Op.getImm()); + } + + template + friend bool matchInst(const MCInst &, unsigned, const OpMatchers &...); + +public: + Imm(std::optional ImmToMatch = std::nullopt) + : OpMatcher(ImmToMatch) {} +}; + +/// Tries to match Inst and updates Ops on success. +/// +/// If Inst has the specified Opcode and its operand list prefix matches Ops, +/// this function returns true and updates Ops, otherwise false is returned and +/// values of Ops are kept as before matchInst was called. +/// +/// Please note that while Ops are technically passed by a const reference to +/// make invocations like `matchInst(MI, Opcode, Imm(42))` possible, all their +/// fields are marked mut
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: make use of C++17 features and LLVM helpers (PR #141665)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/141665 >From 0e0a677194db58183c6167f347d6a47fe3106672 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 27 May 2025 21:06:03 +0300 Subject: [PATCH] [BOLT] Gadget scanner: make use of C++17 features and LLVM helpers Perform trivial syntactical cleanups: * make use of structured binding declarations * use LLVM utility functions when appropriate * omit braces around single expression inside single-line LLVM_DEBUG() This patch is NFC aside from minor debug output changes. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 67 +-- .../AArch64/gs-pauth-debug-output.s | 14 ++-- 2 files changed, 38 insertions(+), 43 deletions(-) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index b540a8d0b7ee7..94cbea5b6344a 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -88,8 +88,8 @@ class TrackedRegisters { TrackedRegisters(ArrayRef RegsToTrack) : Registers(RegsToTrack), RegToIndexMapping(getMappingSize(RegsToTrack), NoIndex) { -for (unsigned I = 0; I < RegsToTrack.size(); ++I) - RegToIndexMapping[RegsToTrack[I]] = I; +for (auto [MappedIndex, Reg] : llvm::enumerate(RegsToTrack)) + RegToIndexMapping[Reg] = MappedIndex; } ArrayRef getRegisters() const { return Registers; } @@ -203,9 +203,9 @@ struct SrcState { SafeToDerefRegs &= StateIn.SafeToDerefRegs; TrustedRegs &= StateIn.TrustedRegs; -for (unsigned I = 0; I < LastInstWritingReg.size(); ++I) - for (const MCInst *J : StateIn.LastInstWritingReg[I]) -LastInstWritingReg[I].insert(J); +for (auto [ThisSet, OtherSet] : + llvm::zip_equal(LastInstWritingReg, StateIn.LastInstWritingReg)) + ThisSet.insert_range(OtherSet); return *this; } @@ -224,11 +224,9 @@ struct SrcState { static void printInstsShort(raw_ostream &OS, ArrayRef Insts) { OS << "Insts: "; - for (unsigned I = 0; I < Insts.size(); ++I) { -auto &Set = Insts[I]; + for (auto [I, PtrSet] : llvm::enumerate(Insts)) { OS << "[" << I << "]("; -for (const MCInst *MCInstP : Set) - OS << MCInstP << " "; +interleave(PtrSet, OS, " "); OS << ")"; } } @@ -416,8 +414,9 @@ class SrcSafetyAnalysis { // ... an address can be updated in a safe manner, producing the result // which is as trusted as the input address. if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) { - if (Cur.SafeToDerefRegs[DstAndSrc->second]) -Regs.push_back(DstAndSrc->first); + auto [DstReg, SrcReg] = *DstAndSrc; + if (Cur.SafeToDerefRegs[SrcReg]) +Regs.push_back(DstReg); } // Make sure explicit checker sequence keeps register safe-to-dereference @@ -469,8 +468,9 @@ class SrcSafetyAnalysis { // ... an address can be updated in a safe manner, producing the result // which is as trusted as the input address. if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) { - if (Cur.TrustedRegs[DstAndSrc->second]) -Regs.push_back(DstAndSrc->first); + auto [DstReg, SrcReg] = *DstAndSrc; + if (Cur.TrustedRegs[SrcReg]) +Regs.push_back(DstReg); } return Regs; @@ -868,9 +868,9 @@ struct DstState { return (*this = StateIn); CannotEscapeUnchecked &= StateIn.CannotEscapeUnchecked; -for (unsigned I = 0; I < FirstInstLeakingReg.size(); ++I) - for (const MCInst *J : StateIn.FirstInstLeakingReg[I]) -FirstInstLeakingReg[I].insert(J); +for (auto [ThisSet, OtherSet] : + llvm::zip_equal(FirstInstLeakingReg, StateIn.FirstInstLeakingReg)) + ThisSet.insert_range(OtherSet); return *this; } @@ -1036,8 +1036,7 @@ class DstSafetyAnalysis { // ... an address can be updated in a safe manner, or if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Inst)) { - MCPhysReg DstReg, SrcReg; - std::tie(DstReg, SrcReg) = *DstAndSrc; + auto [DstReg, SrcReg] = *DstAndSrc; // Note that *all* registers containing the derived values must be safe, // both source and destination ones. No temporaries are supported at now. if (Cur.CannotEscapeUnchecked[SrcReg] && @@ -1077,7 +1076,7 @@ class DstSafetyAnalysis { // If this instruction terminates the program immediately, no // authentication oracles are possible past this point. if (BC.MIB->isTrap(Point)) { - LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); }); + LLVM_DEBUG(traceInst(BC, "Trap instruction found", Point)); DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); Next.CannotEscapeUnchecked.set(); return Next; @@ -1255,7 +1254,7 @@ class CFGUnawareDstSafetyAnalysis : public DstSafetyAnalysis, // starting to analyze Inst.
[llvm-branch-commits] [llvm] AMDGPU: Handle folding vector splats of inline split f64 inline immediates (PR #140878)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/140878 >From da0e7f3709e013348cde68701e8739c2912a4e4f Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Mon, 19 May 2025 21:51:06 +0200 Subject: [PATCH] AMDGPU: Handle folding vector splats of inline split f64 inline immediates Recognize a reg_sequence with 32-bit elements that produce a 64-bit splat value. This enables folding f64 constants into mfma operands --- llvm/lib/Target/AMDGPU/SIFoldOperands.cpp | 103 -- .../CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx90a.ll | 41 +-- 2 files changed, 76 insertions(+), 68 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp index 8d7b68dbd5682..0ed06c37507af 100644 --- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp +++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp @@ -227,12 +227,12 @@ class SIFoldOperandsImpl { getRegSeqInit(SmallVectorImpl> &Defs, Register UseReg) const; - std::pair + std::pair isRegSeqSplat(MachineInstr &RegSeg) const; - MachineOperand *tryFoldRegSeqSplat(MachineInstr *UseMI, unsigned UseOpIdx, - MachineOperand *SplatVal, - const TargetRegisterClass *SplatRC) const; + bool tryFoldRegSeqSplat(MachineInstr *UseMI, unsigned UseOpIdx, + int64_t SplatVal, + const TargetRegisterClass *SplatRC) const; bool tryToFoldACImm(const FoldableDef &OpToFold, MachineInstr *UseMI, unsigned UseOpIdx, @@ -966,15 +966,15 @@ const TargetRegisterClass *SIFoldOperandsImpl::getRegSeqInit( return getRegSeqInit(*Def, Defs); } -std::pair +std::pair SIFoldOperandsImpl::isRegSeqSplat(MachineInstr &RegSeq) const { SmallVector, 32> Defs; const TargetRegisterClass *SrcRC = getRegSeqInit(RegSeq, Defs); if (!SrcRC) return {}; - // TODO: Recognize 64-bit splats broken into 32-bit pieces (i.e. recognize - // every other other element is 0 for 64-bit immediates) + bool TryToMatchSplat64 = false; + int64_t Imm; for (unsigned I = 0, E = Defs.size(); I != E; ++I) { const MachineOperand *Op = Defs[I].first; @@ -986,38 +986,75 @@ SIFoldOperandsImpl::isRegSeqSplat(MachineInstr &RegSeq) const { Imm = SubImm; continue; } -if (Imm != SubImm) + +if (Imm != SubImm) { + if (I == 1 && (E & 1) == 0) { +// If we have an even number of inputs, there's a chance this is a +// 64-bit element splat broken into 32-bit pieces. +TryToMatchSplat64 = true; +break; + } + return {}; // Can only fold splat constants +} + } + + if (!TryToMatchSplat64) +return {Defs[0].first->getImm(), SrcRC}; + + // Fallback to recognizing 64-bit splats broken into 32-bit pieces + // (i.e. recognize every other other element is 0 for 64-bit immediates) + int64_t SplatVal64; + for (unsigned I = 0, E = Defs.size(); I != E; I += 2) { +const MachineOperand *Op0 = Defs[I].first; +const MachineOperand *Op1 = Defs[I + 1].first; + +if (!Op0->isImm() || !Op1->isImm()) + return {}; + +unsigned SubReg0 = Defs[I].second; +unsigned SubReg1 = Defs[I + 1].second; + +// Assume we're going to generally encounter reg_sequences with sorted +// subreg indexes, so reject any that aren't consecutive. +if (TRI->getChannelFromSubReg(SubReg0) + 1 != +TRI->getChannelFromSubReg(SubReg1)) + return {}; + +int64_t MergedVal = Make_64(Op1->getImm(), Op0->getImm()); +if (I == 0) + SplatVal64 = MergedVal; +else if (SplatVal64 != MergedVal) + return {}; } - return {Defs[0].first, SrcRC}; + const TargetRegisterClass *RC64 = TRI->getSubRegisterClass( + MRI->getRegClass(RegSeq.getOperand(0).getReg()), AMDGPU::sub0_sub1); + + return {SplatVal64, RC64}; } -MachineOperand *SIFoldOperandsImpl::tryFoldRegSeqSplat( -MachineInstr *UseMI, unsigned UseOpIdx, MachineOperand *SplatVal, +bool SIFoldOperandsImpl::tryFoldRegSeqSplat( +MachineInstr *UseMI, unsigned UseOpIdx, int64_t SplatVal, const TargetRegisterClass *SplatRC) const { const MCInstrDesc &Desc = UseMI->getDesc(); if (UseOpIdx >= Desc.getNumOperands()) -return nullptr; +return false; // Filter out unhandled pseudos. if (!AMDGPU::isSISrcOperand(Desc, UseOpIdx)) -return nullptr; +return false; int16_t RCID = Desc.operands()[UseOpIdx].RegClass; if (RCID == -1) -return nullptr; +return false; + + const TargetRegisterClass *OpRC = TRI->getRegClass(RCID); // Special case 0/-1, since when interpreted as a 64-bit element both halves - // have the same bits. Effectively this code does not handle 64-bit element - // operands correctly, as the incoming 64-bit constants are already split into - // 32-bit sequence elements. - // - // TODO: We should try to figure out how to interpret the reg_sequence as a - /
[llvm-branch-commits] [llvm] AMDGPU: Handle folding vector splats of inline split f64 inline immediates (PR #140878)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/140878 >From da0e7f3709e013348cde68701e8739c2912a4e4f Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Mon, 19 May 2025 21:51:06 +0200 Subject: [PATCH] AMDGPU: Handle folding vector splats of inline split f64 inline immediates Recognize a reg_sequence with 32-bit elements that produce a 64-bit splat value. This enables folding f64 constants into mfma operands --- llvm/lib/Target/AMDGPU/SIFoldOperands.cpp | 103 -- .../CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx90a.ll | 41 +-- 2 files changed, 76 insertions(+), 68 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp index 8d7b68dbd5682..0ed06c37507af 100644 --- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp +++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp @@ -227,12 +227,12 @@ class SIFoldOperandsImpl { getRegSeqInit(SmallVectorImpl> &Defs, Register UseReg) const; - std::pair + std::pair isRegSeqSplat(MachineInstr &RegSeg) const; - MachineOperand *tryFoldRegSeqSplat(MachineInstr *UseMI, unsigned UseOpIdx, - MachineOperand *SplatVal, - const TargetRegisterClass *SplatRC) const; + bool tryFoldRegSeqSplat(MachineInstr *UseMI, unsigned UseOpIdx, + int64_t SplatVal, + const TargetRegisterClass *SplatRC) const; bool tryToFoldACImm(const FoldableDef &OpToFold, MachineInstr *UseMI, unsigned UseOpIdx, @@ -966,15 +966,15 @@ const TargetRegisterClass *SIFoldOperandsImpl::getRegSeqInit( return getRegSeqInit(*Def, Defs); } -std::pair +std::pair SIFoldOperandsImpl::isRegSeqSplat(MachineInstr &RegSeq) const { SmallVector, 32> Defs; const TargetRegisterClass *SrcRC = getRegSeqInit(RegSeq, Defs); if (!SrcRC) return {}; - // TODO: Recognize 64-bit splats broken into 32-bit pieces (i.e. recognize - // every other other element is 0 for 64-bit immediates) + bool TryToMatchSplat64 = false; + int64_t Imm; for (unsigned I = 0, E = Defs.size(); I != E; ++I) { const MachineOperand *Op = Defs[I].first; @@ -986,38 +986,75 @@ SIFoldOperandsImpl::isRegSeqSplat(MachineInstr &RegSeq) const { Imm = SubImm; continue; } -if (Imm != SubImm) + +if (Imm != SubImm) { + if (I == 1 && (E & 1) == 0) { +// If we have an even number of inputs, there's a chance this is a +// 64-bit element splat broken into 32-bit pieces. +TryToMatchSplat64 = true; +break; + } + return {}; // Can only fold splat constants +} + } + + if (!TryToMatchSplat64) +return {Defs[0].first->getImm(), SrcRC}; + + // Fallback to recognizing 64-bit splats broken into 32-bit pieces + // (i.e. recognize every other other element is 0 for 64-bit immediates) + int64_t SplatVal64; + for (unsigned I = 0, E = Defs.size(); I != E; I += 2) { +const MachineOperand *Op0 = Defs[I].first; +const MachineOperand *Op1 = Defs[I + 1].first; + +if (!Op0->isImm() || !Op1->isImm()) + return {}; + +unsigned SubReg0 = Defs[I].second; +unsigned SubReg1 = Defs[I + 1].second; + +// Assume we're going to generally encounter reg_sequences with sorted +// subreg indexes, so reject any that aren't consecutive. +if (TRI->getChannelFromSubReg(SubReg0) + 1 != +TRI->getChannelFromSubReg(SubReg1)) + return {}; + +int64_t MergedVal = Make_64(Op1->getImm(), Op0->getImm()); +if (I == 0) + SplatVal64 = MergedVal; +else if (SplatVal64 != MergedVal) + return {}; } - return {Defs[0].first, SrcRC}; + const TargetRegisterClass *RC64 = TRI->getSubRegisterClass( + MRI->getRegClass(RegSeq.getOperand(0).getReg()), AMDGPU::sub0_sub1); + + return {SplatVal64, RC64}; } -MachineOperand *SIFoldOperandsImpl::tryFoldRegSeqSplat( -MachineInstr *UseMI, unsigned UseOpIdx, MachineOperand *SplatVal, +bool SIFoldOperandsImpl::tryFoldRegSeqSplat( +MachineInstr *UseMI, unsigned UseOpIdx, int64_t SplatVal, const TargetRegisterClass *SplatRC) const { const MCInstrDesc &Desc = UseMI->getDesc(); if (UseOpIdx >= Desc.getNumOperands()) -return nullptr; +return false; // Filter out unhandled pseudos. if (!AMDGPU::isSISrcOperand(Desc, UseOpIdx)) -return nullptr; +return false; int16_t RCID = Desc.operands()[UseOpIdx].RegClass; if (RCID == -1) -return nullptr; +return false; + + const TargetRegisterClass *OpRC = TRI->getRegClass(RCID); // Special case 0/-1, since when interpreted as a 64-bit element both halves - // have the same bits. Effectively this code does not handle 64-bit element - // operands correctly, as the incoming 64-bit constants are already split into - // 32-bit sequence elements. - // - // TODO: We should try to figure out how to interpret the reg_sequence as a - /
[llvm-branch-commits] [llvm] RuntimeLibcalls: Cleanup darwin exp10 case (PR #145638)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/145638 >From 56a5cab5a8fb95735202f864bb93031f2de5f555 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 25 Jun 2025 14:23:21 +0900 Subject: [PATCH] RuntimeLibcalls: Cleanup darwin exp10 case Add a predicate function following the example of __sincos_stret --- llvm/include/llvm/IR/RuntimeLibcalls.h | 2 ++ llvm/lib/IR/RuntimeLibcalls.cpp| 48 -- 2 files changed, 25 insertions(+), 25 deletions(-) diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.h b/llvm/include/llvm/IR/RuntimeLibcalls.h index 2a095be58a49e..5bd5fd1ce8d3f 100644 --- a/llvm/include/llvm/IR/RuntimeLibcalls.h +++ b/llvm/include/llvm/IR/RuntimeLibcalls.h @@ -149,6 +149,8 @@ struct RuntimeLibcallsInfo { return true; } + static bool darwinHasExp10(const Triple &TT); + /// Return true if the target has sincosf/sincos/sincosl functions static bool hasSinCos(const Triple &TT) { return TT.isGNUEnvironment() || TT.isOSFuchsia() || diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp index e9cb970f804ca..cb8c8457f5a47 100644 --- a/llvm/lib/IR/RuntimeLibcalls.cpp +++ b/llvm/lib/IR/RuntimeLibcalls.cpp @@ -457,33 +457,12 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT, } } -switch (TT.getOS()) { -case Triple::MacOSX: - if (TT.isMacOSXVersionLT(10, 9)) { -setLibcallName(RTLIB::EXP10_F32, nullptr); -setLibcallName(RTLIB::EXP10_F64, nullptr); - } else { -setLibcallName(RTLIB::EXP10_F32, "__exp10f"); -setLibcallName(RTLIB::EXP10_F64, "__exp10"); - } - break; -case Triple::IOS: - if (TT.isOSVersionLT(7, 0)) { -setLibcallName(RTLIB::EXP10_F32, nullptr); -setLibcallName(RTLIB::EXP10_F64, nullptr); -break; - } - [[fallthrough]]; -case Triple::DriverKit: -case Triple::TvOS: -case Triple::WatchOS: -case Triple::XROS: -case Triple::BridgeOS: +if (darwinHasExp10(TT)) { setLibcallName(RTLIB::EXP10_F32, "__exp10f"); setLibcallName(RTLIB::EXP10_F64, "__exp10"); - break; -default: - break; +} else { + setLibcallName(RTLIB::EXP10_F32, nullptr); + setLibcallName(RTLIB::EXP10_F64, nullptr); } } @@ -662,3 +641,22 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT, if (TT.getArch() == Triple::ArchType::msp430) setMSP430Libcalls(*this, TT); } + +bool RuntimeLibcallsInfo::darwinHasExp10(const Triple &TT) { + assert(TT.isOSDarwin() && "should be called with darwin triple"); + + switch (TT.getOS()) { + case Triple::MacOSX: +return !TT.isMacOSXVersionLT(10, 9); + case Triple::IOS: +return !TT.isOSVersionLT(7, 0); + case Triple::DriverKit: + case Triple::TvOS: + case Triple::WatchOS: + case Triple::XROS: + case Triple::BridgeOS: +return true; + default: +return false; + } +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] RuntimeLibcalls: Cleanup darwin bzero configuration (PR #145639)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/145639 >From 9eebced4e8423aeaa8aeb8c13e13e2483c7a3988 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 25 Jun 2025 14:42:24 +0900 Subject: [PATCH] RuntimeLibcalls: Cleanup darwin bzero configuration Write this in a more predicate-apply style instead of the switch. --- llvm/lib/IR/RuntimeLibcalls.cpp | 12 ++-- 1 file changed, 2 insertions(+), 10 deletions(-) diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp index cb8c8457f5a47..5c01d8595d0f9 100644 --- a/llvm/lib/IR/RuntimeLibcalls.cpp +++ b/llvm/lib/IR/RuntimeLibcalls.cpp @@ -432,19 +432,11 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT, setLibcallName(RTLIB::FPROUND_F32_F16, "__truncsfhf2"); // Some darwins have an optimized __bzero/bzero function. -switch (TT.getArch()) { -case Triple::x86: -case Triple::x86_64: +if (TT.isX86()) { if (TT.isMacOSX() && !TT.isMacOSXVersionLT(10, 6)) setLibcallName(RTLIB::BZERO, "__bzero"); - break; -case Triple::aarch64: -case Triple::aarch64_32: +} else if (TT.isAArch64()) setLibcallName(RTLIB::BZERO, "bzero"); - break; -default: - break; -} if (darwinHasSinCosStret(TT)) { setLibcallName(RTLIB::SINCOS_STRET_F32, "__sincosf_stret"); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Handle folding vector splats of inline split f64 inline immediates (PR #140878)
https://github.com/kosarev approved this pull request. https://github.com/llvm/llvm-project/pull/140878 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [AsmPrinter] Always emit global equivalents if there is non-global uses (#145648) (PR #145690)
llvmbot wrote: @llvm/pr-subscribers-mc Author: None (llvmbot) Changes Backport 630d55cce45f8b409367914ef372047c8c43c511 Requested by: @dianqk --- Full diff: https://github.com/llvm/llvm-project/pull/145690.diff 2 Files Affected: - (modified) llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp (+17-7) - (added) llvm/test/MC/X86/gotpcrel-non-globals.ll (+36) ``diff diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp index e77abf429e6b4..c8f567e5f4195 100644 --- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp +++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp @@ -2139,16 +2139,20 @@ void AsmPrinter::emitFunctionBody() { } /// Compute the number of Global Variables that uses a Constant. -static unsigned getNumGlobalVariableUses(const Constant *C) { - if (!C) +static unsigned getNumGlobalVariableUses(const Constant *C, + bool &HasNonGlobalUsers) { + if (!C) { +HasNonGlobalUsers = true; return 0; + } if (isa(C)) return 1; unsigned NumUses = 0; for (const auto *CU : C->users()) -NumUses += getNumGlobalVariableUses(dyn_cast(CU)); +NumUses += +getNumGlobalVariableUses(dyn_cast(CU), HasNonGlobalUsers); return NumUses; } @@ -2159,7 +2163,8 @@ static unsigned getNumGlobalVariableUses(const Constant *C) { /// candidates are skipped and are emitted later in case at least one cstexpr /// isn't replaced by a PC relative GOT entry access. static bool isGOTEquivalentCandidate(const GlobalVariable *GV, - unsigned &NumGOTEquivUsers) { + unsigned &NumGOTEquivUsers, + bool &HasNonGlobalUsers) { // Global GOT equivalents are unnamed private globals with a constant // pointer initializer to another global symbol. They must point to a // GlobalVariable or Function, i.e., as GlobalValue. @@ -2171,7 +2176,8 @@ static bool isGOTEquivalentCandidate(const GlobalVariable *GV, // To be a got equivalent, at least one of its users need to be a constant // expression used by another global variable. for (const auto *U : GV->users()) -NumGOTEquivUsers += getNumGlobalVariableUses(dyn_cast(U)); +NumGOTEquivUsers += +getNumGlobalVariableUses(dyn_cast(U), HasNonGlobalUsers); return NumGOTEquivUsers > 0; } @@ -2189,9 +2195,13 @@ void AsmPrinter::computeGlobalGOTEquivs(Module &M) { for (const auto &G : M.globals()) { unsigned NumGOTEquivUsers = 0; -if (!isGOTEquivalentCandidate(&G, NumGOTEquivUsers)) +bool HasNonGlobalUsers = false; +if (!isGOTEquivalentCandidate(&G, NumGOTEquivUsers, HasNonGlobalUsers)) continue; - +// If non-global variables use it, we still need to emit it. +// Add 1 here, then emit it in `emitGlobalGOTEquivs`. +if (HasNonGlobalUsers) + NumGOTEquivUsers += 1; const MCSymbol *GOTEquivSym = getSymbol(&G); GlobalGOTEquivs[GOTEquivSym] = std::make_pair(&G, NumGOTEquivUsers); } diff --git a/llvm/test/MC/X86/gotpcrel-non-globals.ll b/llvm/test/MC/X86/gotpcrel-non-globals.ll new file mode 100644 index 0..222d2d73ff728 --- /dev/null +++ b/llvm/test/MC/X86/gotpcrel-non-globals.ll @@ -0,0 +1,36 @@ +; RUN: llc < %s | FileCheck %s + +target triple = "x86_64-unknown-linux-gnu" + +; Check that we emit the `@bar_*` symbols, and that we don't emit multiple symbols. + +; CHECK-LABEL: .Lrel_0: +; CHECK: .long foo_0@GOTPCREL+0 +; CHECK-LABEL: .Lrel_1_failed: +; CHECK: .long bar_1-foo_0 +; CHECK-LABEL: .Lrel_2: +; CHECK: .long foo_2@GOTPCREL+0 + +; CHECK: bar_0: +; CHECK: bar_1: +; CHECK: bar_2_indirect: + +@rel_0 = private unnamed_addr constant [1 x i32] [ + i32 trunc (i64 sub (i64 ptrtoint (ptr @bar_0 to i64), i64 ptrtoint (ptr @rel_0 to i64)) to i32)] +@rel_1_failed = private unnamed_addr constant [1 x i32] [ + i32 trunc (i64 sub (i64 ptrtoint (ptr @bar_1 to i64), i64 ptrtoint (ptr @foo_0 to i64)) to i32)] +@rel_2 = private unnamed_addr constant [1 x i32] [ + i32 trunc (i64 sub (i64 ptrtoint (ptr @bar_2_indirect to i64), i64 ptrtoint (ptr @rel_2 to i64)) to i32)] +@bar_0 = internal unnamed_addr constant ptr @foo_0, align 8 +@bar_1 = internal unnamed_addr constant ptr @foo_1, align 8 +@bar_2_indirect = internal unnamed_addr constant ptr @foo_2, align 8 +@foo_0 = external global ptr, align 8 +@foo_1 = external global ptr, align 8 +@foo_2 = external global ptr, align 8 + +define void @foo(ptr %arg0, ptr %arg1) { + store ptr @bar_0, ptr %arg0, align 8 + store ptr @bar_1, ptr %arg1, align 8 + store ptr getelementptr (i8, ptr @bar_2_indirect, i32 1), ptr %arg1, align 8 + ret void +} `` https://github.com/llvm/llvm-project/pull/145690 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [AsmPrinter] Always emit global equivalents if there is non-global uses (#145648) (PR #145690)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/145690 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [AsmPrinter] Always emit global equivalents if there is non-global uses (#145648) (PR #145690)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/145690 Backport 630d55cce45f8b409367914ef372047c8c43c511 Requested by: @dianqk >From eacbb28486642aebd28d826094301cb4ff330287 Mon Sep 17 00:00:00 2001 From: dianqk Date: Wed, 25 Jun 2025 18:39:36 +0800 Subject: [PATCH] [AsmPrinter] Always emit global equivalents if there is non-global uses (#145648) A case found from https://github.com/rust-lang/rust/issues/142752: https://llvm.godbolt.org/z/T7ce9saWh. We should emit `@bar_0` for the following code: ```llvm target triple = "x86_64-unknown-linux-gnu" @rel_0 = private unnamed_addr constant [1 x i32] [ i32 trunc (i64 sub (i64 ptrtoint (ptr @bar_0 to i64), i64 ptrtoint (ptr @rel_0 to i64)) to i32)] @bar_0 = internal unnamed_addr constant ptr @foo_0, align 8 @foo_0 = external global ptr, align 8 define void @foo(ptr %arg0) { store ptr @bar_0, ptr %arg0, align 8 ret void } ``` (cherry picked from commit 630d55cce45f8b409367914ef372047c8c43c511) --- llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp | 24 ++- llvm/test/MC/X86/gotpcrel-non-globals.ll | 36 ++ 2 files changed, 53 insertions(+), 7 deletions(-) create mode 100644 llvm/test/MC/X86/gotpcrel-non-globals.ll diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp index e77abf429e6b4..c8f567e5f4195 100644 --- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp +++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp @@ -2139,16 +2139,20 @@ void AsmPrinter::emitFunctionBody() { } /// Compute the number of Global Variables that uses a Constant. -static unsigned getNumGlobalVariableUses(const Constant *C) { - if (!C) +static unsigned getNumGlobalVariableUses(const Constant *C, + bool &HasNonGlobalUsers) { + if (!C) { +HasNonGlobalUsers = true; return 0; + } if (isa(C)) return 1; unsigned NumUses = 0; for (const auto *CU : C->users()) -NumUses += getNumGlobalVariableUses(dyn_cast(CU)); +NumUses += +getNumGlobalVariableUses(dyn_cast(CU), HasNonGlobalUsers); return NumUses; } @@ -2159,7 +2163,8 @@ static unsigned getNumGlobalVariableUses(const Constant *C) { /// candidates are skipped and are emitted later in case at least one cstexpr /// isn't replaced by a PC relative GOT entry access. static bool isGOTEquivalentCandidate(const GlobalVariable *GV, - unsigned &NumGOTEquivUsers) { + unsigned &NumGOTEquivUsers, + bool &HasNonGlobalUsers) { // Global GOT equivalents are unnamed private globals with a constant // pointer initializer to another global symbol. They must point to a // GlobalVariable or Function, i.e., as GlobalValue. @@ -2171,7 +2176,8 @@ static bool isGOTEquivalentCandidate(const GlobalVariable *GV, // To be a got equivalent, at least one of its users need to be a constant // expression used by another global variable. for (const auto *U : GV->users()) -NumGOTEquivUsers += getNumGlobalVariableUses(dyn_cast(U)); +NumGOTEquivUsers += +getNumGlobalVariableUses(dyn_cast(U), HasNonGlobalUsers); return NumGOTEquivUsers > 0; } @@ -2189,9 +2195,13 @@ void AsmPrinter::computeGlobalGOTEquivs(Module &M) { for (const auto &G : M.globals()) { unsigned NumGOTEquivUsers = 0; -if (!isGOTEquivalentCandidate(&G, NumGOTEquivUsers)) +bool HasNonGlobalUsers = false; +if (!isGOTEquivalentCandidate(&G, NumGOTEquivUsers, HasNonGlobalUsers)) continue; - +// If non-global variables use it, we still need to emit it. +// Add 1 here, then emit it in `emitGlobalGOTEquivs`. +if (HasNonGlobalUsers) + NumGOTEquivUsers += 1; const MCSymbol *GOTEquivSym = getSymbol(&G); GlobalGOTEquivs[GOTEquivSym] = std::make_pair(&G, NumGOTEquivUsers); } diff --git a/llvm/test/MC/X86/gotpcrel-non-globals.ll b/llvm/test/MC/X86/gotpcrel-non-globals.ll new file mode 100644 index 0..222d2d73ff728 --- /dev/null +++ b/llvm/test/MC/X86/gotpcrel-non-globals.ll @@ -0,0 +1,36 @@ +; RUN: llc < %s | FileCheck %s + +target triple = "x86_64-unknown-linux-gnu" + +; Check that we emit the `@bar_*` symbols, and that we don't emit multiple symbols. + +; CHECK-LABEL: .Lrel_0: +; CHECK: .long foo_0@GOTPCREL+0 +; CHECK-LABEL: .Lrel_1_failed: +; CHECK: .long bar_1-foo_0 +; CHECK-LABEL: .Lrel_2: +; CHECK: .long foo_2@GOTPCREL+0 + +; CHECK: bar_0: +; CHECK: bar_1: +; CHECK: bar_2_indirect: + +@rel_0 = private unnamed_addr constant [1 x i32] [ + i32 trunc (i64 sub (i64 ptrtoint (ptr @bar_0 to i64), i64 ptrtoint (ptr @rel_0 to i64)) to i32)] +@rel_1_failed = private unnamed_addr constant [1 x i32] [ + i32 trunc (i64 sub (i64 ptrtoint (ptr @bar_1 to i64), i64 ptrtoint (ptr @foo_0 to i64)) to i32)] +@rel_2 = private unnamed_addr constant [1 x i32] [ + i32 trunc (i64 sub (i64 ptrtoint (ptr @bar_2_
[llvm-branch-commits] [llvm] release/20.x: [AsmPrinter] Always emit global equivalents if there is non-global uses (#145648) (PR #145690)
llvmbot wrote: @llvm/pr-subscribers-backend-x86 Author: None (llvmbot) Changes Backport 630d55cce45f8b409367914ef372047c8c43c511 Requested by: @dianqk --- Full diff: https://github.com/llvm/llvm-project/pull/145690.diff 2 Files Affected: - (modified) llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp (+17-7) - (added) llvm/test/MC/X86/gotpcrel-non-globals.ll (+36) ``diff diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp index e77abf429e6b4..c8f567e5f4195 100644 --- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp +++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp @@ -2139,16 +2139,20 @@ void AsmPrinter::emitFunctionBody() { } /// Compute the number of Global Variables that uses a Constant. -static unsigned getNumGlobalVariableUses(const Constant *C) { - if (!C) +static unsigned getNumGlobalVariableUses(const Constant *C, + bool &HasNonGlobalUsers) { + if (!C) { +HasNonGlobalUsers = true; return 0; + } if (isa(C)) return 1; unsigned NumUses = 0; for (const auto *CU : C->users()) -NumUses += getNumGlobalVariableUses(dyn_cast(CU)); +NumUses += +getNumGlobalVariableUses(dyn_cast(CU), HasNonGlobalUsers); return NumUses; } @@ -2159,7 +2163,8 @@ static unsigned getNumGlobalVariableUses(const Constant *C) { /// candidates are skipped and are emitted later in case at least one cstexpr /// isn't replaced by a PC relative GOT entry access. static bool isGOTEquivalentCandidate(const GlobalVariable *GV, - unsigned &NumGOTEquivUsers) { + unsigned &NumGOTEquivUsers, + bool &HasNonGlobalUsers) { // Global GOT equivalents are unnamed private globals with a constant // pointer initializer to another global symbol. They must point to a // GlobalVariable or Function, i.e., as GlobalValue. @@ -2171,7 +2176,8 @@ static bool isGOTEquivalentCandidate(const GlobalVariable *GV, // To be a got equivalent, at least one of its users need to be a constant // expression used by another global variable. for (const auto *U : GV->users()) -NumGOTEquivUsers += getNumGlobalVariableUses(dyn_cast(U)); +NumGOTEquivUsers += +getNumGlobalVariableUses(dyn_cast(U), HasNonGlobalUsers); return NumGOTEquivUsers > 0; } @@ -2189,9 +2195,13 @@ void AsmPrinter::computeGlobalGOTEquivs(Module &M) { for (const auto &G : M.globals()) { unsigned NumGOTEquivUsers = 0; -if (!isGOTEquivalentCandidate(&G, NumGOTEquivUsers)) +bool HasNonGlobalUsers = false; +if (!isGOTEquivalentCandidate(&G, NumGOTEquivUsers, HasNonGlobalUsers)) continue; - +// If non-global variables use it, we still need to emit it. +// Add 1 here, then emit it in `emitGlobalGOTEquivs`. +if (HasNonGlobalUsers) + NumGOTEquivUsers += 1; const MCSymbol *GOTEquivSym = getSymbol(&G); GlobalGOTEquivs[GOTEquivSym] = std::make_pair(&G, NumGOTEquivUsers); } diff --git a/llvm/test/MC/X86/gotpcrel-non-globals.ll b/llvm/test/MC/X86/gotpcrel-non-globals.ll new file mode 100644 index 0..222d2d73ff728 --- /dev/null +++ b/llvm/test/MC/X86/gotpcrel-non-globals.ll @@ -0,0 +1,36 @@ +; RUN: llc < %s | FileCheck %s + +target triple = "x86_64-unknown-linux-gnu" + +; Check that we emit the `@bar_*` symbols, and that we don't emit multiple symbols. + +; CHECK-LABEL: .Lrel_0: +; CHECK: .long foo_0@GOTPCREL+0 +; CHECK-LABEL: .Lrel_1_failed: +; CHECK: .long bar_1-foo_0 +; CHECK-LABEL: .Lrel_2: +; CHECK: .long foo_2@GOTPCREL+0 + +; CHECK: bar_0: +; CHECK: bar_1: +; CHECK: bar_2_indirect: + +@rel_0 = private unnamed_addr constant [1 x i32] [ + i32 trunc (i64 sub (i64 ptrtoint (ptr @bar_0 to i64), i64 ptrtoint (ptr @rel_0 to i64)) to i32)] +@rel_1_failed = private unnamed_addr constant [1 x i32] [ + i32 trunc (i64 sub (i64 ptrtoint (ptr @bar_1 to i64), i64 ptrtoint (ptr @foo_0 to i64)) to i32)] +@rel_2 = private unnamed_addr constant [1 x i32] [ + i32 trunc (i64 sub (i64 ptrtoint (ptr @bar_2_indirect to i64), i64 ptrtoint (ptr @rel_2 to i64)) to i32)] +@bar_0 = internal unnamed_addr constant ptr @foo_0, align 8 +@bar_1 = internal unnamed_addr constant ptr @foo_1, align 8 +@bar_2_indirect = internal unnamed_addr constant ptr @foo_2, align 8 +@foo_0 = external global ptr, align 8 +@foo_1 = external global ptr, align 8 +@foo_2 = external global ptr, align 8 + +define void @foo(ptr %arg0, ptr %arg1) { + store ptr @bar_0, ptr %arg0, align 8 + store ptr @bar_1, ptr %arg1, align 8 + store ptr getelementptr (i8, ptr @bar_2_indirect, i32 1), ptr %arg1, align 8 + ret void +} `` https://github.com/llvm/llvm-project/pull/145690 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: prevent false positives due to jump tables (PR #138884)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/138884 >From 7f3d14f3b173b0425c7624e8d62f0188081ef691 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 6 May 2025 11:31:03 +0300 Subject: [PATCH 1/2] [BOLT] Gadget scanner: prevent false positives due to jump tables As part of PAuth hardening, AArch64 LLVM backend can use a special BR_JumpTable pseudo (enabled by -faarch64-jump-table-hardening Clang option) which is expanded in the AsmPrinter into a contiguous sequence without unsafe instructions in the middle. This commit adds another target-specific callback to MCPlusBuilder to make it possible to inhibit false positives for known-safe jump table dispatch sequences. Without special handling, the branch instruction is likely to be reported as a non-protected call (as its destination is not produced by an auth instruction, PC-relative address materialization, etc.) and possibly as a tail call being performed with unsafe link register (as the detection whether the branch instruction is a tail call is an heuristic). For now, only the specific instruction sequence used by the AArch64 LLVM backend is matched. --- bolt/include/bolt/Core/MCInstUtils.h | 9 + bolt/include/bolt/Core/MCPlusBuilder.h| 14 + bolt/lib/Core/MCInstUtils.cpp | 20 + bolt/lib/Passes/PAuthGadgetScanner.cpp| 10 + .../Target/AArch64/AArch64MCPlusBuilder.cpp | 73 ++ .../AArch64/gs-pauth-jump-table.s | 703 ++ 6 files changed, 829 insertions(+) create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-jump-table.s diff --git a/bolt/include/bolt/Core/MCInstUtils.h b/bolt/include/bolt/Core/MCInstUtils.h index 50b7d56470c99..33d36cccbcfff 100644 --- a/bolt/include/bolt/Core/MCInstUtils.h +++ b/bolt/include/bolt/Core/MCInstUtils.h @@ -154,6 +154,15 @@ class MCInstReference { return nullptr; } + /// Returns the only preceding instruction, or std::nullopt if multiple or no + /// predecessors are possible. + /// + /// If CFG information is available, basic block boundary can be crossed, + /// provided there is exactly one predecessor. If CFG is not available, the + /// preceding instruction in the offset order is returned, unless this is the + /// first instruction of the function. + std::optional getSinglePredecessor(); + raw_ostream &print(raw_ostream &OS) const; }; diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h index c31c9984ed43e..b6f70fc831fca 100644 --- a/bolt/include/bolt/Core/MCPlusBuilder.h +++ b/bolt/include/bolt/Core/MCPlusBuilder.h @@ -14,6 +14,7 @@ #ifndef BOLT_CORE_MCPLUSBUILDER_H #define BOLT_CORE_MCPLUSBUILDER_H +#include "bolt/Core/MCInstUtils.h" #include "bolt/Core/MCPlus.h" #include "bolt/Core/Relocation.h" #include "llvm/ADT/ArrayRef.h" @@ -700,6 +701,19 @@ class MCPlusBuilder { return std::nullopt; } + /// Tests if BranchInst corresponds to an instruction sequence which is known + /// to be a safe dispatch via jump table. + /// + /// The target can decide which instruction sequences to consider "safe" from + /// the Pointer Authentication point of view, such as any jump table dispatch + /// sequence without function calls inside, any sequence which is contiguous, + /// or only some specific well-known sequences. + virtual bool + isSafeJumpTableBranchForPtrAuth(MCInstReference BranchInst) const { +llvm_unreachable("not implemented"); +return false; + } + virtual bool isTerminator(const MCInst &Inst) const; virtual bool isNoop(const MCInst &Inst) const { diff --git a/bolt/lib/Core/MCInstUtils.cpp b/bolt/lib/Core/MCInstUtils.cpp index 40f6edd59135c..b7c6d898988af 100644 --- a/bolt/lib/Core/MCInstUtils.cpp +++ b/bolt/lib/Core/MCInstUtils.cpp @@ -55,3 +55,23 @@ raw_ostream &MCInstReference::print(raw_ostream &OS) const { OS << ">"; return OS; } + +std::optional MCInstReference::getSinglePredecessor() { + if (const RefInBB *Ref = tryGetRefInBB()) { +if (Ref->It != Ref->BB->begin()) + return MCInstReference(Ref->BB, &*std::prev(Ref->It)); + +if (Ref->BB->pred_size() != 1) + return std::nullopt; + +BinaryBasicBlock *PredBB = *Ref->BB->pred_begin(); +assert(!PredBB->empty() && "Empty basic blocks are not supported yet"); +return MCInstReference(PredBB, &*PredBB->rbegin()); + } + + const RefInBF &Ref = getRefInBF(); + if (Ref.It == Ref.BF->instrs().begin()) +return std::nullopt; + + return MCInstReference(Ref.BF, std::prev(Ref.It)); +} diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index b9fdf6d655d4c..d218ff622b75c 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -1363,6 +1363,11 @@ shouldReportUnsafeTailCall(const BinaryContext &BC, const BinaryFunction &BF, return std::nullopt; } + if (BC.MIB->isSafeJumpTableBranchForPtrAuth(Inst)) { +
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: optionally assume auth traps on failure (PR #139778)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/139778 >From 6ba3d538efb8a31e38ffe96c7b4229d27ace93ae Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 13 May 2025 19:50:41 +0300 Subject: [PATCH] [BOLT] Gadget scanner: optionally assume auth traps on failure On AArch64 it is possible for an auth instruction to either return an invalid address value on failure (without FEAT_FPAC) or generate an error (with FEAT_FPAC). It thus may be possible to never emit explicit pointer checks, if the target CPU is known to support FEAT_FPAC. This commit implements an --auth-traps-on-failure command line option, which essentially makes "safe-to-dereference" and "trusted" register properties identical and disables scanning for authentication oracles completely. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 112 +++ .../binary-analysis/AArch64/cmdline-args.test | 1 + .../AArch64/gs-pauth-authentication-oracles.s | 6 +- .../binary-analysis/AArch64/gs-pauth-calls.s | 5 +- .../AArch64/gs-pauth-debug-output.s | 177 ++--- .../AArch64/gs-pauth-jump-table.s | 6 +- .../AArch64/gs-pauth-signing-oracles.s| 54 ++--- .../AArch64/gs-pauth-tail-calls.s | 184 +- 8 files changed, 318 insertions(+), 227 deletions(-) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index d218ff622b75c..b540a8d0b7ee7 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -14,6 +14,7 @@ #include "bolt/Passes/PAuthGadgetScanner.h" #include "bolt/Core/ParallelUtilities.h" #include "bolt/Passes/DataflowAnalysis.h" +#include "bolt/Utils/CommandLineOpts.h" #include "llvm/ADT/STLExtras.h" #include "llvm/ADT/SmallSet.h" #include "llvm/MC/MCInst.h" @@ -26,6 +27,11 @@ namespace llvm { namespace bolt { namespace PAuthGadgetScanner { +static cl::opt AuthTrapsOnFailure( +"auth-traps-on-failure", +cl::desc("Assume authentication instructions always trap on failure"), +cl::cat(opts::BinaryAnalysisCategory)); + [[maybe_unused]] static void traceInst(const BinaryContext &BC, StringRef Label, const MCInst &MI) { dbgs() << " " << Label << ": "; @@ -364,6 +370,34 @@ class SrcSafetyAnalysis { return Clobbered; } + std::optional getRegMadeTrustedByChecking(const MCInst &Inst, + SrcState Cur) const { +// This functions cannot return multiple registers. This is never the case +// on AArch64. +std::optional RegCheckedByInst = +BC.MIB->getAuthCheckedReg(Inst, /*MayOverwrite=*/false); +if (RegCheckedByInst && Cur.SafeToDerefRegs[*RegCheckedByInst]) + return *RegCheckedByInst; + +auto It = CheckerSequenceInfo.find(&Inst); +if (It == CheckerSequenceInfo.end()) + return std::nullopt; + +MCPhysReg RegCheckedBySequence = It->second.first; +const MCInst *FirstCheckerInst = It->second.second; + +// FirstCheckerInst should belong to the same basic block (see the +// assertion in DataflowSrcSafetyAnalysis::run()), meaning it was +// deterministically processed a few steps before this instruction. +const SrcState &StateBeforeChecker = getStateBefore(*FirstCheckerInst); + +// The sequence checks the register, but it should be authenticated before. +if (!StateBeforeChecker.SafeToDerefRegs[RegCheckedBySequence]) + return std::nullopt; + +return RegCheckedBySequence; + } + // Returns all registers that can be treated as if they are written by an // authentication instruction. SmallVector getRegsMadeSafeToDeref(const MCInst &Point, @@ -386,18 +420,38 @@ class SrcSafetyAnalysis { Regs.push_back(DstAndSrc->first); } +// Make sure explicit checker sequence keeps register safe-to-dereference +// when the register would be clobbered according to the regular rules: +// +//; LR is safe to dereference here +//mov x16, x30 ; start of the sequence, LR is s-t-d right before +//xpaclri ; clobbers LR, LR is not safe anymore +//cmp x30, x16 +//b.eq 1f; end of the sequence: LR is marked as trusted +//brk 0x1234 +// 1: +//; at this point LR would be marked as trusted, +//; but not safe-to-dereference +// +// or even just +// +//; X1 is safe to dereference here +//ldr x0, [x1, #8]! +//; X1 is trusted here, but it was clobbered due to address write-back +if (auto CheckedReg = getRegMadeTrustedByChecking(Point, Cur)) + Regs.push_back(*CheckedReg); + return Regs; } // Returns all registers made trusted by this instruction. SmallVector getRegsMadeTrusted(const MCInst &Point, const SrcState &Cur) const { +assert(!AuthTrapsOnFailure &&
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/137224 >From 27d75c4248864d12381aac765674106f573de923 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 22 Apr 2025 21:43:14 +0300 Subject: [PATCH] [BOLT] Gadget scanner: detect untrusted LR before tail call Implement the detection of tail calls performed with untrusted link register, which violates the assumption made on entry to every function. Unlike other pauth gadgets, this one involves some amount of guessing which branch instructions should be checked as tail calls. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 80 +++ .../AArch64/gs-pauth-tail-calls.s | 597 ++ 2 files changed, 677 insertions(+) create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-tail-calls.s diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 2eadaf15d3a65..0a3948e2e278e 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -1319,6 +1319,83 @@ shouldReportReturnGadget(const BinaryContext &BC, const MCInstReference &Inst, return make_gadget_report(RetKind, Inst, *RetReg); } +/// While BOLT already marks some of the branch instructions as tail calls, +/// this function tries to improve the coverage by including less obvious cases +/// when it is possible to do without introducing too many false positives. +static bool shouldAnalyzeTailCallInst(const BinaryContext &BC, + const BinaryFunction &BF, + const MCInstReference &Inst) { + // Some BC.MIB->isXYZ(Inst) methods simply delegate to MCInstrDesc::isXYZ() + // (such as isBranch at the time of writing this comment), some don't (such + // as isCall). For that reason, call MCInstrDesc's methods explicitly when + // it is important. + const MCInstrDesc &Desc = + BC.MII->get(static_cast(Inst).getOpcode()); + // Tail call should be a branch (but not necessarily an indirect one). + if (!Desc.isBranch()) +return false; + + // Always analyze the branches already marked as tail calls by BOLT. + if (BC.MIB->isTailCall(Inst)) +return true; + + // Try to also check the branches marked as "UNKNOWN CONTROL FLOW" - the + // below is a simplified condition from BinaryContext::printInstruction. + bool IsUnknownControlFlow = + BC.MIB->isIndirectBranch(Inst) && !BC.MIB->getJumpTable(Inst); + + if (BF.hasCFG() && IsUnknownControlFlow) +return true; + + return false; +} + +static std::optional> +shouldReportUnsafeTailCall(const BinaryContext &BC, const BinaryFunction &BF, + const MCInstReference &Inst, const SrcState &S) { + static const GadgetKind UntrustedLRKind( + "untrusted link register found before tail call"); + + if (!shouldAnalyzeTailCallInst(BC, BF, Inst)) +return std::nullopt; + + // Not only the set of registers returned by getTrustedLiveInRegs() can be + // seen as a reasonable target-independent _approximation_ of "the LR", these + // are *exactly* those registers used by SrcSafetyAnalysis to initialize the + // set of trusted registers on function entry. + // Thus, this function basically checks that the precondition expected to be + // imposed by a function call instruction (which is hardcoded into the target- + // specific getTrustedLiveInRegs() function) is also respected on tail calls. + SmallVector RegsToCheck = BC.MIB->getTrustedLiveInRegs(); + LLVM_DEBUG({ +traceInst(BC, "Found tail call inst", Inst); +traceRegMask(BC, "Trusted regs", S.TrustedRegs); + }); + + // In musl on AArch64, the _start function sets LR to zero and calls the next + // stage initialization function at the end, something along these lines: + // + // _start: + // mov x30, #0 + // ; ... other initialization ... + // b _start_c ; performs "exit" system call at some point + // + // As this would produce a false positive for every executable linked with + // such libc, ignore tail calls performed by ELF entry function. + if (BC.StartFunctionAddress && + *BC.StartFunctionAddress == Inst.getFunction()->getAddress()) { +LLVM_DEBUG({ dbgs() << " Skipping tail call in ELF entry function.\n"; }); +return std::nullopt; + } + + // Returns at most one report per instruction - this is probably OK... + for (auto Reg : RegsToCheck) +if (!S.TrustedRegs[Reg]) + return make_gadget_report(UntrustedLRKind, Inst, Reg); + + return std::nullopt; +} + static std::optional> shouldReportCallGadget(const BinaryContext &BC, const MCInstReference &Inst, const SrcState &S) { @@ -1478,6 +1555,9 @@ void FunctionAnalysisContext::findUnsafeUses( if (PacRetGadgetsOnly) return; +if (auto Report = shouldReportUnsafeTailCall(BC, BF, Inst, S)) + Reports.push_back(*Report); + if (auto Report = shouldReportCallGadget(BC, Inst, S))
[llvm-branch-commits] [clang-tools-extra] [clang-doc] refactor BitcodeReader::readSubBlock (PR #145835)
https://github.com/evelez7 created https://github.com/llvm/llvm-project/pull/145835 None >From c15962ea2a37f3d9896fa77914aee67ccf3b5d3f Mon Sep 17 00:00:00 2001 From: Erick Velez Date: Tue, 24 Jun 2025 21:08:36 -0700 Subject: [PATCH] [clang-doc] refactor BitcodeReader::readSubBlock --- clang-tools-extra/clang-doc/BitcodeReader.cpp | 129 -- clang-tools-extra/clang-doc/BitcodeReader.h | 7 + 2 files changed, 61 insertions(+), 75 deletions(-) diff --git a/clang-tools-extra/clang-doc/BitcodeReader.cpp b/clang-tools-extra/clang-doc/BitcodeReader.cpp index cbdd5d245b8de..437599b879748 100644 --- a/clang-tools-extra/clang-doc/BitcodeReader.cpp +++ b/clang-tools-extra/clang-doc/BitcodeReader.cpp @@ -800,11 +800,37 @@ llvm::Error ClangDocBitcodeReader::readBlock(unsigned ID, T I) { } } -// TODO: Create a helper that can receive a function to reduce repetition for -// most blocks. +template +llvm::Error ClangDocBitcodeReader::handleSubBlock(unsigned ID, T Parent, + Callback Function) { + InfoType Info; + if (auto Err = readBlock(ID, &Info)) +return Err; + Function(Parent, std::move(Info)); + return llvm::Error::success(); +} + +template +llvm::Error ClangDocBitcodeReader::handleTypeSubBlock(unsigned ID, T Parent, + Callback Function) { + InfoType Info; + if (auto Err = readBlock(ID, &Info)) +return Err; + if (auto Err = Function(Parent, std::move(Info))) +return Err; + return llvm::Error::success(); +} + template llvm::Error ClangDocBitcodeReader::readSubBlock(unsigned ID, T I) { llvm::TimeTraceScope("Reducing infos", "readSubBlock"); + + static auto CreateAddFunc = [](auto AddFunc) { +return [AddFunc](auto Parent, auto Child) { + return AddFunc(Parent, std::move(Child)); +}; + }; + switch (ID) { // Blocks can only have certain types of sub blocks. case BI_COMMENT_BLOCK_ID: { @@ -816,28 +842,16 @@ llvm::Error ClangDocBitcodeReader::readSubBlock(unsigned ID, T I) { return llvm::Error::success(); } case BI_TYPE_BLOCK_ID: { -TypeInfo TI; -if (auto Err = readBlock(ID, &TI)) - return Err; -if (auto Err = addTypeInfo(I, std::move(TI))) - return Err; -return llvm::Error::success(); +return handleTypeSubBlock( +ID, I, CreateAddFunc(addTypeInfo)); } case BI_FIELD_TYPE_BLOCK_ID: { -FieldTypeInfo TI; -if (auto Err = readBlock(ID, &TI)) - return Err; -if (auto Err = addTypeInfo(I, std::move(TI))) - return Err; -return llvm::Error::success(); +return handleTypeSubBlock( +ID, I, CreateAddFunc(addTypeInfo)); } case BI_MEMBER_TYPE_BLOCK_ID: { -MemberTypeInfo TI; -if (auto Err = readBlock(ID, &TI)) - return Err; -if (auto Err = addTypeInfo(I, std::move(TI))) - return Err; -return llvm::Error::success(); +return handleTypeSubBlock( +ID, I, CreateAddFunc(addTypeInfo)); } case BI_REFERENCE_BLOCK_ID: { Reference R; @@ -848,81 +862,46 @@ llvm::Error ClangDocBitcodeReader::readSubBlock(unsigned ID, T I) { return llvm::Error::success(); } case BI_FUNCTION_BLOCK_ID: { -FunctionInfo F; -if (auto Err = readBlock(ID, &F)) - return Err; -addChild(I, std::move(F)); -return llvm::Error::success(); +return handleSubBlock( +ID, I, CreateAddFunc(addChild)); } case BI_BASE_RECORD_BLOCK_ID: { -BaseRecordInfo BR; -if (auto Err = readBlock(ID, &BR)) - return Err; -addChild(I, std::move(BR)); -return llvm::Error::success(); +return handleSubBlock( +ID, I, CreateAddFunc(addChild)); } case BI_ENUM_BLOCK_ID: { -EnumInfo E; -if (auto Err = readBlock(ID, &E)) - return Err; -addChild(I, std::move(E)); -return llvm::Error::success(); +return handleSubBlock(ID, I, +CreateAddFunc(addChild)); } case BI_ENUM_VALUE_BLOCK_ID: { -EnumValueInfo EV; -if (auto Err = readBlock(ID, &EV)) - return Err; -addChild(I, std::move(EV)); -return llvm::Error::success(); +return handleSubBlock( +ID, I, CreateAddFunc(addChild)); } case BI_TEMPLATE_BLOCK_ID: { -TemplateInfo TI; -if (auto Err = readBlock(ID, &TI)) - return Err; -addTemplate(I, std::move(TI)); -return llvm::Error::success(); +return handleSubBlock(ID, I, CreateAddFunc(addTemplate)); } case BI_TEMPLATE_SPECIALIZATION_BLOCK_ID: { -TemplateSpecializationInfo TSI; -if (auto Err = readBlock(ID, &TSI)) - return Err; -addTemplateSpecialization(I, std::move(TSI)); -return llvm::Error::success(); +return handleSubBlock( +ID, I, CreateAddFunc(addTemplateSpecialization)); } case BI_TEMPLATE_PARAM_BLOCK_ID: { -TemplateParamInfo TPI; -if (auto Err = readBlock(ID, &TPI)) - return Err; -addTemplateParam(I, std::move(TPI));
[llvm-branch-commits] [clang-tools-extra] [clang-doc] refactor BitcodeReader::readSubBlock (PR #145835)
evelez7 wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/145835?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#145835** https://app.graphite.dev/github/pr/llvm/llvm-project/145835?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/145835?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#145595** https://app.graphite.dev/github/pr/llvm/llvm-project/145595?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/145835 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang-tools-extra] [clang-doc] refactor BitcodeReader::readSubBlock (PR #145835)
https://github.com/evelez7 edited https://github.com/llvm/llvm-project/pull/145835 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)
@@ -117,45 +117,72 @@ static LLT getReadAnyLaneSplitTy(LLT Ty) { return LLT::scalar(32); } -static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc, - const RegisterBankInfo &RBI); +using ReadLaneFnTy = +function_ref; + +static Register buildReadLane(MachineIRBuilder &, Register, + const RegisterBankInfo &, ReadLaneFnTy); static void unmergeReadAnyLane(MachineIRBuilder &B, SmallVectorImpl &SgprDstParts, LLT UnmergeTy, Register VgprSrc, - const RegisterBankInfo &RBI) { + const RegisterBankInfo &RBI, + ReadLaneFnTy BuildRL) { const RegisterBank *VgprRB = &RBI.getRegBank(AMDGPU::VGPRRegBankID); auto Unmerge = B.buildUnmerge({VgprRB, UnmergeTy}, VgprSrc); for (unsigned i = 0; i < Unmerge->getNumOperands() - 1; ++i) { -SgprDstParts.push_back(buildReadAnyLane(B, Unmerge.getReg(i), RBI)); +SgprDstParts.push_back(buildReadLane(B, Unmerge.getReg(i), RBI, BuildRL)); } } -static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc, - const RegisterBankInfo &RBI) { +static Register buildReadLane(MachineIRBuilder &B, Register VgprSrc, + const RegisterBankInfo &RBI, + ReadLaneFnTy BuildRL) { LLT Ty = B.getMRI()->getType(VgprSrc); const RegisterBank *SgprRB = &RBI.getRegBank(AMDGPU::SGPRRegBankID); if (Ty.getSizeInBits() == 32) { -return B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {{SgprRB, Ty}}, {VgprSrc}) -.getReg(0); +Register SgprDst = B.getMRI()->createVirtualRegister({SgprRB, Ty}); +return BuildRL(B, SgprDst, VgprSrc).getReg(0); } SmallVector SgprDstParts; - unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI); + unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI, + BuildRL); return B.buildMergeLikeInstr({SgprRB, Ty}, SgprDstParts).getReg(0); } -void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst, - Register VgprSrc, const RegisterBankInfo &RBI) { +static void buildReadLane(MachineIRBuilder &B, Register SgprDst, + Register VgprSrc, const RegisterBankInfo &RBI, + ReadLaneFnTy BuildReadLane) { arsenm wrote: Make the function a template argument? https://github.com/llvm/llvm-project/pull/142790 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)
@@ -205,7 +207,14 @@ class AMDGPURegBankLegalizeCombiner { bool tryEliminateReadAnyLane(MachineInstr &Copy) { Register Dst = Copy.getOperand(0).getReg(); Register Src = Copy.getOperand(1).getReg(); -if (!Src.isVirtual()) + +// Skip non-vgpr Dst +if (Dst.isVirtual() ? (MRI.getRegBankOrNull(Dst) != VgprRB) +: !TRI.isVGPR(MRI, Dst)) + return false; + +// Skip physical source registers and source registers with register class arsenm wrote: This shouldn't happen? https://github.com/llvm/llvm-project/pull/142790 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)
@@ -1,6 +1,6 @@ ; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py -; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -stop-after=regbankselect -regbankselect-fast -o - %s | FileCheck %s -; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -stop-after=regbankselect -regbankselect-greedy -o - %s | FileCheck %s +; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn-mesa-mesa3d -stop-after=amdgpu-regbanklegalize -regbankselect-fast -o - %s | FileCheck %s arsenm wrote: I guess, how much longer until that happens? https://github.com/llvm/llvm-project/pull/142790 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [pgo] add means to specify "unknown" MD_prof (PR #145578)
https://github.com/mtrofin updated https://github.com/llvm/llvm-project/pull/145578 >From 1b048fff9f025e595f6c202a569cd1248010d9b8 Mon Sep 17 00:00:00 2001 From: Mircea Trofin Date: Tue, 24 Jun 2025 09:50:40 -0700 Subject: [PATCH] [pgo] add means to specify "unknown" MD_prof --- llvm/include/llvm/IR/ProfDataUtils.h | 12 +++ llvm/lib/IR/ProfDataUtils.cpp| 22 + llvm/lib/IR/Verifier.cpp | 46 +++--- llvm/test/Verifier/branch-weight.ll | 128 +-- 4 files changed, 186 insertions(+), 22 deletions(-) diff --git a/llvm/include/llvm/IR/ProfDataUtils.h b/llvm/include/llvm/IR/ProfDataUtils.h index 8e8d069b836f1..89fa7f735f5d4 100644 --- a/llvm/include/llvm/IR/ProfDataUtils.h +++ b/llvm/include/llvm/IR/ProfDataUtils.h @@ -133,6 +133,18 @@ LLVM_ABI bool extractProfTotalWeight(const Instruction &I, LLVM_ABI void setBranchWeights(Instruction &I, ArrayRef Weights, bool IsExpected); +/// Specify that the branch weights for this terminator cannot be known at +/// compile time. This should only be called by passes, and never as a default +/// behavior in e.g. MDBuilder. The goal is to use this info to validate passes +/// do not accidentally drop profile info, and this API is called in cases where +/// the pass explicitly cannot provide that info. Defaulting it in would hide +/// bugs where the pass forgets to transfer over or otherwise specify profile +/// info. +LLVM_ABI void setExplicitlyUnknownBranchWeights(Instruction &I); + +LLVM_ABI bool isExplicitlyUnknownBranchWeightsMetadata(const MDNode &MD); +LLVM_ABI bool hasExplicitlyUnknownBranchWeights(const Instruction &I); + /// Scaling the profile data attached to 'I' using the ratio of S/T. LLVM_ABI void scaleProfData(Instruction &I, uint64_t S, uint64_t T); diff --git a/llvm/lib/IR/ProfDataUtils.cpp b/llvm/lib/IR/ProfDataUtils.cpp index 21524eb840539..1585771c0d0ae 100644 --- a/llvm/lib/IR/ProfDataUtils.cpp +++ b/llvm/lib/IR/ProfDataUtils.cpp @@ -44,6 +44,8 @@ constexpr unsigned MinBWOps = 3; // the minimum number of operands for MD_prof nodes with value profiles constexpr unsigned MinVPOps = 5; +const char *UnknownBranchWeightsMarker = "unknown"; + // We may want to add support for other MD_prof types, so provide an abstraction // for checking the metadata type. bool isTargetMD(const MDNode *ProfData, const char *Name, unsigned MinOps) { @@ -232,6 +234,26 @@ bool extractProfTotalWeight(const Instruction &I, uint64_t &TotalVal) { return extractProfTotalWeight(I.getMetadata(LLVMContext::MD_prof), TotalVal); } +void setExplicitlyUnknownBranchWeights(Instruction &I) { + MDBuilder MDB(I.getContext()); + I.setMetadata(LLVMContext::MD_prof, +MDNode::get(I.getContext(), +MDB.createString(UnknownBranchWeightsMarker))); +} + +bool isExplicitlyUnknownBranchWeightsMetadata(const MDNode &MD) { + if (MD.getNumOperands() != 1) +return false; + return MD.getOperand(0).equalsStr(UnknownBranchWeightsMarker); +} + +bool hasExplicitlyUnknownBranchWeights(const Instruction &I) { + auto *MD = I.getMetadata(LLVMContext::MD_prof); + if (!MD) +return false; + return isExplicitlyUnknownBranchWeightsMetadata(*MD); +} + void setBranchWeights(Instruction &I, ArrayRef Weights, bool IsExpected) { MDBuilder MDB(I.getContext()); diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp index ae95e3e2bff8d..7045e49e72b42 100644 --- a/llvm/lib/IR/Verifier.cpp +++ b/llvm/lib/IR/Verifier.cpp @@ -2508,6 +2508,12 @@ void Verifier::verifyFunctionMetadata( for (const auto &Pair : MDs) { if (Pair.first == LLVMContext::MD_prof) { MDNode *MD = Pair.second; + if (isExplicitlyUnknownBranchWeightsMetadata(*MD)) { +CheckFailed("'unknown' !prof metadata should appear only on " +"instructions supporting the 'branch_weights' metadata", +MD); +continue; + } Check(MD->getNumOperands() >= 2, "!prof annotations should have no less than 2 operands", MD); @@ -4964,6 +4970,30 @@ void Verifier::visitDereferenceableMetadata(Instruction& I, MDNode* MD) { } void Verifier::visitProfMetadata(Instruction &I, MDNode *MD) { + auto GetBranchingTerminatorNumOperands = [&]() { +unsigned ExpectedNumOperands = 0; +if (BranchInst *BI = dyn_cast(&I)) + ExpectedNumOperands = BI->getNumSuccessors(); +else if (SwitchInst *SI = dyn_cast(&I)) + ExpectedNumOperands = SI->getNumSuccessors(); +else if (isa(&I)) + ExpectedNumOperands = 1; +else if (IndirectBrInst *IBI = dyn_cast(&I)) + ExpectedNumOperands = IBI->getNumDestinations(); +else if (isa(&I)) + ExpectedNumOperands = 2; +else if (CallBrInst *CI = dyn_cast(&I)) + ExpectedNumOperands = CI->getNumSuccessors(); +return ExpectedNumOperands; + }; + if (isExplicitlyUnknownBranchWeightsMetadata(*MD)) { +Check(GetBran
[llvm-branch-commits] [llvm] [mlir] [llvm-objdump] Support --symbolize-operand on AArch64 (PR #145009)
https://github.com/aengelke updated https://github.com/llvm/llvm-project/pull/145009 >From 87858653bb4c9e3911479f139ca0f1b093e94280 Mon Sep 17 00:00:00 2001 From: Matthias Springer Date: Fri, 20 Jun 2025 10:18:23 + Subject: [PATCH 1/6] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20ch?= =?UTF-8?q?anges=20to=20main=20this=20commit=20is=20based=20on?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Created using spr 1.3.5-bogner [skip ci] --- mlir/include/mlir/Dialect/Arith/IR/ArithOps.td | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mlir/include/mlir/Dialect/Arith/IR/ArithOps.td b/mlir/include/mlir/Dialect/Arith/IR/ArithOps.td index 993f36f556e87..0518cac156eba 100644 --- a/mlir/include/mlir/Dialect/Arith/IR/ArithOps.td +++ b/mlir/include/mlir/Dialect/Arith/IR/ArithOps.td @@ -1271,7 +1271,7 @@ def Arith_ScalingExtFOp // TruncIOp //===--===// -def Arith_TruncIOp : Op, DeclareOpInterfaceMethods, >From db5463b1af5c1c425866979dcf85ee5919c8a75d Mon Sep 17 00:00:00 2001 From: Alexis Engelke Date: Mon, 23 Jun 2025 08:50:34 + Subject: [PATCH 2/6] address comments + add reloctable test Created using spr 1.3.5-bogner --- ...=> elf-executable-symbolize-operands.yaml} | 31 +++- .../elf-relocatable-symbolize-operands.s | 77 +++ 2 files changed, 105 insertions(+), 3 deletions(-) rename llvm/test/tools/llvm-objdump/AArch64/{elf-disassemble-symbololize-operands.yaml => elf-executable-symbolize-operands.yaml} (64%) create mode 100644 llvm/test/tools/llvm-objdump/AArch64/elf-relocatable-symbolize-operands.s diff --git a/llvm/test/tools/llvm-objdump/AArch64/elf-disassemble-symbololize-operands.yaml b/llvm/test/tools/llvm-objdump/AArch64/elf-executable-symbolize-operands.yaml similarity index 64% rename from llvm/test/tools/llvm-objdump/AArch64/elf-disassemble-symbololize-operands.yaml rename to llvm/test/tools/llvm-objdump/AArch64/elf-executable-symbolize-operands.yaml index 3f3c6f33e620f..d318ea01b4c30 100644 --- a/llvm/test/tools/llvm-objdump/AArch64/elf-disassemble-symbololize-operands.yaml +++ b/llvm/test/tools/llvm-objdump/AArch64/elf-executable-symbolize-operands.yaml @@ -1,14 +1,14 @@ # RUN: yaml2obj %s -o %t # RUN: llvm-objdump %t -d --symbolize-operands --no-show-raw-insn --no-leading-addr | \ -# RUN: FileCheck %s --match-full-lines +# RUN: FileCheck %s --match-full-lines -DABS_ADRP_VAL=0x6000 # RUN: llvm-objdump %t -d --symbolize-operands --no-show-raw-insn --no-leading-addr --adjust-vma=0x2000 | \ -# RUN: FileCheck %s --match-full-lines +# RUN: FileCheck %s --match-full-lines -DABS_ADRP_VAL=0x8000 ## Expect to find the branch labels and global variable name. # CHECK: <_start>: # CHECK-NEXT: ldr x0, # CHECK-NEXT: : -# CHECK-NEXT: adrp x1, 0x{{[68]}}000 +# CHECK-NEXT: adrp x1, [[ABS_ADRP_VAL]] # CHECK-NEXT: adr x2, # CHECK-NEXT: cmp x1, x2 # CHECK-NEXT: b.eq @@ -17,6 +17,31 @@ # CHECK-NEXT: cbz x2, # CHECK-NEXT: ret +## Machine code generated with: +# llvm-mc --arch=aarch64 --filetype=obj -o tmp.o <: +# CHECK-NEXT: b +# CHECK-NEXT: tbz x0, #0x2c, +# CHECK-NEXT: : +# CHECK-NEXT: b.eq +# CHECK-NEXT: : +# CHECK-NEXT: cbz x1, +# CHECK-NEXT: : +# CHECK-NEXT: nop +# CHECK-NEXT: : +# CHECK-NEXT: bl +# CHECK-NEXT: R_AARCH64_CALL26 fn2 +# CHECK-NEXT: bl +# CHECK-NEXT: adr x0, +# CHECK-NEXT: : +# CHECK-NEXT: adr x1, +# CHECK-NEXT: R_AARCH64_ADR_PREL_LO21 fn2 +# CHECK-NEXT: adr x2, +# CHECK-NEXT: ldr w0, +# CHECK-NEXT: : +# CHECK-NEXT: ldr w0, +# CHECK-NEXT: R_AARCH64_LD_PREL_LO19 fn2 +# CHECK-NEXT: ret +# CHECK-NEXT: nop +# CHECK-NEXT: nop +# CHECK-NEXT: nop +# CHECK-EMPTY: +# CHECK-NEXT: : +# CHECK-NEXT: bl +# CHECK-NEXT: adrp x3, 0x0 +# CHECK-NEXT: R_AARCH64_ADR_PREL_PG_HI21 fn2 +# CHECK-NEXT: add x3, x3, #0x0 +# CHECK-NEXT: R_AARCH64_ADD_ABS_LO12_NC fn2 +# CHECK-NEXT: adrp x3, 0x0 +# CHECK-NEXT: R_AARCH64_ADR_PREL_PG_HI21 fn2 +# CHECK-NEXT: ldr x0, [x3] +# CHECK-NEXT: R_AARCH64_LDST64_ABS_LO12_NC fn2 +# CHECK-NEXT: ret +# CHECK-NEXT: nop +# CHECK-NEXT: nop +# CHECK-NEXT: : +# CHECK-NEXT: ret + +.p2align 4 +.global fn1 +fn1: +b 0f +tbz x0, 44, 2f +0: b.eq 1f +1: cbz x1, 0b +2: nop +bl fn2 +bl .Lfn2 +adr x0, 2b +adr x1, fn2 +adr x2, .Lfn2 +ldr w0, 2b +ldr w0, fn2 +ret + +.p2align 4 +.global fn2 +fn2: +.Lfn2: # local label for non-interposable call +bl .Lfn3 +# In future, we might identify the pairs and symbolize the operands properly +adrp x3, fn2 +add x3, x3, :lo12:fn2 +adrp x3, fn2 +ldr x0, [x3, :lo12:fn2] +ret + +.p2align 4 +.Lfn3: # private function +ret >From 1abf014077dd0e7f5592651a51484a544cad1e49 Mon Sep 17 00:00:00 2001 From: Alexis Engelke Date: Mon, 23 Jun 2025 09:24:47 +00
[llvm-branch-commits] [llvm] [pgo] add means to specify "unknown" MD_prof (PR #145578)
https://github.com/mtrofin updated https://github.com/llvm/llvm-project/pull/145578 >From 1b048fff9f025e595f6c202a569cd1248010d9b8 Mon Sep 17 00:00:00 2001 From: Mircea Trofin Date: Tue, 24 Jun 2025 09:50:40 -0700 Subject: [PATCH] [pgo] add means to specify "unknown" MD_prof --- llvm/include/llvm/IR/ProfDataUtils.h | 12 +++ llvm/lib/IR/ProfDataUtils.cpp| 22 + llvm/lib/IR/Verifier.cpp | 46 +++--- llvm/test/Verifier/branch-weight.ll | 128 +-- 4 files changed, 186 insertions(+), 22 deletions(-) diff --git a/llvm/include/llvm/IR/ProfDataUtils.h b/llvm/include/llvm/IR/ProfDataUtils.h index 8e8d069b836f1..89fa7f735f5d4 100644 --- a/llvm/include/llvm/IR/ProfDataUtils.h +++ b/llvm/include/llvm/IR/ProfDataUtils.h @@ -133,6 +133,18 @@ LLVM_ABI bool extractProfTotalWeight(const Instruction &I, LLVM_ABI void setBranchWeights(Instruction &I, ArrayRef Weights, bool IsExpected); +/// Specify that the branch weights for this terminator cannot be known at +/// compile time. This should only be called by passes, and never as a default +/// behavior in e.g. MDBuilder. The goal is to use this info to validate passes +/// do not accidentally drop profile info, and this API is called in cases where +/// the pass explicitly cannot provide that info. Defaulting it in would hide +/// bugs where the pass forgets to transfer over or otherwise specify profile +/// info. +LLVM_ABI void setExplicitlyUnknownBranchWeights(Instruction &I); + +LLVM_ABI bool isExplicitlyUnknownBranchWeightsMetadata(const MDNode &MD); +LLVM_ABI bool hasExplicitlyUnknownBranchWeights(const Instruction &I); + /// Scaling the profile data attached to 'I' using the ratio of S/T. LLVM_ABI void scaleProfData(Instruction &I, uint64_t S, uint64_t T); diff --git a/llvm/lib/IR/ProfDataUtils.cpp b/llvm/lib/IR/ProfDataUtils.cpp index 21524eb840539..1585771c0d0ae 100644 --- a/llvm/lib/IR/ProfDataUtils.cpp +++ b/llvm/lib/IR/ProfDataUtils.cpp @@ -44,6 +44,8 @@ constexpr unsigned MinBWOps = 3; // the minimum number of operands for MD_prof nodes with value profiles constexpr unsigned MinVPOps = 5; +const char *UnknownBranchWeightsMarker = "unknown"; + // We may want to add support for other MD_prof types, so provide an abstraction // for checking the metadata type. bool isTargetMD(const MDNode *ProfData, const char *Name, unsigned MinOps) { @@ -232,6 +234,26 @@ bool extractProfTotalWeight(const Instruction &I, uint64_t &TotalVal) { return extractProfTotalWeight(I.getMetadata(LLVMContext::MD_prof), TotalVal); } +void setExplicitlyUnknownBranchWeights(Instruction &I) { + MDBuilder MDB(I.getContext()); + I.setMetadata(LLVMContext::MD_prof, +MDNode::get(I.getContext(), +MDB.createString(UnknownBranchWeightsMarker))); +} + +bool isExplicitlyUnknownBranchWeightsMetadata(const MDNode &MD) { + if (MD.getNumOperands() != 1) +return false; + return MD.getOperand(0).equalsStr(UnknownBranchWeightsMarker); +} + +bool hasExplicitlyUnknownBranchWeights(const Instruction &I) { + auto *MD = I.getMetadata(LLVMContext::MD_prof); + if (!MD) +return false; + return isExplicitlyUnknownBranchWeightsMetadata(*MD); +} + void setBranchWeights(Instruction &I, ArrayRef Weights, bool IsExpected) { MDBuilder MDB(I.getContext()); diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp index ae95e3e2bff8d..7045e49e72b42 100644 --- a/llvm/lib/IR/Verifier.cpp +++ b/llvm/lib/IR/Verifier.cpp @@ -2508,6 +2508,12 @@ void Verifier::verifyFunctionMetadata( for (const auto &Pair : MDs) { if (Pair.first == LLVMContext::MD_prof) { MDNode *MD = Pair.second; + if (isExplicitlyUnknownBranchWeightsMetadata(*MD)) { +CheckFailed("'unknown' !prof metadata should appear only on " +"instructions supporting the 'branch_weights' metadata", +MD); +continue; + } Check(MD->getNumOperands() >= 2, "!prof annotations should have no less than 2 operands", MD); @@ -4964,6 +4970,30 @@ void Verifier::visitDereferenceableMetadata(Instruction& I, MDNode* MD) { } void Verifier::visitProfMetadata(Instruction &I, MDNode *MD) { + auto GetBranchingTerminatorNumOperands = [&]() { +unsigned ExpectedNumOperands = 0; +if (BranchInst *BI = dyn_cast(&I)) + ExpectedNumOperands = BI->getNumSuccessors(); +else if (SwitchInst *SI = dyn_cast(&I)) + ExpectedNumOperands = SI->getNumSuccessors(); +else if (isa(&I)) + ExpectedNumOperands = 1; +else if (IndirectBrInst *IBI = dyn_cast(&I)) + ExpectedNumOperands = IBI->getNumDestinations(); +else if (isa(&I)) + ExpectedNumOperands = 2; +else if (CallBrInst *CI = dyn_cast(&I)) + ExpectedNumOperands = CI->getNumSuccessors(); +return ExpectedNumOperands; + }; + if (isExplicitlyUnknownBranchWeightsMetadata(*MD)) { +Check(GetBran
[llvm-branch-commits] [mlir] [MLIR][AArch64] Add integration test for lowering of `vector.contract` to Neon FEAT_I8MM (PR #144699)
https://github.com/momchil-velikov updated https://github.com/llvm/llvm-project/pull/144699 >From 3240b5d582f6fe5d430f792f0f7d208d80bf7f48 Mon Sep 17 00:00:00 2001 From: Momchil Velikov Date: Wed, 18 Jun 2025 13:04:45 + Subject: [PATCH 1/3] [MLIR][AArch64] Add integration test for lowering of `vector.contract` to Nein FEAT_I8MM --- .../CPU/ArmNeon/vector-contract-i8mm.mlir | 336 ++ 1 file changed, 336 insertions(+) create mode 100644 mlir/test/Integration/Dialect/Vector/CPU/ArmNeon/vector-contract-i8mm.mlir diff --git a/mlir/test/Integration/Dialect/Vector/CPU/ArmNeon/vector-contract-i8mm.mlir b/mlir/test/Integration/Dialect/Vector/CPU/ArmNeon/vector-contract-i8mm.mlir new file mode 100644 index 0..337429e0a9688 --- /dev/null +++ b/mlir/test/Integration/Dialect/Vector/CPU/ArmNeon/vector-contract-i8mm.mlir @@ -0,0 +1,336 @@ +// REQUIRES: arm-emulator + +// DEFINE: %{compile} = mlir-opt %s \ +// DEFINE: --convert-vector-to-scf --convert-scf-to-cf --convert-vector-to-llvm='enable-arm-neon enable-arm-i8mm' \ +// DEFINE: --expand-strided-metadata --convert-to-llvm --finalize-memref-to-llvm \ +// DEFINE: --lower-affine --convert-arith-to-llvm --reconcile-unrealized-casts \ +// DEFINE: -o %t + +// DEFINE: %{entry_point} = main + +// DEFINE: %{run} = %mcr_aarch64_cmd %t -e %{entry_point} -entry-point-result=void --march=aarch64 --mattr="+neon,+i8mm" \ +// DEFINE: -shared-libs=%mlir_runner_utils,%mlir_c_runner_utils,%native_mlir_arm_runner_utils + +// RUN: rm -f %t && %{compile} && FileCheck %s --input-file=%t -check-prefix CHECK-IR && %{run} | FileCheck %s + +#packed_maps = [ + affine_map<(m, n, k) -> (m, k)>, + affine_map<(m, n, k) -> (n, k)>, + affine_map<(m, n, k) -> (m, n)> +] + +// +// Test the lowering of `vector.contract` using the `LowerContractionToSMMLAPattern` +// +// The operation that the `vector.contract` in this test performs is matrix +// multiplication with accumulate +// OUT = ACC + LHS * RHS +// of two 8-bit integer matrices LHS and RHS, and a 32-bit integer matrix ACC +// into a 32-bit integer matrix OUT. The LHS and RHS can be sign- or zero- extended, +// this test covers all the possible variants. +// +// Tested are calculations as well as that the relevant `ArmNeon` dialect +// operations ('arm_neon.smmla`, arm_neon.ummla`, etc) are emitted. +// +// That pattern above handles (therefore this test prepares) input/output vectors with +// specific shapes: +// * LHS: vector +// * RHS: vector +// * ACC, OUT: vector +// where the M and N are even and K is divisible by 8. +// Note that the RHS is transposed. +// This data layout makes it efficient to load data into SIMD +// registers in the layout expected by FEAT_I8MM instructions. +// Such a `vector.contract` is representative of the code we aim to generate +// by vectorisation of `linalg.mmt4d`. +// +// In this specific test we use M == 4, N == 4, and K == 8. +// + +// Test the operation where both LHS and RHS are interpreted as signed, hence +// we ultimately emit and execute the `smmla` instruction. + +// CHECK-IR-LABEL: llvm.func @test_smmla +// CHECK-IR-COUNT-4: arm_neon.intr.smmla +func.func @test_smmla() { + + %c0 = arith.constant 0 : index + %c0_i32 = arith.constant 0 : i32 + %c0_i8 = arith.constant 0 : i8 + + // Accumulator test data + %acc_cst = arith.constant dense<[[ -1, -9, -4, 0], + [ 6, 5, 7, 2], + [ -8, -7, 9, -10], + [ 9, 4, -4, 0]]> : vector<4x4xi32> + + %acc_mem = memref.alloca() : memref<4x4xi32> + vector.transfer_write %acc_cst, %acc_mem[%c0, %c0] : vector<4x4xi32>, memref<4x4xi32> + %acc = vector.transfer_read %acc_mem[%c0, %c0], %c0_i32 {in_bounds = [true, true]} : memref<4x4xi32>, vector<4x4xi32> + + // LHS test data + %lhs_cst = arith.constant dense<[[ -4, -4, -4, -6, 0, 1, 6, 2, -1, 4, 5, -8, 9, 5, 4, 9], + [ -1, 6, 0, 7, -7, 8, 5, 8, -7, 6, -2, 1, 1, 5, -4, -4], + [ 4, -10, 10, -3, 5, 3, 2, 3, -7, 9, -9, -10, 7, -8, -5, -2], + [ 9, 5, 8, 9, 6, -3, -9, 7, -4, -7, -2, 7, -8, 2, 8, 7]]> : vector<4x16xi8> + + %lhs_mem = memref.alloca() : memref<4x16xi8> + vector.transfer_write %lhs_cst, %lhs_mem[%c0, %c0] : vector<4x16xi8>, memref<4x16xi8> + %lhs = vector.transfer_read %lhs_mem[%c0, %c0], %c0_i8 {in_bounds = [true, true]} : memref<4x16xi8>, vector<4x16xi8> + + // RHS test data + %rhs_cst = arith.constant dense<[[ 1, 2, -3, 5, 10, 8, 10, -2, 1, 10, -5, 2, 4, 3, -9, 4], + [ -3, -3, -3, 4, 6, -1, 0, -5, 6, 3, -1, 9, -3, 3, -2, 4], + [ 1, 9, -1, 1, -5, 4, 9, -10, -1, -7,
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)
@@ -57,6 +57,226 @@ void RegBankLegalizeHelper::findRuleAndApplyMapping(MachineInstr &MI) { lower(MI, Mapping, WaterfallSgprs); } +bool RegBankLegalizeHelper::executeInWaterfallLoop( +MachineIRBuilder &B, iterator_range Range, +SmallSet &SGPROperandRegs) { + // Track use registers which have already been expanded with a readfirstlane + // sequence. This may have multiple uses if moving a sequence. + DenseMap WaterfalledRegMap; + + MachineBasicBlock &MBB = B.getMBB(); + MachineFunction &MF = B.getMF(); + + const SIRegisterInfo *TRI = ST.getRegisterInfo(); + const TargetRegisterClass *WaveRC = TRI->getWaveMaskRegClass(); + unsigned MovExecOpc, MovExecTermOpc, XorTermOpc, AndSaveExecOpc, ExecReg; + if (ST.isWave32()) { petar-avramovic wrote: Yes, changed to field https://github.com/llvm/llvm-project/pull/142790 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/142790 >From d5bdc951f61533379fed9a86ed6c0eab18b7893c Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Thu, 5 Jun 2025 12:43:04 +0200 Subject: [PATCH] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize Add rules for G_AMDGPU_BUFFER_LOAD and implement waterfall lowering for divergent operands that must be sgpr. --- .../Target/AMDGPU/AMDGPUGlobalISelUtils.cpp | 53 +++- .../lib/Target/AMDGPU/AMDGPUGlobalISelUtils.h | 2 + .../Target/AMDGPU/AMDGPURegBankLegalize.cpp | 17 +- .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 239 +- .../AMDGPU/AMDGPURegBankLegalizeHelper.h | 1 + .../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 22 +- .../AMDGPU/AMDGPURegBankLegalizeRules.h | 6 +- .../AMDGPU/GlobalISel/buffer-schedule.ll | 2 +- .../llvm.amdgcn.make.buffer.rsrc.ll | 2 +- .../regbankselect-amdgcn.raw.buffer.load.ll | 59 ++--- ...egbankselect-amdgcn.raw.ptr.buffer.load.ll | 59 ++--- ...regbankselect-amdgcn.struct.buffer.load.ll | 59 ++--- ...ankselect-amdgcn.struct.ptr.buffer.load.ll | 59 ++--- .../llvm.amdgcn.buffer.load-last-use.ll | 2 +- .../llvm.amdgcn.raw.atomic.buffer.load.ll | 42 +-- .../llvm.amdgcn.raw.ptr.atomic.buffer.load.ll | 42 +-- .../llvm.amdgcn.struct.atomic.buffer.load.ll | 48 ++-- ...vm.amdgcn.struct.ptr.atomic.buffer.load.ll | 48 ++-- .../CodeGen/AMDGPU/swizzle.bit.extract.ll | 4 +- 19 files changed, 523 insertions(+), 243 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp index 00979f44f9d34..d8be3aee1f410 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp @@ -117,45 +117,72 @@ static LLT getReadAnyLaneSplitTy(LLT Ty) { return LLT::scalar(32); } -static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc, - const RegisterBankInfo &RBI); +using ReadLaneFnTy = +function_ref; + +static Register buildReadLane(MachineIRBuilder &, Register, + const RegisterBankInfo &, ReadLaneFnTy); static void unmergeReadAnyLane(MachineIRBuilder &B, SmallVectorImpl &SgprDstParts, LLT UnmergeTy, Register VgprSrc, - const RegisterBankInfo &RBI) { + const RegisterBankInfo &RBI, + ReadLaneFnTy BuildRL) { const RegisterBank *VgprRB = &RBI.getRegBank(AMDGPU::VGPRRegBankID); auto Unmerge = B.buildUnmerge({VgprRB, UnmergeTy}, VgprSrc); for (unsigned i = 0; i < Unmerge->getNumOperands() - 1; ++i) { -SgprDstParts.push_back(buildReadAnyLane(B, Unmerge.getReg(i), RBI)); +SgprDstParts.push_back(buildReadLane(B, Unmerge.getReg(i), RBI, BuildRL)); } } -static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc, - const RegisterBankInfo &RBI) { +static Register buildReadLane(MachineIRBuilder &B, Register VgprSrc, + const RegisterBankInfo &RBI, + ReadLaneFnTy BuildRL) { LLT Ty = B.getMRI()->getType(VgprSrc); const RegisterBank *SgprRB = &RBI.getRegBank(AMDGPU::SGPRRegBankID); if (Ty.getSizeInBits() == 32) { -return B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {{SgprRB, Ty}}, {VgprSrc}) -.getReg(0); +Register SgprDst = B.getMRI()->createVirtualRegister({SgprRB, Ty}); +return BuildRL(B, SgprDst, VgprSrc).getReg(0); } SmallVector SgprDstParts; - unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI); + unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI, + BuildRL); return B.buildMergeLikeInstr({SgprRB, Ty}, SgprDstParts).getReg(0); } -void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst, - Register VgprSrc, const RegisterBankInfo &RBI) { +static void buildReadLane(MachineIRBuilder &B, Register SgprDst, + Register VgprSrc, const RegisterBankInfo &RBI, + ReadLaneFnTy BuildReadLane) { LLT Ty = B.getMRI()->getType(VgprSrc); if (Ty.getSizeInBits() == 32) { -B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {SgprDst}, {VgprSrc}); +BuildReadLane(B, SgprDst, VgprSrc); return; } SmallVector SgprDstParts; - unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI); + unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI, + BuildReadLane); B.buildMergeLikeInstr(SgprDst, SgprDstParts).getReg(0); } + +void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst, + Register VgprSrc, const RegisterBankInfo &RBI) { + return bu
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #142790)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/142790 >From d5bdc951f61533379fed9a86ed6c0eab18b7893c Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Thu, 5 Jun 2025 12:43:04 +0200 Subject: [PATCH] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize Add rules for G_AMDGPU_BUFFER_LOAD and implement waterfall lowering for divergent operands that must be sgpr. --- .../Target/AMDGPU/AMDGPUGlobalISelUtils.cpp | 53 +++- .../lib/Target/AMDGPU/AMDGPUGlobalISelUtils.h | 2 + .../Target/AMDGPU/AMDGPURegBankLegalize.cpp | 17 +- .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 239 +- .../AMDGPU/AMDGPURegBankLegalizeHelper.h | 1 + .../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 22 +- .../AMDGPU/AMDGPURegBankLegalizeRules.h | 6 +- .../AMDGPU/GlobalISel/buffer-schedule.ll | 2 +- .../llvm.amdgcn.make.buffer.rsrc.ll | 2 +- .../regbankselect-amdgcn.raw.buffer.load.ll | 59 ++--- ...egbankselect-amdgcn.raw.ptr.buffer.load.ll | 59 ++--- ...regbankselect-amdgcn.struct.buffer.load.ll | 59 ++--- ...ankselect-amdgcn.struct.ptr.buffer.load.ll | 59 ++--- .../llvm.amdgcn.buffer.load-last-use.ll | 2 +- .../llvm.amdgcn.raw.atomic.buffer.load.ll | 42 +-- .../llvm.amdgcn.raw.ptr.atomic.buffer.load.ll | 42 +-- .../llvm.amdgcn.struct.atomic.buffer.load.ll | 48 ++-- ...vm.amdgcn.struct.ptr.atomic.buffer.load.ll | 48 ++-- .../CodeGen/AMDGPU/swizzle.bit.extract.ll | 4 +- 19 files changed, 523 insertions(+), 243 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp index 00979f44f9d34..d8be3aee1f410 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp @@ -117,45 +117,72 @@ static LLT getReadAnyLaneSplitTy(LLT Ty) { return LLT::scalar(32); } -static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc, - const RegisterBankInfo &RBI); +using ReadLaneFnTy = +function_ref; + +static Register buildReadLane(MachineIRBuilder &, Register, + const RegisterBankInfo &, ReadLaneFnTy); static void unmergeReadAnyLane(MachineIRBuilder &B, SmallVectorImpl &SgprDstParts, LLT UnmergeTy, Register VgprSrc, - const RegisterBankInfo &RBI) { + const RegisterBankInfo &RBI, + ReadLaneFnTy BuildRL) { const RegisterBank *VgprRB = &RBI.getRegBank(AMDGPU::VGPRRegBankID); auto Unmerge = B.buildUnmerge({VgprRB, UnmergeTy}, VgprSrc); for (unsigned i = 0; i < Unmerge->getNumOperands() - 1; ++i) { -SgprDstParts.push_back(buildReadAnyLane(B, Unmerge.getReg(i), RBI)); +SgprDstParts.push_back(buildReadLane(B, Unmerge.getReg(i), RBI, BuildRL)); } } -static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc, - const RegisterBankInfo &RBI) { +static Register buildReadLane(MachineIRBuilder &B, Register VgprSrc, + const RegisterBankInfo &RBI, + ReadLaneFnTy BuildRL) { LLT Ty = B.getMRI()->getType(VgprSrc); const RegisterBank *SgprRB = &RBI.getRegBank(AMDGPU::SGPRRegBankID); if (Ty.getSizeInBits() == 32) { -return B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {{SgprRB, Ty}}, {VgprSrc}) -.getReg(0); +Register SgprDst = B.getMRI()->createVirtualRegister({SgprRB, Ty}); +return BuildRL(B, SgprDst, VgprSrc).getReg(0); } SmallVector SgprDstParts; - unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI); + unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI, + BuildRL); return B.buildMergeLikeInstr({SgprRB, Ty}, SgprDstParts).getReg(0); } -void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst, - Register VgprSrc, const RegisterBankInfo &RBI) { +static void buildReadLane(MachineIRBuilder &B, Register SgprDst, + Register VgprSrc, const RegisterBankInfo &RBI, + ReadLaneFnTy BuildReadLane) { LLT Ty = B.getMRI()->getType(VgprSrc); if (Ty.getSizeInBits() == 32) { -B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {SgprDst}, {VgprSrc}); +BuildReadLane(B, SgprDst, VgprSrc); return; } SmallVector SgprDstParts; - unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI); + unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI, + BuildReadLane); B.buildMergeLikeInstr(SgprDst, SgprDstParts).getReg(0); } + +void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst, + Register VgprSrc, const RegisterBankInfo &RBI) { + return bu
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #142789)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/142789 >From fada12c02954dd1c244c944fa37dbae674284923 Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Thu, 5 Jun 2025 12:17:13 +0200 Subject: [PATCH] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize --- .../Target/AMDGPU/AMDGPURegBankLegalize.cpp | 124 +++--- .../AMDGPU/GlobalISel/readanylane-combines.ll | 25 +--- .../GlobalISel/readanylane-combines.mir | 78 +++ 3 files changed, 127 insertions(+), 100 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp index ba661348ca5b5..b38dacfe9958d 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp @@ -23,6 +23,7 @@ #include "GCNSubtarget.h" #include "llvm/CodeGen/GlobalISel/CSEInfo.h" #include "llvm/CodeGen/GlobalISel/CSEMIRBuilder.h" +#include "llvm/CodeGen/GlobalISel/GenericMachineInstrs.h" #include "llvm/CodeGen/MachineFunctionPass.h" #include "llvm/CodeGen/MachineUniformityAnalysis.h" #include "llvm/CodeGen/TargetPassConfig.h" @@ -137,7 +138,111 @@ class AMDGPURegBankLegalizeCombiner { return {MatchMI, MatchMI->getOperand(1).getReg()}; } + std::pair tryMatchRALFromUnmerge(Register Src) { +MachineInstr *ReadAnyLane = MRI.getVRegDef(Src); +if (ReadAnyLane->getOpcode() != AMDGPU::G_AMDGPU_READANYLANE) + return {nullptr, -1}; + +Register RALSrc = ReadAnyLane->getOperand(1).getReg(); +if (auto *UnMerge = getOpcodeDef(RALSrc, MRI)) + return {UnMerge, UnMerge->findRegisterDefOperandIdx(RALSrc, nullptr)}; + +return {nullptr, -1}; + } + + Register getReadAnyLaneSrc(Register Src) { +// Src = G_AMDGPU_READANYLANE RALSrc +auto [RAL, RALSrc] = tryMatch(Src, AMDGPU::G_AMDGPU_READANYLANE); +if (RAL) + return RALSrc; + +// LoVgpr, HiVgpr = G_UNMERGE_VALUES UnmergeSrc +// LoSgpr = G_AMDGPU_READANYLANE LoVgpr +// HiSgpr = G_AMDGPU_READANYLANE HiVgpr +// Src G_MERGE_VALUES LoSgpr, HiSgpr +auto *Merge = getOpcodeDef(Src, MRI); +if (Merge) { + unsigned NumElts = Merge->getNumSources(); + auto [Unmerge, Idx] = tryMatchRALFromUnmerge(Merge->getSourceReg(0)); + if (!Unmerge || Unmerge->getNumDefs() != NumElts || Idx != 0) +return {}; + + // check if all elements are from same unmerge and there is no shuffling + for (unsigned i = 1; i < NumElts; ++i) { +auto [UnmergeI, IdxI] = tryMatchRALFromUnmerge(Merge->getSourceReg(i)); +if (UnmergeI != Unmerge || (unsigned)IdxI != i) + return {}; + } + return Unmerge->getSourceReg(); +} + +// ..., VgprI, ... = G_UNMERGE_VALUES VgprLarge +// SgprI = G_AMDGPU_READANYLANE VgprI +// SgprLarge G_MERGE_VALUES ..., SgprI, ... +// ..., Src, ... = G_UNMERGE_VALUES SgprLarge +auto *UnMerge = getOpcodeDef(Src, MRI); +if (UnMerge) { + int Idx = UnMerge->findRegisterDefOperandIdx(Src, nullptr); + auto *Merge = getOpcodeDef(UnMerge->getSourceReg(), MRI); + if (Merge) { +auto [RAL, RALSrc] = +tryMatch(Merge->getSourceReg(Idx), AMDGPU::G_AMDGPU_READANYLANE); +if (RAL) + return RALSrc; + } +} + +return {}; + } + + void replaceRegWithOrBuildCopy(Register Dst, Register Src) { +if (Dst.isVirtual()) + MRI.replaceRegWith(Dst, Src); +else + B.buildCopy(Dst, Src); + } + + bool tryEliminateReadAnyLane(MachineInstr &Copy) { +Register Dst = Copy.getOperand(0).getReg(); +Register Src = Copy.getOperand(1).getReg(); +if (!Src.isVirtual()) + return false; + +Register RALDst = Src; +MachineInstr &SrcMI = *MRI.getVRegDef(Src); +if (SrcMI.getOpcode() == AMDGPU::G_BITCAST) + RALDst = SrcMI.getOperand(1).getReg(); + +Register RALSrc = getReadAnyLaneSrc(RALDst); +if (!RALSrc) + return false; + +B.setInstr(Copy); +if (SrcMI.getOpcode() != AMDGPU::G_BITCAST) { + // Src = READANYLANE RALSrc Src = READANYLANE RALSrc + // Dst = Copy Src $Dst = Copy Src + // -> -> + // Dst = RALSrc $Dst = Copy RALSrc + replaceRegWithOrBuildCopy(Dst, RALSrc); +} else { + // RALDst = READANYLANE RALSrc RALDst = READANYLANE RALSrc + // Src = G_BITCAST RALDst Src = G_BITCAST RALDst + // Dst = Copy Src Dst = Copy Src + // -> -> + // NewVgpr = G_BITCAST RALDst NewVgpr = G_BITCAST RALDst + // Dst = NewVgpr$Dst = Copy NewVgpr + auto Bitcast = B.buildBitcast({VgprRB, MRI.getType(Src)}, RALSrc); + replaceRegWithOrBuildCopy(Dst, Bitcast.getReg(0)); +} + +eraseInstr(Copy, MRI, nullptr); +return true; + } + void tryCombineCopy(MachineInstr &MI) { +if (tryEliminateReadAnyLane(MI)) + return; + Reg
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #142789)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/142789 >From fada12c02954dd1c244c944fa37dbae674284923 Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Thu, 5 Jun 2025 12:17:13 +0200 Subject: [PATCH] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize --- .../Target/AMDGPU/AMDGPURegBankLegalize.cpp | 124 +++--- .../AMDGPU/GlobalISel/readanylane-combines.ll | 25 +--- .../GlobalISel/readanylane-combines.mir | 78 +++ 3 files changed, 127 insertions(+), 100 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp index ba661348ca5b5..b38dacfe9958d 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp @@ -23,6 +23,7 @@ #include "GCNSubtarget.h" #include "llvm/CodeGen/GlobalISel/CSEInfo.h" #include "llvm/CodeGen/GlobalISel/CSEMIRBuilder.h" +#include "llvm/CodeGen/GlobalISel/GenericMachineInstrs.h" #include "llvm/CodeGen/MachineFunctionPass.h" #include "llvm/CodeGen/MachineUniformityAnalysis.h" #include "llvm/CodeGen/TargetPassConfig.h" @@ -137,7 +138,111 @@ class AMDGPURegBankLegalizeCombiner { return {MatchMI, MatchMI->getOperand(1).getReg()}; } + std::pair tryMatchRALFromUnmerge(Register Src) { +MachineInstr *ReadAnyLane = MRI.getVRegDef(Src); +if (ReadAnyLane->getOpcode() != AMDGPU::G_AMDGPU_READANYLANE) + return {nullptr, -1}; + +Register RALSrc = ReadAnyLane->getOperand(1).getReg(); +if (auto *UnMerge = getOpcodeDef(RALSrc, MRI)) + return {UnMerge, UnMerge->findRegisterDefOperandIdx(RALSrc, nullptr)}; + +return {nullptr, -1}; + } + + Register getReadAnyLaneSrc(Register Src) { +// Src = G_AMDGPU_READANYLANE RALSrc +auto [RAL, RALSrc] = tryMatch(Src, AMDGPU::G_AMDGPU_READANYLANE); +if (RAL) + return RALSrc; + +// LoVgpr, HiVgpr = G_UNMERGE_VALUES UnmergeSrc +// LoSgpr = G_AMDGPU_READANYLANE LoVgpr +// HiSgpr = G_AMDGPU_READANYLANE HiVgpr +// Src G_MERGE_VALUES LoSgpr, HiSgpr +auto *Merge = getOpcodeDef(Src, MRI); +if (Merge) { + unsigned NumElts = Merge->getNumSources(); + auto [Unmerge, Idx] = tryMatchRALFromUnmerge(Merge->getSourceReg(0)); + if (!Unmerge || Unmerge->getNumDefs() != NumElts || Idx != 0) +return {}; + + // check if all elements are from same unmerge and there is no shuffling + for (unsigned i = 1; i < NumElts; ++i) { +auto [UnmergeI, IdxI] = tryMatchRALFromUnmerge(Merge->getSourceReg(i)); +if (UnmergeI != Unmerge || (unsigned)IdxI != i) + return {}; + } + return Unmerge->getSourceReg(); +} + +// ..., VgprI, ... = G_UNMERGE_VALUES VgprLarge +// SgprI = G_AMDGPU_READANYLANE VgprI +// SgprLarge G_MERGE_VALUES ..., SgprI, ... +// ..., Src, ... = G_UNMERGE_VALUES SgprLarge +auto *UnMerge = getOpcodeDef(Src, MRI); +if (UnMerge) { + int Idx = UnMerge->findRegisterDefOperandIdx(Src, nullptr); + auto *Merge = getOpcodeDef(UnMerge->getSourceReg(), MRI); + if (Merge) { +auto [RAL, RALSrc] = +tryMatch(Merge->getSourceReg(Idx), AMDGPU::G_AMDGPU_READANYLANE); +if (RAL) + return RALSrc; + } +} + +return {}; + } + + void replaceRegWithOrBuildCopy(Register Dst, Register Src) { +if (Dst.isVirtual()) + MRI.replaceRegWith(Dst, Src); +else + B.buildCopy(Dst, Src); + } + + bool tryEliminateReadAnyLane(MachineInstr &Copy) { +Register Dst = Copy.getOperand(0).getReg(); +Register Src = Copy.getOperand(1).getReg(); +if (!Src.isVirtual()) + return false; + +Register RALDst = Src; +MachineInstr &SrcMI = *MRI.getVRegDef(Src); +if (SrcMI.getOpcode() == AMDGPU::G_BITCAST) + RALDst = SrcMI.getOperand(1).getReg(); + +Register RALSrc = getReadAnyLaneSrc(RALDst); +if (!RALSrc) + return false; + +B.setInstr(Copy); +if (SrcMI.getOpcode() != AMDGPU::G_BITCAST) { + // Src = READANYLANE RALSrc Src = READANYLANE RALSrc + // Dst = Copy Src $Dst = Copy Src + // -> -> + // Dst = RALSrc $Dst = Copy RALSrc + replaceRegWithOrBuildCopy(Dst, RALSrc); +} else { + // RALDst = READANYLANE RALSrc RALDst = READANYLANE RALSrc + // Src = G_BITCAST RALDst Src = G_BITCAST RALDst + // Dst = Copy Src Dst = Copy Src + // -> -> + // NewVgpr = G_BITCAST RALDst NewVgpr = G_BITCAST RALDst + // Dst = NewVgpr$Dst = Copy NewVgpr + auto Bitcast = B.buildBitcast({VgprRB, MRI.getType(Src)}, RALSrc); + replaceRegWithOrBuildCopy(Dst, Bitcast.getReg(0)); +} + +eraseInstr(Copy, MRI, nullptr); +return true; + } + void tryCombineCopy(MachineInstr &MI) { +if (tryEliminateReadAnyLane(MI)) + return; + Reg
[llvm-branch-commits] [llvm] [BOLT] Factor out MCInstReference from gadget scanner (NFC) (PR #138655)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/138655 >From 55df5f56bcb3a7a1e61bcb88a4f43ec725fdb4bd Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Thu, 19 Jun 2025 14:03:59 +0300 Subject: [PATCH] [BOLT] Factor out MCInstReference from gadget scanner (NFC) Move MCInstReference representing a constant reference to an instruction inside a parent entity - either inside a basic block (which has a reference to its parent function) or directly to the function (when CFG information is not available). --- bolt/include/bolt/Core/MCInstUtils.h | 168 + bolt/include/bolt/Passes/PAuthGadgetScanner.h | 176 +- bolt/lib/Core/CMakeLists.txt | 1 + bolt/lib/Core/MCInstUtils.cpp | 57 ++ bolt/lib/Passes/PAuthGadgetScanner.cpp| 102 +- 5 files changed, 269 insertions(+), 235 deletions(-) create mode 100644 bolt/include/bolt/Core/MCInstUtils.h create mode 100644 bolt/lib/Core/MCInstUtils.cpp diff --git a/bolt/include/bolt/Core/MCInstUtils.h b/bolt/include/bolt/Core/MCInstUtils.h new file mode 100644 index 0..69bf5e6159b74 --- /dev/null +++ b/bolt/include/bolt/Core/MCInstUtils.h @@ -0,0 +1,168 @@ +//===- bolt/Core/MCInstUtils.h --*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#ifndef BOLT_CORE_MCINSTUTILS_H +#define BOLT_CORE_MCINSTUTILS_H + +#include "bolt/Core/BinaryBasicBlock.h" + +#include +#include +#include + +namespace llvm { +namespace bolt { + +class BinaryFunction; + +/// MCInstReference represents a reference to a constant MCInst as stored either +/// in a BinaryFunction (i.e. before a CFG is created), or in a BinaryBasicBlock +/// (after a CFG is created). +class MCInstReference { + using nocfg_const_iterator = std::map::const_iterator; + + // Two cases are possible: + // * functions with CFG reconstructed - a function stores a collection of + // basic blocks, each basic block stores a contiguous vector of MCInst + // * functions without CFG - there are no basic blocks created, + // the instructions are directly stored in std::map in BinaryFunction + // + // In both cases, the direct parent of MCInst is stored together with an + // iterator pointing to the instruction. + + // Helper struct: CFG is available, the direct parent is a basic block, + // iterator's type is `MCInst *`. + struct RefInBB { +RefInBB(const BinaryBasicBlock *BB, const MCInst *Inst) +: BB(BB), It(Inst) {} +RefInBB(const RefInBB &Other) = default; +RefInBB &operator=(const RefInBB &Other) = default; + +const BinaryBasicBlock *BB; +BinaryBasicBlock::const_iterator It; + +bool operator<(const RefInBB &Other) const { + return std::tie(BB, It) < std::tie(Other.BB, Other.It); +} + +bool operator==(const RefInBB &Other) const { + return BB == Other.BB && It == Other.It; +} + }; + + // Helper struct: CFG is *not* available, the direct parent is a function, + // iterator's type is std::map::iterator (the mapped value + // is an instruction's offset). + struct RefInBF { +RefInBF(const BinaryFunction *BF, nocfg_const_iterator It) +: BF(BF), It(It) {} +RefInBF(const RefInBF &Other) = default; +RefInBF &operator=(const RefInBF &Other) = default; + +const BinaryFunction *BF; +nocfg_const_iterator It; + +bool operator<(const RefInBF &Other) const { + return std::tie(BF, It->first) < std::tie(Other.BF, Other.It->first); +} + +bool operator==(const RefInBF &Other) const { + return BF == Other.BF && It->first == Other.It->first; +} + }; + + std::variant Reference; + + // Utility methods to be used like this: + // + // if (auto *Ref = tryGetRefInBB()) + // return Ref->doSomething(...); + // return getRefInBF().doSomethingElse(...); + const RefInBB *tryGetRefInBB() const { +assert(std::get_if(&Reference) || + std::get_if(&Reference)); +return std::get_if(&Reference); + } + const RefInBF &getRefInBF() const { +assert(std::get_if(&Reference)); +return *std::get_if(&Reference); + } + +public: + /// Constructs an empty reference. + MCInstReference() : Reference(RefInBB(nullptr, nullptr)) {} + /// Constructs a reference to the instruction inside the basic block. + MCInstReference(const BinaryBasicBlock *BB, const MCInst *Inst) + : Reference(RefInBB(BB, Inst)) { +assert(BB && Inst && "Neither BB nor Inst should be nullptr"); + } + /// Constructs a reference to the instruction inside the basic block. + MCInstReference(const BinaryBasicBlock *BB, unsigned Index) + : Reference(RefInBB(BB, &BB->getInstructionAtIndex(I
[llvm-branch-commits] [clang] 5ef48b3 - Revert "[clang][dataflow] Expose simple access to child StorageLocation prese…"
Author: Samira Bakon Date: 2025-06-25T10:17:15-04:00 New Revision: 5ef48b36e5ebc5e0cd1058a502c7ef0cd9347d15 URL: https://github.com/llvm/llvm-project/commit/5ef48b36e5ebc5e0cd1058a502c7ef0cd9347d15 DIFF: https://github.com/llvm/llvm-project/commit/5ef48b36e5ebc5e0cd1058a502c7ef0cd9347d15.diff LOG: Revert "[clang][dataflow] Expose simple access to child StorageLocation prese…" This reverts commit 09b43a5a862f453aabd95ec01e0b53c46ca3e340. Added: Modified: clang/include/clang/Analysis/FlowSensitive/StorageLocation.h Removed: diff --git a/clang/include/clang/Analysis/FlowSensitive/StorageLocation.h b/clang/include/clang/Analysis/FlowSensitive/StorageLocation.h index 8b263b16d5b1e..8fcc6a44027a0 100644 --- a/clang/include/clang/Analysis/FlowSensitive/StorageLocation.h +++ b/clang/include/clang/Analysis/FlowSensitive/StorageLocation.h @@ -168,8 +168,6 @@ class RecordStorageLocation final : public StorageLocation { return {Children.begin(), Children.end()}; } - bool hasChild(const ValueDecl &D) const { return Children.contains(&D); } - private: FieldToLoc Children; SyntheticFieldMap SyntheticFields; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [NFC][PGO] Use constants rather than free strings for metadata labels (PR #145721)
https://github.com/kazutakahirata approved this pull request. LGTM. Thanks! https://github.com/llvm/llvm-project/pull/145721 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)
@@ -129,6 +147,245 @@ bool AlwaysInlineImpl( return Changed; } +/// Promote allocas to registers if possible. +static void promoteAllocas( +Function *Caller, SmallPtrSetImpl &AllocasToPromote, +function_ref &GetAssumptionCache) { + if (AllocasToPromote.empty()) +return; + + SmallVector PromotableAllocas; + llvm::copy_if(AllocasToPromote, std::back_inserter(PromotableAllocas), +isAllocaPromotable); + if (PromotableAllocas.empty()) +return; + + DominatorTree DT(*Caller); + AssumptionCache &AC = GetAssumptionCache(*Caller); + PromoteMemToReg(PromotableAllocas, DT, &AC); + NumAllocasPromoted += PromotableAllocas.size(); + // Emit a remark for the promotion. + OptimizationRemarkEmitter ORE(Caller); + DebugLoc DLoc = Caller->getEntryBlock().getTerminator()->getDebugLoc(); + ORE.emit([&]() { +return OptimizationRemark(DEBUG_TYPE, "PromoteAllocas", DLoc, + &Caller->getEntryBlock()) + << "Promoting " << ore::NV("NumAlloca", PromotableAllocas.size()) + << " allocas to SSA registers in function '" + << ore::NV("Function", Caller) << "'"; + }); + LLVM_DEBUG(dbgs() << "Promoted " << PromotableAllocas.size() +<< " allocas to registers in function " << Caller->getName() +<< "\n"); +} + +/// We use a different visitation order of functions here to solve a phase +/// ordering problem. After inlining, a caller function may have allocas that +/// were previously used for passing reference arguments to the callee that +/// are now promotable to registers, using SROA/mem2reg. However if we just let +/// the AlwaysInliner continue inlining everything at once, the later SROA pass +/// in the pipeline will end up placing phis for these allocas into blocks along +/// the dominance frontier which may extend further than desired (e.g. loop +/// headers). This can happen when the caller is then inlined into another +/// caller, and the allocas end up hoisted further before SROA is run. +/// +/// Instead what we want is to try to do, as best as we can, is to inline leaf +/// functions into callers, and then run PromoteMemToReg() on the allocas that +/// were passed into the callee before it was inlined. +/// +/// We want to do this *before* the caller is inlined into another caller +/// because we want the alloca promotion to happen before its scope extends too +/// far because of further inlining. +/// +/// Here's a simple pseudo-example: +/// outermost_caller() { +/// for (...) { +/// middle_caller(); +/// } +/// } +/// +/// middle_caller() { +/// int stack_var; +/// inner_callee(&stack_var); +/// } +/// +/// inner_callee(int *x) { +/// // Do something with x. +/// } +/// +/// In this case, we want to inline inner_callee() into middle_caller() and +/// then promote stack_var to a register before we inline middle_caller() into +/// outermost_caller(). The regular always_inliner would inline everything at +/// once, and then SROA/mem2reg would promote stack_var to a register but in +/// the context of outermost_caller() which is not what we want. mtrofin wrote: Thanks. We have been experimenting with other traversal orders (hence `ModuleInliner.cpp`) and this aspect is good to keep in mind. In that context, could the problem addressed here be decoupled from inlining order? It seems like it'd result in a more robust system. (I'm not trying to scope-creep, rather want to understand what options we have, and that doesn't have to impact what we do right now) https://github.com/llvm/llvm-project/pull/145613 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] RuntimeLibcalls: Cleanup darwin exp10 case (PR #145638)
https://github.com/fhahn approved this pull request. LGTM, thanks https://github.com/llvm/llvm-project/pull/145638 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [IR2Vec] Overloading `operator+` for `Embeddings` (PR #145118)
@@ -106,6 +106,7 @@ struct Embedding { const std::vector &getData() const { return Data; } /// Arithmetic operators + Embedding operator+(const Embedding &RHS) const; svkeerthy wrote: Sure. Will add them too! https://github.com/llvm/llvm-project/pull/145118 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [AMDGPU] Add support for `v_cvt_pk_f16_bf8` on gfx1250 (PR #145753)
shiltian wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/145753?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#145753** https://app.graphite.dev/github/pr/llvm/llvm-project/145753?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/145753?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#145747** https://app.graphite.dev/github/pr/llvm/llvm-project/145747?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#145632** https://app.graphite.dev/github/pr/llvm/llvm-project/145632?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/145753 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [GOFF] Add writing of section symbols (PR #133799)
@@ -0,0 +1,113 @@ +//===- MCGOFFAttributes.h - Attributes of GOFF symbols ===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// Defines the various attribute collections defining GOFF symbols. +// +//===--===// + +#ifndef LLVM_MC_MCGOFFATTRIBUTES_H +#define LLVM_MC_MCGOFFATTRIBUTES_H + +#include "llvm/ADT/StringRef.h" +#include "llvm/BinaryFormat/GOFF.h" + +namespace llvm { +namespace GOFF { +// An "External Symbol Definition" in the GOFF file has a type, and depending on +// the type a different subset of the fields is used. +// +// Unlike other formats, a 2 dimensional structure is used to define the +// location of data. For example, the equivalent of the ELF .text section is +// made up of a Section Definition (SD) and a class (Element Definition; ED). +// The name of the SD symbol depends on the application, while the class has the +// predefined name C_CODE/C_CODE64 in AMODE31 and AMODE64 respectively. +// +// Data can be placed into this structure in 2 ways. First, the data (in a text +// record) can be associated with an ED symbol. To refer to data, a Label +// Definition (LD) is used to give an offset into the data a name. When binding, +// the whole data is pulled into the resulting executable, and the addresses +// given by the LD symbols are resolved. +// +// The alternative is to use a Part Definition (PR). In this case, the data (in +// a text record) is associated with the part. When binding, only the data of +// referenced PRs is pulled into the resulting binary. +// +// Both approaches are used, which means that the equivalent of a section in ELF +// results in 3 GOFF symbols, either SD/ED/LD or SD/ED/PR. Moreover, certain +// sections are fine with just defining SD/ED symbols. The SymbolMapper takes +// care of all those details. + +// Attributes for SD symbols. +struct SDAttr { + GOFF::ESDTaskingBehavior TaskingBehavior = GOFF::ESD_TA_Unspecified; + GOFF::ESDBindingScope BindingScope = GOFF::ESD_BSC_Unspecified; +}; + +// Attributes for ED symbols. +struct EDAttr { + bool IsReadOnly = false; + GOFF::ESDExecutable Executable = GOFF::ESD_EXE_Unspecified; + GOFF::ESDAmode Amode; uweigand wrote: OK, thanks for checking. Right now Amode is nowhere emitted to asm output at all anymore, but I guess that's because there is no asm output for LD symbols (function labels). I assume this will be added later? https://github.com/llvm/llvm-project/pull/133799 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [AMDGPU] Add support for `v_cvt_pk_f16_bf8` on gfx1250 (PR #145753)
https://github.com/shiltian updated https://github.com/llvm/llvm-project/pull/145753 >From 5d44b53a20029b6f216bd18f47f49a9e873613e7 Mon Sep 17 00:00:00 2001 From: "Mekhanoshin, Stanislav" Date: Wed, 25 Jun 2025 13:56:12 -0400 Subject: [PATCH] [AMDGPU] Add support for `v_cvt_pk_f16_bf8` on gfx1250 Co-authored-by: Shilei Tian --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 1 + .../CodeGenOpenCL/builtins-amdgcn-gfx1250.cl | 20 +++ llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 4 +++ .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 1 + llvm/lib/Target/AMDGPU/VOP1Instructions.td| 4 +++ .../CodeGen/AMDGPU/llvm.amdgcn.cvt.f16.fp8.ll | 35 +++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1-fake16.s | 9 + llvm/test/MC/AMDGPU/gfx1250_asm_vop1.s| 9 + .../MC/AMDGPU/gfx1250_asm_vop1_dpp16-fake16.s | 4 +++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp16.s | 8 + .../MC/AMDGPU/gfx1250_asm_vop1_dpp8-fake16.s | 4 +++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp8.s | 8 + llvm/test/MC/AMDGPU/gfx1250_asm_vop1_err.s| 10 ++ .../gfx1250_asm_vop3_from_vop1-fake16.s | 12 +++ .../MC/AMDGPU/gfx1250_asm_vop3_from_vop1.s| 12 +++ .../gfx1250_asm_vop3_from_vop1_dpp16-fake16.s | 8 + .../AMDGPU/gfx1250_asm_vop3_from_vop1_dpp16.s | 8 + .../gfx1250_asm_vop3_from_vop1_dpp8-fake16.s | 8 + .../AMDGPU/gfx1250_asm_vop3_from_vop1_dpp8.s | 8 + .../Disassembler/AMDGPU/gfx1250_dasm_vop1.txt | 10 ++ .../AMDGPU/gfx1250_dasm_vop1_dpp16.txt| 8 + .../AMDGPU/gfx1250_dasm_vop1_dpp8.txt | 8 + .../AMDGPU/gfx1250_dasm_vop3_from_vop1.txt| 15 .../gfx1250_dasm_vop3_from_vop1_dpp16.txt | 8 + .../gfx1250_dasm_vop3_from_vop1_dpp8.txt | 8 + 25 files changed, 230 insertions(+) diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 94fa3e9b74c46..1d1f5a4ee3f9f 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -643,6 +643,7 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_f16_f32, "V2hV2hfUiIb", "nc", "f32-to-f16 TARGET_BUILTIN(__builtin_amdgcn_s_setprio_inc_wg, "vIs", "n", "setprio-inc-wg-inst") TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_f16_fp8, "V2hs", "nc", "gfx1250-insts") +TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_f16_bf8, "V2hs", "nc", "gfx1250-insts") #undef BUILTIN #undef TARGET_BUILTIN diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl index 3f7a2d8649740..e2c6a4a3f15f3 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl @@ -30,3 +30,23 @@ void test_cvt_pk_f16_fp8(global half2* out, short a) { out[0] = __builtin_amdgcn_cvt_pk_f16_fp8(a); } + +// CHECK-LABEL: @test_cvt_pk_f16_bf8( +// CHECK-NEXT: entry: +// CHECK-NEXT:[[OUT_ADDR:%.*]] = alloca ptr addrspace(1), align 8, addrspace(5) +// CHECK-NEXT:[[A_ADDR:%.*]] = alloca i16, align 2, addrspace(5) +// CHECK-NEXT:[[OUT_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[OUT_ADDR]] to ptr +// CHECK-NEXT:[[A_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A_ADDR]] to ptr +// CHECK-NEXT:store ptr addrspace(1) [[OUT:%.*]], ptr [[OUT_ADDR_ASCAST]], align 8 +// CHECK-NEXT:store i16 [[A:%.*]], ptr [[A_ADDR_ASCAST]], align 2 +// CHECK-NEXT:[[TMP0:%.*]] = load i16, ptr [[A_ADDR_ASCAST]], align 2 +// CHECK-NEXT:[[TMP1:%.*]] = call <2 x half> @llvm.amdgcn.cvt.pk.f16.bf8(i16 [[TMP0]]) +// CHECK-NEXT:[[TMP2:%.*]] = load ptr addrspace(1), ptr [[OUT_ADDR_ASCAST]], align 8 +// CHECK-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds <2 x half>, ptr addrspace(1) [[TMP2]], i64 0 +// CHECK-NEXT:store <2 x half> [[TMP1]], ptr addrspace(1) [[ARRAYIDX]], align 4 +// CHECK-NEXT:ret void +// +void test_cvt_pk_f16_bf8(global half2* out, short a) +{ + out[0] = __builtin_amdgcn_cvt_pk_f16_bf8(a); +} diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td index 72b0aa01f71aa..6f974c97361de 100644 --- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td +++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td @@ -592,6 +592,10 @@ def int_amdgcn_cvt_pk_f16_fp8 : DefaultAttrsIntrinsic< [llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrSpeculatable] >, ClangBuiltin<"__builtin_amdgcn_cvt_pk_f16_fp8">; +def int_amdgcn_cvt_pk_f16_bf8 : DefaultAttrsIntrinsic< + [llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrSpeculatable] +>, ClangBuiltin<"__builtin_amdgcn_cvt_pk_f16_bf8">; + class AMDGPUCvtScaleF32Intrinsic : DefaultAttrsIntrinsic< [DstTy], [Src0Ty, llvm_float_ty], [IntrNoMem, IntrSpeculatable] >, ClangBuiltin<"__builtin_amdgcn_"#name>; diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index 50d297cd096a6..b20760c356263 100644 --- a/llvm/lib/Tar
[llvm-branch-commits] [clang] [llvm] [AMDGPU] Add support for `v_cvt_pk_f16_bf8` on gfx1250 (PR #145753)
https://github.com/shiltian updated https://github.com/llvm/llvm-project/pull/145753 >From 5d44b53a20029b6f216bd18f47f49a9e873613e7 Mon Sep 17 00:00:00 2001 From: "Mekhanoshin, Stanislav" Date: Wed, 25 Jun 2025 13:56:12 -0400 Subject: [PATCH] [AMDGPU] Add support for `v_cvt_pk_f16_bf8` on gfx1250 Co-authored-by: Shilei Tian --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 1 + .../CodeGenOpenCL/builtins-amdgcn-gfx1250.cl | 20 +++ llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 4 +++ .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 1 + llvm/lib/Target/AMDGPU/VOP1Instructions.td| 4 +++ .../CodeGen/AMDGPU/llvm.amdgcn.cvt.f16.fp8.ll | 35 +++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1-fake16.s | 9 + llvm/test/MC/AMDGPU/gfx1250_asm_vop1.s| 9 + .../MC/AMDGPU/gfx1250_asm_vop1_dpp16-fake16.s | 4 +++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp16.s | 8 + .../MC/AMDGPU/gfx1250_asm_vop1_dpp8-fake16.s | 4 +++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp8.s | 8 + llvm/test/MC/AMDGPU/gfx1250_asm_vop1_err.s| 10 ++ .../gfx1250_asm_vop3_from_vop1-fake16.s | 12 +++ .../MC/AMDGPU/gfx1250_asm_vop3_from_vop1.s| 12 +++ .../gfx1250_asm_vop3_from_vop1_dpp16-fake16.s | 8 + .../AMDGPU/gfx1250_asm_vop3_from_vop1_dpp16.s | 8 + .../gfx1250_asm_vop3_from_vop1_dpp8-fake16.s | 8 + .../AMDGPU/gfx1250_asm_vop3_from_vop1_dpp8.s | 8 + .../Disassembler/AMDGPU/gfx1250_dasm_vop1.txt | 10 ++ .../AMDGPU/gfx1250_dasm_vop1_dpp16.txt| 8 + .../AMDGPU/gfx1250_dasm_vop1_dpp8.txt | 8 + .../AMDGPU/gfx1250_dasm_vop3_from_vop1.txt| 15 .../gfx1250_dasm_vop3_from_vop1_dpp16.txt | 8 + .../gfx1250_dasm_vop3_from_vop1_dpp8.txt | 8 + 25 files changed, 230 insertions(+) diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 94fa3e9b74c46..1d1f5a4ee3f9f 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -643,6 +643,7 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_f16_f32, "V2hV2hfUiIb", "nc", "f32-to-f16 TARGET_BUILTIN(__builtin_amdgcn_s_setprio_inc_wg, "vIs", "n", "setprio-inc-wg-inst") TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_f16_fp8, "V2hs", "nc", "gfx1250-insts") +TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_f16_bf8, "V2hs", "nc", "gfx1250-insts") #undef BUILTIN #undef TARGET_BUILTIN diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl index 3f7a2d8649740..e2c6a4a3f15f3 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl @@ -30,3 +30,23 @@ void test_cvt_pk_f16_fp8(global half2* out, short a) { out[0] = __builtin_amdgcn_cvt_pk_f16_fp8(a); } + +// CHECK-LABEL: @test_cvt_pk_f16_bf8( +// CHECK-NEXT: entry: +// CHECK-NEXT:[[OUT_ADDR:%.*]] = alloca ptr addrspace(1), align 8, addrspace(5) +// CHECK-NEXT:[[A_ADDR:%.*]] = alloca i16, align 2, addrspace(5) +// CHECK-NEXT:[[OUT_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[OUT_ADDR]] to ptr +// CHECK-NEXT:[[A_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A_ADDR]] to ptr +// CHECK-NEXT:store ptr addrspace(1) [[OUT:%.*]], ptr [[OUT_ADDR_ASCAST]], align 8 +// CHECK-NEXT:store i16 [[A:%.*]], ptr [[A_ADDR_ASCAST]], align 2 +// CHECK-NEXT:[[TMP0:%.*]] = load i16, ptr [[A_ADDR_ASCAST]], align 2 +// CHECK-NEXT:[[TMP1:%.*]] = call <2 x half> @llvm.amdgcn.cvt.pk.f16.bf8(i16 [[TMP0]]) +// CHECK-NEXT:[[TMP2:%.*]] = load ptr addrspace(1), ptr [[OUT_ADDR_ASCAST]], align 8 +// CHECK-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds <2 x half>, ptr addrspace(1) [[TMP2]], i64 0 +// CHECK-NEXT:store <2 x half> [[TMP1]], ptr addrspace(1) [[ARRAYIDX]], align 4 +// CHECK-NEXT:ret void +// +void test_cvt_pk_f16_bf8(global half2* out, short a) +{ + out[0] = __builtin_amdgcn_cvt_pk_f16_bf8(a); +} diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td index 72b0aa01f71aa..6f974c97361de 100644 --- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td +++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td @@ -592,6 +592,10 @@ def int_amdgcn_cvt_pk_f16_fp8 : DefaultAttrsIntrinsic< [llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrSpeculatable] >, ClangBuiltin<"__builtin_amdgcn_cvt_pk_f16_fp8">; +def int_amdgcn_cvt_pk_f16_bf8 : DefaultAttrsIntrinsic< + [llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrSpeculatable] +>, ClangBuiltin<"__builtin_amdgcn_cvt_pk_f16_bf8">; + class AMDGPUCvtScaleF32Intrinsic : DefaultAttrsIntrinsic< [DstTy], [Src0Ty, llvm_float_ty], [IntrNoMem, IntrSpeculatable] >, ClangBuiltin<"__builtin_amdgcn_"#name>; diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index 50d297cd096a6..b20760c356263 100644 --- a/llvm/lib/Tar
[llvm-branch-commits] [llvm] [AMDGPU] Use reverse iteration in CodeGenPrepare (PR #145484)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Pierre van Houtryve (Pierre-vh) Changes In order to make this easier, I also removed all "removeFromParent" calls from the visitors, instead adding instructions to a set of instructions to delete once the function has been visited. This avoids crashes due to functions deleting their operands. In theory we could allow functions to delete the instruction they visited (and only that one) but I think having one idiom for everything is less error-prone. Fixes #140219 --- Patch is 48.17 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/145484.diff 5 Files Affected: - (modified) llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp (+29-53) - (modified) llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-break-large-phis-heuristics.ll (+21-21) - (modified) llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fdiv.ll (+70-40) - (modified) llvm/test/CodeGen/AMDGPU/fdiv_flags.f32.ll (+33-105) - (modified) llvm/test/CodeGen/AMDGPU/uniform-select.ll (+32-32) ``diff diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp index 5f1983791cfae..2a3aa1ac672b6 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp @@ -15,6 +15,7 @@ #include "AMDGPU.h" #include "AMDGPUTargetMachine.h" #include "SIModeRegisterDefaults.h" +#include "llvm/ADT/SetVector.h" #include "llvm/Analysis/AssumptionCache.h" #include "llvm/Analysis/ConstantFolding.h" #include "llvm/Analysis/TargetLibraryInfo.h" @@ -109,6 +110,7 @@ class AMDGPUCodeGenPrepareImpl bool FlowChanged = false; mutable Function *SqrtF32 = nullptr; mutable Function *LdexpF32 = nullptr; + mutable SetVector DeadVals; DenseMap BreakPhiNodesCache; @@ -285,28 +287,19 @@ bool AMDGPUCodeGenPrepareImpl::run() { BreakPhiNodesCache.clear(); bool MadeChange = false; - Function::iterator NextBB; - for (Function::iterator FI = F.begin(), FE = F.end(); FI != FE; FI = NextBB) { -BasicBlock *BB = &*FI; -NextBB = std::next(FI); - -BasicBlock::iterator Next; -for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; - I = Next) { - Next = std::next(I); - - MadeChange |= visit(*I); - - if (Next != E) { // Control flow changed -BasicBlock *NextInstBB = Next->getParent(); -if (NextInstBB != BB) { - BB = NextInstBB; - E = BB->end(); - FE = F.end(); -} - } + for (BasicBlock &BB : reverse(F)) { +for (Instruction &I : make_early_inc_range(reverse(BB))) { + if (!DeadVals.contains(&I)) +MadeChange |= visit(I); } } + + while (!DeadVals.empty()) { +RecursivelyDeleteTriviallyDeadInstructions( +DeadVals.pop_back_val(), TLI, /*MSSAU*/ nullptr, +[&](Value *V) { DeadVals.remove(V); }); + } + return MadeChange; } @@ -426,7 +419,7 @@ bool AMDGPUCodeGenPrepareImpl::replaceMulWithMul24(BinaryOperator &I) const { Value *NewVal = insertValues(Builder, Ty, ResultVals); NewVal->takeName(&I); I.replaceAllUsesWith(NewVal); - I.eraseFromParent(); + DeadVals.insert(&I); return true; } @@ -500,10 +493,10 @@ bool AMDGPUCodeGenPrepareImpl::foldBinOpIntoSelect(BinaryOperator &BO) const { FoldedT, FoldedF); NewSelect->takeName(&BO); BO.replaceAllUsesWith(NewSelect); - BO.eraseFromParent(); + DeadVals.insert(&BO); if (CastOp) -CastOp->eraseFromParent(); - Sel->eraseFromParent(); +DeadVals.insert(CastOp); + DeadVals.insert(Sel); return true; } @@ -900,7 +893,7 @@ bool AMDGPUCodeGenPrepareImpl::visitFDiv(BinaryOperator &FDiv) { if (NewVal) { FDiv.replaceAllUsesWith(NewVal); NewVal->takeName(&FDiv); -RecursivelyDeleteTriviallyDeadInstructions(&FDiv, TLI); +DeadVals.insert(&FDiv); } return true; @@ -1310,7 +1303,8 @@ within the byte are all 0. static bool tryNarrowMathIfNoOverflow(Instruction *I, const SITargetLowering *TLI, const TargetTransformInfo &TTI, - const DataLayout &DL) { + const DataLayout &DL, + SetVector &DeadVals) { unsigned Opc = I->getOpcode(); Type *OldType = I->getType(); @@ -1365,7 +1359,7 @@ static bool tryNarrowMathIfNoOverflow(Instruction *I, Value *Zext = Builder.CreateZExt(Arith, OldType); I->replaceAllUsesWith(Zext); - I->eraseFromParent(); + DeadVals.insert(I); return true; } @@ -1376,7 +1370,7 @@ bool AMDGPUCodeGenPrepareImpl::visitBinaryOperator(BinaryOperator &I) { if (UseMul24Intrin && replaceMulWithMul24(I)) return true; if (tryNarrowMathIfNoOverflow(&I, ST.getTargetLowering(), -TM.getTargetTransformInfo(F), DL)) +
[llvm-branch-commits] [llvm] [AMDGPU] Use reverse iteration in CodeGenPrepare (PR #145484)
arsenm wrote: This is not a simplifying pass, it is making the IR more complicated. We have to do hacks like this to prevent later more profitable combines from needing to parse out expanded IR: https://github.com/llvm/llvm-project/blob/fa5d7c926f5f571397eb1981649198136f1d6a92/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp#L2332 https://github.com/llvm/llvm-project/pull/145484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)
https://github.com/aemerson edited https://github.com/llvm/llvm-project/pull/145613 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)
@@ -129,6 +147,245 @@ bool AlwaysInlineImpl( return Changed; } +/// Promote allocas to registers if possible. +static void promoteAllocas( +Function *Caller, SmallPtrSetImpl &AllocasToPromote, +function_ref &GetAssumptionCache) { + if (AllocasToPromote.empty()) +return; + + SmallVector PromotableAllocas; + llvm::copy_if(AllocasToPromote, std::back_inserter(PromotableAllocas), +isAllocaPromotable); + if (PromotableAllocas.empty()) +return; + + DominatorTree DT(*Caller); + AssumptionCache &AC = GetAssumptionCache(*Caller); + PromoteMemToReg(PromotableAllocas, DT, &AC); + NumAllocasPromoted += PromotableAllocas.size(); + // Emit a remark for the promotion. + OptimizationRemarkEmitter ORE(Caller); + DebugLoc DLoc = Caller->getEntryBlock().getTerminator()->getDebugLoc(); + ORE.emit([&]() { +return OptimizationRemark(DEBUG_TYPE, "PromoteAllocas", DLoc, + &Caller->getEntryBlock()) + << "Promoting " << ore::NV("NumAlloca", PromotableAllocas.size()) + << " allocas to SSA registers in function '" + << ore::NV("Function", Caller) << "'"; + }); + LLVM_DEBUG(dbgs() << "Promoted " << PromotableAllocas.size() +<< " allocas to registers in function " << Caller->getName() +<< "\n"); +} + +/// We use a different visitation order of functions here to solve a phase +/// ordering problem. After inlining, a caller function may have allocas that +/// were previously used for passing reference arguments to the callee that +/// are now promotable to registers, using SROA/mem2reg. However if we just let +/// the AlwaysInliner continue inlining everything at once, the later SROA pass +/// in the pipeline will end up placing phis for these allocas into blocks along +/// the dominance frontier which may extend further than desired (e.g. loop +/// headers). This can happen when the caller is then inlined into another +/// caller, and the allocas end up hoisted further before SROA is run. +/// +/// Instead what we want is to try to do, as best as we can, is to inline leaf +/// functions into callers, and then run PromoteMemToReg() on the allocas that +/// were passed into the callee before it was inlined. +/// +/// We want to do this *before* the caller is inlined into another caller +/// because we want the alloca promotion to happen before its scope extends too +/// far because of further inlining. +/// +/// Here's a simple pseudo-example: +/// outermost_caller() { +/// for (...) { +/// middle_caller(); +/// } +/// } +/// +/// middle_caller() { +/// int stack_var; +/// inner_callee(&stack_var); +/// } +/// +/// inner_callee(int *x) { +/// // Do something with x. +/// } +/// +/// In this case, we want to inline inner_callee() into middle_caller() and +/// then promote stack_var to a register before we inline middle_caller() into +/// outermost_caller(). The regular always_inliner would inline everything at +/// once, and then SROA/mem2reg would promote stack_var to a register but in +/// the context of outermost_caller() which is not what we want. aemerson wrote: Yes the traversal order matters here, because for optimal codegen we want mem2reg to happen between the inner->middle and middle->outer inlines. If you don't the other way around mem2reg can't do anything until the final inner->outer inline and by that point it's too late. For now I think only this promotion is a known issue, I don't know of general issues with simplification. https://github.com/llvm/llvm-project/pull/145613 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [AMDGPU] Add support for `v_cvt_pk_f16_bf8` on gfx1250 (PR #145753)
llvmbot wrote: @llvm/pr-subscribers-mc @llvm/pr-subscribers-llvm-ir Author: Shilei Tian (shiltian) Changes Co-authored-by: Shilei Tian --- Patch is 31.64 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/145753.diff 25 Files Affected: - (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+1) - (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl (+20) - (modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+4) - (modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+1) - (modified) llvm/lib/Target/AMDGPU/VOP1Instructions.td (+4) - (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.f16.fp8.ll (+35) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop1-fake16.s (+9) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop1.s (+9) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp16-fake16.s (+4) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp16.s (+8) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp8-fake16.s (+4) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp8.s (+8) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop1_err.s (+10) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3_from_vop1-fake16.s (+12) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3_from_vop1.s (+12) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3_from_vop1_dpp16-fake16.s (+8) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3_from_vop1_dpp16.s (+8) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3_from_vop1_dpp8-fake16.s (+8) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3_from_vop1_dpp8.s (+8) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop1.txt (+10) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop1_dpp16.txt (+8) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop1_dpp8.txt (+8) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3_from_vop1.txt (+15) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3_from_vop1_dpp16.txt (+8) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3_from_vop1_dpp8.txt (+8) ``diff diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 94fa3e9b74c46..1d1f5a4ee3f9f 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -643,6 +643,7 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_f16_f32, "V2hV2hfUiIb", "nc", "f32-to-f16 TARGET_BUILTIN(__builtin_amdgcn_s_setprio_inc_wg, "vIs", "n", "setprio-inc-wg-inst") TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_f16_fp8, "V2hs", "nc", "gfx1250-insts") +TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_f16_bf8, "V2hs", "nc", "gfx1250-insts") #undef BUILTIN #undef TARGET_BUILTIN diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl index 3f7a2d8649740..e2c6a4a3f15f3 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl @@ -30,3 +30,23 @@ void test_cvt_pk_f16_fp8(global half2* out, short a) { out[0] = __builtin_amdgcn_cvt_pk_f16_fp8(a); } + +// CHECK-LABEL: @test_cvt_pk_f16_bf8( +// CHECK-NEXT: entry: +// CHECK-NEXT:[[OUT_ADDR:%.*]] = alloca ptr addrspace(1), align 8, addrspace(5) +// CHECK-NEXT:[[A_ADDR:%.*]] = alloca i16, align 2, addrspace(5) +// CHECK-NEXT:[[OUT_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[OUT_ADDR]] to ptr +// CHECK-NEXT:[[A_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A_ADDR]] to ptr +// CHECK-NEXT:store ptr addrspace(1) [[OUT:%.*]], ptr [[OUT_ADDR_ASCAST]], align 8 +// CHECK-NEXT:store i16 [[A:%.*]], ptr [[A_ADDR_ASCAST]], align 2 +// CHECK-NEXT:[[TMP0:%.*]] = load i16, ptr [[A_ADDR_ASCAST]], align 2 +// CHECK-NEXT:[[TMP1:%.*]] = call <2 x half> @llvm.amdgcn.cvt.pk.f16.bf8(i16 [[TMP0]]) +// CHECK-NEXT:[[TMP2:%.*]] = load ptr addrspace(1), ptr [[OUT_ADDR_ASCAST]], align 8 +// CHECK-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds <2 x half>, ptr addrspace(1) [[TMP2]], i64 0 +// CHECK-NEXT:store <2 x half> [[TMP1]], ptr addrspace(1) [[ARRAYIDX]], align 4 +// CHECK-NEXT:ret void +// +void test_cvt_pk_f16_bf8(global half2* out, short a) +{ + out[0] = __builtin_amdgcn_cvt_pk_f16_bf8(a); +} diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td index 72b0aa01f71aa..6f974c97361de 100644 --- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td +++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td @@ -592,6 +592,10 @@ def int_amdgcn_cvt_pk_f16_fp8 : DefaultAttrsIntrinsic< [llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrSpeculatable] >, ClangBuiltin<"__builtin_amdgcn_cvt_pk_f16_fp8">; +def int_amdgcn_cvt_pk_f16_bf8 : DefaultAttrsIntrinsic< + [llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrSpeculatable] +>, ClangBuiltin<"__builtin_amdgcn_cvt_pk_f16_bf8">; + class AMDGPUCvtScaleF32Intrinsic : DefaultAttrsIn
[llvm-branch-commits] [llvm] [IR][PGO] Verify the structure of `VP` metadata. (PR #145584)
https://github.com/mtrofin updated https://github.com/llvm/llvm-project/pull/145584 >From 591bfc27c2de4a140301100afbbdea1d5a14e39c Mon Sep 17 00:00:00 2001 From: Mircea Trofin Date: Tue, 24 Jun 2025 13:14:09 -0700 Subject: [PATCH] [IR][PGO] Verify the structure of `VP` metadata. --- llvm/include/llvm/IR/ProfDataUtils.h | 3 +++ llvm/lib/IR/ProfDataUtils.cpp| 2 +- llvm/lib/IR/Verifier.cpp | 9 +++-- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/llvm/include/llvm/IR/ProfDataUtils.h b/llvm/include/llvm/IR/ProfDataUtils.h index 8e8d069b836f1..55b84abbbeb23 100644 --- a/llvm/include/llvm/IR/ProfDataUtils.h +++ b/llvm/include/llvm/IR/ProfDataUtils.h @@ -28,6 +28,9 @@ LLVM_ABI bool hasProfMD(const Instruction &I); /// Checks if an MDNode contains Branch Weight Metadata LLVM_ABI bool isBranchWeightMD(const MDNode *ProfileData); +/// Checks if an MDNode contains value profiling Metadata +LLVM_ABI bool isValueProfileMD(const MDNode *ProfileData); + /// Checks if an instructions has Branch Weight Metadata /// /// \param I The instruction to check diff --git a/llvm/lib/IR/ProfDataUtils.cpp b/llvm/lib/IR/ProfDataUtils.cpp index 21524eb840539..914535b599586 100644 --- a/llvm/lib/IR/ProfDataUtils.cpp +++ b/llvm/lib/IR/ProfDataUtils.cpp @@ -96,7 +96,7 @@ bool isBranchWeightMD(const MDNode *ProfileData) { return isTargetMD(ProfileData, "branch_weights", MinBWOps); } -static bool isValueProfileMD(const MDNode *ProfileData) { +bool isValueProfileMD(const MDNode *ProfileData) { return isTargetMD(ProfileData, "VP", MinVPOps); } diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp index ae95e3e2bff8d..c6f8ea78c04d5 100644 --- a/llvm/lib/IR/Verifier.cpp +++ b/llvm/lib/IR/Verifier.cpp @@ -5008,9 +5008,14 @@ void Verifier::visitProfMetadata(Instruction &I, MDNode *MD) { Check(mdconst::dyn_extract(MDO), "!prof brunch_weights operand is not a const int"); } - } else { -Check(ProfName == "VP", "expected either branch_weights or VP profile name", + } else if (ProfName == "VP") { +Check(isValueProfileMD(MD),"invalid value profiling metadata",MD); +Check(isa(I), + "value profiling !prof metadata is expected to be placed on call " + "instructions (which may be memops)", MD); + } else { +CheckFailed("expected either branch_weights or VP profile name", MD); } } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [IR][PGO] Verify the structure of `VP` metadata. (PR #145584)
https://github.com/mtrofin updated https://github.com/llvm/llvm-project/pull/145584 >From 591bfc27c2de4a140301100afbbdea1d5a14e39c Mon Sep 17 00:00:00 2001 From: Mircea Trofin Date: Tue, 24 Jun 2025 13:14:09 -0700 Subject: [PATCH] [IR][PGO] Verify the structure of `VP` metadata. --- llvm/include/llvm/IR/ProfDataUtils.h | 3 +++ llvm/lib/IR/ProfDataUtils.cpp| 2 +- llvm/lib/IR/Verifier.cpp | 9 +++-- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/llvm/include/llvm/IR/ProfDataUtils.h b/llvm/include/llvm/IR/ProfDataUtils.h index 8e8d069b836f1..55b84abbbeb23 100644 --- a/llvm/include/llvm/IR/ProfDataUtils.h +++ b/llvm/include/llvm/IR/ProfDataUtils.h @@ -28,6 +28,9 @@ LLVM_ABI bool hasProfMD(const Instruction &I); /// Checks if an MDNode contains Branch Weight Metadata LLVM_ABI bool isBranchWeightMD(const MDNode *ProfileData); +/// Checks if an MDNode contains value profiling Metadata +LLVM_ABI bool isValueProfileMD(const MDNode *ProfileData); + /// Checks if an instructions has Branch Weight Metadata /// /// \param I The instruction to check diff --git a/llvm/lib/IR/ProfDataUtils.cpp b/llvm/lib/IR/ProfDataUtils.cpp index 21524eb840539..914535b599586 100644 --- a/llvm/lib/IR/ProfDataUtils.cpp +++ b/llvm/lib/IR/ProfDataUtils.cpp @@ -96,7 +96,7 @@ bool isBranchWeightMD(const MDNode *ProfileData) { return isTargetMD(ProfileData, "branch_weights", MinBWOps); } -static bool isValueProfileMD(const MDNode *ProfileData) { +bool isValueProfileMD(const MDNode *ProfileData) { return isTargetMD(ProfileData, "VP", MinVPOps); } diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp index ae95e3e2bff8d..c6f8ea78c04d5 100644 --- a/llvm/lib/IR/Verifier.cpp +++ b/llvm/lib/IR/Verifier.cpp @@ -5008,9 +5008,14 @@ void Verifier::visitProfMetadata(Instruction &I, MDNode *MD) { Check(mdconst::dyn_extract(MDO), "!prof brunch_weights operand is not a const int"); } - } else { -Check(ProfName == "VP", "expected either branch_weights or VP profile name", + } else if (ProfName == "VP") { +Check(isValueProfileMD(MD),"invalid value profiling metadata",MD); +Check(isa(I), + "value profiling !prof metadata is expected to be placed on call " + "instructions (which may be memops)", MD); + } else { +CheckFailed("expected either branch_weights or VP profile name", MD); } } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] b8672c3 - Revert "[mlir][mesh] adding option for traversal order in sharding propagatio…"
Author: Qinkun Bao Date: 2025-06-24T10:09:20-04:00 New Revision: b8672c3278bf3ee83e8c44053d03558632ba46e0 URL: https://github.com/llvm/llvm-project/commit/b8672c3278bf3ee83e8c44053d03558632ba46e0 DIFF: https://github.com/llvm/llvm-project/commit/b8672c3278bf3ee83e8c44053d03558632ba46e0.diff LOG: Revert "[mlir][mesh] adding option for traversal order in sharding propagatio…" This reverts commit 43e1a5a411d972fe06a1afb86ffd5ba21fd2a376. Added: Modified: mlir/include/mlir/Dialect/Mesh/IR/MeshOps.h mlir/include/mlir/Dialect/Mesh/Transforms/Passes.h mlir/include/mlir/Dialect/Mesh/Transforms/Passes.td mlir/lib/Dialect/Mesh/IR/MeshOps.cpp mlir/lib/Dialect/Mesh/Transforms/ShardingPropagation.cpp Removed: mlir/test/Dialect/Mesh/backward-sharding-propagation.mlir mlir/test/Dialect/Mesh/forward-backward-sharding-propagation.mlir mlir/test/Dialect/Mesh/forward-sharding-propagation.mlir diff --git a/mlir/include/mlir/Dialect/Mesh/IR/MeshOps.h b/mlir/include/mlir/Dialect/Mesh/IR/MeshOps.h index c4d512b60bc51..1dc178586e918 100644 --- a/mlir/include/mlir/Dialect/Mesh/IR/MeshOps.h +++ b/mlir/include/mlir/Dialect/Mesh/IR/MeshOps.h @@ -206,6 +206,9 @@ Type shardType(Type type, MeshOp mesh, MeshSharding sharding); // Use newShardOp if it is not null. Otherwise create a new one. // May insert resharding if required. // Potentially updates newShardOp. +void maybeInsertTargetShardingAnnotation(MeshSharding sharding, + OpOperand &operand, OpBuilder &builder, + ShardOp &newShardOp); void maybeInsertTargetShardingAnnotation(MeshSharding sharding, OpResult result, OpBuilder &builder); void maybeInsertSourceShardingAnnotation(MeshSharding sharding, diff --git a/mlir/include/mlir/Dialect/Mesh/Transforms/Passes.h b/mlir/include/mlir/Dialect/Mesh/Transforms/Passes.h index a2424d43a8ba9..83399d10beaae 100644 --- a/mlir/include/mlir/Dialect/Mesh/Transforms/Passes.h +++ b/mlir/include/mlir/Dialect/Mesh/Transforms/Passes.h @@ -19,18 +19,6 @@ class FuncOp; namespace mesh { -/// This enum controls the traversal order for the sharding propagation. -enum class TraversalOrder { - /// Forward traversal. - Forward, - /// Backward traversal. - Backward, - /// Forward then backward traversal. - ForwardBackward, - /// Backward then forward traversal. - BackwardForward -}; - //===--===// // Passes //===--===// diff --git a/mlir/include/mlir/Dialect/Mesh/Transforms/Passes.td b/mlir/include/mlir/Dialect/Mesh/Transforms/Passes.td index 11ec7e78cd5e6..06ebf151e7d64 100644 --- a/mlir/include/mlir/Dialect/Mesh/Transforms/Passes.td +++ b/mlir/include/mlir/Dialect/Mesh/Transforms/Passes.td @@ -24,21 +24,6 @@ def ShardingPropagation : InterfacePass<"sharding-propagation", "mlir::FunctionO operation, and the operations themselves are added with sharding option attributes. }]; - let options = [ -Option<"traversal", "traversal", - "mlir::mesh::TraversalOrder", /*default=*/"mlir::mesh::TraversalOrder::BackwardForward", - "Traversal order to use for sharding propagation:", -[{::llvm::cl::values( - clEnumValN(mlir::mesh::TraversalOrder::Forward, "forward", - "Forward only traversal."), - clEnumValN(mlir::mesh::TraversalOrder::Backward, "backward", - "backward only traversal."), - clEnumValN(mlir::mesh::TraversalOrder::ForwardBackward, "forward-backward", - "forward-backward traversal."), - clEnumValN(mlir::mesh::TraversalOrder::BackwardForward, "backward-forward", - "backward-forward traversal.") -)}]>, - ]; let dependentDialects = [ "mesh::MeshDialect" ]; diff --git a/mlir/lib/Dialect/Mesh/IR/MeshOps.cpp b/mlir/lib/Dialect/Mesh/IR/MeshOps.cpp index b8cc91da722f0..0a01aaf776e7d 100644 --- a/mlir/lib/Dialect/Mesh/IR/MeshOps.cpp +++ b/mlir/lib/Dialect/Mesh/IR/MeshOps.cpp @@ -298,12 +298,13 @@ Type mesh::shardType(Type type, MeshOp mesh, MeshSharding sharding) { return type; } -static void maybeInsertTargetShardingAnnotationImpl(MeshSharding sharding, -Value &operandValue, -Operation *operandOp, -OpBuilder &builder, -ShardOp &newShardOp) { +void mlir::mesh::maybeInsertTargetShardingAnnotation(MeshSharding sharding, + OpOperand &operand, + OpBuilder &builder
[llvm-branch-commits] [llvm] [AMDGPU] Use reverse iteration in CodeGenPrepare (PR #145484)
arsenm wrote: > I don't understand the high level motivation here. "Normal" > combining/simplification order is to visit the operands of an instruction > before you visit the instruction itself. Pattern matching is bottom up. This is essentially a selection problem, and selection is done bottom up. > That way the "visit" function can assume that the operands have already been > simplified. This is one of the problems, you want to see the original value before it's been dirtied up by the transformations. > GlobalISel combines already work this way, The globalisel combiner is in reverse order. Doing it in reverse also largely avoids the problem of stale uniformity info. All newly created values are falsely seen as uniform. If you do it in reverse, you don't encounter those new incorrect values https://github.com/llvm/llvm-project/pull/145484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Use reverse iteration in CodeGenPrepare (PR #145484)
jayfoad wrote: > This is not a simplifying pass, it is making the IR more complicated. We have > to do hacks like this to prevent later more profitable combines from needing > to parse out expanded IR: Fair enough, makes sense. I just want to make sure the justification is properly understood and explained, preferably in the commit message. > https://github.com/llvm/llvm-project/blob/fa5d7c926f5f571397eb1981649198136f1d6a92/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp#L2332 I only disagree with the "like a normal combiner" part of that comment :) https://github.com/llvm/llvm-project/pull/145484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Use reverse iteration in CodeGenPrepare (PR #145484)
https://github.com/Pierre-vh created https://github.com/llvm/llvm-project/pull/145484 In order to make this easier, I also removed all "removeFromParent" calls from the visitors, instead adding instructions to a set of instructions to delete once the function has been visited. This avoids crashes due to functions deleting their operands. In theory we could allow functions to delete the instruction they visited (and only that one) but I think having one idiom for everything is less error-prone. Fixes #140219 >From b031681978e2b356c2ae8e65d6e08515c0044ac1 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Tue, 24 Jun 2025 11:35:58 +0200 Subject: [PATCH] [AMDGPU] Use reverse iteration in CodeGenPrepare In order to make this easier, I also removed all "removeFromParent" calls from the visitors, instead adding instructions to a set of instructions to delete once the function has been visited. This avoids crashes due to functions deleting their operands. In theory we could allow functions to delete the instruction they visited (and only that one) but I think having one idiom for everything is less error-prone. Fixes #140219 --- .../Target/AMDGPU/AMDGPUCodeGenPrepare.cpp| 82 --- ...egenprepare-break-large-phis-heuristics.ll | 42 +++--- .../AMDGPU/amdgpu-codegenprepare-fdiv.ll | 110 +- llvm/test/CodeGen/AMDGPU/fdiv_flags.f32.ll| 138 +- llvm/test/CodeGen/AMDGPU/uniform-select.ll| 64 5 files changed, 185 insertions(+), 251 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp index 5f1983791cfae..2a3aa1ac672b6 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp @@ -15,6 +15,7 @@ #include "AMDGPU.h" #include "AMDGPUTargetMachine.h" #include "SIModeRegisterDefaults.h" +#include "llvm/ADT/SetVector.h" #include "llvm/Analysis/AssumptionCache.h" #include "llvm/Analysis/ConstantFolding.h" #include "llvm/Analysis/TargetLibraryInfo.h" @@ -109,6 +110,7 @@ class AMDGPUCodeGenPrepareImpl bool FlowChanged = false; mutable Function *SqrtF32 = nullptr; mutable Function *LdexpF32 = nullptr; + mutable SetVector DeadVals; DenseMap BreakPhiNodesCache; @@ -285,28 +287,19 @@ bool AMDGPUCodeGenPrepareImpl::run() { BreakPhiNodesCache.clear(); bool MadeChange = false; - Function::iterator NextBB; - for (Function::iterator FI = F.begin(), FE = F.end(); FI != FE; FI = NextBB) { -BasicBlock *BB = &*FI; -NextBB = std::next(FI); - -BasicBlock::iterator Next; -for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; - I = Next) { - Next = std::next(I); - - MadeChange |= visit(*I); - - if (Next != E) { // Control flow changed -BasicBlock *NextInstBB = Next->getParent(); -if (NextInstBB != BB) { - BB = NextInstBB; - E = BB->end(); - FE = F.end(); -} - } + for (BasicBlock &BB : reverse(F)) { +for (Instruction &I : make_early_inc_range(reverse(BB))) { + if (!DeadVals.contains(&I)) +MadeChange |= visit(I); } } + + while (!DeadVals.empty()) { +RecursivelyDeleteTriviallyDeadInstructions( +DeadVals.pop_back_val(), TLI, /*MSSAU*/ nullptr, +[&](Value *V) { DeadVals.remove(V); }); + } + return MadeChange; } @@ -426,7 +419,7 @@ bool AMDGPUCodeGenPrepareImpl::replaceMulWithMul24(BinaryOperator &I) const { Value *NewVal = insertValues(Builder, Ty, ResultVals); NewVal->takeName(&I); I.replaceAllUsesWith(NewVal); - I.eraseFromParent(); + DeadVals.insert(&I); return true; } @@ -500,10 +493,10 @@ bool AMDGPUCodeGenPrepareImpl::foldBinOpIntoSelect(BinaryOperator &BO) const { FoldedT, FoldedF); NewSelect->takeName(&BO); BO.replaceAllUsesWith(NewSelect); - BO.eraseFromParent(); + DeadVals.insert(&BO); if (CastOp) -CastOp->eraseFromParent(); - Sel->eraseFromParent(); +DeadVals.insert(CastOp); + DeadVals.insert(Sel); return true; } @@ -900,7 +893,7 @@ bool AMDGPUCodeGenPrepareImpl::visitFDiv(BinaryOperator &FDiv) { if (NewVal) { FDiv.replaceAllUsesWith(NewVal); NewVal->takeName(&FDiv); -RecursivelyDeleteTriviallyDeadInstructions(&FDiv, TLI); +DeadVals.insert(&FDiv); } return true; @@ -1310,7 +1303,8 @@ within the byte are all 0. static bool tryNarrowMathIfNoOverflow(Instruction *I, const SITargetLowering *TLI, const TargetTransformInfo &TTI, - const DataLayout &DL) { + const DataLayout &DL, + SetVector &DeadVals) { unsigned Opc = I->getOpcode(); Type *OldType = I->getType(); @@ -1365,7 +1359,7 @@ static bool tryNarrowMathIfNoOverflow(Instruction *I, Va
[llvm-branch-commits] [llvm] 661839d - Revert "Add support for Windows Secure Hot-Patching (#138972)"
Author: Qinkun Bao Date: 2025-06-24T13:11:32-04:00 New Revision: 661839d189f825fbb6305e6ac5d8d1cc19ccc42e URL: https://github.com/llvm/llvm-project/commit/661839d189f825fbb6305e6ac5d8d1cc19ccc42e DIFF: https://github.com/llvm/llvm-project/commit/661839d189f825fbb6305e6ac5d8d1cc19ccc42e.diff LOG: Revert "Add support for Windows Secure Hot-Patching (#138972)" This reverts commit 26d318e4a9437f95b6a2e7abace5f2b867c88a3e. Added: Modified: clang/include/clang/Basic/CodeGenOptions.h clang/include/clang/Driver/Options.td clang/lib/CodeGen/CGCall.cpp clang/lib/CodeGen/CodeGenModule.cpp clang/lib/CodeGen/CodeGenModule.h clang/lib/Driver/ToolChains/Clang.cpp llvm/include/llvm/CodeGen/Passes.h llvm/include/llvm/DebugInfo/CodeView/CodeViewSymbols.def llvm/include/llvm/DebugInfo/CodeView/SymbolRecord.h llvm/include/llvm/IR/Attributes.td llvm/include/llvm/InitializePasses.h llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.h llvm/lib/CodeGen/CMakeLists.txt llvm/lib/CodeGen/TargetPassConfig.cpp llvm/lib/DebugInfo/CodeView/SymbolDumper.cpp llvm/lib/DebugInfo/CodeView/SymbolRecordMapping.cpp llvm/lib/ObjectYAML/CodeViewYAMLSymbols.cpp llvm/tools/llvm-pdbutil/MinimalSymbolDumper.cpp Removed: clang/test/CodeGen/X86/ms-secure-hotpatch-bad-file.c clang/test/CodeGen/X86/ms-secure-hotpatch-cpp.cpp clang/test/CodeGen/X86/ms-secure-hotpatch-eh.cpp clang/test/CodeGen/X86/ms-secure-hotpatch-globals.c clang/test/CodeGen/X86/ms-secure-hotpatch-lto.c clang/test/CodeGen/X86/ms-secure-hotpatch.c llvm/lib/CodeGen/WindowsSecureHotPatching.cpp llvm/test/CodeGen/X86/ms-secure-hotpatch-attr.ll llvm/test/CodeGen/X86/ms-secure-hotpatch-bad-file.ll llvm/test/CodeGen/X86/ms-secure-hotpatch-direct-global-access.ll llvm/test/CodeGen/X86/ms-secure-hotpatch-functions-file.ll llvm/test/CodeGen/X86/ms-secure-hotpatch-functions-list.ll diff --git a/clang/include/clang/Basic/CodeGenOptions.h b/clang/include/clang/Basic/CodeGenOptions.h index 77a0c559f7689..7ba21fca6dd6b 100644 --- a/clang/include/clang/Basic/CodeGenOptions.h +++ b/clang/include/clang/Basic/CodeGenOptions.h @@ -495,13 +495,6 @@ class CodeGenOptions : public CodeGenOptionsBase { /// A list of functions that are replacable by the loader. std::vector LoaderReplaceableFunctionNames; - /// The name of a file that contains functions which will be compiled for - /// hotpatching. See -fms-secure-hotpatch-functions-file. - std::string MSSecureHotPatchFunctionsFile; - - /// A list of functions which will be compiled for hotpatching. - /// See -fms-secure-hotpatch-functions-list. - std::vector MSSecureHotPatchFunctionsList; public: // Define accessors/mutators for code generation options of enumeration type. diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 26e953f7ac613..4f91b82a3bfa6 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -3838,24 +3838,6 @@ def fms_hotpatch : Flag<["-"], "fms-hotpatch">, Group, Visibility<[ClangOption, CC1Option, CLOption]>, HelpText<"Ensure that all functions can be hotpatched at runtime">, MarshallingInfoFlag>; - -// See llvm/lib/CodeGen/WindowsSecureHotPatching.cpp -def fms_secure_hotpatch_functions_file -: Joined<["-"], "fms-secure-hotpatch-functions-file=">, - Group, - Visibility<[ClangOption, CC1Option, CLOption]>, - MarshallingInfoString>, - HelpText<"Path to a file that contains a list of mangled names of " - "functions that should be hot-patched for Windows Secure " - "Hot-Patching">; -def fms_secure_hotpatch_functions_list -: CommaJoined<["-"], "fms-secure-hotpatch-functions-list=">, - Group, - Visibility<[ClangOption, CC1Option, CLOption]>, - MarshallingInfoStringVector>, - HelpText<"List of mangled symbol names of functions that should be " - "hot-patched for Windows Secure Hot-Patching">; - def fpcc_struct_return : Flag<["-"], "fpcc-struct-return">, Group, Visibility<[ClangOption, CC1Option]>, HelpText<"Override the default ABI to return all structs on the stack">; diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp index c8c3d6b20c496..fd75de42515da 100644 --- a/clang/lib/CodeGen/CGCall.cpp +++ b/clang/lib/CodeGen/CGCall.cpp @@ -2660,13 +2660,6 @@ void CodeGenModule::ConstructAttributeList(StringRef Name, // CPU/feature overrides. addDefaultFunctionDefinitionAttributes // handles these separately to set them based on the global defaults. GetCPUAndFeaturesAttributes(CalleeInfo.getCalleeDecl(), FuncAttrs); - -// Windows hotpatching support -if (!MSHotPatchFunctions.empty()) { - bool IsHotPatched = llvm::binary_search(MSHotPa
[llvm-branch-commits] [llvm] release/20.x: [AsmPrinter] Always emit global equivalents if there is non-global uses (#145648) (PR #145690)
llvmbot wrote: @nikic What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/145690 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [AMDGPU] Add support for `v_cvt_pk_f16_fp8` on gfx1250 (PR #145747)
https://github.com/shiltian updated https://github.com/llvm/llvm-project/pull/145747 >From 5e439f780f3eab0a75e68d2bac9c85892c9f34c2 Mon Sep 17 00:00:00 2001 From: "Mekhanoshin, Stanislav" Date: Wed, 25 Jun 2025 13:27:57 -0400 Subject: [PATCH] [AMDGPU] Add support for `v_cvt_pk_f16_fp8` on gfx1250 Co-authored-by: Shilei Tian --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 2 + .../CodeGenOpenCL/builtins-amdgcn-gfx1250.cl | 20 ++ llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 4 ++ .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 1 + llvm/lib/Target/AMDGPU/VOP1Instructions.td| 16 + .../CodeGen/AMDGPU/llvm.amdgcn.cvt.f16.fp8.ll | 72 +++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1-fake16.s | 9 +++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1.s| 9 +++ .../MC/AMDGPU/gfx1250_asm_vop1_dpp16-fake16.s | 4 ++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp16.s | 8 +++ .../MC/AMDGPU/gfx1250_asm_vop1_dpp8-fake16.s | 4 ++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp8.s | 8 +++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1_err.s| 10 +++ .../gfx1250_asm_vop3_from_vop1-fake16.s | 12 .../MC/AMDGPU/gfx1250_asm_vop3_from_vop1.s| 12 .../gfx1250_asm_vop3_from_vop1_dpp16-fake16.s | 8 +++ .../AMDGPU/gfx1250_asm_vop3_from_vop1_dpp16.s | 8 +++ .../gfx1250_asm_vop3_from_vop1_dpp8-fake16.s | 8 +++ .../AMDGPU/gfx1250_asm_vop3_from_vop1_dpp8.s | 8 +++ .../Disassembler/AMDGPU/gfx1250_dasm_vop1.txt | 10 +++ .../AMDGPU/gfx1250_dasm_vop1_dpp16.txt| 8 +++ .../AMDGPU/gfx1250_dasm_vop1_dpp8.txt | 8 +++ .../AMDGPU/gfx1250_dasm_vop3_from_vop1.txt| 15 .../gfx1250_dasm_vop3_from_vop1_dpp16.txt | 8 +++ .../gfx1250_dasm_vop3_from_vop1_dpp8.txt | 8 +++ 25 files changed, 280 insertions(+) create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.f16.fp8.ll diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index edb3a17ac07c6..94fa3e9b74c46 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -642,5 +642,7 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_f16_f32, "V2hV2hfUiIb", "nc", "f32-to-f16 TARGET_BUILTIN(__builtin_amdgcn_s_setprio_inc_wg, "vIs", "n", "setprio-inc-wg-inst") +TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_f16_fp8, "V2hs", "nc", "gfx1250-insts") + #undef BUILTIN #undef TARGET_BUILTIN diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl index 3709b1ff52f35..3f7a2d8649740 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl @@ -10,3 +10,23 @@ void test_setprio_inc_wg() { __builtin_amdgcn_s_setprio_inc_wg(10); } + +// CHECK-LABEL: @test_cvt_pk_f16_fp8( +// CHECK-NEXT: entry: +// CHECK-NEXT:[[OUT_ADDR:%.*]] = alloca ptr addrspace(1), align 8, addrspace(5) +// CHECK-NEXT:[[A_ADDR:%.*]] = alloca i16, align 2, addrspace(5) +// CHECK-NEXT:[[OUT_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[OUT_ADDR]] to ptr +// CHECK-NEXT:[[A_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A_ADDR]] to ptr +// CHECK-NEXT:store ptr addrspace(1) [[OUT:%.*]], ptr [[OUT_ADDR_ASCAST]], align 8 +// CHECK-NEXT:store i16 [[A:%.*]], ptr [[A_ADDR_ASCAST]], align 2 +// CHECK-NEXT:[[TMP0:%.*]] = load i16, ptr [[A_ADDR_ASCAST]], align 2 +// CHECK-NEXT:[[TMP1:%.*]] = call <2 x half> @llvm.amdgcn.cvt.pk.f16.fp8(i16 [[TMP0]]) +// CHECK-NEXT:[[TMP2:%.*]] = load ptr addrspace(1), ptr [[OUT_ADDR_ASCAST]], align 8 +// CHECK-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds <2 x half>, ptr addrspace(1) [[TMP2]], i64 0 +// CHECK-NEXT:store <2 x half> [[TMP1]], ptr addrspace(1) [[ARRAYIDX]], align 4 +// CHECK-NEXT:ret void +// +void test_cvt_pk_f16_fp8(global half2* out, short a) +{ + out[0] = __builtin_amdgcn_cvt_pk_f16_fp8(a); +} diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td index e6f0bf6276086..72b0aa01f71aa 100644 --- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td +++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td @@ -588,6 +588,10 @@ def int_amdgcn_ds_ordered_swap : AMDGPUDSOrderedIntrinsic; def int_amdgcn_ds_append : AMDGPUDSAppendConsumedIntrinsic; def int_amdgcn_ds_consume : AMDGPUDSAppendConsumedIntrinsic; +def int_amdgcn_cvt_pk_f16_fp8 : DefaultAttrsIntrinsic< + [llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrSpeculatable] +>, ClangBuiltin<"__builtin_amdgcn_cvt_pk_f16_fp8">; + class AMDGPUCvtScaleF32Intrinsic : DefaultAttrsIntrinsic< [DstTy], [Src0Ty, llvm_float_ty], [IntrNoMem, IntrSpeculatable] >, ClangBuiltin<"__builtin_amdgcn_"#name>; diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index a7b08794fdf1b..50d297cd096a6 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURe
[llvm-branch-commits] [clang] [llvm] [AMDGPU] Add support for `v_cvt_pk_f16_fp8` on gfx1250 (PR #145747)
https://github.com/shiltian updated https://github.com/llvm/llvm-project/pull/145747 >From cd383faea1421c6b048fc709685d56e3483c72f5 Mon Sep 17 00:00:00 2001 From: "Mekhanoshin, Stanislav" Date: Wed, 25 Jun 2025 13:27:57 -0400 Subject: [PATCH] [AMDGPU] Add support for `v_cvt_pk_f16_fp8` on gfx1250 Co-authored-by: Shilei Tian --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 2 + .../CodeGenOpenCL/builtins-amdgcn-gfx1250.cl | 20 ++ llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 4 ++ .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 1 + llvm/lib/Target/AMDGPU/VOP1Instructions.td| 16 + .../CodeGen/AMDGPU/llvm.amdgcn.cvt.f16.fp8.ll | 72 +++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1-fake16.s | 9 +++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1.s| 9 +++ .../MC/AMDGPU/gfx1250_asm_vop1_dpp16-fake16.s | 4 ++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp16.s | 8 +++ .../MC/AMDGPU/gfx1250_asm_vop1_dpp8-fake16.s | 4 ++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp8.s | 8 +++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1_err.s| 10 +++ .../gfx1250_asm_vop3_from_vop1-fake16.s | 12 .../MC/AMDGPU/gfx1250_asm_vop3_from_vop1.s| 12 .../gfx1250_asm_vop3_from_vop1_dpp16-fake16.s | 8 +++ .../AMDGPU/gfx1250_asm_vop3_from_vop1_dpp16.s | 8 +++ .../gfx1250_asm_vop3_from_vop1_dpp8-fake16.s | 8 +++ .../AMDGPU/gfx1250_asm_vop3_from_vop1_dpp8.s | 8 +++ .../Disassembler/AMDGPU/gfx1250_dasm_vop1.txt | 10 +++ .../AMDGPU/gfx1250_dasm_vop1_dpp16.txt| 8 +++ .../AMDGPU/gfx1250_dasm_vop1_dpp8.txt | 8 +++ .../AMDGPU/gfx1250_dasm_vop3_from_vop1.txt| 15 .../gfx1250_dasm_vop3_from_vop1_dpp16.txt | 8 +++ .../gfx1250_dasm_vop3_from_vop1_dpp8.txt | 8 +++ 25 files changed, 280 insertions(+) create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.f16.fp8.ll diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index edb3a17ac07c6..94fa3e9b74c46 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -642,5 +642,7 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_f16_f32, "V2hV2hfUiIb", "nc", "f32-to-f16 TARGET_BUILTIN(__builtin_amdgcn_s_setprio_inc_wg, "vIs", "n", "setprio-inc-wg-inst") +TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_f16_fp8, "V2hs", "nc", "gfx1250-insts") + #undef BUILTIN #undef TARGET_BUILTIN diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl index 3709b1ff52f35..3f7a2d8649740 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl @@ -10,3 +10,23 @@ void test_setprio_inc_wg() { __builtin_amdgcn_s_setprio_inc_wg(10); } + +// CHECK-LABEL: @test_cvt_pk_f16_fp8( +// CHECK-NEXT: entry: +// CHECK-NEXT:[[OUT_ADDR:%.*]] = alloca ptr addrspace(1), align 8, addrspace(5) +// CHECK-NEXT:[[A_ADDR:%.*]] = alloca i16, align 2, addrspace(5) +// CHECK-NEXT:[[OUT_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[OUT_ADDR]] to ptr +// CHECK-NEXT:[[A_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A_ADDR]] to ptr +// CHECK-NEXT:store ptr addrspace(1) [[OUT:%.*]], ptr [[OUT_ADDR_ASCAST]], align 8 +// CHECK-NEXT:store i16 [[A:%.*]], ptr [[A_ADDR_ASCAST]], align 2 +// CHECK-NEXT:[[TMP0:%.*]] = load i16, ptr [[A_ADDR_ASCAST]], align 2 +// CHECK-NEXT:[[TMP1:%.*]] = call <2 x half> @llvm.amdgcn.cvt.pk.f16.fp8(i16 [[TMP0]]) +// CHECK-NEXT:[[TMP2:%.*]] = load ptr addrspace(1), ptr [[OUT_ADDR_ASCAST]], align 8 +// CHECK-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds <2 x half>, ptr addrspace(1) [[TMP2]], i64 0 +// CHECK-NEXT:store <2 x half> [[TMP1]], ptr addrspace(1) [[ARRAYIDX]], align 4 +// CHECK-NEXT:ret void +// +void test_cvt_pk_f16_fp8(global half2* out, short a) +{ + out[0] = __builtin_amdgcn_cvt_pk_f16_fp8(a); +} diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td index e6f0bf6276086..72b0aa01f71aa 100644 --- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td +++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td @@ -588,6 +588,10 @@ def int_amdgcn_ds_ordered_swap : AMDGPUDSOrderedIntrinsic; def int_amdgcn_ds_append : AMDGPUDSAppendConsumedIntrinsic; def int_amdgcn_ds_consume : AMDGPUDSAppendConsumedIntrinsic; +def int_amdgcn_cvt_pk_f16_fp8 : DefaultAttrsIntrinsic< + [llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrSpeculatable] +>, ClangBuiltin<"__builtin_amdgcn_cvt_pk_f16_fp8">; + class AMDGPUCvtScaleF32Intrinsic : DefaultAttrsIntrinsic< [DstTy], [Src0Ty, llvm_float_ty], [IntrNoMem, IntrSpeculatable] >, ClangBuiltin<"__builtin_amdgcn_"#name>; diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index a7b08794fdf1b..50d297cd096a6 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURe
[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)
@@ -129,6 +147,245 @@ bool AlwaysInlineImpl( return Changed; } +/// Promote allocas to registers if possible. +static void promoteAllocas( +Function *Caller, SmallPtrSetImpl &AllocasToPromote, +function_ref &GetAssumptionCache) { + if (AllocasToPromote.empty()) +return; + + SmallVector PromotableAllocas; + llvm::copy_if(AllocasToPromote, std::back_inserter(PromotableAllocas), +isAllocaPromotable); + if (PromotableAllocas.empty()) +return; + + DominatorTree DT(*Caller); + AssumptionCache &AC = GetAssumptionCache(*Caller); + PromoteMemToReg(PromotableAllocas, DT, &AC); + NumAllocasPromoted += PromotableAllocas.size(); + // Emit a remark for the promotion. + OptimizationRemarkEmitter ORE(Caller); + DebugLoc DLoc = Caller->getEntryBlock().getTerminator()->getDebugLoc(); + ORE.emit([&]() { +return OptimizationRemark(DEBUG_TYPE, "PromoteAllocas", DLoc, + &Caller->getEntryBlock()) + << "Promoting " << ore::NV("NumAlloca", PromotableAllocas.size()) + << " allocas to SSA registers in function '" + << ore::NV("Function", Caller) << "'"; + }); + LLVM_DEBUG(dbgs() << "Promoted " << PromotableAllocas.size() +<< " allocas to registers in function " << Caller->getName() +<< "\n"); +} + +/// We use a different visitation order of functions here to solve a phase +/// ordering problem. After inlining, a caller function may have allocas that +/// were previously used for passing reference arguments to the callee that +/// are now promotable to registers, using SROA/mem2reg. However if we just let +/// the AlwaysInliner continue inlining everything at once, the later SROA pass +/// in the pipeline will end up placing phis for these allocas into blocks along +/// the dominance frontier which may extend further than desired (e.g. loop +/// headers). This can happen when the caller is then inlined into another +/// caller, and the allocas end up hoisted further before SROA is run. +/// +/// Instead what we want is to try to do, as best as we can, is to inline leaf +/// functions into callers, and then run PromoteMemToReg() on the allocas that +/// were passed into the callee before it was inlined. +/// +/// We want to do this *before* the caller is inlined into another caller +/// because we want the alloca promotion to happen before its scope extends too +/// far because of further inlining. +/// +/// Here's a simple pseudo-example: +/// outermost_caller() { +/// for (...) { +/// middle_caller(); +/// } +/// } +/// +/// middle_caller() { +/// int stack_var; +/// inner_callee(&stack_var); +/// } +/// +/// inner_callee(int *x) { +/// // Do something with x. +/// } +/// +/// In this case, we want to inline inner_callee() into middle_caller() and +/// then promote stack_var to a register before we inline middle_caller() into +/// outermost_caller(). The regular always_inliner would inline everything at +/// once, and then SROA/mem2reg would promote stack_var to a register but in +/// the context of outermost_caller() which is not what we want. mtrofin wrote: There's no plan yet with the ModuleInliner, currently it lets us experiment with alternative traversals, and some of them have been showing promise. I'm mainly trying to understand if: - the order of traversal matters (for this problem here) - do all the function simplification passes need to be run after some inlining or just some? I'm guessing it's really "just a specific subset", correct? https://github.com/llvm/llvm-project/pull/145613 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [AMDGPU] Add support for `v_cvt_pk_f16_bf8` on gfx1250 (PR #145753)
https://github.com/shiltian created https://github.com/llvm/llvm-project/pull/145753 Co-authored-by: Shilei Tian >From 76ed9609ab498504f7bd557d9703cb5d5f06b043 Mon Sep 17 00:00:00 2001 From: Shilei Tian Date: Wed, 25 Jun 2025 13:56:12 -0400 Subject: [PATCH] [AMDGPU] Add support for `v_cvt_pk_f16_bf8` on gfx1250 Co-authored-by: Shilei Tian --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 1 + .../CodeGenOpenCL/builtins-amdgcn-gfx1250.cl | 20 +++ llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 4 +++ .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 1 + llvm/lib/Target/AMDGPU/VOP1Instructions.td| 4 +++ .../CodeGen/AMDGPU/llvm.amdgcn.cvt.f16.fp8.ll | 35 +++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1-fake16.s | 9 + llvm/test/MC/AMDGPU/gfx1250_asm_vop1.s| 9 + .../MC/AMDGPU/gfx1250_asm_vop1_dpp16-fake16.s | 4 +++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp16.s | 8 + .../MC/AMDGPU/gfx1250_asm_vop1_dpp8-fake16.s | 4 +++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp8.s | 8 + llvm/test/MC/AMDGPU/gfx1250_asm_vop1_err.s| 10 ++ .../gfx1250_asm_vop3_from_vop1-fake16.s | 12 +++ .../MC/AMDGPU/gfx1250_asm_vop3_from_vop1.s| 12 +++ .../gfx1250_asm_vop3_from_vop1_dpp16-fake16.s | 8 + .../AMDGPU/gfx1250_asm_vop3_from_vop1_dpp16.s | 8 + .../gfx1250_asm_vop3_from_vop1_dpp8-fake16.s | 8 + .../AMDGPU/gfx1250_asm_vop3_from_vop1_dpp8.s | 8 + .../Disassembler/AMDGPU/gfx1250_dasm_vop1.txt | 10 ++ .../AMDGPU/gfx1250_dasm_vop1_dpp16.txt| 8 + .../AMDGPU/gfx1250_dasm_vop1_dpp8.txt | 8 + .../AMDGPU/gfx1250_dasm_vop3_from_vop1.txt| 15 .../gfx1250_dasm_vop3_from_vop1_dpp16.txt | 8 + .../gfx1250_dasm_vop3_from_vop1_dpp8.txt | 8 + 25 files changed, 230 insertions(+) diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 94fa3e9b74c46..1d1f5a4ee3f9f 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -643,6 +643,7 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_f16_f32, "V2hV2hfUiIb", "nc", "f32-to-f16 TARGET_BUILTIN(__builtin_amdgcn_s_setprio_inc_wg, "vIs", "n", "setprio-inc-wg-inst") TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_f16_fp8, "V2hs", "nc", "gfx1250-insts") +TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_f16_bf8, "V2hs", "nc", "gfx1250-insts") #undef BUILTIN #undef TARGET_BUILTIN diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl index 3f7a2d8649740..e2c6a4a3f15f3 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl @@ -30,3 +30,23 @@ void test_cvt_pk_f16_fp8(global half2* out, short a) { out[0] = __builtin_amdgcn_cvt_pk_f16_fp8(a); } + +// CHECK-LABEL: @test_cvt_pk_f16_bf8( +// CHECK-NEXT: entry: +// CHECK-NEXT:[[OUT_ADDR:%.*]] = alloca ptr addrspace(1), align 8, addrspace(5) +// CHECK-NEXT:[[A_ADDR:%.*]] = alloca i16, align 2, addrspace(5) +// CHECK-NEXT:[[OUT_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[OUT_ADDR]] to ptr +// CHECK-NEXT:[[A_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A_ADDR]] to ptr +// CHECK-NEXT:store ptr addrspace(1) [[OUT:%.*]], ptr [[OUT_ADDR_ASCAST]], align 8 +// CHECK-NEXT:store i16 [[A:%.*]], ptr [[A_ADDR_ASCAST]], align 2 +// CHECK-NEXT:[[TMP0:%.*]] = load i16, ptr [[A_ADDR_ASCAST]], align 2 +// CHECK-NEXT:[[TMP1:%.*]] = call <2 x half> @llvm.amdgcn.cvt.pk.f16.bf8(i16 [[TMP0]]) +// CHECK-NEXT:[[TMP2:%.*]] = load ptr addrspace(1), ptr [[OUT_ADDR_ASCAST]], align 8 +// CHECK-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds <2 x half>, ptr addrspace(1) [[TMP2]], i64 0 +// CHECK-NEXT:store <2 x half> [[TMP1]], ptr addrspace(1) [[ARRAYIDX]], align 4 +// CHECK-NEXT:ret void +// +void test_cvt_pk_f16_bf8(global half2* out, short a) +{ + out[0] = __builtin_amdgcn_cvt_pk_f16_bf8(a); +} diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td index 72b0aa01f71aa..6f974c97361de 100644 --- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td +++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td @@ -592,6 +592,10 @@ def int_amdgcn_cvt_pk_f16_fp8 : DefaultAttrsIntrinsic< [llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrSpeculatable] >, ClangBuiltin<"__builtin_amdgcn_cvt_pk_f16_fp8">; +def int_amdgcn_cvt_pk_f16_bf8 : DefaultAttrsIntrinsic< + [llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrSpeculatable] +>, ClangBuiltin<"__builtin_amdgcn_cvt_pk_f16_bf8">; + class AMDGPUCvtScaleF32Intrinsic : DefaultAttrsIntrinsic< [DstTy], [Src0Ty, llvm_float_ty], [IntrNoMem, IntrSpeculatable] >, ClangBuiltin<"__builtin_amdgcn_"#name>; diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index 50d297cd096a6..b20760c356263 100644 -
[llvm-branch-commits] [mlir] f7f43d7 - Revert "[mlir][OpenMP] Use correct debug location with link clause. (#145026)"
Author: Abid Qadeer Date: 2025-06-25T20:04:40+01:00 New Revision: f7f43d738e2f5054c604ec337c0c4f03315ed910 URL: https://github.com/llvm/llvm-project/commit/f7f43d738e2f5054c604ec337c0c4f03315ed910 DIFF: https://github.com/llvm/llvm-project/commit/f7f43d738e2f5054c604ec337c0c4f03315ed910.diff LOG: Revert "[mlir][OpenMP] Use correct debug location with link clause. (#145026)" This reverts commit 006037675c10b20d33a2a3c273bf3cdb8b0a252c. Added: Modified: mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp Removed: mlir/test/Target/LLVMIR/omptarget-debug-map-link-loc.mlir diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp index 3806db3ceab25..23140f22555a5 100644 --- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp @@ -4831,7 +4831,6 @@ handleDeclareTargetMapVar(MapInfoData &mapData, llvm::IRBuilderBase &builder, llvm::Function *func) { assert(moduleTranslation.getOpenMPBuilder()->Config.isTargetDevice() && "function only supported for target device codegen"); - llvm::IRBuilderBase::InsertPointGuard guard(builder); for (size_t i = 0; i < mapData.MapClause.size(); ++i) { // In the case of declare target mapped variables, the basePointer is // the reference pointer generated by the convertDeclareTargetAttr @@ -4866,7 +4865,6 @@ handleDeclareTargetMapVar(MapInfoData &mapData, for (llvm::User *user : userVec) { if (auto *insn = dyn_cast(user)) { if (insn->getFunction() == func) { -builder.SetCurrentDebugLocation(insn->getDebugLoc()); auto *load = builder.CreateLoad(mapData.BasePointers[i]->getType(), mapData.BasePointers[i]); load->moveBefore(insn->getIterator()); diff --git a/mlir/test/Target/LLVMIR/omptarget-debug-map-link-loc.mlir b/mlir/test/Target/LLVMIR/omptarget-debug-map-link-loc.mlir deleted file mode 100644 index 89fc1dde4b6cb..0 --- a/mlir/test/Target/LLVMIR/omptarget-debug-map-link-loc.mlir +++ /dev/null @@ -1,40 +0,0 @@ -// RUN: mlir-translate -mlir-to-llvmir %s - -module attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<"dlti.alloca_memory_space", 5 : ui32>>, llvm.target_triple = "amdgcn-amd-amdhsa", omp.is_target_device = true} { - llvm.mlir.global external @_QMtest_0Esp() {addr_space = 1 : i32, omp.declare_target = #omp.declaretarget} : i32 { -%0 = llvm.mlir.constant(1 : i32) : i32 loc(#loc1) -llvm.return %0 : i32 loc(#loc1) - } loc(#loc1) - llvm.func @_QQmain() { -%0 = llvm.mlir.constant(1 : i64) : i64 -%1 = llvm.alloca %0 x i32 : (i64) -> !llvm.ptr<5> loc(#loc2) -%2 = llvm.addrspacecast %1 : !llvm.ptr<5> to !llvm.ptr loc(#loc2) -%5 = llvm.mlir.addressof @_QMtest_0Esp : !llvm.ptr<1> loc(#loc1) -%6 = llvm.addrspacecast %5 : !llvm.ptr<1> to !llvm.ptr loc(#loc1) -%7 = omp.map.info var_ptr(%2 : !llvm.ptr, i32) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr loc(#loc3) -%8 = omp.map.info var_ptr(%6 : !llvm.ptr, i32) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr loc(#loc3) -omp.target map_entries(%7 -> %arg0, %8 -> %arg1 : !llvm.ptr, !llvm.ptr) { - %16 = llvm.load %arg1 : !llvm.ptr -> i32 loc(#loc5) - llvm.store %16, %arg0 : i32, !llvm.ptr loc(#loc5) - omp.terminator loc(#loc5) -} loc(#loc5) -llvm.return loc(#loc6) - } loc(#loc15) -} -#di_file = #llvm.di_file<"target7.f90" in ""> -#di_null_type = #llvm.di_null_type -#di_compile_unit = #llvm.di_compile_unit, - sourceLanguage = DW_LANG_Fortran95, file = #di_file, producer = "flang", - isOptimized = false, emissionKind = LineTablesOnly> -#di_subroutine_type = #llvm.di_subroutine_type< - callingConvention = DW_CC_program, types = #di_null_type> -#di_subprogram = #llvm.di_subprogram, - compileUnit = #di_compile_unit, scope = #di_file, name = "main", - file = #di_file, subprogramFlags = "Definition|MainSubprogram", - type = #di_subroutine_type> -#loc1 = loc("test.f90":3:18) -#loc2 = loc("test.f90":7:7) -#loc3 = loc("test.f90":9:18) -#loc5 = loc("test.f90":11:7) -#loc6 = loc("test.f90":12:7) -#loc15 = loc(fused<#di_subprogram>[#loc2]) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)
https://github.com/kbeyls edited https://github.com/llvm/llvm-project/pull/137224 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [AMDGPU] Add support for `v_cvt_pk_f16_fp8` on gfx1250 (PR #145747)
llvmbot wrote: @llvm/pr-subscribers-mc Author: Shilei Tian (shiltian) Changes Co-authored-by: Shilei Tian --- Patch is 30.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/145747.diff 25 Files Affected: - (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+2) - (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl (+20) - (modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+4) - (modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+1) - (modified) llvm/lib/Target/AMDGPU/VOP1Instructions.td (+16) - (added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.f16.fp8.ll (+72) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop1-fake16.s (+9) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop1.s (+9) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp16-fake16.s (+5) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp16.s (+8) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp8-fake16.s (+4) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp8.s (+8) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop1_err.s (+10) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3_from_vop1-fake16.s (+12) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3_from_vop1.s (+12) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3_from_vop1_dpp16-fake16.s (+8) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3_from_vop1_dpp16.s (+8) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3_from_vop1_dpp8-fake16.s (+8) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3_from_vop1_dpp8.s (+8) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop1.txt (+10) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop1_dpp16.txt (+8) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop1_dpp8.txt (+8) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3_from_vop1.txt (+15) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3_from_vop1_dpp16.txt (+8) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3_from_vop1_dpp8.txt (+8) ``diff diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index edb3a17ac07c6..94fa3e9b74c46 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -642,5 +642,7 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_f16_f32, "V2hV2hfUiIb", "nc", "f32-to-f16 TARGET_BUILTIN(__builtin_amdgcn_s_setprio_inc_wg, "vIs", "n", "setprio-inc-wg-inst") +TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_f16_fp8, "V2hs", "nc", "gfx1250-insts") + #undef BUILTIN #undef TARGET_BUILTIN diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl index 3709b1ff52f35..3f7a2d8649740 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl @@ -10,3 +10,23 @@ void test_setprio_inc_wg() { __builtin_amdgcn_s_setprio_inc_wg(10); } + +// CHECK-LABEL: @test_cvt_pk_f16_fp8( +// CHECK-NEXT: entry: +// CHECK-NEXT:[[OUT_ADDR:%.*]] = alloca ptr addrspace(1), align 8, addrspace(5) +// CHECK-NEXT:[[A_ADDR:%.*]] = alloca i16, align 2, addrspace(5) +// CHECK-NEXT:[[OUT_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[OUT_ADDR]] to ptr +// CHECK-NEXT:[[A_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A_ADDR]] to ptr +// CHECK-NEXT:store ptr addrspace(1) [[OUT:%.*]], ptr [[OUT_ADDR_ASCAST]], align 8 +// CHECK-NEXT:store i16 [[A:%.*]], ptr [[A_ADDR_ASCAST]], align 2 +// CHECK-NEXT:[[TMP0:%.*]] = load i16, ptr [[A_ADDR_ASCAST]], align 2 +// CHECK-NEXT:[[TMP1:%.*]] = call <2 x half> @llvm.amdgcn.cvt.pk.f16.fp8(i16 [[TMP0]]) +// CHECK-NEXT:[[TMP2:%.*]] = load ptr addrspace(1), ptr [[OUT_ADDR_ASCAST]], align 8 +// CHECK-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds <2 x half>, ptr addrspace(1) [[TMP2]], i64 0 +// CHECK-NEXT:store <2 x half> [[TMP1]], ptr addrspace(1) [[ARRAYIDX]], align 4 +// CHECK-NEXT:ret void +// +void test_cvt_pk_f16_fp8(global half2* out, short a) +{ + out[0] = __builtin_amdgcn_cvt_pk_f16_fp8(a); +} diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td index e6f0bf6276086..72b0aa01f71aa 100644 --- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td +++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td @@ -588,6 +588,10 @@ def int_amdgcn_ds_ordered_swap : AMDGPUDSOrderedIntrinsic; def int_amdgcn_ds_append : AMDGPUDSAppendConsumedIntrinsic; def int_amdgcn_ds_consume : AMDGPUDSAppendConsumedIntrinsic; +def int_amdgcn_cvt_pk_f16_fp8 : DefaultAttrsIntrinsic< + [llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrSpeculatable] +>, ClangBuiltin<"__builtin_amdgcn_cvt_pk_f16_fp8">; + class AMDGPUCvtScaleF32Intrinsic : DefaultAttrsIntrinsic< [DstTy], [Src0Ty, llvm_float_ty], [IntrNoMem, IntrSpeculatable] >, ClangBuiltin<"__builtin_amdgcn_"#name>; diff --git a
[llvm-branch-commits] [llvm] [GOFF] Add writing of section symbols (PR #133799)
@@ -0,0 +1,106 @@ +//===- MCGOFFAttributes.h - Attributes of GOFF symbols ===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// Defines the various attribute collections defining GOFF symbols. +// +//===--===// + +#ifndef LLVM_MC_MCGOFFATTRIBUTES_H +#define LLVM_MC_MCGOFFATTRIBUTES_H + +#include "llvm/ADT/StringRef.h" +#include "llvm/BinaryFormat/GOFF.h" +#include + +namespace llvm { +namespace GOFF { +// An "External Symbol Definition" in the GOFF file has a type, and depending on +// the type a different subset of the fields is used. +// +// Unlike other formats, a 2 dimensional structure is used to define the +// location of data. For example, the equivalent of the ELF .text section is +// made up of a Section Definition (SD) and a class (Element Definition; ED). +// The name of the SD symbol depends on the application, while the class has the +// predefined name C_CODE/C_CODE64 in AMODE31 and AMODE64 respectively. +// +// Data can be placed into this structure in 2 ways. First, the data (in a text +// record) can be associated with an ED symbol. To refer to data, a Label +// Definition (LD) is used to give an offset into the data a name. When binding, +// the whole data is pulled into the resulting executable, and the addresses +// given by the LD symbols are resolved. +// +// The alternative is to use a Part Definition (PR). In this case, the data (in +// a text record) is associated with the part. When binding, only the data of +// referenced PRs is pulled into the resulting binary. +// +// Both approaches are used, which means that the equivalent of a section in ELF +// results in 3 GOFF symbols, either SD/ED/LD or SD/ED/PR. Moreover, certain +// sections are fine with just defining SD/ED symbols. The SymbolMapper takes +// care of all those details. + +// Attributes for SD symbols. +struct SDAttr { + GOFF::ESDTaskingBehavior TaskingBehavior = GOFF::ESD_TA_Unspecified; + GOFF::ESDBindingScope BindingScope = GOFF::ESD_BSC_Unspecified; +}; + +// Attributes for ED symbols. +struct EDAttr { + bool IsReadOnly = false; + GOFF::ESDRmode Rmode; + GOFF::ESDNameSpaceId NameSpace = GOFF::ESD_NS_NormalName; + GOFF::ESDTextStyle TextStyle = GOFF::ESD_TS_ByteOriented; + GOFF::ESDBindingAlgorithm BindAlgorithm = GOFF::ESD_BA_Concatenate; + GOFF::ESDLoadingBehavior LoadBehavior = GOFF::ESD_LB_Initial; + GOFF::ESDReserveQwords ReservedQwords = GOFF::ESD_RQ_0; + GOFF::ESDAlignment Alignment = GOFF::ESD_ALIGN_Doubleword; + uint8_t FillByteValue = 0; +}; + +// Attributes for LD symbols. +struct LDAttr { + bool IsRenamable = false; + GOFF::ESDExecutable Executable = GOFF::ESD_EXE_Unspecified; + GOFF::ESDBindingStrength BindingStrength = GOFF::ESD_BST_Strong; + GOFF::ESDLinkageType Linkage = GOFF::ESD_LT_XPLink; + GOFF::ESDAmode Amode; + GOFF::ESDBindingScope BindingScope = GOFF::ESD_BSC_Unspecified; +}; + +// Attributes for PR symbols. +struct PRAttr { + bool IsRenamable = false; + GOFF::ESDExecutable Executable = GOFF::ESD_EXE_Unspecified; + GOFF::ESDLinkageType Linkage = GOFF::ESD_LT_XPLink; + GOFF::ESDBindingScope BindingScope = GOFF::ESD_BSC_Unspecified; + uint32_t SortKey = 0; +}; + +// Class names and other values depending on AMODE64 or AMODE31, and other +// environment properties. For now, only the 64 bit XPLINK case is defined. + +// GOFF classes. +constexpr StringLiteral CLASS_CODE = "C_CODE64"; +constexpr StringLiteral CLASS_WSA = "C_WSA64"; +constexpr StringLiteral CLASS_DATA = "C_DATA64"; +constexpr StringLiteral CLASS_PPA2 = "C_@@QPPA2"; + +// Addres and residency mode. +constexpr GOFF::ESDAmode AMODE = GOFF::ESD_AMODE_64; +constexpr GOFF::ESDRmode RMODE = GOFF::ESD_RMODE_64; + +// Linkage. +constexpr GOFF::ESDLinkageType LINKAGE = GOFF::ESD_LT_XPLink; + +// Loadding behavior. +constexpr GOFF::ESDLoadingBehavior LOADBEHAVIOR = GOFF::ESD_LB_Initial; uweigand wrote: I'm wondering about the above four constants - it's a bit unclear how they will help (in the future) to possibly extend the implementation to 31-bit or non-XPLINK variants, since you'd have to get the mode input from somewhere. Also, these constants aren't used systematically, e.g. a few places use LINKAGE but others hard-coded ESD_LT_XPLink anyway. https://github.com/llvm/llvm-project/pull/133799 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [AMDGPU] Add support for `v_cvt_pk_f16_fp8` on gfx1250 (PR #145747)
https://github.com/shiltian updated https://github.com/llvm/llvm-project/pull/145747 >From cd383faea1421c6b048fc709685d56e3483c72f5 Mon Sep 17 00:00:00 2001 From: "Mekhanoshin, Stanislav" Date: Wed, 25 Jun 2025 13:27:57 -0400 Subject: [PATCH] [AMDGPU] Add support for `v_cvt_pk_f16_fp8` on gfx1250 Co-authored-by: Shilei Tian --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 2 + .../CodeGenOpenCL/builtins-amdgcn-gfx1250.cl | 20 ++ llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 4 ++ .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 1 + llvm/lib/Target/AMDGPU/VOP1Instructions.td| 16 + .../CodeGen/AMDGPU/llvm.amdgcn.cvt.f16.fp8.ll | 72 +++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1-fake16.s | 9 +++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1.s| 9 +++ .../MC/AMDGPU/gfx1250_asm_vop1_dpp16-fake16.s | 4 ++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp16.s | 8 +++ .../MC/AMDGPU/gfx1250_asm_vop1_dpp8-fake16.s | 4 ++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp8.s | 8 +++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1_err.s| 10 +++ .../gfx1250_asm_vop3_from_vop1-fake16.s | 12 .../MC/AMDGPU/gfx1250_asm_vop3_from_vop1.s| 12 .../gfx1250_asm_vop3_from_vop1_dpp16-fake16.s | 8 +++ .../AMDGPU/gfx1250_asm_vop3_from_vop1_dpp16.s | 8 +++ .../gfx1250_asm_vop3_from_vop1_dpp8-fake16.s | 8 +++ .../AMDGPU/gfx1250_asm_vop3_from_vop1_dpp8.s | 8 +++ .../Disassembler/AMDGPU/gfx1250_dasm_vop1.txt | 10 +++ .../AMDGPU/gfx1250_dasm_vop1_dpp16.txt| 8 +++ .../AMDGPU/gfx1250_dasm_vop1_dpp8.txt | 8 +++ .../AMDGPU/gfx1250_dasm_vop3_from_vop1.txt| 15 .../gfx1250_dasm_vop3_from_vop1_dpp16.txt | 8 +++ .../gfx1250_dasm_vop3_from_vop1_dpp8.txt | 8 +++ 25 files changed, 280 insertions(+) create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.f16.fp8.ll diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index edb3a17ac07c6..94fa3e9b74c46 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -642,5 +642,7 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_f16_f32, "V2hV2hfUiIb", "nc", "f32-to-f16 TARGET_BUILTIN(__builtin_amdgcn_s_setprio_inc_wg, "vIs", "n", "setprio-inc-wg-inst") +TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_f16_fp8, "V2hs", "nc", "gfx1250-insts") + #undef BUILTIN #undef TARGET_BUILTIN diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl index 3709b1ff52f35..3f7a2d8649740 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl @@ -10,3 +10,23 @@ void test_setprio_inc_wg() { __builtin_amdgcn_s_setprio_inc_wg(10); } + +// CHECK-LABEL: @test_cvt_pk_f16_fp8( +// CHECK-NEXT: entry: +// CHECK-NEXT:[[OUT_ADDR:%.*]] = alloca ptr addrspace(1), align 8, addrspace(5) +// CHECK-NEXT:[[A_ADDR:%.*]] = alloca i16, align 2, addrspace(5) +// CHECK-NEXT:[[OUT_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[OUT_ADDR]] to ptr +// CHECK-NEXT:[[A_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A_ADDR]] to ptr +// CHECK-NEXT:store ptr addrspace(1) [[OUT:%.*]], ptr [[OUT_ADDR_ASCAST]], align 8 +// CHECK-NEXT:store i16 [[A:%.*]], ptr [[A_ADDR_ASCAST]], align 2 +// CHECK-NEXT:[[TMP0:%.*]] = load i16, ptr [[A_ADDR_ASCAST]], align 2 +// CHECK-NEXT:[[TMP1:%.*]] = call <2 x half> @llvm.amdgcn.cvt.pk.f16.fp8(i16 [[TMP0]]) +// CHECK-NEXT:[[TMP2:%.*]] = load ptr addrspace(1), ptr [[OUT_ADDR_ASCAST]], align 8 +// CHECK-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds <2 x half>, ptr addrspace(1) [[TMP2]], i64 0 +// CHECK-NEXT:store <2 x half> [[TMP1]], ptr addrspace(1) [[ARRAYIDX]], align 4 +// CHECK-NEXT:ret void +// +void test_cvt_pk_f16_fp8(global half2* out, short a) +{ + out[0] = __builtin_amdgcn_cvt_pk_f16_fp8(a); +} diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td index e6f0bf6276086..72b0aa01f71aa 100644 --- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td +++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td @@ -588,6 +588,10 @@ def int_amdgcn_ds_ordered_swap : AMDGPUDSOrderedIntrinsic; def int_amdgcn_ds_append : AMDGPUDSAppendConsumedIntrinsic; def int_amdgcn_ds_consume : AMDGPUDSAppendConsumedIntrinsic; +def int_amdgcn_cvt_pk_f16_fp8 : DefaultAttrsIntrinsic< + [llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrSpeculatable] +>, ClangBuiltin<"__builtin_amdgcn_cvt_pk_f16_fp8">; + class AMDGPUCvtScaleF32Intrinsic : DefaultAttrsIntrinsic< [DstTy], [Src0Ty, llvm_float_ty], [IntrNoMem, IntrSpeculatable] >, ClangBuiltin<"__builtin_amdgcn_"#name>; diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index a7b08794fdf1b..50d297cd096a6 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURe
[llvm-branch-commits] [clang] [llvm] [AMDGPU] Add support for `v_cvt_pk_f16_bf8` on gfx1250 (PR #145753)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/145753 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [AMDGPU] Add support for `v_cvt_pk_f16_fp8` on gfx1250 (PR #145747)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/145747 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [AMDGPU] Add support for `v_cvt_pk_f16_fp8` on gfx1250 (PR #145747)
shiltian wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/145747?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#145747** https://app.graphite.dev/github/pr/llvm/llvm-project/145747?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/145747?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#145632** https://app.graphite.dev/github/pr/llvm/llvm-project/145632?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/145747 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)
@@ -1319,6 +1319,83 @@ shouldReportReturnGadget(const BinaryContext &BC, const MCInstReference &Inst, return make_gadget_report(RetKind, Inst, *RetReg); } +/// While BOLT already marks some of the branch instructions as tail calls, +/// this function tries to improve the coverage by including less obvious cases +/// when it is possible to do without introducing too many false positives. kbeyls wrote: Do you happen to know whether it would be a good idea to adapt what BOLT overall considers as tail calls to also include the cases that this function adds in addition? Basically, does there need to be a separate "definition" of what is considered a tail call, only for the pauth analysis, versus the "definition" of a tail call in all other places in BOLT? If there is a good reason why there has to be a difference, maybe it makes sense to explain in this comment why that is the case? https://github.com/llvm/llvm-project/pull/137224 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [AMDGPU] Add support for `v_cvt_pk_f16_bf8` on gfx1250 (PR #145753)
shiltian wrote: ### Merge activity * **Jun 25, 8:56 PM UTC**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/145753). https://github.com/llvm/llvm-project/pull/145753 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [AMDGPU] Add support for `v_cvt_pk_f16_fp8` on gfx1250 (PR #145747)
https://github.com/shiltian created https://github.com/llvm/llvm-project/pull/145747 Co-authored-by: Shilei Tian >From 86417c4382640e179277338a9040be9b6579dec9 Mon Sep 17 00:00:00 2001 From: "Mekhanoshin, Stanislav" Date: Wed, 25 Jun 2025 13:27:57 -0400 Subject: [PATCH] [AMDGPU] Add support for `v_cvt_pk_f16_fp8` on gfx1250 Co-authored-by: Shilei Tian --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 2 + .../CodeGenOpenCL/builtins-amdgcn-gfx1250.cl | 20 ++ llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 4 ++ .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 1 + llvm/lib/Target/AMDGPU/VOP1Instructions.td| 16 + .../CodeGen/AMDGPU/llvm.amdgcn.cvt.f16.fp8.ll | 72 +++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1-fake16.s | 9 +++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1.s| 9 +++ .../MC/AMDGPU/gfx1250_asm_vop1_dpp16-fake16.s | 5 ++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp16.s | 8 +++ .../MC/AMDGPU/gfx1250_asm_vop1_dpp8-fake16.s | 4 ++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp8.s | 8 +++ llvm/test/MC/AMDGPU/gfx1250_asm_vop1_err.s| 10 +++ .../gfx1250_asm_vop3_from_vop1-fake16.s | 12 .../MC/AMDGPU/gfx1250_asm_vop3_from_vop1.s| 12 .../gfx1250_asm_vop3_from_vop1_dpp16-fake16.s | 8 +++ .../AMDGPU/gfx1250_asm_vop3_from_vop1_dpp16.s | 8 +++ .../gfx1250_asm_vop3_from_vop1_dpp8-fake16.s | 8 +++ .../AMDGPU/gfx1250_asm_vop3_from_vop1_dpp8.s | 8 +++ .../Disassembler/AMDGPU/gfx1250_dasm_vop1.txt | 10 +++ .../AMDGPU/gfx1250_dasm_vop1_dpp16.txt| 8 +++ .../AMDGPU/gfx1250_dasm_vop1_dpp8.txt | 8 +++ .../AMDGPU/gfx1250_dasm_vop3_from_vop1.txt| 15 .../gfx1250_dasm_vop3_from_vop1_dpp16.txt | 8 +++ .../gfx1250_dasm_vop3_from_vop1_dpp8.txt | 8 +++ 25 files changed, 281 insertions(+) create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.f16.fp8.ll diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index edb3a17ac07c6..94fa3e9b74c46 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -642,5 +642,7 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_f16_f32, "V2hV2hfUiIb", "nc", "f32-to-f16 TARGET_BUILTIN(__builtin_amdgcn_s_setprio_inc_wg, "vIs", "n", "setprio-inc-wg-inst") +TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_f16_fp8, "V2hs", "nc", "gfx1250-insts") + #undef BUILTIN #undef TARGET_BUILTIN diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl index 3709b1ff52f35..3f7a2d8649740 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl @@ -10,3 +10,23 @@ void test_setprio_inc_wg() { __builtin_amdgcn_s_setprio_inc_wg(10); } + +// CHECK-LABEL: @test_cvt_pk_f16_fp8( +// CHECK-NEXT: entry: +// CHECK-NEXT:[[OUT_ADDR:%.*]] = alloca ptr addrspace(1), align 8, addrspace(5) +// CHECK-NEXT:[[A_ADDR:%.*]] = alloca i16, align 2, addrspace(5) +// CHECK-NEXT:[[OUT_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[OUT_ADDR]] to ptr +// CHECK-NEXT:[[A_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A_ADDR]] to ptr +// CHECK-NEXT:store ptr addrspace(1) [[OUT:%.*]], ptr [[OUT_ADDR_ASCAST]], align 8 +// CHECK-NEXT:store i16 [[A:%.*]], ptr [[A_ADDR_ASCAST]], align 2 +// CHECK-NEXT:[[TMP0:%.*]] = load i16, ptr [[A_ADDR_ASCAST]], align 2 +// CHECK-NEXT:[[TMP1:%.*]] = call <2 x half> @llvm.amdgcn.cvt.pk.f16.fp8(i16 [[TMP0]]) +// CHECK-NEXT:[[TMP2:%.*]] = load ptr addrspace(1), ptr [[OUT_ADDR_ASCAST]], align 8 +// CHECK-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds <2 x half>, ptr addrspace(1) [[TMP2]], i64 0 +// CHECK-NEXT:store <2 x half> [[TMP1]], ptr addrspace(1) [[ARRAYIDX]], align 4 +// CHECK-NEXT:ret void +// +void test_cvt_pk_f16_fp8(global half2* out, short a) +{ + out[0] = __builtin_amdgcn_cvt_pk_f16_fp8(a); +} diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td index e6f0bf6276086..72b0aa01f71aa 100644 --- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td +++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td @@ -588,6 +588,10 @@ def int_amdgcn_ds_ordered_swap : AMDGPUDSOrderedIntrinsic; def int_amdgcn_ds_append : AMDGPUDSAppendConsumedIntrinsic; def int_amdgcn_ds_consume : AMDGPUDSAppendConsumedIntrinsic; +def int_amdgcn_cvt_pk_f16_fp8 : DefaultAttrsIntrinsic< + [llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrSpeculatable] +>, ClangBuiltin<"__builtin_amdgcn_cvt_pk_f16_fp8">; + class AMDGPUCvtScaleF32Intrinsic : DefaultAttrsIntrinsic< [DstTy], [Src0Ty, llvm_float_ty], [IntrNoMem, IntrSpeculatable] >, ClangBuiltin<"__builtin_amdgcn_"#name>; diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index a7b08794fdf1b..50d297cd096a6 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/l
[llvm-branch-commits] [clang] [llvm] [AMDGPU] Add support for `v_cvt_pk_f16_fp8` on gfx1250 (PR #145747)
https://github.com/shiltian ready_for_review https://github.com/llvm/llvm-project/pull/145747 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [AMDGPU] Add support for `v_cvt_pk_f16_fp8` on gfx1250 (PR #145747)
llvmbot wrote: @llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-clang Author: Shilei Tian (shiltian) Changes Co-authored-by: Shilei Tian --- Patch is 30.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/145747.diff 25 Files Affected: - (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+2) - (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl (+20) - (modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+4) - (modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+1) - (modified) llvm/lib/Target/AMDGPU/VOP1Instructions.td (+16) - (added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.f16.fp8.ll (+72) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop1-fake16.s (+9) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop1.s (+9) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp16-fake16.s (+5) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp16.s (+8) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp8-fake16.s (+4) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp8.s (+8) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop1_err.s (+10) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3_from_vop1-fake16.s (+12) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3_from_vop1.s (+12) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3_from_vop1_dpp16-fake16.s (+8) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3_from_vop1_dpp16.s (+8) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3_from_vop1_dpp8-fake16.s (+8) - (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3_from_vop1_dpp8.s (+8) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop1.txt (+10) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop1_dpp16.txt (+8) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop1_dpp8.txt (+8) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3_from_vop1.txt (+15) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3_from_vop1_dpp16.txt (+8) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3_from_vop1_dpp8.txt (+8) ``diff diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index edb3a17ac07c6..94fa3e9b74c46 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -642,5 +642,7 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_f16_f32, "V2hV2hfUiIb", "nc", "f32-to-f16 TARGET_BUILTIN(__builtin_amdgcn_s_setprio_inc_wg, "vIs", "n", "setprio-inc-wg-inst") +TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_f16_fp8, "V2hs", "nc", "gfx1250-insts") + #undef BUILTIN #undef TARGET_BUILTIN diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl index 3709b1ff52f35..3f7a2d8649740 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl @@ -10,3 +10,23 @@ void test_setprio_inc_wg() { __builtin_amdgcn_s_setprio_inc_wg(10); } + +// CHECK-LABEL: @test_cvt_pk_f16_fp8( +// CHECK-NEXT: entry: +// CHECK-NEXT:[[OUT_ADDR:%.*]] = alloca ptr addrspace(1), align 8, addrspace(5) +// CHECK-NEXT:[[A_ADDR:%.*]] = alloca i16, align 2, addrspace(5) +// CHECK-NEXT:[[OUT_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[OUT_ADDR]] to ptr +// CHECK-NEXT:[[A_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A_ADDR]] to ptr +// CHECK-NEXT:store ptr addrspace(1) [[OUT:%.*]], ptr [[OUT_ADDR_ASCAST]], align 8 +// CHECK-NEXT:store i16 [[A:%.*]], ptr [[A_ADDR_ASCAST]], align 2 +// CHECK-NEXT:[[TMP0:%.*]] = load i16, ptr [[A_ADDR_ASCAST]], align 2 +// CHECK-NEXT:[[TMP1:%.*]] = call <2 x half> @llvm.amdgcn.cvt.pk.f16.fp8(i16 [[TMP0]]) +// CHECK-NEXT:[[TMP2:%.*]] = load ptr addrspace(1), ptr [[OUT_ADDR_ASCAST]], align 8 +// CHECK-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds <2 x half>, ptr addrspace(1) [[TMP2]], i64 0 +// CHECK-NEXT:store <2 x half> [[TMP1]], ptr addrspace(1) [[ARRAYIDX]], align 4 +// CHECK-NEXT:ret void +// +void test_cvt_pk_f16_fp8(global half2* out, short a) +{ + out[0] = __builtin_amdgcn_cvt_pk_f16_fp8(a); +} diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td index e6f0bf6276086..72b0aa01f71aa 100644 --- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td +++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td @@ -588,6 +588,10 @@ def int_amdgcn_ds_ordered_swap : AMDGPUDSOrderedIntrinsic; def int_amdgcn_ds_append : AMDGPUDSAppendConsumedIntrinsic; def int_amdgcn_ds_consume : AMDGPUDSAppendConsumedIntrinsic; +def int_amdgcn_cvt_pk_f16_fp8 : DefaultAttrsIntrinsic< + [llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrSpeculatable] +>, ClangBuiltin<"__builtin_amdgcn_cvt_pk_f16_fp8">; + class AMDGPUCvtScaleF32Intrinsic : DefaultAttrsIntrinsic< [DstTy], [Src0Ty, llvm_float_ty], [IntrNoMem, IntrSpeculatable] >, ClangBuiltin<"__buil
[llvm-branch-commits] [llvm] [GOFF] Add writing of section symbols (PR #133799)
@@ -0,0 +1,106 @@ +//===- MCGOFFAttributes.h - Attributes of GOFF symbols ===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// Defines the various attribute collections defining GOFF symbols. +// +//===--===// + +#ifndef LLVM_MC_MCGOFFATTRIBUTES_H +#define LLVM_MC_MCGOFFATTRIBUTES_H + +#include "llvm/ADT/StringRef.h" +#include "llvm/BinaryFormat/GOFF.h" +#include + +namespace llvm { +namespace GOFF { +// An "External Symbol Definition" in the GOFF file has a type, and depending on +// the type a different subset of the fields is used. +// +// Unlike other formats, a 2 dimensional structure is used to define the +// location of data. For example, the equivalent of the ELF .text section is +// made up of a Section Definition (SD) and a class (Element Definition; ED). +// The name of the SD symbol depends on the application, while the class has the +// predefined name C_CODE/C_CODE64 in AMODE31 and AMODE64 respectively. +// +// Data can be placed into this structure in 2 ways. First, the data (in a text +// record) can be associated with an ED symbol. To refer to data, a Label +// Definition (LD) is used to give an offset into the data a name. When binding, +// the whole data is pulled into the resulting executable, and the addresses +// given by the LD symbols are resolved. +// +// The alternative is to use a Part Definition (PR). In this case, the data (in +// a text record) is associated with the part. When binding, only the data of +// referenced PRs is pulled into the resulting binary. +// +// Both approaches are used, which means that the equivalent of a section in ELF +// results in 3 GOFF symbols, either SD/ED/LD or SD/ED/PR. Moreover, certain +// sections are fine with just defining SD/ED symbols. The SymbolMapper takes +// care of all those details. redstar wrote: True, changed. https://github.com/llvm/llvm-project/pull/133799 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [GOFF] Add writing of section symbols (PR #133799)
@@ -0,0 +1,106 @@ +//===- MCGOFFAttributes.h - Attributes of GOFF symbols ===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// Defines the various attribute collections defining GOFF symbols. +// +//===--===// + +#ifndef LLVM_MC_MCGOFFATTRIBUTES_H +#define LLVM_MC_MCGOFFATTRIBUTES_H + +#include "llvm/ADT/StringRef.h" +#include "llvm/BinaryFormat/GOFF.h" +#include + +namespace llvm { +namespace GOFF { +// An "External Symbol Definition" in the GOFF file has a type, and depending on +// the type a different subset of the fields is used. +// +// Unlike other formats, a 2 dimensional structure is used to define the +// location of data. For example, the equivalent of the ELF .text section is +// made up of a Section Definition (SD) and a class (Element Definition; ED). +// The name of the SD symbol depends on the application, while the class has the +// predefined name C_CODE/C_CODE64 in AMODE31 and AMODE64 respectively. +// +// Data can be placed into this structure in 2 ways. First, the data (in a text +// record) can be associated with an ED symbol. To refer to data, a Label +// Definition (LD) is used to give an offset into the data a name. When binding, +// the whole data is pulled into the resulting executable, and the addresses +// given by the LD symbols are resolved. +// +// The alternative is to use a Part Definition (PR). In this case, the data (in +// a text record) is associated with the part. When binding, only the data of +// referenced PRs is pulled into the resulting binary. +// +// Both approaches are used, which means that the equivalent of a section in ELF +// results in 3 GOFF symbols, either SD/ED/LD or SD/ED/PR. Moreover, certain +// sections are fine with just defining SD/ED symbols. The SymbolMapper takes +// care of all those details. + +// Attributes for SD symbols. +struct SDAttr { + GOFF::ESDTaskingBehavior TaskingBehavior = GOFF::ESD_TA_Unspecified; + GOFF::ESDBindingScope BindingScope = GOFF::ESD_BSC_Unspecified; +}; + +// Attributes for ED symbols. +struct EDAttr { + bool IsReadOnly = false; + GOFF::ESDRmode Rmode; + GOFF::ESDNameSpaceId NameSpace = GOFF::ESD_NS_NormalName; + GOFF::ESDTextStyle TextStyle = GOFF::ESD_TS_ByteOriented; + GOFF::ESDBindingAlgorithm BindAlgorithm = GOFF::ESD_BA_Concatenate; + GOFF::ESDLoadingBehavior LoadBehavior = GOFF::ESD_LB_Initial; + GOFF::ESDReserveQwords ReservedQwords = GOFF::ESD_RQ_0; + GOFF::ESDAlignment Alignment = GOFF::ESD_ALIGN_Doubleword; + uint8_t FillByteValue = 0; +}; + +// Attributes for LD symbols. +struct LDAttr { + bool IsRenamable = false; + GOFF::ESDExecutable Executable = GOFF::ESD_EXE_Unspecified; + GOFF::ESDBindingStrength BindingStrength = GOFF::ESD_BST_Strong; + GOFF::ESDLinkageType Linkage = GOFF::ESD_LT_XPLink; + GOFF::ESDAmode Amode; + GOFF::ESDBindingScope BindingScope = GOFF::ESD_BSC_Unspecified; +}; + +// Attributes for PR symbols. +struct PRAttr { + bool IsRenamable = false; + GOFF::ESDExecutable Executable = GOFF::ESD_EXE_Unspecified; + GOFF::ESDLinkageType Linkage = GOFF::ESD_LT_XPLink; + GOFF::ESDBindingScope BindingScope = GOFF::ESD_BSC_Unspecified; + uint32_t SortKey = 0; +}; + +// Class names and other values depending on AMODE64 or AMODE31, and other +// environment properties. For now, only the 64 bit XPLINK case is defined. + +// GOFF classes. +constexpr StringLiteral CLASS_CODE = "C_CODE64"; +constexpr StringLiteral CLASS_WSA = "C_WSA64"; +constexpr StringLiteral CLASS_DATA = "C_DATA64"; +constexpr StringLiteral CLASS_PPA2 = "C_@@QPPA2"; + +// Addres and residency mode. +constexpr GOFF::ESDAmode AMODE = GOFF::ESD_AMODE_64; +constexpr GOFF::ESDRmode RMODE = GOFF::ESD_RMODE_64; + +// Linkage. +constexpr GOFF::ESDLinkageType LINKAGE = GOFF::ESD_LT_XPLink; + +// Loadding behavior. +constexpr GOFF::ESDLoadingBehavior LOADBEHAVIOR = GOFF::ESD_LB_Initial; redstar wrote: My idea with the constants was to mark the places which needs to be changed for either 31 bit XPLINK or 31 bit StdLink. The root cause is that not all places need modifications, which may make it difficult to spot those places. On the other hand, AMODE and RMODE are used systematically, but this means I can ditch them because every occurrence need to be changed. The use is very limited right now, I'll remove them. https://github.com/llvm/llvm-project/pull/133799 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)
@@ -129,6 +147,245 @@ bool AlwaysInlineImpl( return Changed; } +/// Promote allocas to registers if possible. +static void promoteAllocas( +Function *Caller, SmallPtrSetImpl &AllocasToPromote, +function_ref &GetAssumptionCache) { + if (AllocasToPromote.empty()) +return; + + SmallVector PromotableAllocas; + llvm::copy_if(AllocasToPromote, std::back_inserter(PromotableAllocas), +isAllocaPromotable); + if (PromotableAllocas.empty()) +return; + + DominatorTree DT(*Caller); + AssumptionCache &AC = GetAssumptionCache(*Caller); + PromoteMemToReg(PromotableAllocas, DT, &AC); + NumAllocasPromoted += PromotableAllocas.size(); + // Emit a remark for the promotion. + OptimizationRemarkEmitter ORE(Caller); + DebugLoc DLoc = Caller->getEntryBlock().getTerminator()->getDebugLoc(); + ORE.emit([&]() { +return OptimizationRemark(DEBUG_TYPE, "PromoteAllocas", DLoc, + &Caller->getEntryBlock()) + << "Promoting " << ore::NV("NumAlloca", PromotableAllocas.size()) + << " allocas to SSA registers in function '" + << ore::NV("Function", Caller) << "'"; + }); + LLVM_DEBUG(dbgs() << "Promoted " << PromotableAllocas.size() +<< " allocas to registers in function " << Caller->getName() +<< "\n"); +} + +/// We use a different visitation order of functions here to solve a phase +/// ordering problem. After inlining, a caller function may have allocas that +/// were previously used for passing reference arguments to the callee that +/// are now promotable to registers, using SROA/mem2reg. However if we just let +/// the AlwaysInliner continue inlining everything at once, the later SROA pass +/// in the pipeline will end up placing phis for these allocas into blocks along +/// the dominance frontier which may extend further than desired (e.g. loop +/// headers). This can happen when the caller is then inlined into another +/// caller, and the allocas end up hoisted further before SROA is run. +/// +/// Instead what we want is to try to do, as best as we can, is to inline leaf +/// functions into callers, and then run PromoteMemToReg() on the allocas that +/// were passed into the callee before it was inlined. +/// +/// We want to do this *before* the caller is inlined into another caller +/// because we want the alloca promotion to happen before its scope extends too +/// far because of further inlining. +/// +/// Here's a simple pseudo-example: +/// outermost_caller() { +/// for (...) { +/// middle_caller(); +/// } +/// } +/// +/// middle_caller() { +/// int stack_var; +/// inner_callee(&stack_var); +/// } +/// +/// inner_callee(int *x) { +/// // Do something with x. +/// } +/// +/// In this case, we want to inline inner_callee() into middle_caller() and +/// then promote stack_var to a register before we inline middle_caller() into +/// outermost_caller(). The regular always_inliner would inline everything at +/// once, and then SROA/mem2reg would promote stack_var to a register but in +/// the context of outermost_caller() which is not what we want. aemerson wrote: > In that context, could the problem addressed here be decoupled from inlining > order? It seems like it'd result in a more robust system. I don't *think* so, unless there's something I've missed. Before doing this I tried other approaches, such as: - Trying to detect these over-extended PHIs and then demoting them back to allocas. Didn't work as we end up pessimizing codegen. - Avoiding hoisting large vector allocas to the entry block, in order to block mem2reg. This works but is conceptually the wrong place to do it (no other heuristics code exists there). I wasn't aware of ModuleInliner. Is the long term plan for it to replace the existing inliner? If so we could in future merge it with AlwaysInliner and if we interleave optimization as the current SCC manager does then this should fix the problem. https://github.com/llvm/llvm-project/pull/145613 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits