[llvm-branch-commits] [llvm] [AMDGPU] Remove the pass `AMDGPUPromoteKernelArguments` (PR #137655)
https://github.com/shiltian closed https://github.com/llvm/llvm-project/pull/137655 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Remove the pass `AMDGPUPromoteKernelArguments` (PR #137655)
shiltian wrote: Will close this for now. I'll revisit this if we can handle the case under discussion properly. https://github.com/llvm/llvm-project/pull/137655 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [AMDGPU][Attributor] Rework update of `AAAMDWavesPerEU` (PR #123995)
@@ -1425,8 +1453,14 @@ static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM, } } - ChangeStatus Change = A.run(); - return Change == ChangeStatus::CHANGED; + bool Changed = A.run() == ChangeStatus::CHANGED; + + if (Changed && (LTOPhase == ThinOrFullLTOPhase::None || + LTOPhase == ThinOrFullLTOPhase::FullLTOPostLink || + LTOPhase == ThinOrFullLTOPhase::ThinLTOPostLink)) +checkWavesPerEU(M, TM); shiltian wrote: gentle ping https://github.com/llvm/llvm-project/pull/123995 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [GOFF] Add writing of text records (PR #137235)
redstar wrote: I had to force-push to get the fixed test cases. https://github.com/llvm/llvm-project/pull/137235 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: refactor issue reporting (PR #135662)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/135662 >From 0d477fa179d365147d2cd809724b07f050603ff6 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Mon, 14 Apr 2025 15:08:54 +0300 Subject: [PATCH 1/5] [BOLT] Gadget scanner: refactor issue reporting Remove `getAffectedRegisters` and `setOverwritingInstrs` methods from the base `Report` class. Instead, make `Report` always represent the brief version of the report. When an issue is detected on the first run of the analysis, return an optional request for extra details to attach to the report on the second run. --- bolt/include/bolt/Passes/PAuthGadgetScanner.h | 102 ++--- bolt/lib/Passes/PAuthGadgetScanner.cpp| 200 ++ .../AArch64/gs-pauth-debug-output.s | 8 +- 3 files changed, 187 insertions(+), 123 deletions(-) diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h b/bolt/include/bolt/Passes/PAuthGadgetScanner.h index 4c1bef3d2265f..3b6c1f6af94a0 100644 --- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h +++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h @@ -219,11 +219,6 @@ struct Report { virtual void generateReport(raw_ostream &OS, const BinaryContext &BC) const = 0; - // The two methods below are called by Analysis::computeDetailedInfo when - // iterating over the reports. - virtual ArrayRef getAffectedRegisters() const { return {}; } - virtual void setOverwritingInstrs(ArrayRef Instrs) {} - void printBasicInfo(raw_ostream &OS, const BinaryContext &BC, StringRef IssueKind) const; }; @@ -231,27 +226,11 @@ struct Report { struct GadgetReport : public Report { // The particular kind of gadget that is detected. const GadgetKind &Kind; - // The set of registers related to this gadget report (possibly empty). - SmallVector AffectedRegisters; - // The instructions that clobber the affected registers. - // There is no one-to-one correspondence with AffectedRegisters: for example, - // the same register can be overwritten by different instructions in different - // preceding basic blocks. - SmallVector OverwritingInstrs; - - GadgetReport(const GadgetKind &Kind, MCInstReference Location, - MCPhysReg AffectedRegister) - : Report(Location), Kind(Kind), AffectedRegisters({AffectedRegister}) {} - - void generateReport(raw_ostream &OS, const BinaryContext &BC) const override; - ArrayRef getAffectedRegisters() const override { -return AffectedRegisters; - } + GadgetReport(const GadgetKind &Kind, MCInstReference Location) + : Report(Location), Kind(Kind) {} - void setOverwritingInstrs(ArrayRef Instrs) override { -OverwritingInstrs.assign(Instrs.begin(), Instrs.end()); - } + void generateReport(raw_ostream &OS, const BinaryContext &BC) const override; }; /// Report with a free-form message attached. @@ -263,8 +242,75 @@ struct GenericReport : public Report { const BinaryContext &BC) const override; }; +/// An information about an issue collected on the slower, detailed, +/// run of an analysis. +class ExtraInfo { +public: + virtual void print(raw_ostream &OS, const MCInstReference Location) const = 0; + + virtual ~ExtraInfo() {} +}; + +class ClobberingInfo : public ExtraInfo { + SmallVector ClobberingInstrs; + +public: + ClobberingInfo(const ArrayRef Instrs) + : ClobberingInstrs(Instrs) {} + + void print(raw_ostream &OS, const MCInstReference Location) const override; +}; + +/// A brief version of a report that can be further augmented with the details. +/// +/// It is common for a particular type of gadget detector to be tied to some +/// specific kind of analysis. If an issue is returned by that detector, it may +/// be further augmented with the detailed info in an analysis-specific way, +/// or just be left as-is (f.e. if a free-form warning was reported). +template struct BriefReport { + BriefReport(std::shared_ptr Issue, + const std::optional RequestedDetails) + : Issue(Issue), RequestedDetails(RequestedDetails) {} + + std::shared_ptr Issue; + std::optional RequestedDetails; +}; + +/// A detailed version of a report. +struct DetailedReport { + DetailedReport(std::shared_ptr Issue, + std::shared_ptr Details) + : Issue(Issue), Details(Details) {} + + std::shared_ptr Issue; + std::shared_ptr Details; +}; + struct FunctionAnalysisResult { - std::vector> Diagnostics; + std::vector Diagnostics; +}; + +/// A helper class storing per-function context to be instantiated by Analysis. +class FunctionAnalysis { + BinaryContext &BC; + BinaryFunction &BF; + MCPlusBuilder::AllocatorIdTy AllocatorId; + FunctionAnalysisResult Result; + + bool PacRetGadgetsOnly; + + void findUnsafeUses(SmallVector> &Reports); + void augmentUnsafeUseReports(const ArrayRef> Reports); + +public: + FunctionAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocatorId, +
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: refactor issue reporting (PR #135662)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/135662 >From 0d477fa179d365147d2cd809724b07f050603ff6 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Mon, 14 Apr 2025 15:08:54 +0300 Subject: [PATCH 1/5] [BOLT] Gadget scanner: refactor issue reporting Remove `getAffectedRegisters` and `setOverwritingInstrs` methods from the base `Report` class. Instead, make `Report` always represent the brief version of the report. When an issue is detected on the first run of the analysis, return an optional request for extra details to attach to the report on the second run. --- bolt/include/bolt/Passes/PAuthGadgetScanner.h | 102 ++--- bolt/lib/Passes/PAuthGadgetScanner.cpp| 200 ++ .../AArch64/gs-pauth-debug-output.s | 8 +- 3 files changed, 187 insertions(+), 123 deletions(-) diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h b/bolt/include/bolt/Passes/PAuthGadgetScanner.h index 4c1bef3d2265f..3b6c1f6af94a0 100644 --- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h +++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h @@ -219,11 +219,6 @@ struct Report { virtual void generateReport(raw_ostream &OS, const BinaryContext &BC) const = 0; - // The two methods below are called by Analysis::computeDetailedInfo when - // iterating over the reports. - virtual ArrayRef getAffectedRegisters() const { return {}; } - virtual void setOverwritingInstrs(ArrayRef Instrs) {} - void printBasicInfo(raw_ostream &OS, const BinaryContext &BC, StringRef IssueKind) const; }; @@ -231,27 +226,11 @@ struct Report { struct GadgetReport : public Report { // The particular kind of gadget that is detected. const GadgetKind &Kind; - // The set of registers related to this gadget report (possibly empty). - SmallVector AffectedRegisters; - // The instructions that clobber the affected registers. - // There is no one-to-one correspondence with AffectedRegisters: for example, - // the same register can be overwritten by different instructions in different - // preceding basic blocks. - SmallVector OverwritingInstrs; - - GadgetReport(const GadgetKind &Kind, MCInstReference Location, - MCPhysReg AffectedRegister) - : Report(Location), Kind(Kind), AffectedRegisters({AffectedRegister}) {} - - void generateReport(raw_ostream &OS, const BinaryContext &BC) const override; - ArrayRef getAffectedRegisters() const override { -return AffectedRegisters; - } + GadgetReport(const GadgetKind &Kind, MCInstReference Location) + : Report(Location), Kind(Kind) {} - void setOverwritingInstrs(ArrayRef Instrs) override { -OverwritingInstrs.assign(Instrs.begin(), Instrs.end()); - } + void generateReport(raw_ostream &OS, const BinaryContext &BC) const override; }; /// Report with a free-form message attached. @@ -263,8 +242,75 @@ struct GenericReport : public Report { const BinaryContext &BC) const override; }; +/// An information about an issue collected on the slower, detailed, +/// run of an analysis. +class ExtraInfo { +public: + virtual void print(raw_ostream &OS, const MCInstReference Location) const = 0; + + virtual ~ExtraInfo() {} +}; + +class ClobberingInfo : public ExtraInfo { + SmallVector ClobberingInstrs; + +public: + ClobberingInfo(const ArrayRef Instrs) + : ClobberingInstrs(Instrs) {} + + void print(raw_ostream &OS, const MCInstReference Location) const override; +}; + +/// A brief version of a report that can be further augmented with the details. +/// +/// It is common for a particular type of gadget detector to be tied to some +/// specific kind of analysis. If an issue is returned by that detector, it may +/// be further augmented with the detailed info in an analysis-specific way, +/// or just be left as-is (f.e. if a free-form warning was reported). +template struct BriefReport { + BriefReport(std::shared_ptr Issue, + const std::optional RequestedDetails) + : Issue(Issue), RequestedDetails(RequestedDetails) {} + + std::shared_ptr Issue; + std::optional RequestedDetails; +}; + +/// A detailed version of a report. +struct DetailedReport { + DetailedReport(std::shared_ptr Issue, + std::shared_ptr Details) + : Issue(Issue), Details(Details) {} + + std::shared_ptr Issue; + std::shared_ptr Details; +}; + struct FunctionAnalysisResult { - std::vector> Diagnostics; + std::vector Diagnostics; +}; + +/// A helper class storing per-function context to be instantiated by Analysis. +class FunctionAnalysis { + BinaryContext &BC; + BinaryFunction &BF; + MCPlusBuilder::AllocatorIdTy AllocatorId; + FunctionAnalysisResult Result; + + bool PacRetGadgetsOnly; + + void findUnsafeUses(SmallVector> &Reports); + void augmentUnsafeUseReports(const ArrayRef> Reports); + +public: + FunctionAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocatorId, +
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/137224 >From 33158bcada96af1d201a5cb5342f0256f30894cd Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 22 Apr 2025 21:43:14 +0300 Subject: [PATCH] [BOLT] Gadget scanner: detect untrusted LR before tail call Implement the detection of tail calls performed with untrusted link register, which violates the assumption made on entry to every function. Unlike other pauth gadgets, this one involves some amount of guessing which branch instructions should be checked as tail calls. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 112 +++- .../AArch64/gs-pacret-autiasp.s | 31 +- .../AArch64/gs-pauth-debug-output.s | 30 +- .../AArch64/gs-pauth-tail-calls.s | 597 ++ 4 files changed, 730 insertions(+), 40 deletions(-) create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-tail-calls.s diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index d4282daad648b..fab92947f6d67 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -660,8 +660,9 @@ class DataflowSrcSafetyAnalysis // // Then, a function can be split into a number of disjoint contiguous sequences // of instructions without labels in between. These sequences can be processed -// the same way basic blocks are processed by data-flow analysis, assuming -// pessimistically that all registers are unsafe at the start of each sequence. +// the same way basic blocks are processed by data-flow analysis, with the same +// pessimistic estimation of the initial state at the start of each sequence +// (except the first instruction of the function). class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { BinaryFunction &BF; MCPlusBuilder::AllocatorIdTy AllocId; @@ -672,6 +673,30 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { BC.MIB->removeAnnotation(I.second, StateAnnotationIndex); } + /// Compute a reasonably pessimistic estimation of the register state when + /// the previous instruction is not known for sure. Take the set of registers + /// which are trusted at function entry and remove all registers that can be + /// clobbered inside this function. + SrcState computePessimisticState(BinaryFunction &BF) { +BitVector ClobberedRegs(NumRegs); +for (auto &I : BF.instrs()) { + MCInst &Inst = I.second; + BC.MIB->getClobberedRegs(Inst, ClobberedRegs); + + // If this is a call instruction, no register is safe anymore, unless + // it is a tail call. Ignore tail calls for the purpose of estimating the + // worst-case scenario, assuming no instructions are executed in the + // caller after this point anyway. + if (BC.MIB->isCall(Inst) && !BC.MIB->isTailCall(Inst)) +ClobberedRegs.set(); +} + +SrcState S = createEntryState(); +S.SafeToDerefRegs.reset(ClobberedRegs); +S.TrustedRegs.reset(ClobberedRegs); +return S; + } + public: CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId, @@ -682,6 +707,7 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { } void run() override { +const SrcState DefaultState = computePessimisticState(BF); SrcState S = createEntryState(); for (auto &I : BF.instrs()) { MCInst &Inst = I.second; @@ -696,7 +722,7 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { LLVM_DEBUG({ traceInst(BC, "Due to label, resetting the state before", Inst); }); -S = createUnsafeState(); +S = DefaultState; } // Check if we need to remove an old annotation (this is the case if @@ -1233,6 +1259,83 @@ shouldReportReturnGadget(const BinaryContext &BC, const MCInstReference &Inst, return make_gadget_report(RetKind, Inst, *RetReg); } +/// While BOLT already marks some of the branch instructions as tail calls, +/// this function tries to improve the coverage by including less obvious cases +/// when it is possible to do without introducing too many false positives. +static bool shouldAnalyzeTailCallInst(const BinaryContext &BC, + const BinaryFunction &BF, + const MCInstReference &Inst) { + // Some BC.MIB->isXYZ(Inst) methods simply delegate to MCInstrDesc::isXYZ() + // (such as isBranch at the time of writing this comment), some don't (such + // as isCall). For that reason, call MCInstrDesc's methods explicitly when + // it is important. + const MCInstrDesc &Desc = + BC.MII->get(static_cast(Inst).getOpcode()); + // Tail call should be a branch (but not necessarily an indirect one). + if (!Desc.isBranch()) +return false; + + // Always analyze the branches already marked as tail calls by BOLT. + if (BC.MIB->isTailCall(Inst)) +return t
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: clarify MCPlusBuilder callbacks interface (PR #136147)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/136147 >From 59afabd2a0b89f6e529f58cd17ac56673e2c0599 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Thu, 17 Apr 2025 15:40:05 +0300 Subject: [PATCH] [BOLT] Gadget scanner: clarify MCPlusBuilder callbacks interface Clarify the semantics of `getAuthenticatedReg` and remove a redundant `isAuthenticationOfReg` method, as combined auth+something instructions (such as `retaa` on AArch64) should be handled carefully, especially when searching for authentication oracles: usually, such instructions cannot be authentication oracles and only some of them actually write an authenticated pointer to a register (such as "ldra x0, [x1]!"). Use `std::optional` returned type instead of plain MCPhysReg and returning `getNoRegister()` as a "not applicable" indication. Document a few existing methods, add information about preconditions. --- bolt/include/bolt/Core/MCPlusBuilder.h| 61 ++- bolt/lib/Passes/PAuthGadgetScanner.cpp| 64 +--- .../Target/AArch64/AArch64MCPlusBuilder.cpp | 76 --- .../AArch64/gs-pauth-debug-output.s | 3 - .../AArch64/gs-pauth-signing-oracles.s| 20 + 5 files changed, 130 insertions(+), 94 deletions(-) diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h index 132d58f3f9f79..83ad70ea97076 100644 --- a/bolt/include/bolt/Core/MCPlusBuilder.h +++ b/bolt/include/bolt/Core/MCPlusBuilder.h @@ -562,30 +562,50 @@ class MCPlusBuilder { return {}; } - virtual ErrorOr getAuthenticatedReg(const MCInst &Inst) const { -llvm_unreachable("not implemented"); -return getNoRegister(); - } - - virtual bool isAuthenticationOfReg(const MCInst &Inst, - MCPhysReg AuthenticatedReg) const { + /// Returns the register where an authenticated pointer is written to by Inst, + /// or std::nullopt if not authenticating any register. + /// + /// Sets IsChecked if the instruction always checks authenticated pointer, + /// i.e. it either returns a successfully authenticated pointer or terminates + /// the program abnormally (such as "ldra x0, [x1]!" on AArch64, which crashes + /// on authentication failure even if FEAT_FPAC is not implemented). + virtual std::optional + getWrittenAuthenticatedReg(const MCInst &Inst, bool &IsChecked) const { llvm_unreachable("not implemented"); -return false; +return std::nullopt; } - virtual MCPhysReg getSignedReg(const MCInst &Inst) const { + /// Returns the register signed by Inst, or std::nullopt if not signing any + /// register. + /// + /// The returned register is assumed to be both input and output operand, + /// as it is done on AArch64. + virtual std::optional getSignedReg(const MCInst &Inst) const { llvm_unreachable("not implemented"); -return getNoRegister(); +return std::nullopt; } - virtual ErrorOr getRegUsedAsRetDest(const MCInst &Inst) const { + /// Returns the register used as a return address. Returns std::nullopt if + /// not applicable, such as reading the return address from a system register + /// or from the stack. + /// + /// Sets IsAuthenticatedInternally if the instruction accepts a signed + /// pointer as its operand and authenticates it internally. + /// + /// Should only be called when isReturn(Inst) is true. + virtual std::optional + getRegUsedAsRetDest(const MCInst &Inst, + bool &IsAuthenticatedInternally) const { llvm_unreachable("not implemented"); -return getNoRegister(); +return std::nullopt; } /// Returns the register used as the destination of an indirect branch or call /// instruction. Sets IsAuthenticatedInternally if the instruction accepts /// a signed pointer as its operand and authenticates it internally. + /// + /// Should only be called if isIndirectCall(Inst) or isIndirectBranch(Inst) + /// returns true. virtual MCPhysReg getRegUsedAsIndirectBranchDest(const MCInst &Inst, bool &IsAuthenticatedInternally) const { @@ -602,14 +622,14 @@ class MCPlusBuilder { ///controlled, under the Pointer Authentication threat model. /// /// If the instruction does not write to any register satisfying the above - /// two conditions, NoRegister is returned. + /// two conditions, std::nullopt is returned. /// /// The Pointer Authentication threat model assumes an attacker is able to /// modify any writable memory, but not executable code (due to W^X). - virtual MCPhysReg + virtual std::optional getMaterializedAddressRegForPtrAuth(const MCInst &Inst) const { llvm_unreachable("not implemented"); -return getNoRegister(); +return std::nullopt; } /// Analyzes if this instruction can safely perform address arithmetics @@ -622,10 +642,13 @@ class MCPlusBuilder { /// controlled, provided InReg and executa
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: use more appropriate types (NFC) (PR #135661)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/135661 >From b942d6e9dc8858653e4669d6a1b8e50f0dc19061 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Mon, 14 Apr 2025 14:35:56 +0300 Subject: [PATCH 1/2] [BOLT] Gadget scanner: use more appropriate types (NFC) * use more flexible `const ArrayRef` and `StringRef` types instead of `const std::vector &` and `const std::string &`, correspondingly, for function arguments * return plain `const SrcState &` instead of `ErrorOr` from `SrcSafetyAnalysis::getStateBefore`, as absent state is not handled gracefully by any caller --- bolt/include/bolt/Passes/PAuthGadgetScanner.h | 8 +--- bolt/lib/Passes/PAuthGadgetScanner.cpp| 39 --- 2 files changed, 19 insertions(+), 28 deletions(-) diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h b/bolt/include/bolt/Passes/PAuthGadgetScanner.h index 6765e2aff414f..3e39b64e59e0f 100644 --- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h +++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h @@ -12,7 +12,6 @@ #include "bolt/Core/BinaryContext.h" #include "bolt/Core/BinaryFunction.h" #include "bolt/Passes/BinaryPasses.h" -#include "llvm/ADT/SmallSet.h" #include "llvm/Support/raw_ostream.h" #include @@ -199,9 +198,6 @@ raw_ostream &operator<<(raw_ostream &OS, const MCInstReference &); namespace PAuthGadgetScanner { -class SrcSafetyAnalysis; -struct SrcState; - /// Description of a gadget kind that can be detected. Intended to be /// statically allocated to be attached to reports by reference. class GadgetKind { @@ -210,7 +206,7 @@ class GadgetKind { public: GadgetKind(const char *Description) : Description(Description) {} - const StringRef getDescription() const { return Description; } + StringRef getDescription() const { return Description; } }; /// Base report located at some instruction, without any additional information. @@ -261,7 +257,7 @@ struct GadgetReport : public Report { /// Report with a free-form message attached. struct GenericReport : public Report { std::string Text; - GenericReport(MCInstReference Location, const std::string &Text) + GenericReport(MCInstReference Location, StringRef Text) : Report(Location), Text(Text) {} virtual void generateReport(raw_ostream &OS, const BinaryContext &BC) const override; diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 12eb9c66130b9..3d723456b6ffd 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -91,14 +91,14 @@ class TrackedRegisters { const std::vector Registers; std::vector RegToIndexMapping; - static size_t getMappingSize(const std::vector &RegsToTrack) { + static size_t getMappingSize(const ArrayRef RegsToTrack) { if (RegsToTrack.empty()) return 0; return 1 + *llvm::max_element(RegsToTrack); } public: - TrackedRegisters(const std::vector &RegsToTrack) + TrackedRegisters(const ArrayRef RegsToTrack) : Registers(RegsToTrack), RegToIndexMapping(getMappingSize(RegsToTrack), NoIndex) { for (unsigned I = 0; I < RegsToTrack.size(); ++I) @@ -234,7 +234,7 @@ struct SrcState { static void printLastInsts( raw_ostream &OS, -const std::vector> &LastInstWritingReg) { +const ArrayRef> LastInstWritingReg) { OS << "Insts: "; for (unsigned I = 0; I < LastInstWritingReg.size(); ++I) { auto &Set = LastInstWritingReg[I]; @@ -295,7 +295,7 @@ void SrcStatePrinter::print(raw_ostream &OS, const SrcState &S) const { class SrcSafetyAnalysis { public: SrcSafetyAnalysis(BinaryFunction &BF, -const std::vector &RegsToTrackInstsFor) +const ArrayRef RegsToTrackInstsFor) : BC(BF.getBinaryContext()), NumRegs(BC.MRI->getNumRegs()), RegsToTrackInstsFor(RegsToTrackInstsFor) {} @@ -303,11 +303,10 @@ class SrcSafetyAnalysis { static std::shared_ptr create(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId, - const std::vector &RegsToTrackInstsFor); + const ArrayRef RegsToTrackInstsFor); virtual void run() = 0; - virtual ErrorOr - getStateBefore(const MCInst &Inst) const = 0; + virtual const SrcState &getStateBefore(const MCInst &Inst) const = 0; protected: BinaryContext &BC; @@ -347,7 +346,7 @@ class SrcSafetyAnalysis { } BitVector getClobberedRegs(const MCInst &Point) const { -BitVector Clobbered(NumRegs, false); +BitVector Clobbered(NumRegs); // Assume a call can clobber all registers, including callee-saved // registers. There's a good chance that callee-saved registers will be // saved on the stack at some point during execution of the callee. @@ -409,8 +408,7 @@ class SrcSafetyAnalysis { // FirstCheckerInst should belong to the same basic block (see the // assertion in DataflowSrcSafetyAnalysis::run()), meaning it was
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: improve handling of unreachable basic blocks (PR #136183)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/136183 >From 352364dfc00d23111fc44bc807d3484bc67aff00 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Thu, 17 Apr 2025 20:51:16 +0300 Subject: [PATCH 1/2] [BOLT] Gadget scanner: improve handling of unreachable basic blocks Instead of refusing to analyze an instruction completely, when it is unreachable according to the CFG reconstructed by BOLT, pessimistically assume all registers to be unsafe at the start of basic blocks without any predecessors. Nevertheless, unreachable basic blocks found in optimized code likely means imprecise CFG reconstruction, thus report a warning once per basic block without predecessors. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 46 ++- .../AArch64/gs-pacret-autiasp.s | 7 ++- .../binary-analysis/AArch64/gs-pauth-calls.s | 57 +++ 3 files changed, 95 insertions(+), 15 deletions(-) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 7a86d9c4c9a59..917649731884e 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -344,6 +344,12 @@ class SrcSafetyAnalysis { return S; } + /// Creates a state with all registers marked unsafe (not to be confused + /// with empty state). + SrcState createUnsafeState() const { +return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); + } + BitVector getClobberedRegs(const MCInst &Point) const { BitVector Clobbered(NumRegs); // Assume a call can clobber all registers, including callee-saved @@ -586,6 +592,13 @@ class DataflowSrcSafetyAnalysis if (BB.isEntryPoint()) return createEntryState(); +// If a basic block without any predecessors is found in an optimized code, +// this likely means that some CFG edges were not detected. Pessimistically +// assume all registers to be unsafe before this basic block and warn about +// this fact in FunctionAnalysis::findUnsafeUses(). +if (BB.pred_empty()) + return createUnsafeState(); + return SrcState(); } @@ -659,12 +672,6 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { BC.MIB->removeAnnotation(I.second, StateAnnotationIndex); } - /// Creates a state with all registers marked unsafe (not to be confused - /// with empty state). - SrcState createUnsafeState() const { -return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); - } - public: CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId, @@ -1335,19 +1342,30 @@ void FunctionAnalysisContext::findUnsafeUses( BF.dump(); }); + if (BF.hasCFG()) { +// Warn on basic blocks being unreachable according to BOLT, as this +// likely means CFG is imprecise. +for (BinaryBasicBlock &BB : BF) { + if (!BB.pred_empty() || BB.isEntryPoint()) +continue; + // Arbitrarily attach the report to the first instruction of BB. + MCInst *InstToReport = BB.getFirstNonPseudoInstr(); + if (!InstToReport) +continue; // BB has no real instructions + + Reports.push_back( + make_generic_report(MCInstReference::get(InstToReport, BF), + "Warning: no predecessor basic blocks detected " + "(possibly incomplete CFG)")); +} + } + iterateOverInstrs(BF, [&](MCInstReference Inst) { if (BC.MIB->isCFI(Inst)) return; const SrcState &S = Analysis->getStateBefore(Inst); - -// If non-empty state was never propagated from the entry basic block -// to Inst, assume it to be unreachable and report a warning. -if (S.empty()) { - Reports.push_back( - make_generic_report(Inst, "Warning: unreachable instruction found")); - return; -} +assert(!S.empty() && "Instruction has no associated state"); if (auto Report = shouldReportReturnGadget(BC, Inst, S)) Reports.push_back(*Report); diff --git a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s index 284f0bea607a5..6559ba336e8de 100644 --- a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s +++ b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s @@ -215,12 +215,17 @@ f_callclobbered_calleesaved: .globl f_unreachable_instruction .type f_unreachable_instruction,@function f_unreachable_instruction: -// CHECK-LABEL: GS-PAUTH: Warning: unreachable instruction found in function f_unreachable_instruction, basic block {{[0-9a-zA-Z.]+}}, at address +// CHECK-LABEL: GS-PAUTH: Warning: no predecessor basic blocks detected (possibly incomplete CFG) in function f_unreachable_instruction, basic block {{[0-9a-zA-Z.]+}}, at address // CHECK-NEXT:The instruction is {{[0-9a-f]+}}: add x0, x1, x2 // CHECK-NOT: instructi
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: do not crash on debug-printing CFI instructions (PR #136151)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/136151 >From a0a9cdfe1b7dacc9193317ef2ef8e340076067e7 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 15 Apr 2025 21:47:18 +0300 Subject: [PATCH] [BOLT] Gadget scanner: do not crash on debug-printing CFI instructions Some instruction-printing code used under LLVM_DEBUG does not handle CFI instructions well. While CFI instructions seem to be harmless for the correctness of the analysis results, they do not convey any useful information to the analysis either, so skip them early. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 16 ++ .../AArch64/gs-pauth-debug-output.s | 32 +++ 2 files changed, 48 insertions(+) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 76070bd95cae1..7a86d9c4c9a59 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -432,6 +432,9 @@ class SrcSafetyAnalysis { } SrcState computeNext(const MCInst &Point, const SrcState &Cur) { +if (BC.MIB->isCFI(Point)) + return Cur; + SrcStatePrinter P(BC); LLVM_DEBUG({ dbgs() << " SrcSafetyAnalysis::ComputeNext("; @@ -675,6 +678,8 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { SrcState S = createEntryState(); for (auto &I : BF.instrs()) { MCInst &Inst = I.second; + if (BC.MIB->isCFI(Inst)) +continue; // If there is a label before this instruction, it is possible that it // can be jumped-to, thus conservatively resetting S. As an exception, @@ -952,6 +957,9 @@ class DstSafetyAnalysis { } DstState computeNext(const MCInst &Point, const DstState &Cur) { +if (BC.MIB->isCFI(Point)) + return Cur; + DstStatePrinter P(BC); LLVM_DEBUG({ dbgs() << " DstSafetyAnalysis::ComputeNext("; @@ -1128,6 +1136,8 @@ class CFGUnawareDstSafetyAnalysis : public DstSafetyAnalysis { DstState S = createUnsafeState(); for (auto &I : llvm::reverse(BF.instrs())) { MCInst &Inst = I.second; + if (BC.MIB->isCFI(Inst)) +continue; // If Inst can change the control flow, we cannot be sure that the next // instruction (to be executed in analyzed program) is the one processed @@ -1326,6 +1336,9 @@ void FunctionAnalysisContext::findUnsafeUses( }); iterateOverInstrs(BF, [&](MCInstReference Inst) { +if (BC.MIB->isCFI(Inst)) + return; + const SrcState &S = Analysis->getStateBefore(Inst); // If non-empty state was never propagated from the entry basic block @@ -1389,6 +1402,9 @@ void FunctionAnalysisContext::findUnsafeDefs( }); iterateOverInstrs(BF, [&](MCInstReference Inst) { +if (BC.MIB->isCFI(Inst)) + return; + const DstState &S = Analysis->getStateAfter(Inst); if (auto Report = shouldReportAuthOracle(BC, Inst, S)) diff --git a/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s b/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s index fd55880921d06..07b61bea77e94 100644 --- a/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s +++ b/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s @@ -329,6 +329,38 @@ auth_oracle: // PAUTH-EMPTY: // PAUTH-NEXT: Attaching leakage info to: : autia x0, x1 # DataflowDstSafetyAnalysis: dst-state +// Gadget scanner should not crash on CFI instructions, including when debug-printing them. +// Note that the particular debug output is not checked, but BOLT should be +// compiled with assertions enabled to support -debug-only argument. + +.globl cfi_inst_df +.type cfi_inst_df,@function +cfi_inst_df: +.cfi_startproc +sub sp, sp, #16 +.cfi_def_cfa_offset 16 +add sp, sp, #16 +.cfi_def_cfa_offset 0 +ret +.size cfi_inst_df, .-cfi_inst_df +.cfi_endproc + +.globl cfi_inst_nocfg +.type cfi_inst_nocfg,@function +cfi_inst_nocfg: +.cfi_startproc +sub sp, sp, #16 +.cfi_def_cfa_offset 16 + +adr x0, 1f +br x0 +1: +add sp, sp, #16 +.cfi_def_cfa_offset 0 +ret +.size cfi_inst_nocfg, .-cfi_inst_nocfg +.cfi_endproc + // CHECK-LABEL:Analyzing function main, AllocatorId = 1 .globl main .type main,@function ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: improve handling of unreachable basic blocks (PR #136183)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/136183 >From 352364dfc00d23111fc44bc807d3484bc67aff00 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Thu, 17 Apr 2025 20:51:16 +0300 Subject: [PATCH 1/2] [BOLT] Gadget scanner: improve handling of unreachable basic blocks Instead of refusing to analyze an instruction completely, when it is unreachable according to the CFG reconstructed by BOLT, pessimistically assume all registers to be unsafe at the start of basic blocks without any predecessors. Nevertheless, unreachable basic blocks found in optimized code likely means imprecise CFG reconstruction, thus report a warning once per basic block without predecessors. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 46 ++- .../AArch64/gs-pacret-autiasp.s | 7 ++- .../binary-analysis/AArch64/gs-pauth-calls.s | 57 +++ 3 files changed, 95 insertions(+), 15 deletions(-) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 7a86d9c4c9a59..917649731884e 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -344,6 +344,12 @@ class SrcSafetyAnalysis { return S; } + /// Creates a state with all registers marked unsafe (not to be confused + /// with empty state). + SrcState createUnsafeState() const { +return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); + } + BitVector getClobberedRegs(const MCInst &Point) const { BitVector Clobbered(NumRegs); // Assume a call can clobber all registers, including callee-saved @@ -586,6 +592,13 @@ class DataflowSrcSafetyAnalysis if (BB.isEntryPoint()) return createEntryState(); +// If a basic block without any predecessors is found in an optimized code, +// this likely means that some CFG edges were not detected. Pessimistically +// assume all registers to be unsafe before this basic block and warn about +// this fact in FunctionAnalysis::findUnsafeUses(). +if (BB.pred_empty()) + return createUnsafeState(); + return SrcState(); } @@ -659,12 +672,6 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { BC.MIB->removeAnnotation(I.second, StateAnnotationIndex); } - /// Creates a state with all registers marked unsafe (not to be confused - /// with empty state). - SrcState createUnsafeState() const { -return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); - } - public: CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId, @@ -1335,19 +1342,30 @@ void FunctionAnalysisContext::findUnsafeUses( BF.dump(); }); + if (BF.hasCFG()) { +// Warn on basic blocks being unreachable according to BOLT, as this +// likely means CFG is imprecise. +for (BinaryBasicBlock &BB : BF) { + if (!BB.pred_empty() || BB.isEntryPoint()) +continue; + // Arbitrarily attach the report to the first instruction of BB. + MCInst *InstToReport = BB.getFirstNonPseudoInstr(); + if (!InstToReport) +continue; // BB has no real instructions + + Reports.push_back( + make_generic_report(MCInstReference::get(InstToReport, BF), + "Warning: no predecessor basic blocks detected " + "(possibly incomplete CFG)")); +} + } + iterateOverInstrs(BF, [&](MCInstReference Inst) { if (BC.MIB->isCFI(Inst)) return; const SrcState &S = Analysis->getStateBefore(Inst); - -// If non-empty state was never propagated from the entry basic block -// to Inst, assume it to be unreachable and report a warning. -if (S.empty()) { - Reports.push_back( - make_generic_report(Inst, "Warning: unreachable instruction found")); - return; -} +assert(!S.empty() && "Instruction has no associated state"); if (auto Report = shouldReportReturnGadget(BC, Inst, S)) Reports.push_back(*Report); diff --git a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s index 284f0bea607a5..6559ba336e8de 100644 --- a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s +++ b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s @@ -215,12 +215,17 @@ f_callclobbered_calleesaved: .globl f_unreachable_instruction .type f_unreachable_instruction,@function f_unreachable_instruction: -// CHECK-LABEL: GS-PAUTH: Warning: unreachable instruction found in function f_unreachable_instruction, basic block {{[0-9a-zA-Z.]+}}, at address +// CHECK-LABEL: GS-PAUTH: Warning: no predecessor basic blocks detected (possibly incomplete CFG) in function f_unreachable_instruction, basic block {{[0-9a-zA-Z.]+}}, at address // CHECK-NEXT:The instruction is {{[0-9a-f]+}}: add x0, x1, x2 // CHECK-NOT: instructi
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: use more appropriate types (NFC) (PR #135661)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/135661 >From b942d6e9dc8858653e4669d6a1b8e50f0dc19061 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Mon, 14 Apr 2025 14:35:56 +0300 Subject: [PATCH 1/2] [BOLT] Gadget scanner: use more appropriate types (NFC) * use more flexible `const ArrayRef` and `StringRef` types instead of `const std::vector &` and `const std::string &`, correspondingly, for function arguments * return plain `const SrcState &` instead of `ErrorOr` from `SrcSafetyAnalysis::getStateBefore`, as absent state is not handled gracefully by any caller --- bolt/include/bolt/Passes/PAuthGadgetScanner.h | 8 +--- bolt/lib/Passes/PAuthGadgetScanner.cpp| 39 --- 2 files changed, 19 insertions(+), 28 deletions(-) diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h b/bolt/include/bolt/Passes/PAuthGadgetScanner.h index 6765e2aff414f..3e39b64e59e0f 100644 --- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h +++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h @@ -12,7 +12,6 @@ #include "bolt/Core/BinaryContext.h" #include "bolt/Core/BinaryFunction.h" #include "bolt/Passes/BinaryPasses.h" -#include "llvm/ADT/SmallSet.h" #include "llvm/Support/raw_ostream.h" #include @@ -199,9 +198,6 @@ raw_ostream &operator<<(raw_ostream &OS, const MCInstReference &); namespace PAuthGadgetScanner { -class SrcSafetyAnalysis; -struct SrcState; - /// Description of a gadget kind that can be detected. Intended to be /// statically allocated to be attached to reports by reference. class GadgetKind { @@ -210,7 +206,7 @@ class GadgetKind { public: GadgetKind(const char *Description) : Description(Description) {} - const StringRef getDescription() const { return Description; } + StringRef getDescription() const { return Description; } }; /// Base report located at some instruction, without any additional information. @@ -261,7 +257,7 @@ struct GadgetReport : public Report { /// Report with a free-form message attached. struct GenericReport : public Report { std::string Text; - GenericReport(MCInstReference Location, const std::string &Text) + GenericReport(MCInstReference Location, StringRef Text) : Report(Location), Text(Text) {} virtual void generateReport(raw_ostream &OS, const BinaryContext &BC) const override; diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 12eb9c66130b9..3d723456b6ffd 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -91,14 +91,14 @@ class TrackedRegisters { const std::vector Registers; std::vector RegToIndexMapping; - static size_t getMappingSize(const std::vector &RegsToTrack) { + static size_t getMappingSize(const ArrayRef RegsToTrack) { if (RegsToTrack.empty()) return 0; return 1 + *llvm::max_element(RegsToTrack); } public: - TrackedRegisters(const std::vector &RegsToTrack) + TrackedRegisters(const ArrayRef RegsToTrack) : Registers(RegsToTrack), RegToIndexMapping(getMappingSize(RegsToTrack), NoIndex) { for (unsigned I = 0; I < RegsToTrack.size(); ++I) @@ -234,7 +234,7 @@ struct SrcState { static void printLastInsts( raw_ostream &OS, -const std::vector> &LastInstWritingReg) { +const ArrayRef> LastInstWritingReg) { OS << "Insts: "; for (unsigned I = 0; I < LastInstWritingReg.size(); ++I) { auto &Set = LastInstWritingReg[I]; @@ -295,7 +295,7 @@ void SrcStatePrinter::print(raw_ostream &OS, const SrcState &S) const { class SrcSafetyAnalysis { public: SrcSafetyAnalysis(BinaryFunction &BF, -const std::vector &RegsToTrackInstsFor) +const ArrayRef RegsToTrackInstsFor) : BC(BF.getBinaryContext()), NumRegs(BC.MRI->getNumRegs()), RegsToTrackInstsFor(RegsToTrackInstsFor) {} @@ -303,11 +303,10 @@ class SrcSafetyAnalysis { static std::shared_ptr create(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId, - const std::vector &RegsToTrackInstsFor); + const ArrayRef RegsToTrackInstsFor); virtual void run() = 0; - virtual ErrorOr - getStateBefore(const MCInst &Inst) const = 0; + virtual const SrcState &getStateBefore(const MCInst &Inst) const = 0; protected: BinaryContext &BC; @@ -347,7 +346,7 @@ class SrcSafetyAnalysis { } BitVector getClobberedRegs(const MCInst &Point) const { -BitVector Clobbered(NumRegs, false); +BitVector Clobbered(NumRegs); // Assume a call can clobber all registers, including callee-saved // registers. There's a good chance that callee-saved registers will be // saved on the stack at some point during execution of the callee. @@ -409,8 +408,7 @@ class SrcSafetyAnalysis { // FirstCheckerInst should belong to the same basic block (see the // assertion in DataflowSrcSafetyAnalysis::run()), meaning it was
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: clarify MCPlusBuilder callbacks interface (PR #136147)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/136147 >From 59afabd2a0b89f6e529f58cd17ac56673e2c0599 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Thu, 17 Apr 2025 15:40:05 +0300 Subject: [PATCH] [BOLT] Gadget scanner: clarify MCPlusBuilder callbacks interface Clarify the semantics of `getAuthenticatedReg` and remove a redundant `isAuthenticationOfReg` method, as combined auth+something instructions (such as `retaa` on AArch64) should be handled carefully, especially when searching for authentication oracles: usually, such instructions cannot be authentication oracles and only some of them actually write an authenticated pointer to a register (such as "ldra x0, [x1]!"). Use `std::optional` returned type instead of plain MCPhysReg and returning `getNoRegister()` as a "not applicable" indication. Document a few existing methods, add information about preconditions. --- bolt/include/bolt/Core/MCPlusBuilder.h| 61 ++- bolt/lib/Passes/PAuthGadgetScanner.cpp| 64 +--- .../Target/AArch64/AArch64MCPlusBuilder.cpp | 76 --- .../AArch64/gs-pauth-debug-output.s | 3 - .../AArch64/gs-pauth-signing-oracles.s| 20 + 5 files changed, 130 insertions(+), 94 deletions(-) diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h index 132d58f3f9f79..83ad70ea97076 100644 --- a/bolt/include/bolt/Core/MCPlusBuilder.h +++ b/bolt/include/bolt/Core/MCPlusBuilder.h @@ -562,30 +562,50 @@ class MCPlusBuilder { return {}; } - virtual ErrorOr getAuthenticatedReg(const MCInst &Inst) const { -llvm_unreachable("not implemented"); -return getNoRegister(); - } - - virtual bool isAuthenticationOfReg(const MCInst &Inst, - MCPhysReg AuthenticatedReg) const { + /// Returns the register where an authenticated pointer is written to by Inst, + /// or std::nullopt if not authenticating any register. + /// + /// Sets IsChecked if the instruction always checks authenticated pointer, + /// i.e. it either returns a successfully authenticated pointer or terminates + /// the program abnormally (such as "ldra x0, [x1]!" on AArch64, which crashes + /// on authentication failure even if FEAT_FPAC is not implemented). + virtual std::optional + getWrittenAuthenticatedReg(const MCInst &Inst, bool &IsChecked) const { llvm_unreachable("not implemented"); -return false; +return std::nullopt; } - virtual MCPhysReg getSignedReg(const MCInst &Inst) const { + /// Returns the register signed by Inst, or std::nullopt if not signing any + /// register. + /// + /// The returned register is assumed to be both input and output operand, + /// as it is done on AArch64. + virtual std::optional getSignedReg(const MCInst &Inst) const { llvm_unreachable("not implemented"); -return getNoRegister(); +return std::nullopt; } - virtual ErrorOr getRegUsedAsRetDest(const MCInst &Inst) const { + /// Returns the register used as a return address. Returns std::nullopt if + /// not applicable, such as reading the return address from a system register + /// or from the stack. + /// + /// Sets IsAuthenticatedInternally if the instruction accepts a signed + /// pointer as its operand and authenticates it internally. + /// + /// Should only be called when isReturn(Inst) is true. + virtual std::optional + getRegUsedAsRetDest(const MCInst &Inst, + bool &IsAuthenticatedInternally) const { llvm_unreachable("not implemented"); -return getNoRegister(); +return std::nullopt; } /// Returns the register used as the destination of an indirect branch or call /// instruction. Sets IsAuthenticatedInternally if the instruction accepts /// a signed pointer as its operand and authenticates it internally. + /// + /// Should only be called if isIndirectCall(Inst) or isIndirectBranch(Inst) + /// returns true. virtual MCPhysReg getRegUsedAsIndirectBranchDest(const MCInst &Inst, bool &IsAuthenticatedInternally) const { @@ -602,14 +622,14 @@ class MCPlusBuilder { ///controlled, under the Pointer Authentication threat model. /// /// If the instruction does not write to any register satisfying the above - /// two conditions, NoRegister is returned. + /// two conditions, std::nullopt is returned. /// /// The Pointer Authentication threat model assumes an attacker is able to /// modify any writable memory, but not executable code (due to W^X). - virtual MCPhysReg + virtual std::optional getMaterializedAddressRegForPtrAuth(const MCInst &Inst) const { llvm_unreachable("not implemented"); -return getNoRegister(); +return std::nullopt; } /// Analyzes if this instruction can safely perform address arithmetics @@ -622,10 +642,13 @@ class MCPlusBuilder { /// controlled, provided InReg and executa
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/137224 >From 33158bcada96af1d201a5cb5342f0256f30894cd Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 22 Apr 2025 21:43:14 +0300 Subject: [PATCH] [BOLT] Gadget scanner: detect untrusted LR before tail call Implement the detection of tail calls performed with untrusted link register, which violates the assumption made on entry to every function. Unlike other pauth gadgets, this one involves some amount of guessing which branch instructions should be checked as tail calls. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 112 +++- .../AArch64/gs-pacret-autiasp.s | 31 +- .../AArch64/gs-pauth-debug-output.s | 30 +- .../AArch64/gs-pauth-tail-calls.s | 597 ++ 4 files changed, 730 insertions(+), 40 deletions(-) create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-tail-calls.s diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index d4282daad648b..fab92947f6d67 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -660,8 +660,9 @@ class DataflowSrcSafetyAnalysis // // Then, a function can be split into a number of disjoint contiguous sequences // of instructions without labels in between. These sequences can be processed -// the same way basic blocks are processed by data-flow analysis, assuming -// pessimistically that all registers are unsafe at the start of each sequence. +// the same way basic blocks are processed by data-flow analysis, with the same +// pessimistic estimation of the initial state at the start of each sequence +// (except the first instruction of the function). class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { BinaryFunction &BF; MCPlusBuilder::AllocatorIdTy AllocId; @@ -672,6 +673,30 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { BC.MIB->removeAnnotation(I.second, StateAnnotationIndex); } + /// Compute a reasonably pessimistic estimation of the register state when + /// the previous instruction is not known for sure. Take the set of registers + /// which are trusted at function entry and remove all registers that can be + /// clobbered inside this function. + SrcState computePessimisticState(BinaryFunction &BF) { +BitVector ClobberedRegs(NumRegs); +for (auto &I : BF.instrs()) { + MCInst &Inst = I.second; + BC.MIB->getClobberedRegs(Inst, ClobberedRegs); + + // If this is a call instruction, no register is safe anymore, unless + // it is a tail call. Ignore tail calls for the purpose of estimating the + // worst-case scenario, assuming no instructions are executed in the + // caller after this point anyway. + if (BC.MIB->isCall(Inst) && !BC.MIB->isTailCall(Inst)) +ClobberedRegs.set(); +} + +SrcState S = createEntryState(); +S.SafeToDerefRegs.reset(ClobberedRegs); +S.TrustedRegs.reset(ClobberedRegs); +return S; + } + public: CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId, @@ -682,6 +707,7 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { } void run() override { +const SrcState DefaultState = computePessimisticState(BF); SrcState S = createEntryState(); for (auto &I : BF.instrs()) { MCInst &Inst = I.second; @@ -696,7 +722,7 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { LLVM_DEBUG({ traceInst(BC, "Due to label, resetting the state before", Inst); }); -S = createUnsafeState(); +S = DefaultState; } // Check if we need to remove an old annotation (this is the case if @@ -1233,6 +1259,83 @@ shouldReportReturnGadget(const BinaryContext &BC, const MCInstReference &Inst, return make_gadget_report(RetKind, Inst, *RetReg); } +/// While BOLT already marks some of the branch instructions as tail calls, +/// this function tries to improve the coverage by including less obvious cases +/// when it is possible to do without introducing too many false positives. +static bool shouldAnalyzeTailCallInst(const BinaryContext &BC, + const BinaryFunction &BF, + const MCInstReference &Inst) { + // Some BC.MIB->isXYZ(Inst) methods simply delegate to MCInstrDesc::isXYZ() + // (such as isBranch at the time of writing this comment), some don't (such + // as isCall). For that reason, call MCInstrDesc's methods explicitly when + // it is important. + const MCInstrDesc &Desc = + BC.MII->get(static_cast(Inst).getOpcode()); + // Tail call should be a branch (but not necessarily an indirect one). + if (!Desc.isBranch()) +return false; + + // Always analyze the branches already marked as tail calls by BOLT. + if (BC.MIB->isTailCall(Inst)) +return t
[llvm-branch-commits] [llvm] [AArch64][llvm] Codegen for 16/32/64-bit floating-point atomicrmw fminimum/faximum (PR #137703)
https://github.com/CarolineConcatto approved this pull request. Thank you Jonathan, LGTM! https://github.com/llvm/llvm-project/pull/137703 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -986,11 +986,23 @@ InstructionCost TargetTransformInfo::getShuffleCost( TargetTransformInfo::PartialReductionExtendKind TargetTransformInfo::getPartialReductionExtendKind(Instruction *I) { - if (isa(I)) -return PR_SignExtend; - if (isa(I)) + auto *Cast = dyn_cast(I); + if (!Cast) +return PR_None; + return getPartialReductionExtendKind(Cast->getOpcode()); +} + +TargetTransformInfo::PartialReductionExtendKind +TargetTransformInfo::getPartialReductionExtendKind( +Instruction::CastOps ExtOpcode) { + switch (ExtOpcode) { + case Instruction::CastOps::ZExt: return PR_ZeroExtend; - return PR_None; + case Instruction::CastOps::SExt: +return PR_SignExtend; + default: +return PR_None; SamTebbs33 wrote: Done. https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -986,11 +986,23 @@ InstructionCost TargetTransformInfo::getShuffleCost( TargetTransformInfo::PartialReductionExtendKind TargetTransformInfo::getPartialReductionExtendKind(Instruction *I) { - if (isa(I)) -return PR_SignExtend; - if (isa(I)) + auto *Cast = dyn_cast(I); + if (!Cast) +return PR_None; + return getPartialReductionExtendKind(Cast->getOpcode()); SamTebbs33 wrote: Done. https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -2432,12 +2437,40 @@ static void tryToCreateAbstractReductionRecipe(VPReductionRecipe *Red, Red->replaceAllUsesWith(AbstractR); } +/// This function tries to create an abstract recipe from a partial reduction to +/// hide its mul and extends from cost estimation. +static void +tryToCreateAbstractPartialReductionRecipe(VPPartialReductionRecipe *PRed) { SamTebbs33 wrote: At this point we've already created the partial reduction and clamped the range so I don't think we need to do any costing (like `tryToMatchAndCreateMulAccumulateReduction` does with `getMulAccReductionCost`) since we already know it's worthwhile (see `getScaledReductions` in LoopVectorize.cpp). This part of the code just puts the partial reduction inside the abstract recipe, which shouldn't need to consider any costing. https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: account for BRK when searching for auth oracles (PR #137975)
https://github.com/atrosinenko ready_for_review https://github.com/llvm/llvm-project/pull/137975 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect authentication oracles (PR #135663)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/135663 >From 46e3e96a9c0ffab84ee7003621041b7bf76f3d78 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Sat, 5 Apr 2025 14:54:01 +0300 Subject: [PATCH 1/4] [BOLT] Gadget scanner: detect authentication oracles Implement the detection of authentication instructions whose results can be inspected by an attacker to know whether authentication succeeded. As the properties of output registers of authentication instructions are inspected, add a second set of analysis-related classes to iterate over the instructions in reverse order. --- bolt/include/bolt/Passes/PAuthGadgetScanner.h | 12 + bolt/lib/Passes/PAuthGadgetScanner.cpp| 543 + .../AArch64/gs-pauth-authentication-oracles.s | 723 ++ .../AArch64/gs-pauth-debug-output.s | 78 ++ 4 files changed, 1356 insertions(+) create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-authentication-oracles.s diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h b/bolt/include/bolt/Passes/PAuthGadgetScanner.h index a0dd33ff88d6f..73198be81f396 100644 --- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h +++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h @@ -286,6 +286,15 @@ class ClobberingInfo : public ExtraInfo { void print(raw_ostream &OS, const MCInstReference Location) const override; }; +class LeakageInfo : public ExtraInfo { + SmallVector LeakingInstrs; + +public: + LeakageInfo(const ArrayRef Instrs) : LeakingInstrs(Instrs) {} + + void print(raw_ostream &OS, const MCInstReference Location) const override; +}; + /// A brief version of a report that can be further augmented with the details. /// /// A half-baked report produced on the first run of the analysis. An extra, @@ -326,6 +335,9 @@ class FunctionAnalysisContext { void findUnsafeUses(SmallVector> &Reports); void augmentUnsafeUseReports(ArrayRef> Reports); + void findUnsafeDefs(SmallVector> &Reports); + void augmentUnsafeDefReports(ArrayRef> Reports); + /// Process the reports which do not have to be augmented, and remove them /// from Reports. void handleSimpleReports(SmallVector> &Reports); diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 24ad4844ffb7d..1dd2e3824e385 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -717,6 +717,459 @@ SrcSafetyAnalysis::create(BinaryFunction &BF, RegsToTrackInstsFor); } +/// A state representing which registers are safe to be used as the destination +/// operand of an authentication instruction. +/// +/// Similar to SrcState, it is the analysis that should take register aliasing +/// into account. +/// +/// Depending on the implementation, it may be possible that an authentication +/// instruction returns an invalid pointer on failure instead of terminating +/// the program immediately (assuming the program will crash as soon as that +/// pointer is dereferenced). To prevent brute-forcing the correct signature, +/// it should be impossible for an attacker to test if a pointer is correctly +/// signed - either the program should be terminated on authentication failure +/// or it should be impossible to tell whether authentication succeeded or not. +/// +/// For that reason, a restricted set of operations is allowed on any register +/// containing a value derived from the result of an authentication instruction +/// until that register is either wiped or checked not to contain a result of a +/// failed authentication. +/// +/// Specifically, the safety property for a register is computed by iterating +/// the instructions in backward order: the source register Xn of an instruction +/// Inst is safe if at least one of the following is true: +/// * Inst checks if Xn contains the result of a successful authentication and +/// terminates the program on failure. Note that Inst can either naturally +/// dereference Xn (load, branch, return, etc. instructions) or be the first +/// instruction of an explicit checking sequence. +/// * Inst performs safe address arithmetic AND both source and result +/// registers, as well as any temporary registers, must be safe after +/// execution of Inst (temporaries are not used on AArch64 and thus not +/// currently supported/allowed). +/// See MCPlusBuilder::analyzeAddressArithmeticsForPtrAuth for the details. +/// * Inst fully overwrites Xn with an unrelated value. +struct DstState { + /// The set of registers whose values cannot be inspected by an attacker in + /// a way usable as an authentication oracle. The results of authentication + /// instructions should be written to such registers. + BitVector CannotEscapeUnchecked; + + std::vector> FirstInstLeakingReg; + + /// Construct an empty state. + DstState() {} + + DstState(unsigned NumRegs, unsigned NumRegsToTrack) + : Cannot
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: account for BRK when searching for auth oracles (PR #137975)
https://github.com/atrosinenko created https://github.com/llvm/llvm-project/pull/137975 An authenticated pointer can be explicitly checked by the compiler via a sequence of instructions that executes BRK on failure. It is important to recognize such BRK instruction as checking every register (as it is expected to immediately trigger an abnormal program termination) to prevent false positive reports about authentication oracles: autia x2, x3 autia x0, x1 ; neither x0 nor x2 are checked at this point eor x16, x0, x0, lsl #1 tbz x16, #62, on_success ; marks x0 as checked ; end of BB: for x2 to be checked here, it must be checked in both ; successor basic blocks on_failure: brk 0xc470 on_success: ; x2 is checked ldr x1, [x2] ; marks x2 as checked >From 5d0aa042415f6b4321e5d278ce435ceef241dd3e Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Wed, 30 Apr 2025 16:08:10 +0300 Subject: [PATCH] [BOLT] Gadget scanner: account for BRK when searching for auth oracles An authenticated pointer can be explicitly checked by the compiler via a sequence of instructions that executes BRK on failure. It is important to recognize such BRK instruction as checking every register (as it is expected to immediately trigger an abnormal program termination) to prevent false positive reports about authentication oracles: autia x2, x3 autia x0, x1 ; neither x0 nor x2 are checked at this point eor x16, x0, x0, lsl #1 tbz x16, #62, on_success ; marks x0 as checked ; end of BB: for x2 to be checked here, it must be checked in both ; successor basic blocks on_failure: brk 0xc470 on_success: ; x2 is checked ldr x1, [x2] ; marks x2 as checked --- bolt/include/bolt/Core/MCPlusBuilder.h| 14 ++ bolt/lib/Passes/PAuthGadgetScanner.cpp| 13 +- .../Target/AArch64/AArch64MCPlusBuilder.cpp | 24 -- .../AArch64/gs-pauth-address-checks.s | 44 +-- .../AArch64/gs-pauth-authentication-oracles.s | 9 ++-- .../AArch64/gs-pauth-signing-oracles.s| 6 +-- 6 files changed, 75 insertions(+), 35 deletions(-) diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h index 83ad70ea97076..b0484964cfbad 100644 --- a/bolt/include/bolt/Core/MCPlusBuilder.h +++ b/bolt/include/bolt/Core/MCPlusBuilder.h @@ -706,6 +706,20 @@ class MCPlusBuilder { return false; } + /// Returns true if Inst is a trap instruction. + /// + /// Tests if Inst is an instruction that immediately causes an abnormal + /// program termination, for example when a security violation is detected + /// by a compiler-inserted check. + /// + /// @note An implementation of this method should likely return false for + /// calls to library functions like abort(), as it is possible that the + /// execution state is partially attacker-controlled at this point. + virtual bool isTrap(const MCInst &Inst) const { +llvm_unreachable("not implemented"); +return false; + } + virtual bool isBreakpoint(const MCInst &Inst) const { llvm_unreachable("not implemented"); return false; diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index fab92947f6d67..20d4fdf42f03a 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -1003,6 +1003,15 @@ class DstSafetyAnalysis { dbgs() << ")\n"; }); +// If this instruction terminates the program immediately, no +// authentication oracles are possible past this point. +if (BC.MIB->isTrap(Point)) { + LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); }); + DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); + Next.CannotEscapeUnchecked.set(); + return Next; +} + // If this instruction is reachable by the analysis, a non-empty state will // be propagated to it sooner or later. Until then, skip computeNext(). if (Cur.empty()) { @@ -1108,8 +1117,8 @@ class DataflowDstSafetyAnalysis // // A basic block without any successors, on the other hand, can be // pessimistically initialized to everything-is-unsafe: this will naturally -// handle both return and tail call instructions and is harmless for -// internal indirect branch instructions (such as computed gotos). +// handle return, trap and tail call instructions. At the same time, it is +// harmless for internal indirect branch instructions, like computed gotos. if (BB.succ_empty()) return createUnsafeState(); diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp index f3c29e6ee43b9..4d11c5b206eab 100644 --- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp +++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp @@ -386,10 +386,9 @@ class AArch64MCPlusBuilder : public MCPlusB
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: account for BRK when searching for auth oracles (PR #137975)
llvmbot wrote: @llvm/pr-subscribers-bolt Author: Anatoly Trosinenko (atrosinenko) Changes An authenticated pointer can be explicitly checked by the compiler via a sequence of instructions that executes BRK on failure. It is important to recognize such BRK instruction as checking every register (as it is expected to immediately trigger an abnormal program termination) to prevent false positive reports about authentication oracles: autia x2, x3 autia x0, x1 ; neither x0 nor x2 are checked at this point eor x16, x0, x0, lsl #1 tbz x16, #62, on_success ; marks x0 as checked ; end of BB: for x2 to be checked here, it must be checked in both ; successor basic blocks on_failure: brk 0xc470 on_success: ; x2 is checked ldr x1, [x2] ; marks x2 as checked --- Full diff: https://github.com/llvm/llvm-project/pull/137975.diff 6 Files Affected: - (modified) bolt/include/bolt/Core/MCPlusBuilder.h (+14) - (modified) bolt/lib/Passes/PAuthGadgetScanner.cpp (+11-2) - (modified) bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp (+21-3) - (modified) bolt/test/binary-analysis/AArch64/gs-pauth-address-checks.s (+22-22) - (modified) bolt/test/binary-analysis/AArch64/gs-pauth-authentication-oracles.s (+4-5) - (modified) bolt/test/binary-analysis/AArch64/gs-pauth-signing-oracles.s (+3-3) ``diff diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h index 83ad70ea97076..b0484964cfbad 100644 --- a/bolt/include/bolt/Core/MCPlusBuilder.h +++ b/bolt/include/bolt/Core/MCPlusBuilder.h @@ -706,6 +706,20 @@ class MCPlusBuilder { return false; } + /// Returns true if Inst is a trap instruction. + /// + /// Tests if Inst is an instruction that immediately causes an abnormal + /// program termination, for example when a security violation is detected + /// by a compiler-inserted check. + /// + /// @note An implementation of this method should likely return false for + /// calls to library functions like abort(), as it is possible that the + /// execution state is partially attacker-controlled at this point. + virtual bool isTrap(const MCInst &Inst) const { +llvm_unreachable("not implemented"); +return false; + } + virtual bool isBreakpoint(const MCInst &Inst) const { llvm_unreachable("not implemented"); return false; diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index fab92947f6d67..20d4fdf42f03a 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -1003,6 +1003,15 @@ class DstSafetyAnalysis { dbgs() << ")\n"; }); +// If this instruction terminates the program immediately, no +// authentication oracles are possible past this point. +if (BC.MIB->isTrap(Point)) { + LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); }); + DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); + Next.CannotEscapeUnchecked.set(); + return Next; +} + // If this instruction is reachable by the analysis, a non-empty state will // be propagated to it sooner or later. Until then, skip computeNext(). if (Cur.empty()) { @@ -1108,8 +1117,8 @@ class DataflowDstSafetyAnalysis // // A basic block without any successors, on the other hand, can be // pessimistically initialized to everything-is-unsafe: this will naturally -// handle both return and tail call instructions and is harmless for -// internal indirect branch instructions (such as computed gotos). +// handle return, trap and tail call instructions. At the same time, it is +// harmless for internal indirect branch instructions, like computed gotos. if (BB.succ_empty()) return createUnsafeState(); diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp index f3c29e6ee43b9..4d11c5b206eab 100644 --- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp +++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp @@ -386,10 +386,9 @@ class AArch64MCPlusBuilder : public MCPlusBuilder { // the list of successors of this basic block as appropriate. // Any of the above code sequences assume the fall-through basic block -// is a dead-end BRK instruction (any immediate operand is accepted). +// is a dead-end trap instruction. const BinaryBasicBlock *BreakBB = BB.getFallthrough(); -if (!BreakBB || BreakBB->empty() || -BreakBB->front().getOpcode() != AArch64::BRK) +if (!BreakBB || BreakBB->empty() || !isTrap(BreakBB->front())) return std::nullopt; // Iterate over the instructions of BB in reverse order, matching opcodes @@ -1745,6 +1744,25 @@ class AArch64MCPlusBuilder : public MCPlusBuilder { Inst.addOperand(MCOperand::createImm(0)); } + bool isTrap(const MCInst &Inst) const override { +if (
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: account for BRK when searching for auth oracles (PR #137975)
atrosinenko wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/137975?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#137975** https://app.graphite.dev/github/pr/llvm/llvm-project/137975?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> ๐ https://app.graphite.dev/github/pr/llvm/llvm-project/137975?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#137224** https://app.graphite.dev/github/pr/llvm/llvm-project/137224?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#136183** https://app.graphite.dev/github/pr/llvm/llvm-project/136183?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#136151** https://app.graphite.dev/github/pr/llvm/llvm-project/136151?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#135663** https://app.graphite.dev/github/pr/llvm/llvm-project/135663?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#136147** https://app.graphite.dev/github/pr/llvm/llvm-project/136147?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#135662** https://app.graphite.dev/github/pr/llvm/llvm-project/135662?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#135661** https://app.graphite.dev/github/pr/llvm/llvm-project/135661?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#134146** https://app.graphite.dev/github/pr/llvm/llvm-project/134146?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#133461** https://app.graphite.dev/github/pr/llvm/llvm-project/133461?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#135073** https://app.graphite.dev/github/pr/llvm/llvm-project/135073?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/137975 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: account for BRK when searching for auth oracles (PR #137975)
https://github.com/atrosinenko edited https://github.com/llvm/llvm-project/pull/137975 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Recommit "[ConstraintElim] Simplify cmp after uadd.sat/usub.sat (#135603)" (PR #136467)
el-ev wrote: ### Merge activity * **Apr 30, 9:40 AM EDT**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/136467). https://github.com/llvm/llvm-project/pull/136467 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -4923,9 +4923,7 @@ InstructionCost AArch64TTIImpl::getPartialReductionCost( return Invalid; break; case 16: - if (AccumEVT == MVT::i64) -Cost *= 2; - else if (AccumEVT != MVT::i32) + if (AccumEVT != MVT::i32) sdesmalen-arm wrote: ```llvm.partial.reduce(nxv2i64 %acc, mul(zext(nxv16i8 %a) -> nxv16i64, zext(nxv16i8 %b) -> nxv16i64))``` This case can be lowered with the correct support in selectiondag (see https://github.com/llvm/llvm-project/pull/130935#discussion_r2068162662) I'd argue that this function will need a bit of a rewrite anyway once we get proper support in SelectionDAG for legalising these operations, so I wouldn't bother changing that in this PR. https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -2056,55 +2056,6 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe, } }; -/// A recipe for forming partial reductions. In the loop, an accumulator and sdesmalen-arm wrote: (We discussed this offline) swapping the operands in the debug-print function of the recipe is not something that really concerns me, and I think there's still value making this functionally (from end-user perspective) NFC change. https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -986,11 +986,23 @@ InstructionCost TargetTransformInfo::getShuffleCost( TargetTransformInfo::PartialReductionExtendKind TargetTransformInfo::getPartialReductionExtendKind(Instruction *I) { - if (isa(I)) -return PR_SignExtend; - if (isa(I)) + auto *Cast = dyn_cast(I); + if (!Cast) +return PR_None; + return getPartialReductionExtendKind(Cast->getOpcode()); +} + +TargetTransformInfo::PartialReductionExtendKind +TargetTransformInfo::getPartialReductionExtendKind( +Instruction::CastOps ExtOpcode) { + switch (ExtOpcode) { + case Instruction::CastOps::ZExt: return PR_ZeroExtend; - return PR_None; + case Instruction::CastOps::SExt: +return PR_SignExtend; + default: +return PR_None; sdesmalen-arm wrote: I think the default case should be an `llvm_unreachable()`, because only zero/sign-extend are supported partial reduction extends. The `PR_None` is handled by the function above that takes an `Instruction*`. https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -986,11 +986,23 @@ InstructionCost TargetTransformInfo::getShuffleCost( TargetTransformInfo::PartialReductionExtendKind TargetTransformInfo::getPartialReductionExtendKind(Instruction *I) { - if (isa(I)) -return PR_SignExtend; - if (isa(I)) + auto *Cast = dyn_cast(I); + if (!Cast) +return PR_None; + return getPartialReductionExtendKind(Cast->getOpcode()); sdesmalen-arm wrote: ```suggestion if (auto *Cast = dyn_cast(I)) return getPartialReductionExtendKind(Cast->getOpcode()); return PR_None; ``` https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -2432,12 +2437,40 @@ static void tryToCreateAbstractReductionRecipe(VPReductionRecipe *Red, Red->replaceAllUsesWith(AbstractR); } +/// This function tries to create an abstract recipe from a partial reduction to +/// hide its mul and extends from cost estimation. +static void +tryToCreateAbstractPartialReductionRecipe(VPPartialReductionRecipe *PRed) { sdesmalen-arm wrote: Does this need to be given the `Range &` and for that range to be clamped if it doesn't match or if the cost is higher than the individual operations (similar to what happens in `tryToCreateAbstractReductionRecipe`) ? (note that the cost part is still missing) https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] release/20.x: [flang] Exempt construct entities from SAVE check for PURE (#131383) (PR #137752)
https://github.com/eugeneepshteyn approved this pull request. https://github.com/llvm/llvm-project/pull/137752 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -2432,12 +2437,40 @@ static void tryToCreateAbstractReductionRecipe(VPReductionRecipe *Red, Red->replaceAllUsesWith(AbstractR); } +/// This function tries to create an abstract recipe from a partial reduction to +/// hide its mul and extends from cost estimation. +static void +tryToCreateAbstractPartialReductionRecipe(VPPartialReductionRecipe *PRed) { + if (PRed->getOpcode() != Instruction::Add) +return; + + VPRecipeBase *BinOpR = PRed->getBinOp()->getDefiningRecipe(); + auto *BinOp = dyn_cast(BinOpR); + if (!BinOp || BinOp->getOpcode() != Instruction::Mul) +return; SamTebbs33 wrote: Done :+1: . https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AutoUpgrade][AMDGPU] Adjust AS7 address width to 48 bits (PR #137418)
krzysz00 wrote: So p9 is quite weird. It's used, IIRC, by, https://github.com/GPUOpen-Drivers/llpc internally to represent "structured" pointers, and so its interpretation is `{p8 rsrc, i32 index, i32 offset}` . This does still have a 48-bit effective address by way of a thoroughly weird ptrtoaddr function that's, in it's full generality, a function of the resource flags (see swizzles) --- As to ptrmask on p8 ... if ptrmask is only supposed to affect the Index bits, then I probably shouldn't be emitting it during the p7 => p8 process and only applying a mask to the offset. But that hits the same weirdness that masking for alignment doesn't work as expected if the base pointer isn't quite aligned enough. (I can't think of many instances where that base pointer will, for example, be odd, but I'm not aware of anything saying it's not allowed. Though I'd personally be OK with documenting around that) I think most codegen - including down in LLPC - assumes the base is at least `align 4`, which is the largest value where you sometimes need to change behavior, so you can quietly figure that the alignment of the offset is *basically* the alignment of the pointer --- Actually, more general question on this address with stuff: x86 thread local storage. That's a form of pointer with an implicit base that can mess up your alignment checks - how's that work with ptrtoaddr? https://github.com/llvm/llvm-project/pull/137418 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect authentication oracles (PR #135663)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/135663 >From 46e3e96a9c0ffab84ee7003621041b7bf76f3d78 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Sat, 5 Apr 2025 14:54:01 +0300 Subject: [PATCH 1/4] [BOLT] Gadget scanner: detect authentication oracles Implement the detection of authentication instructions whose results can be inspected by an attacker to know whether authentication succeeded. As the properties of output registers of authentication instructions are inspected, add a second set of analysis-related classes to iterate over the instructions in reverse order. --- bolt/include/bolt/Passes/PAuthGadgetScanner.h | 12 + bolt/lib/Passes/PAuthGadgetScanner.cpp| 543 + .../AArch64/gs-pauth-authentication-oracles.s | 723 ++ .../AArch64/gs-pauth-debug-output.s | 78 ++ 4 files changed, 1356 insertions(+) create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-authentication-oracles.s diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h b/bolt/include/bolt/Passes/PAuthGadgetScanner.h index a0dd33ff88d6f..73198be81f396 100644 --- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h +++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h @@ -286,6 +286,15 @@ class ClobberingInfo : public ExtraInfo { void print(raw_ostream &OS, const MCInstReference Location) const override; }; +class LeakageInfo : public ExtraInfo { + SmallVector LeakingInstrs; + +public: + LeakageInfo(const ArrayRef Instrs) : LeakingInstrs(Instrs) {} + + void print(raw_ostream &OS, const MCInstReference Location) const override; +}; + /// A brief version of a report that can be further augmented with the details. /// /// A half-baked report produced on the first run of the analysis. An extra, @@ -326,6 +335,9 @@ class FunctionAnalysisContext { void findUnsafeUses(SmallVector> &Reports); void augmentUnsafeUseReports(ArrayRef> Reports); + void findUnsafeDefs(SmallVector> &Reports); + void augmentUnsafeDefReports(ArrayRef> Reports); + /// Process the reports which do not have to be augmented, and remove them /// from Reports. void handleSimpleReports(SmallVector> &Reports); diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 24ad4844ffb7d..1dd2e3824e385 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -717,6 +717,459 @@ SrcSafetyAnalysis::create(BinaryFunction &BF, RegsToTrackInstsFor); } +/// A state representing which registers are safe to be used as the destination +/// operand of an authentication instruction. +/// +/// Similar to SrcState, it is the analysis that should take register aliasing +/// into account. +/// +/// Depending on the implementation, it may be possible that an authentication +/// instruction returns an invalid pointer on failure instead of terminating +/// the program immediately (assuming the program will crash as soon as that +/// pointer is dereferenced). To prevent brute-forcing the correct signature, +/// it should be impossible for an attacker to test if a pointer is correctly +/// signed - either the program should be terminated on authentication failure +/// or it should be impossible to tell whether authentication succeeded or not. +/// +/// For that reason, a restricted set of operations is allowed on any register +/// containing a value derived from the result of an authentication instruction +/// until that register is either wiped or checked not to contain a result of a +/// failed authentication. +/// +/// Specifically, the safety property for a register is computed by iterating +/// the instructions in backward order: the source register Xn of an instruction +/// Inst is safe if at least one of the following is true: +/// * Inst checks if Xn contains the result of a successful authentication and +/// terminates the program on failure. Note that Inst can either naturally +/// dereference Xn (load, branch, return, etc. instructions) or be the first +/// instruction of an explicit checking sequence. +/// * Inst performs safe address arithmetic AND both source and result +/// registers, as well as any temporary registers, must be safe after +/// execution of Inst (temporaries are not used on AArch64 and thus not +/// currently supported/allowed). +/// See MCPlusBuilder::analyzeAddressArithmeticsForPtrAuth for the details. +/// * Inst fully overwrites Xn with an unrelated value. +struct DstState { + /// The set of registers whose values cannot be inspected by an attacker in + /// a way usable as an authentication oracle. The results of authentication + /// instructions should be written to such registers. + BitVector CannotEscapeUnchecked; + + std::vector> FirstInstLeakingReg; + + /// Construct an empty state. + DstState() {} + + DstState(unsigned NumRegs, unsigned NumRegsToTrack) + : Cannot
[llvm-branch-commits] [llvm] [AArch64][llvm] Pre-commit tests for #137703 (NFC) (PR #137702)
https://github.com/CarolineConcatto approved this pull request. I could not spot any significant difference between fmin and fminimum. So I believe this is fine. Thank you for the work Jonathan https://github.com/llvm/llvm-project/pull/137702 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [AArch64][SME] Prevent spills of ZT0 when ZA is not enabled (PR #137683)
MacDue wrote: @sdesmalen-arm What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/137683 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [GOFF] Add writing of text records (PR #137235)
https://github.com/redstar updated https://github.com/llvm/llvm-project/pull/137235 Rate limit ยท GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support โ https://githubstatus.com";>GitHub Status โ https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT][test] Fix callcont-fallthru.s after #129481 (PR #135867)
yota9 wrote: @paschalis-mpeis Could you please check if the binaries are identical and it is indeed nm problem? E.g. with objdump, is plt entry is there? Maybe there is problem related to the plt section type, e.g. one of the binaries has .plt.sec or .plt.got section and there is some kind of but in nm that not lists symbols from these sections. Then we can use custom linker script with .plt section only. My next suggestion would be just using llvm-nm here is to pass llvm-nm with --nmtool arg to link_fdata.py, since the [lit.cfg.py](https://github.com/llvm/llvm-project/blob/main/bolt/test/lit.cfg.py#L118) has it in the list of mandatory tools for bolt testing, so we won't have environment dependencies here. Maybe even add nmtool as an link_fdata_cmd arg in lit.cfg.py , so all test would use it by default... https://github.com/llvm/llvm-project/pull/135867 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AutoUpgrade][AMDGPU] Adjust AS7 address width to 48 bits (PR #137418)
arichardson wrote: @krzysz00 I tried setting the p8 index size to 0 and this almost works but there are a handful of tests that fail. The first one is: ``` define ptr addrspace(8) @v_ptrmask_buffer_resource_variable_i128_neg8(ptr addrspace(8) %ptr) { ; GCN-LABEL: v_ptrmask_buffer_resource_variable_i128_neg8: ; GCN: ; %bb.0: ; GCN-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; GCN-NEXT:v_and_b32_e32 v0, -8, v0 ; GCN-NEXT:s_setpc_b64 s[30:31] ; ; GFX10PLUS-LABEL: v_ptrmask_buffer_resource_variable_i128_neg8: ; GFX10PLUS: ; %bb.0: ; GFX10PLUS-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; GFX10PLUS-NEXT:v_and_b32_e32 v0, -8, v0 ; GFX10PLUS-NEXT:s_setpc_b64 s[30:31] %masked = call ptr addrspace(8) @llvm.ptrmask.p8.i128(ptr addrspace(8) %ptr, i128 -8) ret ptr addrspace(8) %masked } ``` This fails with `llvm.ptrmask intrinsic second argument bitwidth must match pointer index type size of first argument` How would you expect ptrmask to work here? I would imagine it should use the address width and not the index width since you have valid codegen for p8. I don't know enough about amdgpu assembly so I can't tell what is actually being done here. Another test that fails is llvm/test/CodeGen/AMDGPU/GlobalISel/unsupported-ptr-add.ll which now crashes with an assertion failure inside `StraightLineStrengthReduce` instead of failing with `unable to legalize instruction`. I imagine we could easily avoid the assertion here and handle that with a sensible error message in the IR verifier. llvm/test/CodeGen/AMDGPU/GlobalISel/unsupported-load.ll also asserts but should be easy to handle. https://github.com/llvm/llvm-project/pull/137418 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/20.x: [Clang][MicrosoftMangle] Implement mangling for ConstantMatrixType (#134930) (PR #138017)
github-actions[bot] wrote: โ ๏ธ We detected that you are using a GitHub private e-mail address to contribute to the repo. Please turn off [Keep my email addresses private](https://github.com/settings/emails) setting in your account. See [LLVM Discourse](https://discourse.llvm.org/t/hidden-emails-on-github-should-we-do-something-about-it) for more information. https://github.com/llvm/llvm-project/pull/138017 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add mandatory parameters for RootConstants (PR #138002)
https://github.com/inbelic created https://github.com/llvm/llvm-project/pull/138002 - defines the `parseRootConstantParams` function and adds handling for the mandatory arguments of `num32BitConstants` and `bReg` - adds corresponding unit tests Part two of implementing #126576 >From 15857bf8e1303e2325b48e417e7abd26aa77910e Mon Sep 17 00:00:00 2001 From: Finn Plummer Date: Wed, 30 Apr 2025 17:53:11 + Subject: [PATCH] [HLSL][RootSignature] Add mandatory parameters for RootConstants - defines the `parseRootConstantParams` function and adds handling for the mandatory arguments of `num32BitConstants` and `bReg` - adds corresponding unit tests Part two of implementing --- .../clang/Parse/ParseHLSLRootSignature.h | 10 ++- clang/lib/Parse/ParseHLSLRootSignature.cpp| 68 ++- .../Parse/ParseHLSLRootSignatureTest.cpp | 14 +++- .../llvm/Frontend/HLSL/HLSLRootSignature.h| 5 +- 4 files changed, 89 insertions(+), 8 deletions(-) diff --git a/clang/include/clang/Parse/ParseHLSLRootSignature.h b/clang/include/clang/Parse/ParseHLSLRootSignature.h index efa735ea03d94..0f05b05ed4df6 100644 --- a/clang/include/clang/Parse/ParseHLSLRootSignature.h +++ b/clang/include/clang/Parse/ParseHLSLRootSignature.h @@ -77,8 +77,14 @@ class RootSignatureParser { parseDescriptorTableClause(); /// Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any - /// order and only exactly once. `ParsedClauseParams` denotes the current - /// state of parsed params + /// order and only exactly once. The following methods define a + /// `Parsed.*Params` struct to denote the current state of parsed params + struct ParsedConstantParams { +std::optional Reg; +std::optional Num32BitConstants; + }; + std::optional parseRootConstantParams(); + struct ParsedClauseParams { std::optional Reg; std::optional NumDescriptors; diff --git a/clang/lib/Parse/ParseHLSLRootSignature.cpp b/clang/lib/Parse/ParseHLSLRootSignature.cpp index 48d3e38b0519d..2ce8e6e5cca98 100644 --- a/clang/lib/Parse/ParseHLSLRootSignature.cpp +++ b/clang/lib/Parse/ParseHLSLRootSignature.cpp @@ -57,6 +57,27 @@ std::optional RootSignatureParser::parseRootConstants() { RootConstants Constants; + auto Params = parseRootConstantParams(); + if (!Params.has_value()) +return std::nullopt; + + // Check mandatory parameters were provided + if (!Params->Num32BitConstants.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_missing_param) +<< TokenKind::kw_num32BitConstants; +return std::nullopt; + } + + Constants.Num32BitConstants = Params->Num32BitConstants.value(); + + if (!Params->Reg.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_missing_param) +<< TokenKind::bReg; +return std::nullopt; + } + + Constants.Reg = Params->Reg.value(); + if (consumeExpectedToken(TokenKind::pu_r_paren, diag::err_hlsl_unexpected_end_of_params, /*param of=*/TokenKind::kw_RootConstants)) @@ -187,14 +208,55 @@ RootSignatureParser::parseDescriptorTableClause() { return Clause; } +// Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any +// order and only exactly once. The following methods will parse through as +// many arguments as possible reporting an error if a duplicate is seen. +std::optional +RootSignatureParser::parseRootConstantParams() { + assert(CurToken.TokKind == TokenKind::pu_l_paren && + "Expects to only be invoked starting at given token"); + + ParsedConstantParams Params; + do { +// `num32BitConstants` `=` POS_INT +if (tryConsumeExpectedToken(TokenKind::kw_num32BitConstants)) { + if (Params.Num32BitConstants.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param) +<< CurToken.TokKind; +return std::nullopt; + } + + if (consumeExpectedToken(TokenKind::pu_equal)) +return std::nullopt; + + auto Num32BitConstants = parseUIntParam(); + if (!Num32BitConstants.has_value()) +return std::nullopt; + Params.Num32BitConstants = Num32BitConstants; +} + +// `b` POS_INT +if (tryConsumeExpectedToken(TokenKind::bReg)) { + if (Params.Reg.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param) +<< CurToken.TokKind; +return std::nullopt; + } + auto Reg = parseRegister(); + if (!Reg.has_value()) +return std::nullopt; + Params.Reg = Reg; +} + } while (tryConsumeExpectedToken(TokenKind::pu_comma)); + + return Params; +} + std::optional RootSignatureParser::parseDescriptorTableClauseParams(TokenKind RegType) { assert(CurToken.TokKind == TokenKind::pu_l_paren && "Expects to only be invoked starting at given token"); - // Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any - // order
[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add mandatory parameters for RootConstants (PR #138002)
llvmbot wrote: @llvm/pr-subscribers-hlsl Author: Finn Plummer (inbelic) Changes - defines the `parseRootConstantParams` function and adds handling for the mandatory arguments of `num32BitConstants` and `bReg` - adds corresponding unit tests Part two of implementing #126576 --- Full diff: https://github.com/llvm/llvm-project/pull/138002.diff 4 Files Affected: - (modified) clang/include/clang/Parse/ParseHLSLRootSignature.h (+8-2) - (modified) clang/lib/Parse/ParseHLSLRootSignature.cpp (+65-3) - (modified) clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp (+12-2) - (modified) llvm/include/llvm/Frontend/HLSL/HLSLRootSignature.h (+4-1) ``diff diff --git a/clang/include/clang/Parse/ParseHLSLRootSignature.h b/clang/include/clang/Parse/ParseHLSLRootSignature.h index efa735ea03d94..0f05b05ed4df6 100644 --- a/clang/include/clang/Parse/ParseHLSLRootSignature.h +++ b/clang/include/clang/Parse/ParseHLSLRootSignature.h @@ -77,8 +77,14 @@ class RootSignatureParser { parseDescriptorTableClause(); /// Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any - /// order and only exactly once. `ParsedClauseParams` denotes the current - /// state of parsed params + /// order and only exactly once. The following methods define a + /// `Parsed.*Params` struct to denote the current state of parsed params + struct ParsedConstantParams { +std::optional Reg; +std::optional Num32BitConstants; + }; + std::optional parseRootConstantParams(); + struct ParsedClauseParams { std::optional Reg; std::optional NumDescriptors; diff --git a/clang/lib/Parse/ParseHLSLRootSignature.cpp b/clang/lib/Parse/ParseHLSLRootSignature.cpp index 48d3e38b0519d..2ce8e6e5cca98 100644 --- a/clang/lib/Parse/ParseHLSLRootSignature.cpp +++ b/clang/lib/Parse/ParseHLSLRootSignature.cpp @@ -57,6 +57,27 @@ std::optional RootSignatureParser::parseRootConstants() { RootConstants Constants; + auto Params = parseRootConstantParams(); + if (!Params.has_value()) +return std::nullopt; + + // Check mandatory parameters were provided + if (!Params->Num32BitConstants.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_missing_param) +<< TokenKind::kw_num32BitConstants; +return std::nullopt; + } + + Constants.Num32BitConstants = Params->Num32BitConstants.value(); + + if (!Params->Reg.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_missing_param) +<< TokenKind::bReg; +return std::nullopt; + } + + Constants.Reg = Params->Reg.value(); + if (consumeExpectedToken(TokenKind::pu_r_paren, diag::err_hlsl_unexpected_end_of_params, /*param of=*/TokenKind::kw_RootConstants)) @@ -187,14 +208,55 @@ RootSignatureParser::parseDescriptorTableClause() { return Clause; } +// Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any +// order and only exactly once. The following methods will parse through as +// many arguments as possible reporting an error if a duplicate is seen. +std::optional +RootSignatureParser::parseRootConstantParams() { + assert(CurToken.TokKind == TokenKind::pu_l_paren && + "Expects to only be invoked starting at given token"); + + ParsedConstantParams Params; + do { +// `num32BitConstants` `=` POS_INT +if (tryConsumeExpectedToken(TokenKind::kw_num32BitConstants)) { + if (Params.Num32BitConstants.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param) +<< CurToken.TokKind; +return std::nullopt; + } + + if (consumeExpectedToken(TokenKind::pu_equal)) +return std::nullopt; + + auto Num32BitConstants = parseUIntParam(); + if (!Num32BitConstants.has_value()) +return std::nullopt; + Params.Num32BitConstants = Num32BitConstants; +} + +// `b` POS_INT +if (tryConsumeExpectedToken(TokenKind::bReg)) { + if (Params.Reg.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param) +<< CurToken.TokKind; +return std::nullopt; + } + auto Reg = parseRegister(); + if (!Reg.has_value()) +return std::nullopt; + Params.Reg = Reg; +} + } while (tryConsumeExpectedToken(TokenKind::pu_comma)); + + return Params; +} + std::optional RootSignatureParser::parseDescriptorTableClauseParams(TokenKind RegType) { assert(CurToken.TokKind == TokenKind::pu_l_paren && "Expects to only be invoked starting at given token"); - // Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any - // order and only exactly once. Parse through as many arguments as possible - // reporting an error if a duplicate is seen. ParsedClauseParams Params; do { // ( `b` | `t` | `u` | `s`) POS_INT diff --git a/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp b/clang/unittests/Parse/ParseHLSLRootSign
[llvm-branch-commits] [clang] release/20.x: [Clang][MicrosoftMangle] Implement mangling for ConstantMatrixType (#134930) (PR #138017)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/138017 Backport f5a30f111dc4ad6422863722eb708059a68a9d5c Requested by: @tstellar >From bc9b07f09ab1c728e4dcce614eb4ccfa2e5e52c7 Mon Sep 17 00:00:00 2001 From: Losy001 <64610343+losy...@users.noreply.github.com> Date: Sat, 26 Apr 2025 18:04:12 +0200 Subject: [PATCH] [Clang][MicrosoftMangle] Implement mangling for ConstantMatrixType (#134930) This pull request implements mangling for ConstantMatrixType, allowing matrices to be used on Windows. Related issues: #53158, #127127 This example code: ```cpp #include #include typedef float Matrix4 __attribute__((matrix_type(4, 4))); int main() { printf("%s\n", typeid(Matrix4).name()); } ``` Outputs this: ``` struct __clang::__matrix ``` (cherry picked from commit f5a30f111dc4ad6422863722eb708059a68a9d5c) --- clang/lib/AST/MicrosoftMangle.cpp | 17 ++- clang/test/CodeGenCXX/mangle-ms-matrix.cpp | 57 ++ 2 files changed, 73 insertions(+), 1 deletion(-) create mode 100644 clang/test/CodeGenCXX/mangle-ms-matrix.cpp diff --git a/clang/lib/AST/MicrosoftMangle.cpp b/clang/lib/AST/MicrosoftMangle.cpp index 42b735ccf4a2c..74c995f2f97f0 100644 --- a/clang/lib/AST/MicrosoftMangle.cpp +++ b/clang/lib/AST/MicrosoftMangle.cpp @@ -3552,7 +3552,22 @@ void MicrosoftCXXNameMangler::mangleType(const DependentSizedExtVectorType *T, void MicrosoftCXXNameMangler::mangleType(const ConstantMatrixType *T, Qualifiers quals, SourceRange Range) { - Error(Range.getBegin(), "matrix type") << Range; + QualType EltTy = T->getElementType(); + const BuiltinType *ET = EltTy->getAs(); + + llvm::SmallString<64> TemplateMangling; + llvm::raw_svector_ostream Stream(TemplateMangling); + MicrosoftCXXNameMangler Extra(Context, Stream); + + Stream << "?$"; + + Extra.mangleSourceName("__matrix"); + Extra.mangleType(EltTy, Range, QMM_Escape); + + Extra.mangleIntegerLiteral(llvm::APSInt::getUnsigned(T->getNumRows())); + Extra.mangleIntegerLiteral(llvm::APSInt::getUnsigned(T->getNumColumns())); + + mangleArtificialTagType(TagTypeKind::Struct, TemplateMangling, {"__clang"}); } void MicrosoftCXXNameMangler::mangleType(const DependentSizedMatrixType *T, diff --git a/clang/test/CodeGenCXX/mangle-ms-matrix.cpp b/clang/test/CodeGenCXX/mangle-ms-matrix.cpp new file mode 100644 index 0..b244aa6e33cfa --- /dev/null +++ b/clang/test/CodeGenCXX/mangle-ms-matrix.cpp @@ -0,0 +1,57 @@ +// RUN: %clang_cc1 -fenable-matrix -fms-extensions -fcxx-exceptions -ffreestanding -target-feature +avx -emit-llvm %s -o - -triple=i686-pc-win32 | FileCheck %s +// RUN: %clang_cc1 -fenable-matrix -fms-extensions -fcxx-exceptions -ffreestanding -target-feature +avx -emit-llvm %s -o - -triple=i686-pc-win32 -fexperimental-new-constant-interpreter | FileCheck %s + +typedef float __attribute__((matrix_type(4, 4))) m4x4f; +typedef float __attribute__((matrix_type(2, 2))) m2x2f; + +typedef int __attribute__((matrix_type(4, 4))) m4x4i; +typedef int __attribute__((matrix_type(2, 2))) m2x2i; + +void thow(int i) { + switch (i) { +case 0: throw m4x4f(); +// CHECK: ??_R0U?$__matrix@M$03$03@__clang@@@8 +// CHECK: _CT??_R0U?$__matrix@M$03$03@__clang@@@864 +// CHECK: _CTA1U?$__matrix@M$03$03@__clang@@ +// CHECK: _TI1U?$__matrix@M$03$03@__clang@@ +case 1: throw m2x2f(); +// CHECK: ??_R0U?$__matrix@M$01$01@__clang@@@8 +// CHECK: _CT??_R0U?$__matrix@M$01$01@__clang@@@816 +// CHECK: _CTA1U?$__matrix@M$01$01@__clang@@ +// CHECK: _TI1U?$__matrix@M$01$01@__clang@@ +case 2: throw m4x4i(); +// CHECK: ??_R0U?$__matrix@H$03$03@__clang@@@8 +// CHECK: _CT??_R0U?$__matrix@H$03$03@__clang@@@864 +// CHECK: _CTA1U?$__matrix@H$03$03@__clang@@ +// CHECK: _TI1U?$__matrix@H$03$03@__clang@@ +case 3: throw m2x2i(); +// CHECK: ??_R0U?$__matrix@H$01$01@__clang@@@8 +// CHECK: _CT??_R0U?$__matrix@H$01$01@__clang@@@816 +// CHECK: _CTA1U?$__matrix@H$01$01@__clang@@ +// CHECK: _TI1U?$__matrix@H$01$01@__clang@@ + } +} + +void foo44f(m4x4f) {} +// CHECK: define dso_local void @"?foo44f@@YAXU?$__matrix@M$03$03@__clang@@@Z" + +m4x4f rfoo44f() { return m4x4f(); } +// CHECK: define dso_local noundef <16 x float> @"?rfoo44f@@YAU?$__matrix@M$03$03@__clang@@XZ" + +void foo22f(m2x2f) {} +// CHECK: define dso_local void @"?foo22f@@YAXU?$__matrix@M$01$01@__clang@@@Z" + +m2x2f rfoo22f() { return m2x2f(); } +// CHECK: define dso_local noundef <4 x float> @"?rfoo22f@@YAU?$__matrix@M$01$01@__clang@@XZ" + +void foo44i(m4x4i) {} +// CHECK: define dso_local void @"?foo44i@@YAXU?$__matrix@H$03$03@__clang@@@Z" + +m4x4i rfoo44i() { return m4x4i(); } +// CHECK: define dso_local noundef <16 x i32> @"?rfoo44i@@YAU?$__matrix@H$03$03@__clang@@XZ" + +void foo22i(m2x2i) {} +// CHECK: define dso_local void @"?foo22i@@YAXU?$__matrix@H$01$01@__clang@@@Z" + +m2x2i rfoo22i() { return m2x2i(); } +// CHECK: define dso_local noundef <4 x
[llvm-branch-commits] [clang] release/20.x: [Clang][MicrosoftMangle] Implement mangling for ConstantMatrixType (#134930) (PR #138017)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/138017 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/20.x: [Clang][MicrosoftMangle] Implement mangling for ConstantMatrixType (#134930) (PR #138017)
llvmbot wrote: @rnk What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/138017 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [GOFF] Add writing of text records (PR #137235)
@@ -688,28 +689,32 @@ MCSectionGOFF *MCContext::getGOFFSection(SectionKind Kind, StringRef Name, return Iter->second; StringRef CachedName = StringRef(Iter->first.c_str(), Name.size()); - MCSectionGOFF *GOFFSection = new (GOFFAllocator.Allocate()) MCSectionGOFF( - CachedName, Kind, Attributes, static_cast(Parent)); + MCSectionGOFF *GOFFSection = new (GOFFAllocator.Allocate()) + MCSectionGOFF(CachedName, Kind, IsVirtual, Attributes, +static_cast(Parent)); Iter->second = GOFFSection; allocInitialFragment(*GOFFSection); return GOFFSection; } MCSectionGOFF *MCContext::getGOFFSection(SectionKind Kind, StringRef Name, GOFF::SDAttr SDAttributes) { - return getGOFFSection(Kind, Name, SDAttributes, nullptr); + return getGOFFSection(Kind, Name, SDAttributes, nullptr, + /*IsVirtual=*/true); } MCSectionGOFF *MCContext::getGOFFSection(SectionKind Kind, StringRef Name, GOFF::EDAttr EDAttributes, - MCSection *Parent) { - return getGOFFSection(Kind, Name, EDAttributes, Parent); + MCSection *Parent, bool IsVirtual) { + return getGOFFSection(Kind, Name, EDAttributes, Parent, + IsVirtual); uweigand wrote: Can we deduce the `IsVirtual` property from `EDAttributes.BindAlgorithm == GOFF::ESD_BA_Merge` here? Having to explicitly specific it with every caller seems error-prone. https://github.com/llvm/llvm-project/pull/137235 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][llvm] Codegen for 16/32/64-bit floating-point atomicrmw fminimum/faximum (PR #137703)
https://github.com/jthackray updated https://github.com/llvm/llvm-project/pull/137703 >From c47eaad006a6310f667ac5e12589935b5e62a322 Mon Sep 17 00:00:00 2001 From: Jonathan Thackray Date: Wed, 23 Apr 2025 21:10:31 + Subject: [PATCH] [AArch64][llvm] Codegen for 16/32/64-bit floating-point atomicrmw fminimum/fmaximum Codegen for AArch64 16/32/64-bit floating-point atomic read-modify-write operations (`atomicrmw {fmaximum,fminimum}`) using LD{B}FMAX and LD{B}FMIN atomic instructions. --- .../Target/AArch64/AArch64ISelLowering.cpp| 14 +- .../lib/Target/AArch64/AArch64InstrAtomics.td | 7 + .../AArch64/Atomics/aarch64-atomicrmw-lsfe.ll | 686 +++-- .../Atomics/aarch64_be-atomicrmw-lsfe.ll | 728 +++--- 4 files changed, 244 insertions(+), 1191 deletions(-) diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp index 63924dc1b30ea..0d3cd575a517e 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp @@ -1001,6 +1001,16 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM, setOperationAction(ISD::ATOMIC_LOAD_FMIN, MVT::f32, LibCall); setOperationAction(ISD::ATOMIC_LOAD_FMIN, MVT::f64, LibCall); setOperationAction(ISD::ATOMIC_LOAD_FMIN, MVT::bf16, LibCall); + +setOperationAction(ISD::ATOMIC_LOAD_FMAXIMUM, MVT::f16, LibCall); +setOperationAction(ISD::ATOMIC_LOAD_FMAXIMUM, MVT::f32, LibCall); +setOperationAction(ISD::ATOMIC_LOAD_FMAXIMUM, MVT::f64, LibCall); +setOperationAction(ISD::ATOMIC_LOAD_FMAXIMUM, MVT::bf16, LibCall); + +setOperationAction(ISD::ATOMIC_LOAD_FMINIMUM, MVT::f16, LibCall); +setOperationAction(ISD::ATOMIC_LOAD_FMINIMUM, MVT::f32, LibCall); +setOperationAction(ISD::ATOMIC_LOAD_FMINIMUM, MVT::f64, LibCall); +setOperationAction(ISD::ATOMIC_LOAD_FMINIMUM, MVT::bf16, LibCall); } if (Subtarget->hasLSE128()) { @@ -27991,7 +28001,9 @@ AArch64TargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const { // If LSFE available, use atomic FP instructions in preference to expansion if (Subtarget->hasLSFE() && (AI->getOperation() == AtomicRMWInst::FAdd || AI->getOperation() == AtomicRMWInst::FMax || - AI->getOperation() == AtomicRMWInst::FMin)) + AI->getOperation() == AtomicRMWInst::FMin || + AI->getOperation() == AtomicRMWInst::FMaximum || + AI->getOperation() == AtomicRMWInst::FMinimum)) return AtomicExpansionKind::None; // Nand is not supported in LSE. diff --git a/llvm/lib/Target/AArch64/AArch64InstrAtomics.td b/llvm/lib/Target/AArch64/AArch64InstrAtomics.td index f3734e05ae667..31fcd63b9f2c8 100644 --- a/llvm/lib/Target/AArch64/AArch64InstrAtomics.td +++ b/llvm/lib/Target/AArch64/AArch64InstrAtomics.td @@ -551,14 +551,21 @@ defm atomic_load_fadd : binary_atomic_op_fp; defm atomic_load_fmin : binary_atomic_op_fp; defm atomic_load_fmax : binary_atomic_op_fp; +defm atomic_load_fminimum : binary_atomic_op_fp; +defm atomic_load_fmaximum : binary_atomic_op_fp; + let Predicates = [HasLSFE] in { defm : LDFPOPregister_patterns<"LDFADD", "atomic_load_fadd">; defm : LDFPOPregister_patterns<"LDFMAXNM", "atomic_load_fmax">; defm : LDFPOPregister_patterns<"LDFMINNM", "atomic_load_fmin">; + defm : LDFPOPregister_patterns<"LDFMAX", "atomic_load_fmaximum">; + defm : LDFPOPregister_patterns<"LDFMIN", "atomic_load_fminimum">; defm : LDBFPOPregister_patterns<"LDBFADD", "atomic_load_fadd">; defm : LDBFPOPregister_patterns<"LDBFMAXNM", "atomic_load_fmax">; defm : LDBFPOPregister_patterns<"LDBFMINNM", "atomic_load_fmin">; + defm : LDBFPOPregister_patterns<"LDBFMAX", "atomic_load_fmaximum">; + defm : LDBFPOPregister_patterns<"LDBFMIN", "atomic_load_fminimum">; } // v8.9a/v9.4a FEAT_LRCPC patterns diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomicrmw-lsfe.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomicrmw-lsfe.ll index 7ee2a0bb19c0e..d5ead2bff2f18 100644 --- a/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomicrmw-lsfe.ll +++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomicrmw-lsfe.ll @@ -1600,428 +1600,197 @@ define dso_local double @atomicrmw_fmin_double_unaligned_seq_cst(ptr %ptr, doubl } define dso_local half @atomicrmw_fmaximum_half_aligned_monotonic(ptr %ptr, half %value) { -; -O0-LABEL: atomicrmw_fmaximum_half_aligned_monotonic: -; -O0:ldaxrh w0, [x9] -; -O0:cmp w0, w10, uxth -; -O0:stlxrh w8, w11, [x9] -; -O0:subs w8, w8, w0, uxth -; -; -O1-LABEL: atomicrmw_fmaximum_half_aligned_monotonic: -; -O1:ldxrh w8, [x0] -; -O1:stxrh w9, w8, [x0] +; CHECK-LABEL: atomicrmw_fmaximum_half_aligned_monotonic: +; CHECK:ldfmax h0, h0, [x0] %r = atomicrmw fmaximum ptr %ptr, half %value monotonic, align 2 ret half %r }
[llvm-branch-commits] [llvm] [AArch64][llvm] Pre-commit tests for #137703 (NFC) (PR #137702)
https://github.com/jthackray updated https://github.com/llvm/llvm-project/pull/137702 Rate limit ยท GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support โ https://githubstatus.com";>GitHub Status โ https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] release/20.x: [wasm-ld] Refactor WasmSym from static globals to per-link context (#134970) (PR #137620)
sbc100 wrote: > cc @tstellar > > Not sure you saw my tag above but as I mentioned here [#137620 > (comment)](https://github.com/llvm/llvm-project/pull/137620#issuecomment-2834979164) > I had to manually backport this due to some merge conflicts. This was > critical for downstream projects and I was hoping it would be picked up in > the 20.1.4 release ๐ฌ > > EDIT: I relized 20.1.4 is out, so probably you could help me move this a part > of 20.1.5 ? To be fair we actually track the release/20.x branch, so as soon > as this PR goes in, we can put it to use. Thanks ! Do you not build llvm from source in your project? Can't you therefore build from tip-of-tree? https://github.com/llvm/llvm-project/pull/137620 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add mandatory parameters for RootConstants (PR #138002)
llvmbot wrote: @llvm/pr-subscribers-clang Author: Finn Plummer (inbelic) Changes - defines the `parseRootConstantParams` function and adds handling for the mandatory arguments of `num32BitConstants` and `bReg` - adds corresponding unit tests Part two of implementing #126576 --- Full diff: https://github.com/llvm/llvm-project/pull/138002.diff 4 Files Affected: - (modified) clang/include/clang/Parse/ParseHLSLRootSignature.h (+8-2) - (modified) clang/lib/Parse/ParseHLSLRootSignature.cpp (+65-3) - (modified) clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp (+12-2) - (modified) llvm/include/llvm/Frontend/HLSL/HLSLRootSignature.h (+4-1) ``diff diff --git a/clang/include/clang/Parse/ParseHLSLRootSignature.h b/clang/include/clang/Parse/ParseHLSLRootSignature.h index efa735ea03d94..0f05b05ed4df6 100644 --- a/clang/include/clang/Parse/ParseHLSLRootSignature.h +++ b/clang/include/clang/Parse/ParseHLSLRootSignature.h @@ -77,8 +77,14 @@ class RootSignatureParser { parseDescriptorTableClause(); /// Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any - /// order and only exactly once. `ParsedClauseParams` denotes the current - /// state of parsed params + /// order and only exactly once. The following methods define a + /// `Parsed.*Params` struct to denote the current state of parsed params + struct ParsedConstantParams { +std::optional Reg; +std::optional Num32BitConstants; + }; + std::optional parseRootConstantParams(); + struct ParsedClauseParams { std::optional Reg; std::optional NumDescriptors; diff --git a/clang/lib/Parse/ParseHLSLRootSignature.cpp b/clang/lib/Parse/ParseHLSLRootSignature.cpp index 48d3e38b0519d..2ce8e6e5cca98 100644 --- a/clang/lib/Parse/ParseHLSLRootSignature.cpp +++ b/clang/lib/Parse/ParseHLSLRootSignature.cpp @@ -57,6 +57,27 @@ std::optional RootSignatureParser::parseRootConstants() { RootConstants Constants; + auto Params = parseRootConstantParams(); + if (!Params.has_value()) +return std::nullopt; + + // Check mandatory parameters were provided + if (!Params->Num32BitConstants.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_missing_param) +<< TokenKind::kw_num32BitConstants; +return std::nullopt; + } + + Constants.Num32BitConstants = Params->Num32BitConstants.value(); + + if (!Params->Reg.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_missing_param) +<< TokenKind::bReg; +return std::nullopt; + } + + Constants.Reg = Params->Reg.value(); + if (consumeExpectedToken(TokenKind::pu_r_paren, diag::err_hlsl_unexpected_end_of_params, /*param of=*/TokenKind::kw_RootConstants)) @@ -187,14 +208,55 @@ RootSignatureParser::parseDescriptorTableClause() { return Clause; } +// Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any +// order and only exactly once. The following methods will parse through as +// many arguments as possible reporting an error if a duplicate is seen. +std::optional +RootSignatureParser::parseRootConstantParams() { + assert(CurToken.TokKind == TokenKind::pu_l_paren && + "Expects to only be invoked starting at given token"); + + ParsedConstantParams Params; + do { +// `num32BitConstants` `=` POS_INT +if (tryConsumeExpectedToken(TokenKind::kw_num32BitConstants)) { + if (Params.Num32BitConstants.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param) +<< CurToken.TokKind; +return std::nullopt; + } + + if (consumeExpectedToken(TokenKind::pu_equal)) +return std::nullopt; + + auto Num32BitConstants = parseUIntParam(); + if (!Num32BitConstants.has_value()) +return std::nullopt; + Params.Num32BitConstants = Num32BitConstants; +} + +// `b` POS_INT +if (tryConsumeExpectedToken(TokenKind::bReg)) { + if (Params.Reg.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param) +<< CurToken.TokKind; +return std::nullopt; + } + auto Reg = parseRegister(); + if (!Reg.has_value()) +return std::nullopt; + Params.Reg = Reg; +} + } while (tryConsumeExpectedToken(TokenKind::pu_comma)); + + return Params; +} + std::optional RootSignatureParser::parseDescriptorTableClauseParams(TokenKind RegType) { assert(CurToken.TokKind == TokenKind::pu_l_paren && "Expects to only be invoked starting at given token"); - // Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any - // order and only exactly once. Parse through as many arguments as possible - // reporting an error if a duplicate is seen. ParsedClauseParams Params; do { // ( `b` | `t` | `u` | `s`) POS_INT diff --git a/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp b/clang/unittests/Parse/ParseHLSLRootSig
[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add mandatory parameters for RootConstants (PR #138002)
https://github.com/inbelic updated https://github.com/llvm/llvm-project/pull/138002 >From 15857bf8e1303e2325b48e417e7abd26aa77910e Mon Sep 17 00:00:00 2001 From: Finn Plummer Date: Wed, 30 Apr 2025 17:53:11 + Subject: [PATCH] [HLSL][RootSignature] Add mandatory parameters for RootConstants - defines the `parseRootConstantParams` function and adds handling for the mandatory arguments of `num32BitConstants` and `bReg` - adds corresponding unit tests Part two of implementing --- .../clang/Parse/ParseHLSLRootSignature.h | 10 ++- clang/lib/Parse/ParseHLSLRootSignature.cpp| 68 ++- .../Parse/ParseHLSLRootSignatureTest.cpp | 14 +++- .../llvm/Frontend/HLSL/HLSLRootSignature.h| 5 +- 4 files changed, 89 insertions(+), 8 deletions(-) diff --git a/clang/include/clang/Parse/ParseHLSLRootSignature.h b/clang/include/clang/Parse/ParseHLSLRootSignature.h index efa735ea03d94..0f05b05ed4df6 100644 --- a/clang/include/clang/Parse/ParseHLSLRootSignature.h +++ b/clang/include/clang/Parse/ParseHLSLRootSignature.h @@ -77,8 +77,14 @@ class RootSignatureParser { parseDescriptorTableClause(); /// Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any - /// order and only exactly once. `ParsedClauseParams` denotes the current - /// state of parsed params + /// order and only exactly once. The following methods define a + /// `Parsed.*Params` struct to denote the current state of parsed params + struct ParsedConstantParams { +std::optional Reg; +std::optional Num32BitConstants; + }; + std::optional parseRootConstantParams(); + struct ParsedClauseParams { std::optional Reg; std::optional NumDescriptors; diff --git a/clang/lib/Parse/ParseHLSLRootSignature.cpp b/clang/lib/Parse/ParseHLSLRootSignature.cpp index 48d3e38b0519d..2ce8e6e5cca98 100644 --- a/clang/lib/Parse/ParseHLSLRootSignature.cpp +++ b/clang/lib/Parse/ParseHLSLRootSignature.cpp @@ -57,6 +57,27 @@ std::optional RootSignatureParser::parseRootConstants() { RootConstants Constants; + auto Params = parseRootConstantParams(); + if (!Params.has_value()) +return std::nullopt; + + // Check mandatory parameters were provided + if (!Params->Num32BitConstants.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_missing_param) +<< TokenKind::kw_num32BitConstants; +return std::nullopt; + } + + Constants.Num32BitConstants = Params->Num32BitConstants.value(); + + if (!Params->Reg.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_missing_param) +<< TokenKind::bReg; +return std::nullopt; + } + + Constants.Reg = Params->Reg.value(); + if (consumeExpectedToken(TokenKind::pu_r_paren, diag::err_hlsl_unexpected_end_of_params, /*param of=*/TokenKind::kw_RootConstants)) @@ -187,14 +208,55 @@ RootSignatureParser::parseDescriptorTableClause() { return Clause; } +// Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any +// order and only exactly once. The following methods will parse through as +// many arguments as possible reporting an error if a duplicate is seen. +std::optional +RootSignatureParser::parseRootConstantParams() { + assert(CurToken.TokKind == TokenKind::pu_l_paren && + "Expects to only be invoked starting at given token"); + + ParsedConstantParams Params; + do { +// `num32BitConstants` `=` POS_INT +if (tryConsumeExpectedToken(TokenKind::kw_num32BitConstants)) { + if (Params.Num32BitConstants.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param) +<< CurToken.TokKind; +return std::nullopt; + } + + if (consumeExpectedToken(TokenKind::pu_equal)) +return std::nullopt; + + auto Num32BitConstants = parseUIntParam(); + if (!Num32BitConstants.has_value()) +return std::nullopt; + Params.Num32BitConstants = Num32BitConstants; +} + +// `b` POS_INT +if (tryConsumeExpectedToken(TokenKind::bReg)) { + if (Params.Reg.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param) +<< CurToken.TokKind; +return std::nullopt; + } + auto Reg = parseRegister(); + if (!Reg.has_value()) +return std::nullopt; + Params.Reg = Reg; +} + } while (tryConsumeExpectedToken(TokenKind::pu_comma)); + + return Params; +} + std::optional RootSignatureParser::parseDescriptorTableClauseParams(TokenKind RegType) { assert(CurToken.TokKind == TokenKind::pu_l_paren && "Expects to only be invoked starting at given token"); - // Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any - // order and only exactly once. Parse through as many arguments as possible - // reporting an error if a duplicate is seen. ParsedClauseParams Params; do { // ( `b` | `t` | `u` | `s`) POS_INT dif
[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add optional parameter for RootConstants (PR #138007)
https://github.com/inbelic created https://github.com/llvm/llvm-project/pull/138007 - extends `parseRootConstantParams` and the struct to include the optional parameters of a RootConstant - adds corresponding unit tests Part three of and resolves https://github.com/llvm/llvm-project/issues/126576 >From f8011807d233d1d0d1f9d4ce08bad0c559cde74a Mon Sep 17 00:00:00 2001 From: Finn Plummer Date: Wed, 30 Apr 2025 18:08:31 + Subject: [PATCH] [HLSL][RootSignature] Add optional parameter for RootConstants - extends `parseRootConstantParams` and the struct to include the optional parameters of a RootConstant - adds corresponding unit tests --- .../clang/Parse/ParseHLSLRootSignature.h | 2 + clang/lib/Parse/ParseHLSLRootSignature.cpp| 41 +++ .../Parse/ParseHLSLRootSignatureTest.cpp | 8 +++- .../llvm/Frontend/HLSL/HLSLRootSignature.h| 2 + 4 files changed, 52 insertions(+), 1 deletion(-) diff --git a/clang/include/clang/Parse/ParseHLSLRootSignature.h b/clang/include/clang/Parse/ParseHLSLRootSignature.h index 0f05b05ed4df6..2ac2083983741 100644 --- a/clang/include/clang/Parse/ParseHLSLRootSignature.h +++ b/clang/include/clang/Parse/ParseHLSLRootSignature.h @@ -82,6 +82,8 @@ class RootSignatureParser { struct ParsedConstantParams { std::optional Reg; std::optional Num32BitConstants; +std::optional Space; +std::optional Visibility; }; std::optional parseRootConstantParams(); diff --git a/clang/lib/Parse/ParseHLSLRootSignature.cpp b/clang/lib/Parse/ParseHLSLRootSignature.cpp index 2ce8e6e5cca98..a5006b77a6e44 100644 --- a/clang/lib/Parse/ParseHLSLRootSignature.cpp +++ b/clang/lib/Parse/ParseHLSLRootSignature.cpp @@ -78,6 +78,13 @@ std::optional RootSignatureParser::parseRootConstants() { Constants.Reg = Params->Reg.value(); + // Fill in optional parameters + if (Params->Visibility.has_value()) +Constants.Visibility = Params->Visibility.value(); + + if (Params->Space.has_value()) +Constants.Space = Params->Space.value(); + if (consumeExpectedToken(TokenKind::pu_r_paren, diag::err_hlsl_unexpected_end_of_params, /*param of=*/TokenKind::kw_RootConstants)) @@ -247,6 +254,40 @@ RootSignatureParser::parseRootConstantParams() { return std::nullopt; Params.Reg = Reg; } + +// `space` `=` POS_INT +if (tryConsumeExpectedToken(TokenKind::kw_space)) { + if (Params.Space.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param) +<< CurToken.TokKind; +return std::nullopt; + } + + if (consumeExpectedToken(TokenKind::pu_equal)) +return std::nullopt; + + auto Space = parseUIntParam(); + if (!Space.has_value()) +return std::nullopt; + Params.Space = Space; +} + +// `visibility` `=` SHADER_VISIBILITY +if (tryConsumeExpectedToken(TokenKind::kw_visibility)) { + if (Params.Visibility.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param) +<< CurToken.TokKind; +return std::nullopt; + } + + if (consumeExpectedToken(TokenKind::pu_equal)) +return std::nullopt; + + auto Visibility = parseShaderVisibility(); + if (!Visibility.has_value()) +return std::nullopt; + Params.Visibility = Visibility; +} } while (tryConsumeExpectedToken(TokenKind::pu_comma)); return Params; diff --git a/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp b/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp index 336868b579866..150eb3e6e54ef 100644 --- a/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp +++ b/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp @@ -255,7 +255,9 @@ TEST_F(ParseHLSLRootSignatureTest, ValidSamplerFlagsTest) { TEST_F(ParseHLSLRootSignatureTest, ValidParseRootConsantsTest) { const llvm::StringLiteral Source = R"cc( RootConstants(num32BitConstants = 1, b0), -RootConstants(b42, num32BitConstants = 4294967295) +RootConstants(b42, space = 3, num32BitConstants = 4294967295, + visibility = SHADER_VISIBILITY_HULL +) )cc"; TrivialModuleLoader ModLoader; @@ -278,12 +280,16 @@ TEST_F(ParseHLSLRootSignatureTest, ValidParseRootConsantsTest) { ASSERT_EQ(std::get(Elem).Num32BitConstants, 1u); ASSERT_EQ(std::get(Elem).Reg.ViewType, RegisterType::BReg); ASSERT_EQ(std::get(Elem).Reg.Number, 0u); + ASSERT_EQ(std::get(Elem).Space, 0u); + ASSERT_EQ(std::get(Elem).Visibility, ShaderVisibility::All); Elem = Elements[1]; ASSERT_TRUE(std::holds_alternative(Elem)); ASSERT_EQ(std::get(Elem).Num32BitConstants, 4294967295u); ASSERT_EQ(std::get(Elem).Reg.ViewType, RegisterType::BReg); ASSERT_EQ(std::get(Elem).Reg.Number, 42u); + ASSERT_EQ(std::get(Elem).Space, 3u); + ASSERT_EQ(std::get(Elem).Visibility, ShaderVisibility::Hull); ASSERT_TRUE(Consumer->isSatisfied()); } diff --git
[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add optional parameter for RootConstants (PR #138007)
llvmbot wrote: @llvm/pr-subscribers-hlsl @llvm/pr-subscribers-clang Author: Finn Plummer (inbelic) Changes - extends `parseRootConstantParams` and the struct to include the optional parameters of a RootConstant - adds corresponding unit tests Part three of and resolves https://github.com/llvm/llvm-project/issues/126576 --- Full diff: https://github.com/llvm/llvm-project/pull/138007.diff 4 Files Affected: - (modified) clang/include/clang/Parse/ParseHLSLRootSignature.h (+2) - (modified) clang/lib/Parse/ParseHLSLRootSignature.cpp (+41) - (modified) clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp (+7-1) - (modified) llvm/include/llvm/Frontend/HLSL/HLSLRootSignature.h (+2) ``diff diff --git a/clang/include/clang/Parse/ParseHLSLRootSignature.h b/clang/include/clang/Parse/ParseHLSLRootSignature.h index 0f05b05ed4df6..2ac2083983741 100644 --- a/clang/include/clang/Parse/ParseHLSLRootSignature.h +++ b/clang/include/clang/Parse/ParseHLSLRootSignature.h @@ -82,6 +82,8 @@ class RootSignatureParser { struct ParsedConstantParams { std::optional Reg; std::optional Num32BitConstants; +std::optional Space; +std::optional Visibility; }; std::optional parseRootConstantParams(); diff --git a/clang/lib/Parse/ParseHLSLRootSignature.cpp b/clang/lib/Parse/ParseHLSLRootSignature.cpp index 2ce8e6e5cca98..a5006b77a6e44 100644 --- a/clang/lib/Parse/ParseHLSLRootSignature.cpp +++ b/clang/lib/Parse/ParseHLSLRootSignature.cpp @@ -78,6 +78,13 @@ std::optional RootSignatureParser::parseRootConstants() { Constants.Reg = Params->Reg.value(); + // Fill in optional parameters + if (Params->Visibility.has_value()) +Constants.Visibility = Params->Visibility.value(); + + if (Params->Space.has_value()) +Constants.Space = Params->Space.value(); + if (consumeExpectedToken(TokenKind::pu_r_paren, diag::err_hlsl_unexpected_end_of_params, /*param of=*/TokenKind::kw_RootConstants)) @@ -247,6 +254,40 @@ RootSignatureParser::parseRootConstantParams() { return std::nullopt; Params.Reg = Reg; } + +// `space` `=` POS_INT +if (tryConsumeExpectedToken(TokenKind::kw_space)) { + if (Params.Space.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param) +<< CurToken.TokKind; +return std::nullopt; + } + + if (consumeExpectedToken(TokenKind::pu_equal)) +return std::nullopt; + + auto Space = parseUIntParam(); + if (!Space.has_value()) +return std::nullopt; + Params.Space = Space; +} + +// `visibility` `=` SHADER_VISIBILITY +if (tryConsumeExpectedToken(TokenKind::kw_visibility)) { + if (Params.Visibility.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param) +<< CurToken.TokKind; +return std::nullopt; + } + + if (consumeExpectedToken(TokenKind::pu_equal)) +return std::nullopt; + + auto Visibility = parseShaderVisibility(); + if (!Visibility.has_value()) +return std::nullopt; + Params.Visibility = Visibility; +} } while (tryConsumeExpectedToken(TokenKind::pu_comma)); return Params; diff --git a/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp b/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp index 336868b579866..150eb3e6e54ef 100644 --- a/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp +++ b/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp @@ -255,7 +255,9 @@ TEST_F(ParseHLSLRootSignatureTest, ValidSamplerFlagsTest) { TEST_F(ParseHLSLRootSignatureTest, ValidParseRootConsantsTest) { const llvm::StringLiteral Source = R"cc( RootConstants(num32BitConstants = 1, b0), -RootConstants(b42, num32BitConstants = 4294967295) +RootConstants(b42, space = 3, num32BitConstants = 4294967295, + visibility = SHADER_VISIBILITY_HULL +) )cc"; TrivialModuleLoader ModLoader; @@ -278,12 +280,16 @@ TEST_F(ParseHLSLRootSignatureTest, ValidParseRootConsantsTest) { ASSERT_EQ(std::get(Elem).Num32BitConstants, 1u); ASSERT_EQ(std::get(Elem).Reg.ViewType, RegisterType::BReg); ASSERT_EQ(std::get(Elem).Reg.Number, 0u); + ASSERT_EQ(std::get(Elem).Space, 0u); + ASSERT_EQ(std::get(Elem).Visibility, ShaderVisibility::All); Elem = Elements[1]; ASSERT_TRUE(std::holds_alternative(Elem)); ASSERT_EQ(std::get(Elem).Num32BitConstants, 4294967295u); ASSERT_EQ(std::get(Elem).Reg.ViewType, RegisterType::BReg); ASSERT_EQ(std::get(Elem).Reg.Number, 42u); + ASSERT_EQ(std::get(Elem).Space, 3u); + ASSERT_EQ(std::get(Elem).Visibility, ShaderVisibility::Hull); ASSERT_TRUE(Consumer->isSatisfied()); } diff --git a/llvm/include/llvm/Frontend/HLSL/HLSLRootSignature.h b/llvm/include/llvm/Frontend/HLSL/HLSLRootSignature.h index a3f98a9f1944f..8b8324df18bb3 100644 --- a/llvm/include/llvm/Frontend/HLSL/HLSLRootSignature.h +++ b/llvm/include
[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add optional parameters for RootConstants (PR #138007)
https://github.com/inbelic edited https://github.com/llvm/llvm-project/pull/138007 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AutoUpgrade][AMDGPU] Adjust AS7 address width to 48 bits (PR #137418)
arichardson wrote: > So p9 is quite weird. It's used, IIRC, by, > https://github.com/GPUOpen-Drivers/llpc internally to represent "structured" > pointers, and so its interpretation is `{p8 rsrc, i32 index, i32 offset}` . > This does still have a 48-bit effective address by way of a thoroughly weird > ptrtoaddr function that's, in it's full generality, a function of the > resource flags (see swizzles) > > As to ptrmask on p8 ... if ptrmask is only supposed to affect the Index bits, > then I probably shouldn't be emitting it during the p7 => p8 process and only > applying a mask to the offset. But that hits the same weirdness that masking > for alignment doesn't work as expected if the base pointer isn't quite > aligned enough. > > (I can't think of many instances where that base pointer will, for example, > be odd, but I'm not aware of anything saying it's not allowed. Though I'd > personally be OK with documenting around that) > > I think most codegen - including down in LLPC - assumes the base is at least > `align 4`, which is the largest value where you sometimes need to change > behavior, so you can quietly figure that the alignment of the offset is > _basically_ the alignment of the pointer > Thanks for all the background - how do you feel about pretending that p8 has a 48-bit index width until we figure out how to do "no indexing allowed" correctly? > Actually, more general question on this address with stuff: x86 thread local > storage. That's a form of pointer with an implicit base that can mess up your > alignment checks - how's that work with ptrtoaddr? Agreed, segmentation is awkward and I think we just have to assume that the segments are sufficiently aligned. I think this is somewhat outside of the LLVM abstract machine similar to page based memory. If we perform alignment checks on virtual addresses beyond 12/16/etc bits, there is no guarantee the final address will be sufficiently aligned but I view that as being outside of the LLVM semantics. I think we just have to pretend that virtual addresses or segment-relative addresses are the only thing that matters. https://github.com/llvm/llvm-project/pull/137418 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add mandatory parameters for RootConstants (PR #138002)
https://github.com/inbelic updated https://github.com/llvm/llvm-project/pull/138002 >From 15857bf8e1303e2325b48e417e7abd26aa77910e Mon Sep 17 00:00:00 2001 From: Finn Plummer Date: Wed, 30 Apr 2025 17:53:11 + Subject: [PATCH 1/2] [HLSL][RootSignature] Add mandatory parameters for RootConstants - defines the `parseRootConstantParams` function and adds handling for the mandatory arguments of `num32BitConstants` and `bReg` - adds corresponding unit tests Part two of implementing --- .../clang/Parse/ParseHLSLRootSignature.h | 10 ++- clang/lib/Parse/ParseHLSLRootSignature.cpp| 68 ++- .../Parse/ParseHLSLRootSignatureTest.cpp | 14 +++- .../llvm/Frontend/HLSL/HLSLRootSignature.h| 5 +- 4 files changed, 89 insertions(+), 8 deletions(-) diff --git a/clang/include/clang/Parse/ParseHLSLRootSignature.h b/clang/include/clang/Parse/ParseHLSLRootSignature.h index efa735ea03d94..0f05b05ed4df6 100644 --- a/clang/include/clang/Parse/ParseHLSLRootSignature.h +++ b/clang/include/clang/Parse/ParseHLSLRootSignature.h @@ -77,8 +77,14 @@ class RootSignatureParser { parseDescriptorTableClause(); /// Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any - /// order and only exactly once. `ParsedClauseParams` denotes the current - /// state of parsed params + /// order and only exactly once. The following methods define a + /// `Parsed.*Params` struct to denote the current state of parsed params + struct ParsedConstantParams { +std::optional Reg; +std::optional Num32BitConstants; + }; + std::optional parseRootConstantParams(); + struct ParsedClauseParams { std::optional Reg; std::optional NumDescriptors; diff --git a/clang/lib/Parse/ParseHLSLRootSignature.cpp b/clang/lib/Parse/ParseHLSLRootSignature.cpp index 48d3e38b0519d..2ce8e6e5cca98 100644 --- a/clang/lib/Parse/ParseHLSLRootSignature.cpp +++ b/clang/lib/Parse/ParseHLSLRootSignature.cpp @@ -57,6 +57,27 @@ std::optional RootSignatureParser::parseRootConstants() { RootConstants Constants; + auto Params = parseRootConstantParams(); + if (!Params.has_value()) +return std::nullopt; + + // Check mandatory parameters were provided + if (!Params->Num32BitConstants.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_missing_param) +<< TokenKind::kw_num32BitConstants; +return std::nullopt; + } + + Constants.Num32BitConstants = Params->Num32BitConstants.value(); + + if (!Params->Reg.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_missing_param) +<< TokenKind::bReg; +return std::nullopt; + } + + Constants.Reg = Params->Reg.value(); + if (consumeExpectedToken(TokenKind::pu_r_paren, diag::err_hlsl_unexpected_end_of_params, /*param of=*/TokenKind::kw_RootConstants)) @@ -187,14 +208,55 @@ RootSignatureParser::parseDescriptorTableClause() { return Clause; } +// Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any +// order and only exactly once. The following methods will parse through as +// many arguments as possible reporting an error if a duplicate is seen. +std::optional +RootSignatureParser::parseRootConstantParams() { + assert(CurToken.TokKind == TokenKind::pu_l_paren && + "Expects to only be invoked starting at given token"); + + ParsedConstantParams Params; + do { +// `num32BitConstants` `=` POS_INT +if (tryConsumeExpectedToken(TokenKind::kw_num32BitConstants)) { + if (Params.Num32BitConstants.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param) +<< CurToken.TokKind; +return std::nullopt; + } + + if (consumeExpectedToken(TokenKind::pu_equal)) +return std::nullopt; + + auto Num32BitConstants = parseUIntParam(); + if (!Num32BitConstants.has_value()) +return std::nullopt; + Params.Num32BitConstants = Num32BitConstants; +} + +// `b` POS_INT +if (tryConsumeExpectedToken(TokenKind::bReg)) { + if (Params.Reg.has_value()) { +getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param) +<< CurToken.TokKind; +return std::nullopt; + } + auto Reg = parseRegister(); + if (!Reg.has_value()) +return std::nullopt; + Params.Reg = Reg; +} + } while (tryConsumeExpectedToken(TokenKind::pu_comma)); + + return Params; +} + std::optional RootSignatureParser::parseDescriptorTableClauseParams(TokenKind RegType) { assert(CurToken.TokKind == TokenKind::pu_l_paren && "Expects to only be invoked starting at given token"); - // Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any - // order and only exactly once. Parse through as many arguments as possible - // reporting an error if a duplicate is seen. ParsedClauseParams Params; do { // ( `b` | `t` | `u` | `s`) POS_INT
[llvm-branch-commits] [llvm] [BOLT][test] Fix callcont-fallthru.s after #129481 (PR #135867)
aaupov wrote: Thanks for tracking it down, looks like it's an issue with GNU nm. However llvm-nm has no functionality equivalent to `nm --synthetic` which prints the address of the PLT entry, and the test relies on that. Let me try to decouple this test from GNU nm. https://github.com/llvm/llvm-project/pull/135867 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [AArch64][SME] Prevent spills of ZT0 when ZA is not enabled (PR #137683)
https://github.com/sdesmalen-arm approved this pull request. https://github.com/llvm/llvm-project/pull/137683 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Assert on map translation functions, NFC (PR #137199)
@@ -3623,6 +3623,9 @@ static llvm::omp::OpenMPOffloadMappingFlags mapParentWithMembers( LLVM::ModuleTranslation &moduleTranslation, llvm::IRBuilderBase &builder, llvm::OpenMPIRBuilder &ompBuilder, DataLayout &dl, MapInfosTy &combinedInfo, MapInfoData &mapData, uint64_t mapDataIndex, bool isTargetParams) { + assert(!ompBuilder.Config.isTargetDevice() && + "function only supported for host device codegen"); skatrak wrote: Yes, I do agree that "host device" seems like a rather contradictory term, since we generally just talk about "host" as the CPU and "device" as whatever we're offloading to. The reason of using these terms is to align with OpenMP terminology (5.2 spec, Section 1.2.1): > **host device** The _device_ on which the _OpenMP program_ begins execution. > **target device** A _device_ with respect to which the current _device_ > performs an operation, as specified by a _device construct_ or an OpenMP > device memory routine. This is also why the `-fopenmp-is-target-device` flag is called that and not `-fopenmp-is-device` (which [used to be its name](https://reviews.llvm.org/D154591)). https://github.com/llvm/llvm-project/pull/137199 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang-tools-extra] release/20.x: [clang-tidy] Do not pass any file when listing checks in run_clang_tiโฆ (#137286) (PR #137775)
https://github.com/carlosgalvezp updated https://github.com/llvm/llvm-project/pull/137775 >From 387c2886b68b1eaf68916e4b08e8d92966a1 Mon Sep 17 00:00:00 2001 From: Carlos Galvez Date: Tue, 29 Apr 2025 11:13:30 +0200 Subject: [PATCH] =?UTF-8?q?[clang-tidy]=20Do=20not=20pass=20any=20file=20w?= =?UTF-8?q?hen=20listing=20checks=20in=20run=5Fclang=5Fti=E2=80=A6=20(#137?= =?UTF-8?q?286)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit โฆdy.py Currently, run_clang_tidy.py does not correctly display the list of checks picked up from the top-level .clang-tidy file. The reason for that is that we are passing an empty string as input file. However, that's not how we are supposed to use clang-tidy to list checks. Per https://github.com/llvm/llvm-project/commit/65eccb463df7fe511c813ee6a1794c80d7489ff2, we simply should not pass any file at all - the internal code of clang-tidy will pass a "dummy" file if that's the case and get the .clang-tidy file from the current working directory. Fixes #136659 Co-authored-by: Carlos Gรกlvez (cherry picked from commit 014ab736dc741f24c007f9861e24b31faba0e1e7) --- clang-tools-extra/clang-tidy/tool/run-clang-tidy.py | 7 --- clang-tools-extra/docs/ReleaseNotes.rst | 3 +++ 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/clang-tools-extra/clang-tidy/tool/run-clang-tidy.py b/clang-tools-extra/clang-tidy/tool/run-clang-tidy.py index f1b934f7139e9..8741147a4f8a3 100755 --- a/clang-tools-extra/clang-tidy/tool/run-clang-tidy.py +++ b/clang-tools-extra/clang-tidy/tool/run-clang-tidy.py @@ -87,7 +87,7 @@ def find_compilation_database(path: str) -> str: def get_tidy_invocation( -f: str, +f: Optional[str], clang_tidy_binary: str, checks: str, tmpdir: Optional[str], @@ -147,7 +147,8 @@ def get_tidy_invocation( start.append(f"--warnings-as-errors={warnings_as_errors}") if allow_no_checks: start.append("--allow-no-checks") -start.append(f) +if f: +start.append(f) return start @@ -490,7 +491,7 @@ async def main() -> None: try: invocation = get_tidy_invocation( -"", +None, clang_tidy_binary, args.checks, None, diff --git a/clang-tools-extra/docs/ReleaseNotes.rst b/clang-tools-extra/docs/ReleaseNotes.rst index 2ab597eb37048..0b2e9c5fabc36 100644 --- a/clang-tools-extra/docs/ReleaseNotes.rst +++ b/clang-tools-extra/docs/ReleaseNotes.rst @@ -190,6 +190,9 @@ Improvements to clang-tidy - Fixed bug in :program:`clang-tidy` by which `HeaderFilterRegex` did not take effect when passed via the `.clang-tidy` file. +- Fixed bug in :program:`run_clang_tidy.py` where the program would not + correctly display the checks enabled by the top-level `.clang-tidy` file. + New checks ^^ ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT][test] Fix callcont-fallthru.s after #129481 (PR #135867)
paschalis-mpeis wrote: Hey @yota9, thanks for the suggestions! Indeed, the PLT entries exist in both binaries. For example running: ``` build/bin/llvm-objdump -d -j .plt build/tools/bolt/test/X86/Output/callcont-fallthru.s.tmp ``` shows: ``` build/tools/bolt/test/X86/Output/callcont-fallthru.s.tmp: file format elf64-x86-64 Disassembly of section .plt: 1430 <.plt>: 1430: ff 35 f2 20 00 00 pushq 0x20f2(%rip)# 0x3528 1436: ff 25 f4 20 00 00 jmpq*0x20f4(%rip) # 0x3530 143c: 0f 1f 40 00 nopl(%rax) 1440 : 1440: ff 25 f2 20 00 00 jmpq*0x20f2(%rip) # 0x3538 1446: 68 00 00 00 00pushq $0x0 144b: e9 e0 ff ff ffjmp 0x1430 <.plt> ``` I noticed some code differences in the binaries but I haven't looked deeper into it. **It looks like it's differences in GNU nm though:** On my AArch64 dev-machine, `nm --synthetic` lists `puts@plt`, but when I copy that same binary over to our upcoming AArch64 buildbot, it's missing. Conversely, `nm --synthetic` on the buildbot does not list `puts@plt`, but when if I copy that binary to the dev-machine it does appear. --- I too agree that relying on GNU is not ideal. Essentially using any binary tool that does not come from the built LLVM revision. However, `llvm-nm` does not seem support `--synthetic`. BTW, thanks for all the help! I'm focused on AArch64, so while I may be involved to some extent with this, I'll let Amir drive the fix. That's why I'm looking for a code owner to get #137831 stamped. :) (also cc'ing: @aaupov, @maksfb) https://github.com/llvm/llvm-project/pull/135867 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/137224 >From 7d1cbb97817e615a5fa8841d72a18643b5722114 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 22 Apr 2025 21:43:14 +0300 Subject: [PATCH] [BOLT] Gadget scanner: detect untrusted LR before tail call Implement the detection of tail calls performed with untrusted link register, which violates the assumption made on entry to every function. Unlike other pauth gadgets, this one involves some amount of guessing which branch instructions should be checked as tail calls. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 112 +++- .../AArch64/gs-pacret-autiasp.s | 31 +- .../AArch64/gs-pauth-debug-output.s | 30 +- .../AArch64/gs-pauth-tail-calls.s | 597 ++ 4 files changed, 730 insertions(+), 40 deletions(-) create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-tail-calls.s diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index d4282daad648b..fab92947f6d67 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -660,8 +660,9 @@ class DataflowSrcSafetyAnalysis // // Then, a function can be split into a number of disjoint contiguous sequences // of instructions without labels in between. These sequences can be processed -// the same way basic blocks are processed by data-flow analysis, assuming -// pessimistically that all registers are unsafe at the start of each sequence. +// the same way basic blocks are processed by data-flow analysis, with the same +// pessimistic estimation of the initial state at the start of each sequence +// (except the first instruction of the function). class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { BinaryFunction &BF; MCPlusBuilder::AllocatorIdTy AllocId; @@ -672,6 +673,30 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { BC.MIB->removeAnnotation(I.second, StateAnnotationIndex); } + /// Compute a reasonably pessimistic estimation of the register state when + /// the previous instruction is not known for sure. Take the set of registers + /// which are trusted at function entry and remove all registers that can be + /// clobbered inside this function. + SrcState computePessimisticState(BinaryFunction &BF) { +BitVector ClobberedRegs(NumRegs); +for (auto &I : BF.instrs()) { + MCInst &Inst = I.second; + BC.MIB->getClobberedRegs(Inst, ClobberedRegs); + + // If this is a call instruction, no register is safe anymore, unless + // it is a tail call. Ignore tail calls for the purpose of estimating the + // worst-case scenario, assuming no instructions are executed in the + // caller after this point anyway. + if (BC.MIB->isCall(Inst) && !BC.MIB->isTailCall(Inst)) +ClobberedRegs.set(); +} + +SrcState S = createEntryState(); +S.SafeToDerefRegs.reset(ClobberedRegs); +S.TrustedRegs.reset(ClobberedRegs); +return S; + } + public: CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId, @@ -682,6 +707,7 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { } void run() override { +const SrcState DefaultState = computePessimisticState(BF); SrcState S = createEntryState(); for (auto &I : BF.instrs()) { MCInst &Inst = I.second; @@ -696,7 +722,7 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { LLVM_DEBUG({ traceInst(BC, "Due to label, resetting the state before", Inst); }); -S = createUnsafeState(); +S = DefaultState; } // Check if we need to remove an old annotation (this is the case if @@ -1233,6 +1259,83 @@ shouldReportReturnGadget(const BinaryContext &BC, const MCInstReference &Inst, return make_gadget_report(RetKind, Inst, *RetReg); } +/// While BOLT already marks some of the branch instructions as tail calls, +/// this function tries to improve the coverage by including less obvious cases +/// when it is possible to do without introducing too many false positives. +static bool shouldAnalyzeTailCallInst(const BinaryContext &BC, + const BinaryFunction &BF, + const MCInstReference &Inst) { + // Some BC.MIB->isXYZ(Inst) methods simply delegate to MCInstrDesc::isXYZ() + // (such as isBranch at the time of writing this comment), some don't (such + // as isCall). For that reason, call MCInstrDesc's methods explicitly when + // it is important. + const MCInstrDesc &Desc = + BC.MII->get(static_cast(Inst).getOpcode()); + // Tail call should be a branch (but not necessarily an indirect one). + if (!Desc.isBranch()) +return false; + + // Always analyze the branches already marked as tail calls by BOLT. + if (BC.MIB->isTailCall(Inst)) +return t
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/137224 >From 7d1cbb97817e615a5fa8841d72a18643b5722114 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Tue, 22 Apr 2025 21:43:14 +0300 Subject: [PATCH] [BOLT] Gadget scanner: detect untrusted LR before tail call Implement the detection of tail calls performed with untrusted link register, which violates the assumption made on entry to every function. Unlike other pauth gadgets, this one involves some amount of guessing which branch instructions should be checked as tail calls. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 112 +++- .../AArch64/gs-pacret-autiasp.s | 31 +- .../AArch64/gs-pauth-debug-output.s | 30 +- .../AArch64/gs-pauth-tail-calls.s | 597 ++ 4 files changed, 730 insertions(+), 40 deletions(-) create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-tail-calls.s diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index d4282daad648b..fab92947f6d67 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -660,8 +660,9 @@ class DataflowSrcSafetyAnalysis // // Then, a function can be split into a number of disjoint contiguous sequences // of instructions without labels in between. These sequences can be processed -// the same way basic blocks are processed by data-flow analysis, assuming -// pessimistically that all registers are unsafe at the start of each sequence. +// the same way basic blocks are processed by data-flow analysis, with the same +// pessimistic estimation of the initial state at the start of each sequence +// (except the first instruction of the function). class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { BinaryFunction &BF; MCPlusBuilder::AllocatorIdTy AllocId; @@ -672,6 +673,30 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { BC.MIB->removeAnnotation(I.second, StateAnnotationIndex); } + /// Compute a reasonably pessimistic estimation of the register state when + /// the previous instruction is not known for sure. Take the set of registers + /// which are trusted at function entry and remove all registers that can be + /// clobbered inside this function. + SrcState computePessimisticState(BinaryFunction &BF) { +BitVector ClobberedRegs(NumRegs); +for (auto &I : BF.instrs()) { + MCInst &Inst = I.second; + BC.MIB->getClobberedRegs(Inst, ClobberedRegs); + + // If this is a call instruction, no register is safe anymore, unless + // it is a tail call. Ignore tail calls for the purpose of estimating the + // worst-case scenario, assuming no instructions are executed in the + // caller after this point anyway. + if (BC.MIB->isCall(Inst) && !BC.MIB->isTailCall(Inst)) +ClobberedRegs.set(); +} + +SrcState S = createEntryState(); +S.SafeToDerefRegs.reset(ClobberedRegs); +S.TrustedRegs.reset(ClobberedRegs); +return S; + } + public: CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId, @@ -682,6 +707,7 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { } void run() override { +const SrcState DefaultState = computePessimisticState(BF); SrcState S = createEntryState(); for (auto &I : BF.instrs()) { MCInst &Inst = I.second; @@ -696,7 +722,7 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { LLVM_DEBUG({ traceInst(BC, "Due to label, resetting the state before", Inst); }); -S = createUnsafeState(); +S = DefaultState; } // Check if we need to remove an old annotation (this is the case if @@ -1233,6 +1259,83 @@ shouldReportReturnGadget(const BinaryContext &BC, const MCInstReference &Inst, return make_gadget_report(RetKind, Inst, *RetReg); } +/// While BOLT already marks some of the branch instructions as tail calls, +/// this function tries to improve the coverage by including less obvious cases +/// when it is possible to do without introducing too many false positives. +static bool shouldAnalyzeTailCallInst(const BinaryContext &BC, + const BinaryFunction &BF, + const MCInstReference &Inst) { + // Some BC.MIB->isXYZ(Inst) methods simply delegate to MCInstrDesc::isXYZ() + // (such as isBranch at the time of writing this comment), some don't (such + // as isCall). For that reason, call MCInstrDesc's methods explicitly when + // it is important. + const MCInstrDesc &Desc = + BC.MII->get(static_cast(Inst).getOpcode()); + // Tail call should be a branch (but not necessarily an indirect one). + if (!Desc.isBranch()) +return false; + + // Always analyze the branches already marked as tail calls by BOLT. + if (BC.MIB->isTailCall(Inst)) +return t
[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: improve handling of unreachable basic blocks (PR #136183)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/136183 >From 96e1b04ab61480c080350d9fc2c658cb0364f4da Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Thu, 17 Apr 2025 20:51:16 +0300 Subject: [PATCH 1/2] [BOLT] Gadget scanner: improve handling of unreachable basic blocks Instead of refusing to analyze an instruction completely, when it is unreachable according to the CFG reconstructed by BOLT, pessimistically assume all registers to be unsafe at the start of basic blocks without any predecessors. Nevertheless, unreachable basic blocks found in optimized code likely means imprecise CFG reconstruction, thus report a warning once per basic block without predecessors. --- bolt/lib/Passes/PAuthGadgetScanner.cpp| 46 ++- .../AArch64/gs-pacret-autiasp.s | 7 ++- .../binary-analysis/AArch64/gs-pauth-calls.s | 57 +++ 3 files changed, 95 insertions(+), 15 deletions(-) diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp index 7a86d9c4c9a59..917649731884e 100644 --- a/bolt/lib/Passes/PAuthGadgetScanner.cpp +++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp @@ -344,6 +344,12 @@ class SrcSafetyAnalysis { return S; } + /// Creates a state with all registers marked unsafe (not to be confused + /// with empty state). + SrcState createUnsafeState() const { +return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); + } + BitVector getClobberedRegs(const MCInst &Point) const { BitVector Clobbered(NumRegs); // Assume a call can clobber all registers, including callee-saved @@ -586,6 +592,13 @@ class DataflowSrcSafetyAnalysis if (BB.isEntryPoint()) return createEntryState(); +// If a basic block without any predecessors is found in an optimized code, +// this likely means that some CFG edges were not detected. Pessimistically +// assume all registers to be unsafe before this basic block and warn about +// this fact in FunctionAnalysis::findUnsafeUses(). +if (BB.pred_empty()) + return createUnsafeState(); + return SrcState(); } @@ -659,12 +672,6 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis { BC.MIB->removeAnnotation(I.second, StateAnnotationIndex); } - /// Creates a state with all registers marked unsafe (not to be confused - /// with empty state). - SrcState createUnsafeState() const { -return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters()); - } - public: CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId, @@ -1335,19 +1342,30 @@ void FunctionAnalysisContext::findUnsafeUses( BF.dump(); }); + if (BF.hasCFG()) { +// Warn on basic blocks being unreachable according to BOLT, as this +// likely means CFG is imprecise. +for (BinaryBasicBlock &BB : BF) { + if (!BB.pred_empty() || BB.isEntryPoint()) +continue; + // Arbitrarily attach the report to the first instruction of BB. + MCInst *InstToReport = BB.getFirstNonPseudoInstr(); + if (!InstToReport) +continue; // BB has no real instructions + + Reports.push_back( + make_generic_report(MCInstReference::get(InstToReport, BF), + "Warning: no predecessor basic blocks detected " + "(possibly incomplete CFG)")); +} + } + iterateOverInstrs(BF, [&](MCInstReference Inst) { if (BC.MIB->isCFI(Inst)) return; const SrcState &S = Analysis->getStateBefore(Inst); - -// If non-empty state was never propagated from the entry basic block -// to Inst, assume it to be unreachable and report a warning. -if (S.empty()) { - Reports.push_back( - make_generic_report(Inst, "Warning: unreachable instruction found")); - return; -} +assert(!S.empty() && "Instruction has no associated state"); if (auto Report = shouldReportReturnGadget(BC, Inst, S)) Reports.push_back(*Report); diff --git a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s index 284f0bea607a5..6559ba336e8de 100644 --- a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s +++ b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s @@ -215,12 +215,17 @@ f_callclobbered_calleesaved: .globl f_unreachable_instruction .type f_unreachable_instruction,@function f_unreachable_instruction: -// CHECK-LABEL: GS-PAUTH: Warning: unreachable instruction found in function f_unreachable_instruction, basic block {{[0-9a-zA-Z.]+}}, at address +// CHECK-LABEL: GS-PAUTH: Warning: no predecessor basic blocks detected (possibly incomplete CFG) in function f_unreachable_instruction, basic block {{[0-9a-zA-Z.]+}}, at address // CHECK-NEXT:The instruction is {{[0-9a-f]+}}: add x0, x1, x2 // CHECK-NOT: instructi
[llvm-branch-commits] [llvm] [ConstraintElim] Simplify `usub_with_overflow` when A uge B (PR #135785)
https://github.com/el-ev updated https://github.com/llvm/llvm-project/pull/135785 >From c5328e2075b5f98311eb9c9657da79d800cf28f4 Mon Sep 17 00:00:00 2001 From: Iris Shi <0...@owo.li> Date: Tue, 15 Apr 2025 20:20:45 +0800 Subject: [PATCH 1/2] [ConstraintElim] Simplify `usub_with_overflow` when A uge B --- .../Scalar/ConstraintElimination.cpp | 10 ++ .../usub-with-overflow.ll | 33 +++ 2 files changed, 21 insertions(+), 22 deletions(-) diff --git a/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp b/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp index 6d17b36b7d9ea..e030f0fac4c0d 100644 --- a/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp +++ b/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp @@ -1123,6 +1123,7 @@ void State::addInfoFor(BasicBlock &BB) { // Enqueue overflow intrinsics for simplification. case Intrinsic::sadd_with_overflow: case Intrinsic::ssub_with_overflow: +case Intrinsic::usub_with_overflow: case Intrinsic::ucmp: case Intrinsic::scmp: WorkList.push_back( @@ -1785,6 +1786,15 @@ tryToSimplifyOverflowMath(IntrinsicInst *II, ConstraintInfo &Info, Changed = replaceSubOverflowUses(II, A, B, ToRemove); break; } + case Intrinsic::usub_with_overflow: { +// usub overflows iff A < B +// TODO: If the operation is guaranteed to overflow, we could +// also apply some simplifications. +if (DoesConditionHold(CmpInst::ICMP_UGE, A, B, Info)) { + Changed = replaceSubOverflowUses(II, A, B, ToRemove); +} +break; + } } return Changed; diff --git a/llvm/test/Transforms/ConstraintElimination/usub-with-overflow.ll b/llvm/test/Transforms/ConstraintElimination/usub-with-overflow.ll index 06bfd8269d97d..0a3c4eb00fd25 100644 --- a/llvm/test/Transforms/ConstraintElimination/usub-with-overflow.ll +++ b/llvm/test/Transforms/ConstraintElimination/usub-with-overflow.ll @@ -9,11 +9,9 @@ define i8 @usub_no_overflow_due_to_cmp_condition(i8 %a, i8 %b) { ; CHECK-NEXT:[[C_1:%.*]] = icmp uge i8 [[B:%.*]], [[A:%.*]] ; CHECK-NEXT:br i1 [[C_1]], label [[MATH:%.*]], label [[EXIT_FAIL:%.*]] ; CHECK: math: -; CHECK-NEXT:[[OP:%.*]] = tail call { i8, i1 } @llvm.usub.with.overflow.i8(i8 [[B]], i8 [[A]]) -; CHECK-NEXT:[[STATUS:%.*]] = extractvalue { i8, i1 } [[OP]], 1 -; CHECK-NEXT:br i1 [[STATUS]], label [[EXIT_FAIL]], label [[EXIT_OK:%.*]] +; CHECK-NEXT:[[RES:%.*]] = sub i8 [[B]], [[A]] +; CHECK-NEXT:br i1 false, label [[EXIT_FAIL]], label [[EXIT_OK:%.*]] ; CHECK: exit.ok: -; CHECK-NEXT:[[RES:%.*]] = extractvalue { i8, i1 } [[OP]], 0 ; CHECK-NEXT:ret i8 [[RES]] ; CHECK: exit.fail: ; CHECK-NEXT:ret i8 0 @@ -41,11 +39,9 @@ define i8 @usub_no_overflow_due_to_cmp_condition2(i8 %a, i8 %b) { ; CHECK-NEXT:[[C_1:%.*]] = icmp ule i8 [[B:%.*]], [[A:%.*]] ; CHECK-NEXT:br i1 [[C_1]], label [[EXIT_FAIL:%.*]], label [[MATH:%.*]] ; CHECK: math: -; CHECK-NEXT:[[OP:%.*]] = tail call { i8, i1 } @llvm.usub.with.overflow.i8(i8 [[B]], i8 [[A]]) -; CHECK-NEXT:[[STATUS:%.*]] = extractvalue { i8, i1 } [[OP]], 1 -; CHECK-NEXT:br i1 [[STATUS]], label [[EXIT_FAIL]], label [[EXIT_OK:%.*]] +; CHECK-NEXT:[[RES:%.*]] = sub i8 [[B]], [[A]] +; CHECK-NEXT:br i1 false, label [[EXIT_FAIL]], label [[EXIT_OK:%.*]] ; CHECK: exit.ok: -; CHECK-NEXT:[[RES:%.*]] = extractvalue { i8, i1 } [[OP]], 0 ; CHECK-NEXT:ret i8 [[RES]] ; CHECK: exit.fail: ; CHECK-NEXT:ret i8 0 @@ -75,12 +71,11 @@ define i8 @sub_no_overflow_due_to_cmp_condition_result_used(i8 %a, i8 %b) { ; CHECK-NEXT:[[C_1:%.*]] = icmp ule i8 [[B:%.*]], [[A:%.*]] ; CHECK-NEXT:br i1 [[C_1]], label [[EXIT_FAIL:%.*]], label [[MATH:%.*]] ; CHECK: math: +; CHECK-NEXT:[[RES:%.*]] = sub i8 [[B]], [[A]] ; CHECK-NEXT:[[OP:%.*]] = tail call { i8, i1 } @llvm.usub.with.overflow.i8(i8 [[B]], i8 [[A]]) ; CHECK-NEXT:call void @use_res({ i8, i1 } [[OP]]) -; CHECK-NEXT:[[STATUS:%.*]] = extractvalue { i8, i1 } [[OP]], 1 -; CHECK-NEXT:br i1 [[STATUS]], label [[EXIT_FAIL]], label [[EXIT_OK:%.*]] +; CHECK-NEXT:br i1 false, label [[EXIT_FAIL]], label [[EXIT_OK:%.*]] ; CHECK: exit.ok: -; CHECK-NEXT:[[RES:%.*]] = extractvalue { i8, i1 } [[OP]], 0 ; CHECK-NEXT:ret i8 [[RES]] ; CHECK: exit.fail: ; CHECK-NEXT:ret i8 0 @@ -111,11 +106,9 @@ define i8 @usub_no_overflow_due_to_or_conds(i8 %a, i8 %b) { ; CHECK-NEXT:[[OR:%.*]] = or i1 [[C_2]], [[C_1]] ; CHECK-NEXT:br i1 [[OR]], label [[EXIT_FAIL:%.*]], label [[MATH:%.*]] ; CHECK: math: -; CHECK-NEXT:[[OP:%.*]] = tail call { i8, i1 } @llvm.usub.with.overflow.i8(i8 [[B]], i8 [[A]]) -; CHECK-NEXT:[[STATUS:%.*]] = extractvalue { i8, i1 } [[OP]], 1 -; CHECK-NEXT:br i1 [[STATUS]], label [[EXIT_FAIL]], label [[EXIT_OK:%.*]] +; CHECK-NEXT:[[RES:%.*]] = sub i8 [[B]], [[A]] +; CHECK-NEXT:br i1 false, label [[EX
[llvm-branch-commits] [llvm] [BOLT][test] Fix callcont-fallthru.s after #129481 (PR #135867)
paschalis-mpeis wrote: Great. A quick way to use an llvm tool could be: ```bash llvm-objdump -d -j .plt %t | grep @plt ``` This produces output similar to what `nm --synthetic` produces (when it works): ```bash 1430 : ``` You'll need ofc to tweak `link_fdata` to properly parse symbol+address: https://github.com/llvm/llvm-project/blob/fb8d61d8163f1439749a62be614c09215fe65e9f/bolt/test/link_fdata.py#L96-L99 Not sure of any cleaner approach? (@yota9, @MaskRay) https://github.com/llvm/llvm-project/pull/135867 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/125432 >From d382b955279cf3699599f1885e4c47af5f5bda2e Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 31 Jan 2025 13:12:56 -0500 Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic vector. After splitting, all elements are created. The elements are placed back into a concat_vectors. This change extends EltsFromConsecutiveLoads to understand AtomicSDNode so that its concat_vectors can be mapped to a BUILD_VECTOR and so unused elements are no longer referenced. commit-id:b83937a8 --- llvm/include/llvm/CodeGen/SelectionDAG.h | 4 +- .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 20 ++- .../SelectionDAGAddressAnalysis.cpp | 30 ++-- .../SelectionDAG/SelectionDAGBuilder.cpp | 6 +- llvm/lib/Target/X86/X86ISelLowering.cpp | 29 +-- llvm/test/CodeGen/X86/atomic-load-store.ll| 167 ++ 6 files changed, 69 insertions(+), 187 deletions(-) diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h index ba11ddbb5b731..d3cd81c146280 100644 --- a/llvm/include/llvm/CodeGen/SelectionDAG.h +++ b/llvm/include/llvm/CodeGen/SelectionDAG.h @@ -1843,7 +1843,7 @@ class SelectionDAG { /// chain to the token factor. This ensures that the new memory node will have /// the same relative memory dependency position as the old load. Returns the /// new merged load chain. - SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp); + SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp); /// Topological-sort the AllNodes list and a /// assign a unique node id for each node in the DAG based on their @@ -2281,7 +2281,7 @@ class SelectionDAG { /// merged. Check that both are nonvolatile and if LD is loading /// 'Bytes' bytes from a location that is 'Dist' units away from the /// location that the 'Base' load is loading from. - bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base, + bool areNonVolatileConsecutiveLoads(MemSDNode *LD, MemSDNode *Base, unsigned Bytes, int Dist) const; /// Infer alignment of a load / store address. Return std::nullopt if it diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index d9d6d43c42430..e4d7e176c51e4 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -12213,7 +12213,7 @@ SDValue SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain, return TokenFactor; } -SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, +SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp) { assert(isa(NewMemOp.getNode()) && "Expected a memop node"); SDValue OldChain = SDValue(OldLoad, 1); @@ -12906,17 +12906,21 @@ std::pair SelectionDAG::UnrollVectorOverflowOp( getBuildVector(NewOvVT, dl, OvScalars)); } -bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD, - LoadSDNode *Base, +bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD, + MemSDNode *Base, unsigned Bytes, int Dist) const { if (LD->isVolatile() || Base->isVolatile()) return false; - // TODO: probably too restrictive for atomics, revisit - if (!LD->isSimple()) -return false; - if (LD->isIndexed() || Base->isIndexed()) -return false; + if (auto Ld = dyn_cast(LD)) { +if (!Ld->isSimple()) + return false; +if (Ld->isIndexed()) + return false; + } + if (auto Ld = dyn_cast(Base)) +if (Ld->isIndexed()) + return false; if (LD->getChain() != Base->getChain()) return false; EVT VT = LD->getMemoryVT(); diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp index f2ab88851b780..c29cb424c7a4c 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp @@ -195,8 +195,8 @@ bool BaseIndexOffset::contains(const SelectionDAG &DAG, int64_t BitSize, } /// Parses tree in Ptr for base, index, offset addresses. -static BaseIndexOffset matchLSNode(const LSBaseSDNode *N, - const SelectionDAG &DAG) { +template +static BaseIndexOffset matchSDNode(const T *N, const SelectionDAG &DAG) { SDValue Ptr = N->getBasePtr(); // (((B + I*M) + c)) + c ... @@ -206,16 +206,18 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N, bool IsIndexSignExt = false; // pre-inc/pre-dec ops are components of EA. - if (N->get
[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120386 >From cb2e5bc0ea3c3d0ccc3f684ef9bced8b30c7e74e Mon Sep 17 00:00:00 2001 From: jofrn Date: Wed, 18 Dec 2024 03:38:23 -0500 Subject: [PATCH] [X86] Manage atomic load of fp -> int promotion in DAG When lowering atomic <1 x T> vector types with floats, selection can fail since this pattern is unsupported. To support this, floats can be casted to an integer type of the same size. commit-id:f9d761c5 --- llvm/lib/Target/X86/X86ISelLowering.cpp| 4 +++ llvm/test/CodeGen/X86/atomic-load-store.ll | 37 ++ 2 files changed, 41 insertions(+) diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 483aceb239b0c..c48344332d707 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -2651,6 +2651,10 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM, setOperationAction(Op, MVT::f32, Promote); } + setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f16, MVT::i16); + setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f32, MVT::i32); + setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f64, MVT::i64); + // We have target-specific dag combine patterns for the following nodes: setTargetDAGCombine({ISD::VECTOR_SHUFFLE, ISD::SCALAR_TO_VECTOR, diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index d23cfb89f9fc8..6efcbb80c0ce6 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -145,3 +145,40 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind { %ret = load atomic <1 x i64>, ptr %x acquire, align 8 ret <1 x i64> %ret } + +define <1 x half> @atomic_vec1_half(ptr %x) { +; CHECK3-LABEL: atomic_vec1_half: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movzwl (%rdi), %eax +; CHECK3-NEXT:pinsrw $0, %eax, %xmm0 +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_half: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movw (%rdi), %cx +; CHECK0-NEXT:## implicit-def: $eax +; CHECK0-NEXT:movw %cx, %ax +; CHECK0-NEXT:## implicit-def: $xmm0 +; CHECK0-NEXT:pinsrw $0, %eax, %xmm0 +; CHECK0-NEXT:retq + %ret = load atomic <1 x half>, ptr %x acquire, align 2 + ret <1 x half> %ret +} + +define <1 x float> @atomic_vec1_float(ptr %x) { +; CHECK-LABEL: atomic_vec1_float: +; CHECK: ## %bb.0: +; CHECK-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK-NEXT:retq + %ret = load atomic <1 x float>, ptr %x acquire, align 4 + ret <1 x float> %ret +} + +define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec1_double_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK-NEXT:retq + %ret = load atomic <1 x double>, ptr %x acquire, align 8 + ret <1 x double> %ret +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120716 >From be154f89c03bf89ee99d213ca954c77c44fc4e9c Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 20 Dec 2024 06:14:28 -0500 Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector AtomicExpand fails for aligned `load atomic ` because it does not find a compatible library call. This change adds appropriate bitcasts so that the call can be lowered. commit-id:f430c1af --- llvm/lib/CodeGen/AtomicExpandPass.cpp | 20 +- llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++ llvm/test/CodeGen/X86/atomic-load-store.ll| 30 + .../X86/expand-atomic-non-integer.ll | 65 +++ 4 files changed, 163 insertions(+), 3 deletions(-) diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp index c376de877ac7d..b6f1e9db9ce35 100644 --- a/llvm/lib/CodeGen/AtomicExpandPass.cpp +++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp @@ -2066,9 +2066,23 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall( I->replaceAllUsesWith(V); } else if (HasResult) { Value *V; -if (UseSizedLibcall) - V = Builder.CreateBitOrPointerCast(Result, I->getType()); -else { +if (UseSizedLibcall) { + // Add bitcasts from Result's scalar type to I's vector type + if (I->getType()->getScalarType()->isPointerTy() && + I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) { +unsigned AS = + cast(I->getType()->getScalarType())->getAddressSpace(); +ElementCount EC = cast(I->getType())->getElementCount(); +Value *BC = Builder.CreateBitCast( +Result, +VectorType::get(IntegerType::get(Ctx, DL.getPointerSizeInBits(AS)), +EC)); +Value *IntToPtr = Builder.CreateIntToPtr( +BC, VectorType::get(PointerType::get(Ctx, AS), EC)); +V = Builder.CreateBitOrPointerCast(IntToPtr, I->getType()); + } else +V = Builder.CreateBitOrPointerCast(Result, I->getType()); +} else { V = Builder.CreateAlignedLoad(I->getType(), AllocaResult, AllocaAlignment); Builder.CreateLifetimeEnd(AllocaResult, SizeVal64); diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll b/llvm/test/CodeGen/ARM/atomic-load-store.ll index 560dfde356c29..36c1305a7c5df 100644 --- a/llvm/test/CodeGen/ARM/atomic-load-store.ll +++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll @@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double %val1) { store atomic double %val1, ptr %ptr seq_cst, align 8 ret void } + +define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 { +; ARM-LABEL: atomic_vec1_ptr: +; ARM: @ %bb.0: +; ARM-NEXT:ldr r0, [r0] +; ARM-NEXT:dmb ish +; ARM-NEXT:bx lr +; +; ARMOPTNONE-LABEL: atomic_vec1_ptr: +; ARMOPTNONE: @ %bb.0: +; ARMOPTNONE-NEXT:ldr r0, [r0] +; ARMOPTNONE-NEXT:dmb ish +; ARMOPTNONE-NEXT:bx lr +; +; THUMBTWO-LABEL: atomic_vec1_ptr: +; THUMBTWO: @ %bb.0: +; THUMBTWO-NEXT:ldr r0, [r0] +; THUMBTWO-NEXT:dmb ish +; THUMBTWO-NEXT:bx lr +; +; THUMBONE-LABEL: atomic_vec1_ptr: +; THUMBONE: @ %bb.0: +; THUMBONE-NEXT:push {r7, lr} +; THUMBONE-NEXT:movs r1, #0 +; THUMBONE-NEXT:mov r2, r1 +; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4 +; THUMBONE-NEXT:pop {r7, pc} +; +; ARMV4-LABEL: atomic_vec1_ptr: +; ARMV4: @ %bb.0: +; ARMV4-NEXT:push {r11, lr} +; ARMV4-NEXT:mov r1, #2 +; ARMV4-NEXT:bl __atomic_load_4 +; ARMV4-NEXT:pop {r11, lr} +; ARMV4-NEXT:mov pc, lr +; +; ARMV6-LABEL: atomic_vec1_ptr: +; ARMV6: @ %bb.0: +; ARMV6-NEXT:mov r1, #0 +; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5 +; ARMV6-NEXT:ldr r0, [r0] +; ARMV6-NEXT:bx lr +; +; THUMBM-LABEL: atomic_vec1_ptr: +; THUMBM: @ %bb.0: +; THUMBM-NEXT:ldr r0, [r0] +; THUMBM-NEXT:dmb sy +; THUMBM-NEXT:bx lr + %ret = load atomic <1 x ptr>, ptr %x acquire, align 4 + ret <1 x ptr> %ret +} diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 08d0405345f57..4293df8c13571 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -371,6 +371,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind { ret <2 x i32> %ret } +define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec2_ptr_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:pushq %rax +; CHECK-NEXT:movl $2, %esi +; CHECK-NEXT:callq ___atomic_load_16 +; CHECK-NEXT:movq %rdx, %xmm1 +; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT:popq %rax +; CHECK-NEXT:retq + %ret = load atomic <2 x ptr>, ptr %x acquire, align 16 + ret <2 x ptr> %ret +} + define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind { ; CHECK3-LAB
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120598 >From 867af9f33e61399964c51dd0c40dfcf37ec0a1e0 Mon Sep 17 00:00:00 2001 From: jofrn Date: Thu, 19 Dec 2024 11:19:39 -0500 Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load Vector types of 2 elements must be widened. This change does this for vector types of atomic load in SelectionDAG so that it can translate aligned vectors of >1 size. Also, it also adds Pats to remove an extra MOV. commit-id:2894ccd1 --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 104 ++ llvm/lib/Target/X86/X86InstrCompiler.td | 7 ++ llvm/test/CodeGen/X86/atomic-load-store.ll| 81 ++ llvm/test/CodeGen/X86/atomic-unordered.ll | 3 +- 5 files changed, 173 insertions(+), 23 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index 89ea7ef4dbe89..bdfa5f7741ad3 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -1062,6 +1062,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N); SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N); SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N); + SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N); SDValue WidenVecRes_LOAD(SDNode* N); SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N); SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N); diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index f9034c94bfb6d..23353b326b02c 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -4590,6 +4590,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, unsigned ResNo) { break; case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break; case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break; + case ISD::ATOMIC_LOAD: +Res = WidenVecRes_ATOMIC_LOAD(cast(N)); +break; case ISD::LOAD: Res = WidenVecRes_LOAD(N); break; case ISD::STEP_VECTOR: case ISD::SPLAT_VECTOR: @@ -5979,6 +5982,85 @@ SDValue DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) { N->getOperand(1), N->getOperand(2)); } +/// Either return the same load or provide appropriate casts +/// from the load and return that. +static SDValue loadElement(SDValue LdOp, EVT FirstVT, EVT WidenVT, + TypeSize LdWidth, TypeSize FirstVTWidth, SDLoc dl, + SelectionDAG &DAG) { + assert(TypeSize::isKnownLE(LdWidth, FirstVTWidth)); + TypeSize WidenWidth = WidenVT.getSizeInBits(); + if (!FirstVT.isVector()) { +unsigned NumElts = +WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue(); +EVT NewVecVT = EVT::getVectorVT(*DAG.getContext(), FirstVT, NumElts); +SDValue VecOp = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, NewVecVT, LdOp); +return DAG.getNode(ISD::BITCAST, dl, WidenVT, VecOp); + } else if (FirstVT == WidenVT) +return LdOp; + else { +// TODO: We don't currently have any tests that exercise this code path. +assert(WidenWidth.getFixedValue() % FirstVTWidth.getFixedValue() == 0); +unsigned NumConcat = +WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue(); +SmallVector ConcatOps(NumConcat); +SDValue UndefVal = DAG.getUNDEF(FirstVT); +ConcatOps[0] = LdOp; +for (unsigned i = 1; i != NumConcat; ++i) + ConcatOps[i] = UndefVal; +return DAG.getNode(ISD::CONCAT_VECTORS, dl, WidenVT, ConcatOps); + } +} + +static std::optional findMemType(SelectionDAG &DAG, + const TargetLowering &TLI, unsigned Width, + EVT WidenVT, unsigned Align, + unsigned WidenEx); + +SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *LD) { + EVT WidenVT = + TLI.getTypeToTransformTo(*DAG.getContext(),LD->getValueType(0)); + EVT LdVT = LD->getMemoryVT(); + SDLoc dl(LD); + assert(LdVT.isVector() && WidenVT.isVector() && "Expected vectors"); + assert(LdVT.isScalableVector() == WidenVT.isScalableVector() && + "Must be scalable"); + assert(LdVT.getVectorElementType() == WidenVT.getVectorElementType() && + "Expected equivalent element types"); + + // Load information + SDValue Chain = LD->getChain(); + SDValue BasePtr = LD->getBasePtr(); + MachineMemOperand::Flags MMOFlags = LD->getMemOperand()->getFlags(); + AAMDNodes AAInfo = LD->getAAInfo(); + + TypeSize LdWidth = LdVT.getSizeInBits(); + TypeSize WidenWidth = WidenVT.getSizeInBits(); + TypeSize WidthDiff = WidenWidth - LdWidth; + + // Find the vector type that can load from. + std::optional FirstVT = + fi
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120598 >From 867af9f33e61399964c51dd0c40dfcf37ec0a1e0 Mon Sep 17 00:00:00 2001 From: jofrn Date: Thu, 19 Dec 2024 11:19:39 -0500 Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load Vector types of 2 elements must be widened. This change does this for vector types of atomic load in SelectionDAG so that it can translate aligned vectors of >1 size. Also, it also adds Pats to remove an extra MOV. commit-id:2894ccd1 --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 104 ++ llvm/lib/Target/X86/X86InstrCompiler.td | 7 ++ llvm/test/CodeGen/X86/atomic-load-store.ll| 81 ++ llvm/test/CodeGen/X86/atomic-unordered.ll | 3 +- 5 files changed, 173 insertions(+), 23 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index 89ea7ef4dbe89..bdfa5f7741ad3 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -1062,6 +1062,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N); SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N); SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N); + SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N); SDValue WidenVecRes_LOAD(SDNode* N); SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N); SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N); diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index f9034c94bfb6d..23353b326b02c 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -4590,6 +4590,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, unsigned ResNo) { break; case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break; case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break; + case ISD::ATOMIC_LOAD: +Res = WidenVecRes_ATOMIC_LOAD(cast(N)); +break; case ISD::LOAD: Res = WidenVecRes_LOAD(N); break; case ISD::STEP_VECTOR: case ISD::SPLAT_VECTOR: @@ -5979,6 +5982,85 @@ SDValue DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) { N->getOperand(1), N->getOperand(2)); } +/// Either return the same load or provide appropriate casts +/// from the load and return that. +static SDValue loadElement(SDValue LdOp, EVT FirstVT, EVT WidenVT, + TypeSize LdWidth, TypeSize FirstVTWidth, SDLoc dl, + SelectionDAG &DAG) { + assert(TypeSize::isKnownLE(LdWidth, FirstVTWidth)); + TypeSize WidenWidth = WidenVT.getSizeInBits(); + if (!FirstVT.isVector()) { +unsigned NumElts = +WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue(); +EVT NewVecVT = EVT::getVectorVT(*DAG.getContext(), FirstVT, NumElts); +SDValue VecOp = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, NewVecVT, LdOp); +return DAG.getNode(ISD::BITCAST, dl, WidenVT, VecOp); + } else if (FirstVT == WidenVT) +return LdOp; + else { +// TODO: We don't currently have any tests that exercise this code path. +assert(WidenWidth.getFixedValue() % FirstVTWidth.getFixedValue() == 0); +unsigned NumConcat = +WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue(); +SmallVector ConcatOps(NumConcat); +SDValue UndefVal = DAG.getUNDEF(FirstVT); +ConcatOps[0] = LdOp; +for (unsigned i = 1; i != NumConcat; ++i) + ConcatOps[i] = UndefVal; +return DAG.getNode(ISD::CONCAT_VECTORS, dl, WidenVT, ConcatOps); + } +} + +static std::optional findMemType(SelectionDAG &DAG, + const TargetLowering &TLI, unsigned Width, + EVT WidenVT, unsigned Align, + unsigned WidenEx); + +SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *LD) { + EVT WidenVT = + TLI.getTypeToTransformTo(*DAG.getContext(),LD->getValueType(0)); + EVT LdVT = LD->getMemoryVT(); + SDLoc dl(LD); + assert(LdVT.isVector() && WidenVT.isVector() && "Expected vectors"); + assert(LdVT.isScalableVector() == WidenVT.isScalableVector() && + "Must be scalable"); + assert(LdVT.getVectorElementType() == WidenVT.getVectorElementType() && + "Expected equivalent element types"); + + // Load information + SDValue Chain = LD->getChain(); + SDValue BasePtr = LD->getBasePtr(); + MachineMemOperand::Flags MMOFlags = LD->getMemOperand()->getFlags(); + AAMDNodes AAInfo = LD->getAAInfo(); + + TypeSize LdWidth = LdVT.getSizeInBits(); + TypeSize WidenWidth = WidenVT.getSizeInBits(); + TypeSize WidthDiff = WidenWidth - LdWidth; + + // Find the vector type that can load from. + std::optional FirstVT = + fi
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Split via Concat vector types for atomic load (PR #120640)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120640 >From 79fce921b0cb321760da43ede34e58d4e0880d33 Mon Sep 17 00:00:00 2001 From: jofrn Date: Thu, 19 Dec 2024 16:25:55 -0500 Subject: [PATCH] [SelectionDAG][X86] Split via Concat vector types for atomic load Vector types that aren't widened are 'split' via CONCAT_VECTORS so that a single ATOMIC_LOAD is issued for the entire vector at once. This change utilizes the load vectorization infrastructure in SelectionDAG in order to group the vectors. This enables SelectionDAG to translate vectors with type bfloat,half. commit-id:3a045357 --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 33 llvm/test/CodeGen/X86/atomic-load-store.ll| 171 ++ 3 files changed, 205 insertions(+) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index bdfa5f7741ad3..7905f5a94c579 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -960,6 +960,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi); void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi); void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi); + void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD); void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo, diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index 23353b326b02c..c1cb45ed40d49 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -1172,6 +1172,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) { SplitVecRes_STEP_VECTOR(N, Lo, Hi); break; case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break; + case ISD::ATOMIC_LOAD: +SplitVecRes_ATOMIC_LOAD(cast(N)); +break; case ISD::LOAD: SplitVecRes_LOAD(cast(N), Lo, Hi); break; @@ -1421,6 +1424,36 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) { SetSplitVector(SDValue(N, ResNo), Lo, Hi); } +void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD) { + SDLoc dl(LD); + + EVT MemoryVT = LD->getMemoryVT(); + unsigned NumElts = MemoryVT.getVectorMinNumElements(); + + EVT IntMemoryVT = EVT::getVectorVT(*DAG.getContext(), MVT::i16, NumElts); + EVT ElemVT = + EVT::getVectorVT(*DAG.getContext(), MemoryVT.getVectorElementType(), 1); + + // Create a single atomic to load all the elements at once. + SDValue Atomic = + DAG.getAtomicLoad(ISD::NON_EXTLOAD, dl, IntMemoryVT, IntMemoryVT, +LD->getChain(), LD->getBasePtr(), +LD->getMemOperand()); + + // Instead of splitting, put all the elements back into a vector. + SmallVector Ops; + for (unsigned i = 0; i < NumElts; ++i) { +SDValue Elt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::i16, Atomic, + DAG.getVectorIdxConstant(i, dl)); +Elt = DAG.getBitcast(ElemVT, Elt); +Ops.push_back(Elt); + } + SDValue Concat = DAG.getNode(ISD::CONCAT_VECTORS, dl, MemoryVT, Ops); + + ReplaceValueWith(SDValue(LD, 0), Concat); + ReplaceValueWith(SDValue(LD, 1), LD->getChain()); +} + void DAGTypeLegalizer::IncrementPointer(MemSDNode *N, EVT MemVT, MachinePointerInfo &MPI, SDValue &Ptr, uint64_t *ScaledOffset) { diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 935d058a52f8f..227bdbf9c0747 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -204,6 +204,76 @@ define <2 x float> @atomic_vec2_float_align(ptr %x) { ret <2 x float> %ret } +define <2 x half> @atomic_vec2_half(ptr %x) { +; CHECK3-LABEL: atomic_vec2_half: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movl (%rdi), %eax +; CHECK3-NEXT:movd %eax, %xmm1 +; CHECK3-NEXT:shrl $16, %eax +; CHECK3-NEXT:pinsrw $0, %eax, %xmm2 +; CHECK3-NEXT:movdqa {{.*#+}} xmm0 = [65535,0,65535,65535,65535,65535,65535,65535] +; CHECK3-NEXT:pand %xmm0, %xmm1 +; CHECK3-NEXT:pslld $16, %xmm2 +; CHECK3-NEXT:pandn %xmm2, %xmm0 +; CHECK3-NEXT:por %xmm1, %xmm0 +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec2_half: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movl (%rdi), %eax +; CHECK0-NEXT:movl %eax, %ecx +; CHECK0-NEXT:shrl $16, %ecx +; CHECK0-NEXT:movw %cx, %dx +; CHECK0-NEXT:## implicit-def: $ecx +; CHECK0-NEXT:movw %dx, %cx +; CHECK0-NEXT:## implicit-def: $xmm2 +; CHEC
[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120716 >From be154f89c03bf89ee99d213ca954c77c44fc4e9c Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 20 Dec 2024 06:14:28 -0500 Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector AtomicExpand fails for aligned `load atomic ` because it does not find a compatible library call. This change adds appropriate bitcasts so that the call can be lowered. commit-id:f430c1af --- llvm/lib/CodeGen/AtomicExpandPass.cpp | 20 +- llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++ llvm/test/CodeGen/X86/atomic-load-store.ll| 30 + .../X86/expand-atomic-non-integer.ll | 65 +++ 4 files changed, 163 insertions(+), 3 deletions(-) diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp index c376de877ac7d..b6f1e9db9ce35 100644 --- a/llvm/lib/CodeGen/AtomicExpandPass.cpp +++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp @@ -2066,9 +2066,23 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall( I->replaceAllUsesWith(V); } else if (HasResult) { Value *V; -if (UseSizedLibcall) - V = Builder.CreateBitOrPointerCast(Result, I->getType()); -else { +if (UseSizedLibcall) { + // Add bitcasts from Result's scalar type to I's vector type + if (I->getType()->getScalarType()->isPointerTy() && + I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) { +unsigned AS = + cast(I->getType()->getScalarType())->getAddressSpace(); +ElementCount EC = cast(I->getType())->getElementCount(); +Value *BC = Builder.CreateBitCast( +Result, +VectorType::get(IntegerType::get(Ctx, DL.getPointerSizeInBits(AS)), +EC)); +Value *IntToPtr = Builder.CreateIntToPtr( +BC, VectorType::get(PointerType::get(Ctx, AS), EC)); +V = Builder.CreateBitOrPointerCast(IntToPtr, I->getType()); + } else +V = Builder.CreateBitOrPointerCast(Result, I->getType()); +} else { V = Builder.CreateAlignedLoad(I->getType(), AllocaResult, AllocaAlignment); Builder.CreateLifetimeEnd(AllocaResult, SizeVal64); diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll b/llvm/test/CodeGen/ARM/atomic-load-store.ll index 560dfde356c29..36c1305a7c5df 100644 --- a/llvm/test/CodeGen/ARM/atomic-load-store.ll +++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll @@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double %val1) { store atomic double %val1, ptr %ptr seq_cst, align 8 ret void } + +define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 { +; ARM-LABEL: atomic_vec1_ptr: +; ARM: @ %bb.0: +; ARM-NEXT:ldr r0, [r0] +; ARM-NEXT:dmb ish +; ARM-NEXT:bx lr +; +; ARMOPTNONE-LABEL: atomic_vec1_ptr: +; ARMOPTNONE: @ %bb.0: +; ARMOPTNONE-NEXT:ldr r0, [r0] +; ARMOPTNONE-NEXT:dmb ish +; ARMOPTNONE-NEXT:bx lr +; +; THUMBTWO-LABEL: atomic_vec1_ptr: +; THUMBTWO: @ %bb.0: +; THUMBTWO-NEXT:ldr r0, [r0] +; THUMBTWO-NEXT:dmb ish +; THUMBTWO-NEXT:bx lr +; +; THUMBONE-LABEL: atomic_vec1_ptr: +; THUMBONE: @ %bb.0: +; THUMBONE-NEXT:push {r7, lr} +; THUMBONE-NEXT:movs r1, #0 +; THUMBONE-NEXT:mov r2, r1 +; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4 +; THUMBONE-NEXT:pop {r7, pc} +; +; ARMV4-LABEL: atomic_vec1_ptr: +; ARMV4: @ %bb.0: +; ARMV4-NEXT:push {r11, lr} +; ARMV4-NEXT:mov r1, #2 +; ARMV4-NEXT:bl __atomic_load_4 +; ARMV4-NEXT:pop {r11, lr} +; ARMV4-NEXT:mov pc, lr +; +; ARMV6-LABEL: atomic_vec1_ptr: +; ARMV6: @ %bb.0: +; ARMV6-NEXT:mov r1, #0 +; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5 +; ARMV6-NEXT:ldr r0, [r0] +; ARMV6-NEXT:bx lr +; +; THUMBM-LABEL: atomic_vec1_ptr: +; THUMBM: @ %bb.0: +; THUMBM-NEXT:ldr r0, [r0] +; THUMBM-NEXT:dmb sy +; THUMBM-NEXT:bx lr + %ret = load atomic <1 x ptr>, ptr %x acquire, align 4 + ret <1 x ptr> %ret +} diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 08d0405345f57..4293df8c13571 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -371,6 +371,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind { ret <2 x i32> %ret } +define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec2_ptr_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:pushq %rax +; CHECK-NEXT:movl $2, %esi +; CHECK-NEXT:callq ___atomic_load_16 +; CHECK-NEXT:movq %rdx, %xmm1 +; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT:popq %rax +; CHECK-NEXT:retq + %ret = load atomic <2 x ptr>, ptr %x acquire, align 16 + ret <2 x ptr> %ret +} + define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind { ; CHECK3-LAB
[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120387 >From a60144bede07d2ca53f5f3a48009b10d10f8c082 Mon Sep 17 00:00:00 2001 From: jofrn Date: Wed, 18 Dec 2024 03:40:32 -0500 Subject: [PATCH] [X86] Add atomic vector tests for unaligned >1 sizes. Unaligned atomic vectors with size >1 are lowered to calls. Adding their tests separately here. commit-id:a06a5cc6 --- llvm/test/CodeGen/X86/atomic-load-store.ll | 253 + 1 file changed, 253 insertions(+) diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 6efcbb80c0ce6..39e9fdfa5e62b 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -146,6 +146,34 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind { ret <1 x i64> %ret } +define <1 x ptr> @atomic_vec1_ptr(ptr %x) nounwind { +; CHECK3-LABEL: atomic_vec1_ptr: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:pushq %rax +; CHECK3-NEXT:movq %rdi, %rsi +; CHECK3-NEXT:movq %rsp, %rdx +; CHECK3-NEXT:movl $8, %edi +; CHECK3-NEXT:movl $2, %ecx +; CHECK3-NEXT:callq ___atomic_load +; CHECK3-NEXT:movq (%rsp), %rax +; CHECK3-NEXT:popq %rcx +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_ptr: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:pushq %rax +; CHECK0-NEXT:movq %rdi, %rsi +; CHECK0-NEXT:movl $8, %edi +; CHECK0-NEXT:movq %rsp, %rdx +; CHECK0-NEXT:movl $2, %ecx +; CHECK0-NEXT:callq ___atomic_load +; CHECK0-NEXT:movq (%rsp), %rax +; CHECK0-NEXT:popq %rcx +; CHECK0-NEXT:retq + %ret = load atomic <1 x ptr>, ptr %x acquire, align 4 + ret <1 x ptr> %ret +} + define <1 x half> @atomic_vec1_half(ptr %x) { ; CHECK3-LABEL: atomic_vec1_half: ; CHECK3: ## %bb.0: @@ -182,3 +210,228 @@ define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind { %ret = load atomic <1 x double>, ptr %x acquire, align 8 ret <1 x double> %ret } + +define <1 x i64> @atomic_vec1_i64(ptr %x) nounwind { +; CHECK3-LABEL: atomic_vec1_i64: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:pushq %rax +; CHECK3-NEXT:movq %rdi, %rsi +; CHECK3-NEXT:movq %rsp, %rdx +; CHECK3-NEXT:movl $8, %edi +; CHECK3-NEXT:movl $2, %ecx +; CHECK3-NEXT:callq ___atomic_load +; CHECK3-NEXT:movq (%rsp), %rax +; CHECK3-NEXT:popq %rcx +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_i64: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:pushq %rax +; CHECK0-NEXT:movq %rdi, %rsi +; CHECK0-NEXT:movl $8, %edi +; CHECK0-NEXT:movq %rsp, %rdx +; CHECK0-NEXT:movl $2, %ecx +; CHECK0-NEXT:callq ___atomic_load +; CHECK0-NEXT:movq (%rsp), %rax +; CHECK0-NEXT:popq %rcx +; CHECK0-NEXT:retq + %ret = load atomic <1 x i64>, ptr %x acquire, align 4 + ret <1 x i64> %ret +} + +define <1 x double> @atomic_vec1_double(ptr %x) nounwind { +; CHECK3-LABEL: atomic_vec1_double: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:pushq %rax +; CHECK3-NEXT:movq %rdi, %rsi +; CHECK3-NEXT:movq %rsp, %rdx +; CHECK3-NEXT:movl $8, %edi +; CHECK3-NEXT:movl $2, %ecx +; CHECK3-NEXT:callq ___atomic_load +; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK3-NEXT:popq %rax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_double: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:pushq %rax +; CHECK0-NEXT:movq %rdi, %rsi +; CHECK0-NEXT:movl $8, %edi +; CHECK0-NEXT:movq %rsp, %rdx +; CHECK0-NEXT:movl $2, %ecx +; CHECK0-NEXT:callq ___atomic_load +; CHECK0-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK0-NEXT:popq %rax +; CHECK0-NEXT:retq + %ret = load atomic <1 x double>, ptr %x acquire, align 4 + ret <1 x double> %ret +} + +define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind { +; CHECK3-LABEL: atomic_vec2_i32: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:pushq %rax +; CHECK3-NEXT:movq %rdi, %rsi +; CHECK3-NEXT:movq %rsp, %rdx +; CHECK3-NEXT:movl $8, %edi +; CHECK3-NEXT:movl $2, %ecx +; CHECK3-NEXT:callq ___atomic_load +; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK3-NEXT:popq %rax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec2_i32: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:pushq %rax +; CHECK0-NEXT:movq %rdi, %rsi +; CHECK0-NEXT:movl $8, %edi +; CHECK0-NEXT:movq %rsp, %rdx +; CHECK0-NEXT:movl $2, %ecx +; CHECK0-NEXT:callq ___atomic_load +; CHECK0-NEXT:movq {{.*#+}} xmm0 = mem[0],zero +; CHECK0-NEXT:popq %rax +; CHECK0-NEXT:retq + %ret = load atomic <2 x i32>, ptr %x acquire, align 4 + ret <2 x i32> %ret +} + +define <4 x float> @atomic_vec4_float_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec4_float_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:pushq %rax +; CHECK-NEXT:movl $2, %esi +; CHECK-NEXT:callq ___atomic_load_16 +; CHECK-NEXT:movq %rdx, %xmm1 +; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120598 >From 867af9f33e61399964c51dd0c40dfcf37ec0a1e0 Mon Sep 17 00:00:00 2001 From: jofrn Date: Thu, 19 Dec 2024 11:19:39 -0500 Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load Vector types of 2 elements must be widened. This change does this for vector types of atomic load in SelectionDAG so that it can translate aligned vectors of >1 size. Also, it also adds Pats to remove an extra MOV. commit-id:2894ccd1 --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 104 ++ llvm/lib/Target/X86/X86InstrCompiler.td | 7 ++ llvm/test/CodeGen/X86/atomic-load-store.ll| 81 ++ llvm/test/CodeGen/X86/atomic-unordered.ll | 3 +- 5 files changed, 173 insertions(+), 23 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index 89ea7ef4dbe89..bdfa5f7741ad3 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -1062,6 +1062,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N); SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N); SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N); + SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N); SDValue WidenVecRes_LOAD(SDNode* N); SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N); SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N); diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index f9034c94bfb6d..23353b326b02c 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -4590,6 +4590,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, unsigned ResNo) { break; case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break; case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break; + case ISD::ATOMIC_LOAD: +Res = WidenVecRes_ATOMIC_LOAD(cast(N)); +break; case ISD::LOAD: Res = WidenVecRes_LOAD(N); break; case ISD::STEP_VECTOR: case ISD::SPLAT_VECTOR: @@ -5979,6 +5982,85 @@ SDValue DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) { N->getOperand(1), N->getOperand(2)); } +/// Either return the same load or provide appropriate casts +/// from the load and return that. +static SDValue loadElement(SDValue LdOp, EVT FirstVT, EVT WidenVT, + TypeSize LdWidth, TypeSize FirstVTWidth, SDLoc dl, + SelectionDAG &DAG) { + assert(TypeSize::isKnownLE(LdWidth, FirstVTWidth)); + TypeSize WidenWidth = WidenVT.getSizeInBits(); + if (!FirstVT.isVector()) { +unsigned NumElts = +WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue(); +EVT NewVecVT = EVT::getVectorVT(*DAG.getContext(), FirstVT, NumElts); +SDValue VecOp = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, NewVecVT, LdOp); +return DAG.getNode(ISD::BITCAST, dl, WidenVT, VecOp); + } else if (FirstVT == WidenVT) +return LdOp; + else { +// TODO: We don't currently have any tests that exercise this code path. +assert(WidenWidth.getFixedValue() % FirstVTWidth.getFixedValue() == 0); +unsigned NumConcat = +WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue(); +SmallVector ConcatOps(NumConcat); +SDValue UndefVal = DAG.getUNDEF(FirstVT); +ConcatOps[0] = LdOp; +for (unsigned i = 1; i != NumConcat; ++i) + ConcatOps[i] = UndefVal; +return DAG.getNode(ISD::CONCAT_VECTORS, dl, WidenVT, ConcatOps); + } +} + +static std::optional findMemType(SelectionDAG &DAG, + const TargetLowering &TLI, unsigned Width, + EVT WidenVT, unsigned Align, + unsigned WidenEx); + +SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *LD) { + EVT WidenVT = + TLI.getTypeToTransformTo(*DAG.getContext(),LD->getValueType(0)); + EVT LdVT = LD->getMemoryVT(); + SDLoc dl(LD); + assert(LdVT.isVector() && WidenVT.isVector() && "Expected vectors"); + assert(LdVT.isScalableVector() == WidenVT.isScalableVector() && + "Must be scalable"); + assert(LdVT.getVectorElementType() == WidenVT.getVectorElementType() && + "Expected equivalent element types"); + + // Load information + SDValue Chain = LD->getChain(); + SDValue BasePtr = LD->getBasePtr(); + MachineMemOperand::Flags MMOFlags = LD->getMemOperand()->getFlags(); + AAMDNodes AAInfo = LD->getAAInfo(); + + TypeSize LdWidth = LdVT.getSizeInBits(); + TypeSize WidenWidth = WidenVT.getSizeInBits(); + TypeSize WidthDiff = WidenWidth - LdWidth; + + // Find the vector type that can load from. + std::optional FirstVT = + fi
[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120385 >From 7f2115c386e164541ba579eefd87bdfbb45c2890 Mon Sep 17 00:00:00 2001 From: jofrn Date: Wed, 18 Dec 2024 03:37:17 -0500 Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load `load atomic <1 x T>` is not valid. This change legalizes vector types of atomic load via scalarization in SelectionDAG so that it can, for example, translate from `v1i32` to `i32`. commit-id:5c36cc8c --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 15 +++ llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +- 3 files changed, 135 insertions(+), 2 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index 720393158aa5e..89ea7ef4dbe89 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -874,6 +874,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N); SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N); SDValue ScalarizeVecRes_LOAD(LoadSDNode *N); + SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N); SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N); SDValue ScalarizeVecRes_VSELECT(SDNode *N); SDValue ScalarizeVecRes_SELECT(SDNode *N); diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index a01e1cff74564..f9034c94bfb6d 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, unsigned ResNo) { R = ScalarizeVecRes_UnaryOpWithExtraInput(N); break; case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break; + case ISD::ATOMIC_LOAD: +R = ScalarizeVecRes_ATOMIC_LOAD(cast(N)); +break; case ISD::LOAD: R = ScalarizeVecRes_LOAD(cast(N));break; case ISD::SCALAR_TO_VECTOR: R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break; case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break; @@ -458,6 +461,18 @@ SDValue DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) { return Op; } +SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) { + SDValue Result = DAG.getAtomicLoad( + ISD::NON_EXTLOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(), + N->getValueType(0).getVectorElementType(), N->getChain(), + N->getBasePtr(), N->getMemOperand()); + + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(N, 1), Result.getValue(1)); + return Result; +} + SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) { assert(N->isUnindexed() && "Indexed vector load?"); diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 5bce4401f7bdb..d23cfb89f9fc8 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -1,6 +1,6 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | FileCheck %s -; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | FileCheck %s +; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | FileCheck %s --check-prefixes=CHECK,CHECK3 +; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | FileCheck %s --check-prefixes=CHECK,CHECK0 define void @test1(ptr %ptr, i32 %val1) { ; CHECK-LABEL: test1: @@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) { %val = load atomic i32, ptr %ptr seq_cst, align 4 ret i32 %val } + +define <1 x i32> @atomic_vec1_i32(ptr %x) { +; CHECK-LABEL: atomic_vec1_i32: +; CHECK: ## %bb.0: +; CHECK-NEXT:movl (%rdi), %eax +; CHECK-NEXT:retq + %ret = load atomic <1 x i32>, ptr %x acquire, align 4 + ret <1 x i32> %ret +} + +define <1 x i8> @atomic_vec1_i8(ptr %x) { +; CHECK3-LABEL: atomic_vec1_i8: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movzbl (%rdi), %eax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_i8: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movb (%rdi), %al +; CHECK0-NEXT:retq + %ret = load atomic <1 x i8>, ptr %x acquire, align 1 + ret <1 x i8> %ret +} + +define <1 x i16> @atomic_vec1_i16(ptr %x) { +; CHECK3-LABEL: atomic_vec1_i16: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movzwl (%rdi), %eax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_i16: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movw (%rdi), %ax +; CHECK0-NEXT:retq + %ret = load atomic <1 x i16>, ptr %x acquire, align 2 + ret <1 x i16> %ret +} + +define <1 x i32> @atomic_vec1_i8_zext(ptr %x) { +; CHECK3-LABEL: atomic_vec
[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120385 >From 7f2115c386e164541ba579eefd87bdfbb45c2890 Mon Sep 17 00:00:00 2001 From: jofrn Date: Wed, 18 Dec 2024 03:37:17 -0500 Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load `load atomic <1 x T>` is not valid. This change legalizes vector types of atomic load via scalarization in SelectionDAG so that it can, for example, translate from `v1i32` to `i32`. commit-id:5c36cc8c --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 15 +++ llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +- 3 files changed, 135 insertions(+), 2 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index 720393158aa5e..89ea7ef4dbe89 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -874,6 +874,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N); SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N); SDValue ScalarizeVecRes_LOAD(LoadSDNode *N); + SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N); SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N); SDValue ScalarizeVecRes_VSELECT(SDNode *N); SDValue ScalarizeVecRes_SELECT(SDNode *N); diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index a01e1cff74564..f9034c94bfb6d 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, unsigned ResNo) { R = ScalarizeVecRes_UnaryOpWithExtraInput(N); break; case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break; + case ISD::ATOMIC_LOAD: +R = ScalarizeVecRes_ATOMIC_LOAD(cast(N)); +break; case ISD::LOAD: R = ScalarizeVecRes_LOAD(cast(N));break; case ISD::SCALAR_TO_VECTOR: R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break; case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break; @@ -458,6 +461,18 @@ SDValue DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) { return Op; } +SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) { + SDValue Result = DAG.getAtomicLoad( + ISD::NON_EXTLOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(), + N->getValueType(0).getVectorElementType(), N->getChain(), + N->getBasePtr(), N->getMemOperand()); + + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(N, 1), Result.getValue(1)); + return Result; +} + SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) { assert(N->isUnindexed() && "Indexed vector load?"); diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 5bce4401f7bdb..d23cfb89f9fc8 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -1,6 +1,6 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | FileCheck %s -; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | FileCheck %s +; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | FileCheck %s --check-prefixes=CHECK,CHECK3 +; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | FileCheck %s --check-prefixes=CHECK,CHECK0 define void @test1(ptr %ptr, i32 %val1) { ; CHECK-LABEL: test1: @@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) { %val = load atomic i32, ptr %ptr seq_cst, align 4 ret i32 %val } + +define <1 x i32> @atomic_vec1_i32(ptr %x) { +; CHECK-LABEL: atomic_vec1_i32: +; CHECK: ## %bb.0: +; CHECK-NEXT:movl (%rdi), %eax +; CHECK-NEXT:retq + %ret = load atomic <1 x i32>, ptr %x acquire, align 4 + ret <1 x i32> %ret +} + +define <1 x i8> @atomic_vec1_i8(ptr %x) { +; CHECK3-LABEL: atomic_vec1_i8: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movzbl (%rdi), %eax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_i8: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movb (%rdi), %al +; CHECK0-NEXT:retq + %ret = load atomic <1 x i8>, ptr %x acquire, align 1 + ret <1 x i8> %ret +} + +define <1 x i16> @atomic_vec1_i16(ptr %x) { +; CHECK3-LABEL: atomic_vec1_i16: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movzwl (%rdi), %eax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_i16: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movw (%rdi), %ax +; CHECK0-NEXT:retq + %ret = load atomic <1 x i16>, ptr %x acquire, align 2 + ret <1 x i16> %ret +} + +define <1 x i32> @atomic_vec1_i8_zext(ptr %x) { +; CHECK3-LABEL: atomic_vec
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Assert on map translation functions, NFC (PR #137199)
@@ -3623,6 +3623,9 @@ static llvm::omp::OpenMPOffloadMappingFlags mapParentWithMembers( LLVM::ModuleTranslation &moduleTranslation, llvm::IRBuilderBase &builder, llvm::OpenMPIRBuilder &ompBuilder, DataLayout &dl, MapInfosTy &combinedInfo, MapInfoData &mapData, uint64_t mapDataIndex, bool isTargetParams) { + assert(!ompBuilder.Config.isTargetDevice() && + "function only supported for host device codegen"); bhandarkar-pranav wrote: > Yes, I do agree that "host device" seems like a rather contradictory term, > since we generally just talk about "host" as the CPU and "device" as whatever > we're offloading to. The reason of using these terms is to align with OpenMP > terminology (5.2 spec, Section 1.2.1): > > > **host device** The _device_ on which the _OpenMP program_ begins execution. > > **target device** A _device_ with respect to which the current _device_ > > performs an operation, as specified by a _device construct_ or an OpenMP > > device memory routine. > > This is also why the `-fopenmp-is-target-device` flag is called that and not > `-fopenmp-is-device` (which [used to be its > name](https://reviews.llvm.org/D154591)). Thank you, good to know. So in this world every thing is a device and some devices are hosts and some are targets meant to be offloaded to. https://github.com/llvm/llvm-project/pull/137199 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [DirectX] Implementation of DXILResourceImplicitBinding pass (PR #138043)
https://github.com/hekota edited https://github.com/llvm/llvm-project/pull/138043 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [DirectX] Implement DXILResourceImplicitBinding pass (PR #138043)
https://github.com/hekota edited https://github.com/llvm/llvm-project/pull/138043 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [DirectX] Implement DXILResourceImplicitBinding pass (PR #138043)
https://github.com/hekota updated https://github.com/llvm/llvm-project/pull/138043 >From f58e2d5c079f31d7a4d99d481a569ad4c754c4e7 Mon Sep 17 00:00:00 2001 From: Helena Kotas Date: Wed, 30 Apr 2025 15:08:04 -0700 Subject: [PATCH 1/2] [HLSL] Implementation of DXILResourceImplicitBinding pass This pass takes advantage of the DXILResourceBinding analysis and assigns register slots to resources that do not have explicit binding. Part 2/2 of #136786 Closes #136786 --- llvm/include/llvm/Analysis/DXILResource.h | 8 + llvm/include/llvm/InitializePasses.h | 1 + llvm/lib/Analysis/Analysis.cpp| 1 + llvm/lib/Analysis/DXILResource.cpp| 53 + llvm/lib/Target/DirectX/CMakeLists.txt| 1 + .../DirectX/DXILResourceImplicitBinding.cpp | 182 ++ .../DirectX/DXILResourceImplicitBinding.h | 29 +++ llvm/lib/Target/DirectX/DirectX.h | 6 + .../Target/DirectX/DirectXPassRegistry.def| 1 + .../Target/DirectX/DirectXTargetMachine.cpp | 3 + .../CodeGen/DirectX/ImplicitBinding/arrays.ll | 41 .../ImplicitBinding/multiple-spaces.ll| 44 + .../CodeGen/DirectX/ImplicitBinding/simple.ll | 30 +++ .../ImplicitBinding/unbounded-arrays-error.ll | 34 .../ImplicitBinding/unbounded-arrays.ll | 42 llvm/test/CodeGen/DirectX/llc-pipeline.ll | 2 + 16 files changed, 478 insertions(+) create mode 100644 llvm/lib/Target/DirectX/DXILResourceImplicitBinding.cpp create mode 100644 llvm/lib/Target/DirectX/DXILResourceImplicitBinding.h create mode 100644 llvm/test/CodeGen/DirectX/ImplicitBinding/arrays.ll create mode 100644 llvm/test/CodeGen/DirectX/ImplicitBinding/multiple-spaces.ll create mode 100644 llvm/test/CodeGen/DirectX/ImplicitBinding/simple.ll create mode 100644 llvm/test/CodeGen/DirectX/ImplicitBinding/unbounded-arrays-error.ll create mode 100644 llvm/test/CodeGen/DirectX/ImplicitBinding/unbounded-arrays.ll diff --git a/llvm/include/llvm/Analysis/DXILResource.h b/llvm/include/llvm/Analysis/DXILResource.h index 569f5346b9e36..fd34fdfda4c1c 100644 --- a/llvm/include/llvm/Analysis/DXILResource.h +++ b/llvm/include/llvm/Analysis/DXILResource.h @@ -632,12 +632,15 @@ class DXILResourceBindingInfo { RegisterSpace(uint32_t Space) : Space(Space) { FreeRanges.emplace_back(0, UINT32_MAX); } +// Size == -1 means unbounded array +bool findAvailableBinding(int32_t Size, uint32_t *RegSlot); }; struct BindingSpaces { dxil::ResourceClass ResClass; llvm::SmallVector Spaces; BindingSpaces(dxil::ResourceClass ResClass) : ResClass(ResClass) {} +RegisterSpace &getOrInsertSpace(uint32_t Space); }; private: @@ -658,6 +661,7 @@ class DXILResourceBindingInfo { OverlappingBinding(false) {} bool hasImplicitBinding() const { return ImplicitBinding; } + void setHasImplicitBinding(bool Value) { ImplicitBinding = Value; } bool hasOverlappingBinding() const { return OverlappingBinding; } BindingSpaces &getBindingSpaces(dxil::ResourceClass RC) { @@ -673,6 +677,10 @@ class DXILResourceBindingInfo { } } + // Size == -1 means unbounded array + bool findAvailableBinding(dxil::ResourceClass RC, uint32_t Space, +int32_t Size, uint32_t *RegSlot); + friend class DXILResourceBindingAnalysis; friend class DXILResourceBindingWrapperPass; }; diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h index b84c6d6123d58..71db56d922676 100644 --- a/llvm/include/llvm/InitializePasses.h +++ b/llvm/include/llvm/InitializePasses.h @@ -85,6 +85,7 @@ void initializeDCELegacyPassPass(PassRegistry &); void initializeDXILMetadataAnalysisWrapperPassPass(PassRegistry &); void initializeDXILMetadataAnalysisWrapperPrinterPass(PassRegistry &); void initializeDXILResourceBindingWrapperPassPass(PassRegistry &); +void initializeDXILResourceImplicitBindingLegacyPass(PassRegistry &); void initializeDXILResourceTypeWrapperPassPass(PassRegistry &); void initializeDXILResourceWrapperPassPass(PassRegistry &); void initializeDeadMachineInstructionElimPass(PassRegistry &); diff --git a/llvm/lib/Analysis/Analysis.cpp b/llvm/lib/Analysis/Analysis.cpp index 484a456f49f1b..9f5daf32be9a0 100644 --- a/llvm/lib/Analysis/Analysis.cpp +++ b/llvm/lib/Analysis/Analysis.cpp @@ -27,6 +27,7 @@ void llvm::initializeAnalysis(PassRegistry &Registry) { initializeCallGraphViewerPass(Registry); initializeCycleInfoWrapperPassPass(Registry); initializeDXILMetadataAnalysisWrapperPassPass(Registry); + initializeDXILResourceWrapperPassPass(Registry); initializeDXILResourceBindingWrapperPassPass(Registry); initializeDXILResourceTypeWrapperPassPass(Registry); initializeDXILResourceWrapperPassPass(Registry); diff --git a/llvm/lib/Analysis/DXILResource.cpp b/llvm/lib/Analysis/DXILResource.cpp index ce8e250a32ebe..34355c81e77b0 100644 --- a/llvm/lib/Analysis/DXILResource.cpp +++ b/llvm/lib
[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120387 >From 6d045290b83d4abb2765e6d323d7a441aab6445c Mon Sep 17 00:00:00 2001 From: jofrn Date: Wed, 18 Dec 2024 03:40:32 -0500 Subject: [PATCH] [X86] Add atomic vector tests for unaligned >1 sizes. Unaligned atomic vectors with size >1 are lowered to calls. Adding their tests separately here. commit-id:a06a5cc6 --- llvm/test/CodeGen/X86/atomic-load-store.ll | 253 + 1 file changed, 253 insertions(+) diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 6efcbb80c0ce6..39e9fdfa5e62b 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -146,6 +146,34 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind { ret <1 x i64> %ret } +define <1 x ptr> @atomic_vec1_ptr(ptr %x) nounwind { +; CHECK3-LABEL: atomic_vec1_ptr: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:pushq %rax +; CHECK3-NEXT:movq %rdi, %rsi +; CHECK3-NEXT:movq %rsp, %rdx +; CHECK3-NEXT:movl $8, %edi +; CHECK3-NEXT:movl $2, %ecx +; CHECK3-NEXT:callq ___atomic_load +; CHECK3-NEXT:movq (%rsp), %rax +; CHECK3-NEXT:popq %rcx +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_ptr: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:pushq %rax +; CHECK0-NEXT:movq %rdi, %rsi +; CHECK0-NEXT:movl $8, %edi +; CHECK0-NEXT:movq %rsp, %rdx +; CHECK0-NEXT:movl $2, %ecx +; CHECK0-NEXT:callq ___atomic_load +; CHECK0-NEXT:movq (%rsp), %rax +; CHECK0-NEXT:popq %rcx +; CHECK0-NEXT:retq + %ret = load atomic <1 x ptr>, ptr %x acquire, align 4 + ret <1 x ptr> %ret +} + define <1 x half> @atomic_vec1_half(ptr %x) { ; CHECK3-LABEL: atomic_vec1_half: ; CHECK3: ## %bb.0: @@ -182,3 +210,228 @@ define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind { %ret = load atomic <1 x double>, ptr %x acquire, align 8 ret <1 x double> %ret } + +define <1 x i64> @atomic_vec1_i64(ptr %x) nounwind { +; CHECK3-LABEL: atomic_vec1_i64: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:pushq %rax +; CHECK3-NEXT:movq %rdi, %rsi +; CHECK3-NEXT:movq %rsp, %rdx +; CHECK3-NEXT:movl $8, %edi +; CHECK3-NEXT:movl $2, %ecx +; CHECK3-NEXT:callq ___atomic_load +; CHECK3-NEXT:movq (%rsp), %rax +; CHECK3-NEXT:popq %rcx +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_i64: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:pushq %rax +; CHECK0-NEXT:movq %rdi, %rsi +; CHECK0-NEXT:movl $8, %edi +; CHECK0-NEXT:movq %rsp, %rdx +; CHECK0-NEXT:movl $2, %ecx +; CHECK0-NEXT:callq ___atomic_load +; CHECK0-NEXT:movq (%rsp), %rax +; CHECK0-NEXT:popq %rcx +; CHECK0-NEXT:retq + %ret = load atomic <1 x i64>, ptr %x acquire, align 4 + ret <1 x i64> %ret +} + +define <1 x double> @atomic_vec1_double(ptr %x) nounwind { +; CHECK3-LABEL: atomic_vec1_double: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:pushq %rax +; CHECK3-NEXT:movq %rdi, %rsi +; CHECK3-NEXT:movq %rsp, %rdx +; CHECK3-NEXT:movl $8, %edi +; CHECK3-NEXT:movl $2, %ecx +; CHECK3-NEXT:callq ___atomic_load +; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK3-NEXT:popq %rax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_double: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:pushq %rax +; CHECK0-NEXT:movq %rdi, %rsi +; CHECK0-NEXT:movl $8, %edi +; CHECK0-NEXT:movq %rsp, %rdx +; CHECK0-NEXT:movl $2, %ecx +; CHECK0-NEXT:callq ___atomic_load +; CHECK0-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK0-NEXT:popq %rax +; CHECK0-NEXT:retq + %ret = load atomic <1 x double>, ptr %x acquire, align 4 + ret <1 x double> %ret +} + +define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind { +; CHECK3-LABEL: atomic_vec2_i32: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:pushq %rax +; CHECK3-NEXT:movq %rdi, %rsi +; CHECK3-NEXT:movq %rsp, %rdx +; CHECK3-NEXT:movl $8, %edi +; CHECK3-NEXT:movl $2, %ecx +; CHECK3-NEXT:callq ___atomic_load +; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK3-NEXT:popq %rax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec2_i32: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:pushq %rax +; CHECK0-NEXT:movq %rdi, %rsi +; CHECK0-NEXT:movl $8, %edi +; CHECK0-NEXT:movq %rsp, %rdx +; CHECK0-NEXT:movl $2, %ecx +; CHECK0-NEXT:callq ___atomic_load +; CHECK0-NEXT:movq {{.*#+}} xmm0 = mem[0],zero +; CHECK0-NEXT:popq %rax +; CHECK0-NEXT:retq + %ret = load atomic <2 x i32>, ptr %x acquire, align 4 + ret <2 x i32> %ret +} + +define <4 x float> @atomic_vec4_float_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec4_float_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:pushq %rax +; CHECK-NEXT:movl $2, %esi +; CHECK-NEXT:callq ___atomic_load_16 +; CHECK-NEXT:movq %rdx, %xmm1 +; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/125432 >From abb646bbd716e2b12e10961c80c28dcfde8d66f5 Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 31 Jan 2025 13:12:56 -0500 Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic vector. After splitting, all elements are created. The elements are placed back into a concat_vectors. This change extends EltsFromConsecutiveLoads to understand AtomicSDNode so that its concat_vectors can be mapped to a BUILD_VECTOR and so unused elements are no longer referenced. commit-id:b83937a8 --- llvm/include/llvm/CodeGen/SelectionDAG.h | 4 +- .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 20 ++- .../SelectionDAGAddressAnalysis.cpp | 30 ++-- .../SelectionDAG/SelectionDAGBuilder.cpp | 6 +- llvm/lib/Target/X86/X86ISelLowering.cpp | 29 +-- llvm/test/CodeGen/X86/atomic-load-store.ll| 167 ++ 6 files changed, 69 insertions(+), 187 deletions(-) diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h index ba11ddbb5b731..d3cd81c146280 100644 --- a/llvm/include/llvm/CodeGen/SelectionDAG.h +++ b/llvm/include/llvm/CodeGen/SelectionDAG.h @@ -1843,7 +1843,7 @@ class SelectionDAG { /// chain to the token factor. This ensures that the new memory node will have /// the same relative memory dependency position as the old load. Returns the /// new merged load chain. - SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp); + SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp); /// Topological-sort the AllNodes list and a /// assign a unique node id for each node in the DAG based on their @@ -2281,7 +2281,7 @@ class SelectionDAG { /// merged. Check that both are nonvolatile and if LD is loading /// 'Bytes' bytes from a location that is 'Dist' units away from the /// location that the 'Base' load is loading from. - bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base, + bool areNonVolatileConsecutiveLoads(MemSDNode *LD, MemSDNode *Base, unsigned Bytes, int Dist) const; /// Infer alignment of a load / store address. Return std::nullopt if it diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index d9d6d43c42430..e4d7e176c51e4 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -12213,7 +12213,7 @@ SDValue SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain, return TokenFactor; } -SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, +SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp) { assert(isa(NewMemOp.getNode()) && "Expected a memop node"); SDValue OldChain = SDValue(OldLoad, 1); @@ -12906,17 +12906,21 @@ std::pair SelectionDAG::UnrollVectorOverflowOp( getBuildVector(NewOvVT, dl, OvScalars)); } -bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD, - LoadSDNode *Base, +bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD, + MemSDNode *Base, unsigned Bytes, int Dist) const { if (LD->isVolatile() || Base->isVolatile()) return false; - // TODO: probably too restrictive for atomics, revisit - if (!LD->isSimple()) -return false; - if (LD->isIndexed() || Base->isIndexed()) -return false; + if (auto Ld = dyn_cast(LD)) { +if (!Ld->isSimple()) + return false; +if (Ld->isIndexed()) + return false; + } + if (auto Ld = dyn_cast(Base)) +if (Ld->isIndexed()) + return false; if (LD->getChain() != Base->getChain()) return false; EVT VT = LD->getMemoryVT(); diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp index f2ab88851b780..c29cb424c7a4c 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp @@ -195,8 +195,8 @@ bool BaseIndexOffset::contains(const SelectionDAG &DAG, int64_t BitSize, } /// Parses tree in Ptr for base, index, offset addresses. -static BaseIndexOffset matchLSNode(const LSBaseSDNode *N, - const SelectionDAG &DAG) { +template +static BaseIndexOffset matchSDNode(const T *N, const SelectionDAG &DAG) { SDValue Ptr = N->getBasePtr(); // (((B + I*M) + c)) + c ... @@ -206,16 +206,18 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N, bool IsIndexSignExt = false; // pre-inc/pre-dec ops are components of EA. - if (N->get
[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120716 >From 855430ef05b0a8cfe6696ec62f827cab6d663655 Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 20 Dec 2024 06:14:28 -0500 Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector AtomicExpand fails for aligned `load atomic ` because it does not find a compatible library call. This change adds appropriate bitcasts so that the call can be lowered. commit-id:f430c1af --- llvm/lib/CodeGen/AtomicExpandPass.cpp | 20 +- llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++ llvm/test/CodeGen/X86/atomic-load-store.ll| 30 + .../X86/expand-atomic-non-integer.ll | 65 +++ 4 files changed, 163 insertions(+), 3 deletions(-) diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp index c376de877ac7d..b6f1e9db9ce35 100644 --- a/llvm/lib/CodeGen/AtomicExpandPass.cpp +++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp @@ -2066,9 +2066,23 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall( I->replaceAllUsesWith(V); } else if (HasResult) { Value *V; -if (UseSizedLibcall) - V = Builder.CreateBitOrPointerCast(Result, I->getType()); -else { +if (UseSizedLibcall) { + // Add bitcasts from Result's scalar type to I's vector type + if (I->getType()->getScalarType()->isPointerTy() && + I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) { +unsigned AS = + cast(I->getType()->getScalarType())->getAddressSpace(); +ElementCount EC = cast(I->getType())->getElementCount(); +Value *BC = Builder.CreateBitCast( +Result, +VectorType::get(IntegerType::get(Ctx, DL.getPointerSizeInBits(AS)), +EC)); +Value *IntToPtr = Builder.CreateIntToPtr( +BC, VectorType::get(PointerType::get(Ctx, AS), EC)); +V = Builder.CreateBitOrPointerCast(IntToPtr, I->getType()); + } else +V = Builder.CreateBitOrPointerCast(Result, I->getType()); +} else { V = Builder.CreateAlignedLoad(I->getType(), AllocaResult, AllocaAlignment); Builder.CreateLifetimeEnd(AllocaResult, SizeVal64); diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll b/llvm/test/CodeGen/ARM/atomic-load-store.ll index 560dfde356c29..36c1305a7c5df 100644 --- a/llvm/test/CodeGen/ARM/atomic-load-store.ll +++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll @@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double %val1) { store atomic double %val1, ptr %ptr seq_cst, align 8 ret void } + +define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 { +; ARM-LABEL: atomic_vec1_ptr: +; ARM: @ %bb.0: +; ARM-NEXT:ldr r0, [r0] +; ARM-NEXT:dmb ish +; ARM-NEXT:bx lr +; +; ARMOPTNONE-LABEL: atomic_vec1_ptr: +; ARMOPTNONE: @ %bb.0: +; ARMOPTNONE-NEXT:ldr r0, [r0] +; ARMOPTNONE-NEXT:dmb ish +; ARMOPTNONE-NEXT:bx lr +; +; THUMBTWO-LABEL: atomic_vec1_ptr: +; THUMBTWO: @ %bb.0: +; THUMBTWO-NEXT:ldr r0, [r0] +; THUMBTWO-NEXT:dmb ish +; THUMBTWO-NEXT:bx lr +; +; THUMBONE-LABEL: atomic_vec1_ptr: +; THUMBONE: @ %bb.0: +; THUMBONE-NEXT:push {r7, lr} +; THUMBONE-NEXT:movs r1, #0 +; THUMBONE-NEXT:mov r2, r1 +; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4 +; THUMBONE-NEXT:pop {r7, pc} +; +; ARMV4-LABEL: atomic_vec1_ptr: +; ARMV4: @ %bb.0: +; ARMV4-NEXT:push {r11, lr} +; ARMV4-NEXT:mov r1, #2 +; ARMV4-NEXT:bl __atomic_load_4 +; ARMV4-NEXT:pop {r11, lr} +; ARMV4-NEXT:mov pc, lr +; +; ARMV6-LABEL: atomic_vec1_ptr: +; ARMV6: @ %bb.0: +; ARMV6-NEXT:mov r1, #0 +; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5 +; ARMV6-NEXT:ldr r0, [r0] +; ARMV6-NEXT:bx lr +; +; THUMBM-LABEL: atomic_vec1_ptr: +; THUMBM: @ %bb.0: +; THUMBM-NEXT:ldr r0, [r0] +; THUMBM-NEXT:dmb sy +; THUMBM-NEXT:bx lr + %ret = load atomic <1 x ptr>, ptr %x acquire, align 4 + ret <1 x ptr> %ret +} diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 08d0405345f57..4293df8c13571 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -371,6 +371,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind { ret <2 x i32> %ret } +define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec2_ptr_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:pushq %rax +; CHECK-NEXT:movl $2, %esi +; CHECK-NEXT:callq ___atomic_load_16 +; CHECK-NEXT:movq %rdx, %xmm1 +; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT:popq %rax +; CHECK-NEXT:retq + %ret = load atomic <2 x ptr>, ptr %x acquire, align 16 + ret <2 x ptr> %ret +} + define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind { ; CHECK3-LAB
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120598 >From 5bc5d3291ad26fa3af646e45bd5d491d03588d98 Mon Sep 17 00:00:00 2001 From: jofrn Date: Thu, 19 Dec 2024 11:19:39 -0500 Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load Vector types of 2 elements must be widened. This change does this for vector types of atomic load in SelectionDAG so that it can translate aligned vectors of >1 size. Also, it also adds Pats to remove an extra MOV. commit-id:2894ccd1 --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 104 ++ llvm/lib/Target/X86/X86InstrCompiler.td | 7 ++ llvm/test/CodeGen/X86/atomic-load-store.ll| 81 ++ llvm/test/CodeGen/X86/atomic-unordered.ll | 3 +- 5 files changed, 173 insertions(+), 23 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index 89ea7ef4dbe89..bdfa5f7741ad3 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -1062,6 +1062,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N); SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N); SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N); + SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N); SDValue WidenVecRes_LOAD(SDNode* N); SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N); SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N); diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index 5126d597161d4..43710b77b763b 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -4590,6 +4590,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, unsigned ResNo) { break; case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break; case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break; + case ISD::ATOMIC_LOAD: +Res = WidenVecRes_ATOMIC_LOAD(cast(N)); +break; case ISD::LOAD: Res = WidenVecRes_LOAD(N); break; case ISD::STEP_VECTOR: case ISD::SPLAT_VECTOR: @@ -5979,6 +5982,85 @@ SDValue DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) { N->getOperand(1), N->getOperand(2)); } +/// Either return the same load or provide appropriate casts +/// from the load and return that. +static SDValue loadElement(SDValue LdOp, EVT FirstVT, EVT WidenVT, + TypeSize LdWidth, TypeSize FirstVTWidth, SDLoc dl, + SelectionDAG &DAG) { + assert(TypeSize::isKnownLE(LdWidth, FirstVTWidth)); + TypeSize WidenWidth = WidenVT.getSizeInBits(); + if (!FirstVT.isVector()) { +unsigned NumElts = +WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue(); +EVT NewVecVT = EVT::getVectorVT(*DAG.getContext(), FirstVT, NumElts); +SDValue VecOp = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, NewVecVT, LdOp); +return DAG.getNode(ISD::BITCAST, dl, WidenVT, VecOp); + } else if (FirstVT == WidenVT) +return LdOp; + else { +// TODO: We don't currently have any tests that exercise this code path. +assert(WidenWidth.getFixedValue() % FirstVTWidth.getFixedValue() == 0); +unsigned NumConcat = +WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue(); +SmallVector ConcatOps(NumConcat); +SDValue UndefVal = DAG.getUNDEF(FirstVT); +ConcatOps[0] = LdOp; +for (unsigned i = 1; i != NumConcat; ++i) + ConcatOps[i] = UndefVal; +return DAG.getNode(ISD::CONCAT_VECTORS, dl, WidenVT, ConcatOps); + } +} + +static std::optional findMemType(SelectionDAG &DAG, + const TargetLowering &TLI, unsigned Width, + EVT WidenVT, unsigned Align, + unsigned WidenEx); + +SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *LD) { + EVT WidenVT = + TLI.getTypeToTransformTo(*DAG.getContext(), LD->getValueType(0)); + EVT LdVT = LD->getMemoryVT(); + SDLoc dl(LD); + assert(LdVT.isVector() && WidenVT.isVector() && "Expected vectors"); + assert(LdVT.isScalableVector() == WidenVT.isScalableVector() && + "Must be scalable"); + assert(LdVT.getVectorElementType() == WidenVT.getVectorElementType() && + "Expected equivalent element types"); + + // Load information + SDValue Chain = LD->getChain(); + SDValue BasePtr = LD->getBasePtr(); + MachineMemOperand::Flags MMOFlags = LD->getMemOperand()->getFlags(); + AAMDNodes AAInfo = LD->getAAInfo(); + + TypeSize LdWidth = LdVT.getSizeInBits(); + TypeSize WidenWidth = WidenVT.getSizeInBits(); + TypeSize WidthDiff = WidenWidth - LdWidth; + + // Find the vector type that can load from. + std::optional FirstVT = +
[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120386 >From 0824a27d19d3a26eaf9d1ce0b04cf09dbf3cab4b Mon Sep 17 00:00:00 2001 From: jofrn Date: Wed, 18 Dec 2024 03:38:23 -0500 Subject: [PATCH] [X86] Manage atomic load of fp -> int promotion in DAG When lowering atomic <1 x T> vector types with floats, selection can fail since this pattern is unsupported. To support this, floats can be casted to an integer type of the same size. commit-id:f9d761c5 --- llvm/lib/Target/X86/X86ISelLowering.cpp| 4 +++ llvm/test/CodeGen/X86/atomic-load-store.ll | 37 ++ 2 files changed, 41 insertions(+) diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 483aceb239b0c..c48344332d707 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -2651,6 +2651,10 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM, setOperationAction(Op, MVT::f32, Promote); } + setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f16, MVT::i16); + setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f32, MVT::i32); + setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f64, MVT::i64); + // We have target-specific dag combine patterns for the following nodes: setTargetDAGCombine({ISD::VECTOR_SHUFFLE, ISD::SCALAR_TO_VECTOR, diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index d23cfb89f9fc8..6efcbb80c0ce6 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -145,3 +145,40 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind { %ret = load atomic <1 x i64>, ptr %x acquire, align 8 ret <1 x i64> %ret } + +define <1 x half> @atomic_vec1_half(ptr %x) { +; CHECK3-LABEL: atomic_vec1_half: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movzwl (%rdi), %eax +; CHECK3-NEXT:pinsrw $0, %eax, %xmm0 +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_half: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movw (%rdi), %cx +; CHECK0-NEXT:## implicit-def: $eax +; CHECK0-NEXT:movw %cx, %ax +; CHECK0-NEXT:## implicit-def: $xmm0 +; CHECK0-NEXT:pinsrw $0, %eax, %xmm0 +; CHECK0-NEXT:retq + %ret = load atomic <1 x half>, ptr %x acquire, align 2 + ret <1 x half> %ret +} + +define <1 x float> @atomic_vec1_float(ptr %x) { +; CHECK-LABEL: atomic_vec1_float: +; CHECK: ## %bb.0: +; CHECK-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK-NEXT:retq + %ret = load atomic <1 x float>, ptr %x acquire, align 4 + ret <1 x float> %ret +} + +define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec1_double_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK-NEXT:retq + %ret = load atomic <1 x double>, ptr %x acquire, align 8 + ret <1 x double> %ret +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120716 >From 855430ef05b0a8cfe6696ec62f827cab6d663655 Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 20 Dec 2024 06:14:28 -0500 Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector AtomicExpand fails for aligned `load atomic ` because it does not find a compatible library call. This change adds appropriate bitcasts so that the call can be lowered. commit-id:f430c1af --- llvm/lib/CodeGen/AtomicExpandPass.cpp | 20 +- llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++ llvm/test/CodeGen/X86/atomic-load-store.ll| 30 + .../X86/expand-atomic-non-integer.ll | 65 +++ 4 files changed, 163 insertions(+), 3 deletions(-) diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp index c376de877ac7d..b6f1e9db9ce35 100644 --- a/llvm/lib/CodeGen/AtomicExpandPass.cpp +++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp @@ -2066,9 +2066,23 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall( I->replaceAllUsesWith(V); } else if (HasResult) { Value *V; -if (UseSizedLibcall) - V = Builder.CreateBitOrPointerCast(Result, I->getType()); -else { +if (UseSizedLibcall) { + // Add bitcasts from Result's scalar type to I's vector type + if (I->getType()->getScalarType()->isPointerTy() && + I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) { +unsigned AS = + cast(I->getType()->getScalarType())->getAddressSpace(); +ElementCount EC = cast(I->getType())->getElementCount(); +Value *BC = Builder.CreateBitCast( +Result, +VectorType::get(IntegerType::get(Ctx, DL.getPointerSizeInBits(AS)), +EC)); +Value *IntToPtr = Builder.CreateIntToPtr( +BC, VectorType::get(PointerType::get(Ctx, AS), EC)); +V = Builder.CreateBitOrPointerCast(IntToPtr, I->getType()); + } else +V = Builder.CreateBitOrPointerCast(Result, I->getType()); +} else { V = Builder.CreateAlignedLoad(I->getType(), AllocaResult, AllocaAlignment); Builder.CreateLifetimeEnd(AllocaResult, SizeVal64); diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll b/llvm/test/CodeGen/ARM/atomic-load-store.ll index 560dfde356c29..36c1305a7c5df 100644 --- a/llvm/test/CodeGen/ARM/atomic-load-store.ll +++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll @@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double %val1) { store atomic double %val1, ptr %ptr seq_cst, align 8 ret void } + +define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 { +; ARM-LABEL: atomic_vec1_ptr: +; ARM: @ %bb.0: +; ARM-NEXT:ldr r0, [r0] +; ARM-NEXT:dmb ish +; ARM-NEXT:bx lr +; +; ARMOPTNONE-LABEL: atomic_vec1_ptr: +; ARMOPTNONE: @ %bb.0: +; ARMOPTNONE-NEXT:ldr r0, [r0] +; ARMOPTNONE-NEXT:dmb ish +; ARMOPTNONE-NEXT:bx lr +; +; THUMBTWO-LABEL: atomic_vec1_ptr: +; THUMBTWO: @ %bb.0: +; THUMBTWO-NEXT:ldr r0, [r0] +; THUMBTWO-NEXT:dmb ish +; THUMBTWO-NEXT:bx lr +; +; THUMBONE-LABEL: atomic_vec1_ptr: +; THUMBONE: @ %bb.0: +; THUMBONE-NEXT:push {r7, lr} +; THUMBONE-NEXT:movs r1, #0 +; THUMBONE-NEXT:mov r2, r1 +; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4 +; THUMBONE-NEXT:pop {r7, pc} +; +; ARMV4-LABEL: atomic_vec1_ptr: +; ARMV4: @ %bb.0: +; ARMV4-NEXT:push {r11, lr} +; ARMV4-NEXT:mov r1, #2 +; ARMV4-NEXT:bl __atomic_load_4 +; ARMV4-NEXT:pop {r11, lr} +; ARMV4-NEXT:mov pc, lr +; +; ARMV6-LABEL: atomic_vec1_ptr: +; ARMV6: @ %bb.0: +; ARMV6-NEXT:mov r1, #0 +; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5 +; ARMV6-NEXT:ldr r0, [r0] +; ARMV6-NEXT:bx lr +; +; THUMBM-LABEL: atomic_vec1_ptr: +; THUMBM: @ %bb.0: +; THUMBM-NEXT:ldr r0, [r0] +; THUMBM-NEXT:dmb sy +; THUMBM-NEXT:bx lr + %ret = load atomic <1 x ptr>, ptr %x acquire, align 4 + ret <1 x ptr> %ret +} diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 08d0405345f57..4293df8c13571 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -371,6 +371,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind { ret <2 x i32> %ret } +define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec2_ptr_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:pushq %rax +; CHECK-NEXT:movl $2, %esi +; CHECK-NEXT:callq ___atomic_load_16 +; CHECK-NEXT:movq %rdx, %xmm1 +; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT:popq %rax +; CHECK-NEXT:retq + %ret = load atomic <2 x ptr>, ptr %x acquire, align 16 + ret <2 x ptr> %ret +} + define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind { ; CHECK3-LAB
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120598 >From 5bc5d3291ad26fa3af646e45bd5d491d03588d98 Mon Sep 17 00:00:00 2001 From: jofrn Date: Thu, 19 Dec 2024 11:19:39 -0500 Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load Vector types of 2 elements must be widened. This change does this for vector types of atomic load in SelectionDAG so that it can translate aligned vectors of >1 size. Also, it also adds Pats to remove an extra MOV. commit-id:2894ccd1 --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 104 ++ llvm/lib/Target/X86/X86InstrCompiler.td | 7 ++ llvm/test/CodeGen/X86/atomic-load-store.ll| 81 ++ llvm/test/CodeGen/X86/atomic-unordered.ll | 3 +- 5 files changed, 173 insertions(+), 23 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index 89ea7ef4dbe89..bdfa5f7741ad3 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -1062,6 +1062,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N); SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N); SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N); + SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N); SDValue WidenVecRes_LOAD(SDNode* N); SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N); SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N); diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index 5126d597161d4..43710b77b763b 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -4590,6 +4590,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, unsigned ResNo) { break; case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break; case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break; + case ISD::ATOMIC_LOAD: +Res = WidenVecRes_ATOMIC_LOAD(cast(N)); +break; case ISD::LOAD: Res = WidenVecRes_LOAD(N); break; case ISD::STEP_VECTOR: case ISD::SPLAT_VECTOR: @@ -5979,6 +5982,85 @@ SDValue DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) { N->getOperand(1), N->getOperand(2)); } +/// Either return the same load or provide appropriate casts +/// from the load and return that. +static SDValue loadElement(SDValue LdOp, EVT FirstVT, EVT WidenVT, + TypeSize LdWidth, TypeSize FirstVTWidth, SDLoc dl, + SelectionDAG &DAG) { + assert(TypeSize::isKnownLE(LdWidth, FirstVTWidth)); + TypeSize WidenWidth = WidenVT.getSizeInBits(); + if (!FirstVT.isVector()) { +unsigned NumElts = +WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue(); +EVT NewVecVT = EVT::getVectorVT(*DAG.getContext(), FirstVT, NumElts); +SDValue VecOp = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, NewVecVT, LdOp); +return DAG.getNode(ISD::BITCAST, dl, WidenVT, VecOp); + } else if (FirstVT == WidenVT) +return LdOp; + else { +// TODO: We don't currently have any tests that exercise this code path. +assert(WidenWidth.getFixedValue() % FirstVTWidth.getFixedValue() == 0); +unsigned NumConcat = +WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue(); +SmallVector ConcatOps(NumConcat); +SDValue UndefVal = DAG.getUNDEF(FirstVT); +ConcatOps[0] = LdOp; +for (unsigned i = 1; i != NumConcat; ++i) + ConcatOps[i] = UndefVal; +return DAG.getNode(ISD::CONCAT_VECTORS, dl, WidenVT, ConcatOps); + } +} + +static std::optional findMemType(SelectionDAG &DAG, + const TargetLowering &TLI, unsigned Width, + EVT WidenVT, unsigned Align, + unsigned WidenEx); + +SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *LD) { + EVT WidenVT = + TLI.getTypeToTransformTo(*DAG.getContext(), LD->getValueType(0)); + EVT LdVT = LD->getMemoryVT(); + SDLoc dl(LD); + assert(LdVT.isVector() && WidenVT.isVector() && "Expected vectors"); + assert(LdVT.isScalableVector() == WidenVT.isScalableVector() && + "Must be scalable"); + assert(LdVT.getVectorElementType() == WidenVT.getVectorElementType() && + "Expected equivalent element types"); + + // Load information + SDValue Chain = LD->getChain(); + SDValue BasePtr = LD->getBasePtr(); + MachineMemOperand::Flags MMOFlags = LD->getMemOperand()->getFlags(); + AAMDNodes AAInfo = LD->getAAInfo(); + + TypeSize LdWidth = LdVT.getSizeInBits(); + TypeSize WidenWidth = WidenVT.getSizeInBits(); + TypeSize WidthDiff = WidenWidth - LdWidth; + + // Find the vector type that can load from. + std::optional FirstVT = +
[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120385 >From 4a47d3fd2f29c66c2230f40d7cca8fdbd1238abd Mon Sep 17 00:00:00 2001 From: jofrn Date: Wed, 18 Dec 2024 03:37:17 -0500 Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load `load atomic <1 x T>` is not valid. This change legalizes vector types of atomic load via scalarization in SelectionDAG so that it can, for example, translate from `v1i32` to `i32`. commit-id:5c36cc8c --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 15 +++ llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +- 3 files changed, 135 insertions(+), 2 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index 720393158aa5e..89ea7ef4dbe89 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -874,6 +874,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N); SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N); SDValue ScalarizeVecRes_LOAD(LoadSDNode *N); + SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N); SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N); SDValue ScalarizeVecRes_VSELECT(SDNode *N); SDValue ScalarizeVecRes_SELECT(SDNode *N); diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index a01e1cff74564..5126d597161d4 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, unsigned ResNo) { R = ScalarizeVecRes_UnaryOpWithExtraInput(N); break; case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break; + case ISD::ATOMIC_LOAD: +R = ScalarizeVecRes_ATOMIC_LOAD(cast(N)); +break; case ISD::LOAD: R = ScalarizeVecRes_LOAD(cast(N));break; case ISD::SCALAR_TO_VECTOR: R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break; case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break; @@ -458,6 +461,18 @@ SDValue DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) { return Op; } +SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) { + SDValue Result = DAG.getAtomicLoad( + ISD::NON_EXTLOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(), + N->getValueType(0).getVectorElementType(), N->getChain(), N->getBasePtr(), + N->getMemOperand()); + + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(N, 1), Result.getValue(1)); + return Result; +} + SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) { assert(N->isUnindexed() && "Indexed vector load?"); diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 5bce4401f7bdb..d23cfb89f9fc8 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -1,6 +1,6 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | FileCheck %s -; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | FileCheck %s +; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | FileCheck %s --check-prefixes=CHECK,CHECK3 +; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | FileCheck %s --check-prefixes=CHECK,CHECK0 define void @test1(ptr %ptr, i32 %val1) { ; CHECK-LABEL: test1: @@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) { %val = load atomic i32, ptr %ptr seq_cst, align 4 ret i32 %val } + +define <1 x i32> @atomic_vec1_i32(ptr %x) { +; CHECK-LABEL: atomic_vec1_i32: +; CHECK: ## %bb.0: +; CHECK-NEXT:movl (%rdi), %eax +; CHECK-NEXT:retq + %ret = load atomic <1 x i32>, ptr %x acquire, align 4 + ret <1 x i32> %ret +} + +define <1 x i8> @atomic_vec1_i8(ptr %x) { +; CHECK3-LABEL: atomic_vec1_i8: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movzbl (%rdi), %eax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_i8: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movb (%rdi), %al +; CHECK0-NEXT:retq + %ret = load atomic <1 x i8>, ptr %x acquire, align 1 + ret <1 x i8> %ret +} + +define <1 x i16> @atomic_vec1_i16(ptr %x) { +; CHECK3-LABEL: atomic_vec1_i16: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movzwl (%rdi), %eax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_i16: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movw (%rdi), %ax +; CHECK0-NEXT:retq + %ret = load atomic <1 x i16>, ptr %x acquire, align 2 + ret <1 x i16> %ret +} + +define <1 x i32> @atomic_vec1_i8_zext(ptr %x) { +; CHECK3-LABEL: atomic_ve
[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120385 >From 4a47d3fd2f29c66c2230f40d7cca8fdbd1238abd Mon Sep 17 00:00:00 2001 From: jofrn Date: Wed, 18 Dec 2024 03:37:17 -0500 Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load `load atomic <1 x T>` is not valid. This change legalizes vector types of atomic load via scalarization in SelectionDAG so that it can, for example, translate from `v1i32` to `i32`. commit-id:5c36cc8c --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 15 +++ llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +- 3 files changed, 135 insertions(+), 2 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index 720393158aa5e..89ea7ef4dbe89 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -874,6 +874,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N); SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N); SDValue ScalarizeVecRes_LOAD(LoadSDNode *N); + SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N); SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N); SDValue ScalarizeVecRes_VSELECT(SDNode *N); SDValue ScalarizeVecRes_SELECT(SDNode *N); diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index a01e1cff74564..5126d597161d4 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, unsigned ResNo) { R = ScalarizeVecRes_UnaryOpWithExtraInput(N); break; case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break; + case ISD::ATOMIC_LOAD: +R = ScalarizeVecRes_ATOMIC_LOAD(cast(N)); +break; case ISD::LOAD: R = ScalarizeVecRes_LOAD(cast(N));break; case ISD::SCALAR_TO_VECTOR: R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break; case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break; @@ -458,6 +461,18 @@ SDValue DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) { return Op; } +SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) { + SDValue Result = DAG.getAtomicLoad( + ISD::NON_EXTLOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(), + N->getValueType(0).getVectorElementType(), N->getChain(), N->getBasePtr(), + N->getMemOperand()); + + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(N, 1), Result.getValue(1)); + return Result; +} + SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) { assert(N->isUnindexed() && "Indexed vector load?"); diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 5bce4401f7bdb..d23cfb89f9fc8 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -1,6 +1,6 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | FileCheck %s -; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | FileCheck %s +; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | FileCheck %s --check-prefixes=CHECK,CHECK3 +; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | FileCheck %s --check-prefixes=CHECK,CHECK0 define void @test1(ptr %ptr, i32 %val1) { ; CHECK-LABEL: test1: @@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) { %val = load atomic i32, ptr %ptr seq_cst, align 4 ret i32 %val } + +define <1 x i32> @atomic_vec1_i32(ptr %x) { +; CHECK-LABEL: atomic_vec1_i32: +; CHECK: ## %bb.0: +; CHECK-NEXT:movl (%rdi), %eax +; CHECK-NEXT:retq + %ret = load atomic <1 x i32>, ptr %x acquire, align 4 + ret <1 x i32> %ret +} + +define <1 x i8> @atomic_vec1_i8(ptr %x) { +; CHECK3-LABEL: atomic_vec1_i8: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movzbl (%rdi), %eax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_i8: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movb (%rdi), %al +; CHECK0-NEXT:retq + %ret = load atomic <1 x i8>, ptr %x acquire, align 1 + ret <1 x i8> %ret +} + +define <1 x i16> @atomic_vec1_i16(ptr %x) { +; CHECK3-LABEL: atomic_vec1_i16: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movzwl (%rdi), %eax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_i16: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movw (%rdi), %ax +; CHECK0-NEXT:retq + %ret = load atomic <1 x i16>, ptr %x acquire, align 2 + ret <1 x i16> %ret +} + +define <1 x i32> @atomic_vec1_i8_zext(ptr %x) { +; CHECK3-LABEL: atomic_ve
[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120386 >From 0824a27d19d3a26eaf9d1ce0b04cf09dbf3cab4b Mon Sep 17 00:00:00 2001 From: jofrn Date: Wed, 18 Dec 2024 03:38:23 -0500 Subject: [PATCH] [X86] Manage atomic load of fp -> int promotion in DAG When lowering atomic <1 x T> vector types with floats, selection can fail since this pattern is unsupported. To support this, floats can be casted to an integer type of the same size. commit-id:f9d761c5 --- llvm/lib/Target/X86/X86ISelLowering.cpp| 4 +++ llvm/test/CodeGen/X86/atomic-load-store.ll | 37 ++ 2 files changed, 41 insertions(+) diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 483aceb239b0c..c48344332d707 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -2651,6 +2651,10 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM, setOperationAction(Op, MVT::f32, Promote); } + setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f16, MVT::i16); + setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f32, MVT::i32); + setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f64, MVT::i64); + // We have target-specific dag combine patterns for the following nodes: setTargetDAGCombine({ISD::VECTOR_SHUFFLE, ISD::SCALAR_TO_VECTOR, diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index d23cfb89f9fc8..6efcbb80c0ce6 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -145,3 +145,40 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind { %ret = load atomic <1 x i64>, ptr %x acquire, align 8 ret <1 x i64> %ret } + +define <1 x half> @atomic_vec1_half(ptr %x) { +; CHECK3-LABEL: atomic_vec1_half: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movzwl (%rdi), %eax +; CHECK3-NEXT:pinsrw $0, %eax, %xmm0 +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_half: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movw (%rdi), %cx +; CHECK0-NEXT:## implicit-def: $eax +; CHECK0-NEXT:movw %cx, %ax +; CHECK0-NEXT:## implicit-def: $xmm0 +; CHECK0-NEXT:pinsrw $0, %eax, %xmm0 +; CHECK0-NEXT:retq + %ret = load atomic <1 x half>, ptr %x acquire, align 2 + ret <1 x half> %ret +} + +define <1 x float> @atomic_vec1_float(ptr %x) { +; CHECK-LABEL: atomic_vec1_float: +; CHECK: ## %bb.0: +; CHECK-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK-NEXT:retq + %ret = load atomic <1 x float>, ptr %x acquire, align 4 + ret <1 x float> %ret +} + +define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec1_double_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK-NEXT:retq + %ret = load atomic <1 x double>, ptr %x acquire, align 8 + ret <1 x double> %ret +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AutoUpgrade][AMDGPU] Adjust AS7 address width to 48 bits (PR #137418)
krzysz00 wrote: > I think we just have to pretend that virtual addresses or segment-relative > addresses are the only thing that matters. This is why I've been advocating for the index-based definition of ptrtoaddr. So there's no need to do this 48-bit thing on the assumption that the base of a buffer fat pointer is "aligned enough" (in most contrxts where I've seen this the start point is some sort of allocation that'll definitely be align(16) or better) https://github.com/llvm/llvm-project/pull/137418 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [HLSL] Implementation of DXILResourceImplicitBinding pass (PR #138043)
https://github.com/hekota created https://github.com/llvm/llvm-project/pull/138043 The `DXILResourceImplicitBinding` pass uses the results of `DXILResourceBindingAnalysis` to assigns register slots to resources that do not have explicit binding. It replaces all `llvm.dx.resource.handlefromimplicitbinding` calls with `llvm.dx.resource.handlefrombinding` using the newly assigned binding. If a binding cannot be found for a resource, the pass will raise a diagnostic error. Currently this diagnostic message does not include the resource name, which will be addressed in a separate task (#137868). Part 2/2 of #136786 Closes #136786 >From f58e2d5c079f31d7a4d99d481a569ad4c754c4e7 Mon Sep 17 00:00:00 2001 From: Helena Kotas Date: Wed, 30 Apr 2025 15:08:04 -0700 Subject: [PATCH] [HLSL] Implementation of DXILResourceImplicitBinding pass This pass takes advantage of the DXILResourceBinding analysis and assigns register slots to resources that do not have explicit binding. Part 2/2 of #136786 Closes #136786 --- llvm/include/llvm/Analysis/DXILResource.h | 8 + llvm/include/llvm/InitializePasses.h | 1 + llvm/lib/Analysis/Analysis.cpp| 1 + llvm/lib/Analysis/DXILResource.cpp| 53 + llvm/lib/Target/DirectX/CMakeLists.txt| 1 + .../DirectX/DXILResourceImplicitBinding.cpp | 182 ++ .../DirectX/DXILResourceImplicitBinding.h | 29 +++ llvm/lib/Target/DirectX/DirectX.h | 6 + .../Target/DirectX/DirectXPassRegistry.def| 1 + .../Target/DirectX/DirectXTargetMachine.cpp | 3 + .../CodeGen/DirectX/ImplicitBinding/arrays.ll | 41 .../ImplicitBinding/multiple-spaces.ll| 44 + .../CodeGen/DirectX/ImplicitBinding/simple.ll | 30 +++ .../ImplicitBinding/unbounded-arrays-error.ll | 34 .../ImplicitBinding/unbounded-arrays.ll | 42 llvm/test/CodeGen/DirectX/llc-pipeline.ll | 2 + 16 files changed, 478 insertions(+) create mode 100644 llvm/lib/Target/DirectX/DXILResourceImplicitBinding.cpp create mode 100644 llvm/lib/Target/DirectX/DXILResourceImplicitBinding.h create mode 100644 llvm/test/CodeGen/DirectX/ImplicitBinding/arrays.ll create mode 100644 llvm/test/CodeGen/DirectX/ImplicitBinding/multiple-spaces.ll create mode 100644 llvm/test/CodeGen/DirectX/ImplicitBinding/simple.ll create mode 100644 llvm/test/CodeGen/DirectX/ImplicitBinding/unbounded-arrays-error.ll create mode 100644 llvm/test/CodeGen/DirectX/ImplicitBinding/unbounded-arrays.ll diff --git a/llvm/include/llvm/Analysis/DXILResource.h b/llvm/include/llvm/Analysis/DXILResource.h index 569f5346b9e36..fd34fdfda4c1c 100644 --- a/llvm/include/llvm/Analysis/DXILResource.h +++ b/llvm/include/llvm/Analysis/DXILResource.h @@ -632,12 +632,15 @@ class DXILResourceBindingInfo { RegisterSpace(uint32_t Space) : Space(Space) { FreeRanges.emplace_back(0, UINT32_MAX); } +// Size == -1 means unbounded array +bool findAvailableBinding(int32_t Size, uint32_t *RegSlot); }; struct BindingSpaces { dxil::ResourceClass ResClass; llvm::SmallVector Spaces; BindingSpaces(dxil::ResourceClass ResClass) : ResClass(ResClass) {} +RegisterSpace &getOrInsertSpace(uint32_t Space); }; private: @@ -658,6 +661,7 @@ class DXILResourceBindingInfo { OverlappingBinding(false) {} bool hasImplicitBinding() const { return ImplicitBinding; } + void setHasImplicitBinding(bool Value) { ImplicitBinding = Value; } bool hasOverlappingBinding() const { return OverlappingBinding; } BindingSpaces &getBindingSpaces(dxil::ResourceClass RC) { @@ -673,6 +677,10 @@ class DXILResourceBindingInfo { } } + // Size == -1 means unbounded array + bool findAvailableBinding(dxil::ResourceClass RC, uint32_t Space, +int32_t Size, uint32_t *RegSlot); + friend class DXILResourceBindingAnalysis; friend class DXILResourceBindingWrapperPass; }; diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h index b84c6d6123d58..71db56d922676 100644 --- a/llvm/include/llvm/InitializePasses.h +++ b/llvm/include/llvm/InitializePasses.h @@ -85,6 +85,7 @@ void initializeDCELegacyPassPass(PassRegistry &); void initializeDXILMetadataAnalysisWrapperPassPass(PassRegistry &); void initializeDXILMetadataAnalysisWrapperPrinterPass(PassRegistry &); void initializeDXILResourceBindingWrapperPassPass(PassRegistry &); +void initializeDXILResourceImplicitBindingLegacyPass(PassRegistry &); void initializeDXILResourceTypeWrapperPassPass(PassRegistry &); void initializeDXILResourceWrapperPassPass(PassRegistry &); void initializeDeadMachineInstructionElimPass(PassRegistry &); diff --git a/llvm/lib/Analysis/Analysis.cpp b/llvm/lib/Analysis/Analysis.cpp index 484a456f49f1b..9f5daf32be9a0 100644 --- a/llvm/lib/Analysis/Analysis.cpp +++ b/llvm/lib/Analysis/Analysis.cpp @@ -27,6 +27,7 @@ void llvm::initializeAnalysis(PassRegistry &Registry) {
[llvm-branch-commits] [llvm] [HLSL] Implementation of DXILResourceImplicitBinding pass (PR #138043)
llvmbot wrote: @llvm/pr-subscribers-backend-directx Author: Helena Kotas (hekota) Changes The `DXILResourceImplicitBinding` pass uses the results of `DXILResourceBindingAnalysis` to assigns register slots to resources that do not have explicit binding. It replaces all `llvm.dx.resource.handlefromimplicitbinding` calls with `llvm.dx.resource.handlefrombinding` using the newly assigned binding. If a binding cannot be found for a resource, the pass will raise a diagnostic error. Currently this diagnostic message does not include the resource name, which will be addressed in a separate task (#137868). Part 2/2 of #136786 Closes #136786 --- Patch is 26.40 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/138043.diff 16 Files Affected: - (modified) llvm/include/llvm/Analysis/DXILResource.h (+8) - (modified) llvm/include/llvm/InitializePasses.h (+1) - (modified) llvm/lib/Analysis/Analysis.cpp (+1) - (modified) llvm/lib/Analysis/DXILResource.cpp (+53) - (modified) llvm/lib/Target/DirectX/CMakeLists.txt (+1) - (added) llvm/lib/Target/DirectX/DXILResourceImplicitBinding.cpp (+182) - (added) llvm/lib/Target/DirectX/DXILResourceImplicitBinding.h (+29) - (modified) llvm/lib/Target/DirectX/DirectX.h (+6) - (modified) llvm/lib/Target/DirectX/DirectXPassRegistry.def (+1) - (modified) llvm/lib/Target/DirectX/DirectXTargetMachine.cpp (+3) - (added) llvm/test/CodeGen/DirectX/ImplicitBinding/arrays.ll (+41) - (added) llvm/test/CodeGen/DirectX/ImplicitBinding/multiple-spaces.ll (+44) - (added) llvm/test/CodeGen/DirectX/ImplicitBinding/simple.ll (+30) - (added) llvm/test/CodeGen/DirectX/ImplicitBinding/unbounded-arrays-error.ll (+34) - (added) llvm/test/CodeGen/DirectX/ImplicitBinding/unbounded-arrays.ll (+42) - (modified) llvm/test/CodeGen/DirectX/llc-pipeline.ll (+2) ``diff diff --git a/llvm/include/llvm/Analysis/DXILResource.h b/llvm/include/llvm/Analysis/DXILResource.h index 569f5346b9e36..fd34fdfda4c1c 100644 --- a/llvm/include/llvm/Analysis/DXILResource.h +++ b/llvm/include/llvm/Analysis/DXILResource.h @@ -632,12 +632,15 @@ class DXILResourceBindingInfo { RegisterSpace(uint32_t Space) : Space(Space) { FreeRanges.emplace_back(0, UINT32_MAX); } +// Size == -1 means unbounded array +bool findAvailableBinding(int32_t Size, uint32_t *RegSlot); }; struct BindingSpaces { dxil::ResourceClass ResClass; llvm::SmallVector Spaces; BindingSpaces(dxil::ResourceClass ResClass) : ResClass(ResClass) {} +RegisterSpace &getOrInsertSpace(uint32_t Space); }; private: @@ -658,6 +661,7 @@ class DXILResourceBindingInfo { OverlappingBinding(false) {} bool hasImplicitBinding() const { return ImplicitBinding; } + void setHasImplicitBinding(bool Value) { ImplicitBinding = Value; } bool hasOverlappingBinding() const { return OverlappingBinding; } BindingSpaces &getBindingSpaces(dxil::ResourceClass RC) { @@ -673,6 +677,10 @@ class DXILResourceBindingInfo { } } + // Size == -1 means unbounded array + bool findAvailableBinding(dxil::ResourceClass RC, uint32_t Space, +int32_t Size, uint32_t *RegSlot); + friend class DXILResourceBindingAnalysis; friend class DXILResourceBindingWrapperPass; }; diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h index b84c6d6123d58..71db56d922676 100644 --- a/llvm/include/llvm/InitializePasses.h +++ b/llvm/include/llvm/InitializePasses.h @@ -85,6 +85,7 @@ void initializeDCELegacyPassPass(PassRegistry &); void initializeDXILMetadataAnalysisWrapperPassPass(PassRegistry &); void initializeDXILMetadataAnalysisWrapperPrinterPass(PassRegistry &); void initializeDXILResourceBindingWrapperPassPass(PassRegistry &); +void initializeDXILResourceImplicitBindingLegacyPass(PassRegistry &); void initializeDXILResourceTypeWrapperPassPass(PassRegistry &); void initializeDXILResourceWrapperPassPass(PassRegistry &); void initializeDeadMachineInstructionElimPass(PassRegistry &); diff --git a/llvm/lib/Analysis/Analysis.cpp b/llvm/lib/Analysis/Analysis.cpp index 484a456f49f1b..9f5daf32be9a0 100644 --- a/llvm/lib/Analysis/Analysis.cpp +++ b/llvm/lib/Analysis/Analysis.cpp @@ -27,6 +27,7 @@ void llvm::initializeAnalysis(PassRegistry &Registry) { initializeCallGraphViewerPass(Registry); initializeCycleInfoWrapperPassPass(Registry); initializeDXILMetadataAnalysisWrapperPassPass(Registry); + initializeDXILResourceWrapperPassPass(Registry); initializeDXILResourceBindingWrapperPassPass(Registry); initializeDXILResourceTypeWrapperPassPass(Registry); initializeDXILResourceWrapperPassPass(Registry); diff --git a/llvm/lib/Analysis/DXILResource.cpp b/llvm/lib/Analysis/DXILResource.cpp index ce8e250a32ebe..34355c81e77b0 100644 --- a/llvm/lib/Analysis/DXILResource.cpp +++ b/llvm/lib/Analysis/DXILResource.cpp @@ -995,6 +995,59 @@ void DXILResourceBindingInfo::populate(Module &M, D
[llvm-branch-commits] [llvm] [HLSL] Implementation of DXILResourceImplicitBinding pass (PR #138043)
llvmbot wrote: @llvm/pr-subscribers-llvm-analysis Author: Helena Kotas (hekota) Changes The `DXILResourceImplicitBinding` pass uses the results of `DXILResourceBindingAnalysis` to assigns register slots to resources that do not have explicit binding. It replaces all `llvm.dx.resource.handlefromimplicitbinding` calls with `llvm.dx.resource.handlefrombinding` using the newly assigned binding. If a binding cannot be found for a resource, the pass will raise a diagnostic error. Currently this diagnostic message does not include the resource name, which will be addressed in a separate task (#137868). Part 2/2 of #136786 Closes #136786 --- Patch is 26.40 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/138043.diff 16 Files Affected: - (modified) llvm/include/llvm/Analysis/DXILResource.h (+8) - (modified) llvm/include/llvm/InitializePasses.h (+1) - (modified) llvm/lib/Analysis/Analysis.cpp (+1) - (modified) llvm/lib/Analysis/DXILResource.cpp (+53) - (modified) llvm/lib/Target/DirectX/CMakeLists.txt (+1) - (added) llvm/lib/Target/DirectX/DXILResourceImplicitBinding.cpp (+182) - (added) llvm/lib/Target/DirectX/DXILResourceImplicitBinding.h (+29) - (modified) llvm/lib/Target/DirectX/DirectX.h (+6) - (modified) llvm/lib/Target/DirectX/DirectXPassRegistry.def (+1) - (modified) llvm/lib/Target/DirectX/DirectXTargetMachine.cpp (+3) - (added) llvm/test/CodeGen/DirectX/ImplicitBinding/arrays.ll (+41) - (added) llvm/test/CodeGen/DirectX/ImplicitBinding/multiple-spaces.ll (+44) - (added) llvm/test/CodeGen/DirectX/ImplicitBinding/simple.ll (+30) - (added) llvm/test/CodeGen/DirectX/ImplicitBinding/unbounded-arrays-error.ll (+34) - (added) llvm/test/CodeGen/DirectX/ImplicitBinding/unbounded-arrays.ll (+42) - (modified) llvm/test/CodeGen/DirectX/llc-pipeline.ll (+2) ``diff diff --git a/llvm/include/llvm/Analysis/DXILResource.h b/llvm/include/llvm/Analysis/DXILResource.h index 569f5346b9e36..fd34fdfda4c1c 100644 --- a/llvm/include/llvm/Analysis/DXILResource.h +++ b/llvm/include/llvm/Analysis/DXILResource.h @@ -632,12 +632,15 @@ class DXILResourceBindingInfo { RegisterSpace(uint32_t Space) : Space(Space) { FreeRanges.emplace_back(0, UINT32_MAX); } +// Size == -1 means unbounded array +bool findAvailableBinding(int32_t Size, uint32_t *RegSlot); }; struct BindingSpaces { dxil::ResourceClass ResClass; llvm::SmallVector Spaces; BindingSpaces(dxil::ResourceClass ResClass) : ResClass(ResClass) {} +RegisterSpace &getOrInsertSpace(uint32_t Space); }; private: @@ -658,6 +661,7 @@ class DXILResourceBindingInfo { OverlappingBinding(false) {} bool hasImplicitBinding() const { return ImplicitBinding; } + void setHasImplicitBinding(bool Value) { ImplicitBinding = Value; } bool hasOverlappingBinding() const { return OverlappingBinding; } BindingSpaces &getBindingSpaces(dxil::ResourceClass RC) { @@ -673,6 +677,10 @@ class DXILResourceBindingInfo { } } + // Size == -1 means unbounded array + bool findAvailableBinding(dxil::ResourceClass RC, uint32_t Space, +int32_t Size, uint32_t *RegSlot); + friend class DXILResourceBindingAnalysis; friend class DXILResourceBindingWrapperPass; }; diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h index b84c6d6123d58..71db56d922676 100644 --- a/llvm/include/llvm/InitializePasses.h +++ b/llvm/include/llvm/InitializePasses.h @@ -85,6 +85,7 @@ void initializeDCELegacyPassPass(PassRegistry &); void initializeDXILMetadataAnalysisWrapperPassPass(PassRegistry &); void initializeDXILMetadataAnalysisWrapperPrinterPass(PassRegistry &); void initializeDXILResourceBindingWrapperPassPass(PassRegistry &); +void initializeDXILResourceImplicitBindingLegacyPass(PassRegistry &); void initializeDXILResourceTypeWrapperPassPass(PassRegistry &); void initializeDXILResourceWrapperPassPass(PassRegistry &); void initializeDeadMachineInstructionElimPass(PassRegistry &); diff --git a/llvm/lib/Analysis/Analysis.cpp b/llvm/lib/Analysis/Analysis.cpp index 484a456f49f1b..9f5daf32be9a0 100644 --- a/llvm/lib/Analysis/Analysis.cpp +++ b/llvm/lib/Analysis/Analysis.cpp @@ -27,6 +27,7 @@ void llvm::initializeAnalysis(PassRegistry &Registry) { initializeCallGraphViewerPass(Registry); initializeCycleInfoWrapperPassPass(Registry); initializeDXILMetadataAnalysisWrapperPassPass(Registry); + initializeDXILResourceWrapperPassPass(Registry); initializeDXILResourceBindingWrapperPassPass(Registry); initializeDXILResourceTypeWrapperPassPass(Registry); initializeDXILResourceWrapperPassPass(Registry); diff --git a/llvm/lib/Analysis/DXILResource.cpp b/llvm/lib/Analysis/DXILResource.cpp index ce8e250a32ebe..34355c81e77b0 100644 --- a/llvm/lib/Analysis/DXILResource.cpp +++ b/llvm/lib/Analysis/DXILResource.cpp @@ -995,6 +995,59 @@ void DXILResourceBindingInfo::populate(Module &M, DXI
[llvm-branch-commits] [llvm] [HLSL] Implementation of DXILResourceImplicitBinding pass (PR #138043)
@@ -0,0 +1,44 @@ +; RUN: opt -S -dxil-resource-implicit-binding %s | FileCheck %s hekota wrote: I just realized this test is not complete... working on a fix. https://github.com/llvm/llvm-project/pull/138043 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang-tools-extra] [clang-doc] Add Mustache template assets (PR #138059)
ilovepi wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/138059?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#138063** https://app.graphite.dev/github/pr/llvm/llvm-project/138063?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#138062** https://app.graphite.dev/github/pr/llvm/llvm-project/138062?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#138061** https://app.graphite.dev/github/pr/llvm/llvm-project/138061?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#138060** https://app.graphite.dev/github/pr/llvm/llvm-project/138060?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#138059** https://app.graphite.dev/github/pr/llvm/llvm-project/138059?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> ๐ https://app.graphite.dev/github/pr/llvm/llvm-project/138059?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#138058** https://app.graphite.dev/github/pr/llvm/llvm-project/138058?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/138059 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [llvm][IR] Treat memcmp and bcmp as libcalls (PR #135706)
ilovepi wrote: @efriedma-quic We've been discussing the topic of LTOing libc quite a bit internally, and are currently sketching out how this could work. Unsurprisingly, there's quite a lot to think about in both the compiler and linker, and how the two combine in our different versions of LTO, and how that may break in new and fun ways. I was wondering if you'd be available to join the libc monthly meeting (5/8 9am PST) to discuss your take on the whole idea? I'm not sure what time zone you're normally in, but I think there will be quite a number of folks who are interested in making LTO work well with libc's and LLVM libc in particular. I know a few of my team members would like to pick your brain on the subject, as we're sketching our ideas out. I can try to post a short summary of our thoughts either here or on discouse to make the discussion a bit easier as well. https://github.com/llvm/llvm-project/pull/135706 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add parsing for RootFlags (PR #138055)
https://github.com/inbelic created https://github.com/llvm/llvm-project/pull/138055 - defines the `RootFlags` in-memory enum - defines `parseRootFlags` to parse the various flag enums into a single `uint32_t` - adds corresponding unit tests - improves the diagnostic message for when we provide a non-zero integer value to the flags Resolves https://github.com/llvm/llvm-project/issues/126575 Rate limit ยท GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support โ https://githubstatus.com";>GitHub Status โ https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits