[llvm-branch-commits] [llvm] [AMDGPU] Remove the pass `AMDGPUPromoteKernelArguments` (PR #137655)

2025-04-30 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian closed 
https://github.com/llvm/llvm-project/pull/137655
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Remove the pass `AMDGPUPromoteKernelArguments` (PR #137655)

2025-04-30 Thread Shilei Tian via llvm-branch-commits

shiltian wrote:

Will close this for now. I'll revisit this if we can handle the case under 
discussion properly.

https://github.com/llvm/llvm-project/pull/137655
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [AMDGPU][Attributor] Rework update of `AAAMDWavesPerEU` (PR #123995)

2025-04-30 Thread Shilei Tian via llvm-branch-commits


@@ -1425,8 +1453,14 @@ static bool runImpl(Module &M, AnalysisGetter &AG, 
TargetMachine &TM,
 }
   }
 
-  ChangeStatus Change = A.run();
-  return Change == ChangeStatus::CHANGED;
+  bool Changed = A.run() == ChangeStatus::CHANGED;
+
+  if (Changed && (LTOPhase == ThinOrFullLTOPhase::None ||
+  LTOPhase == ThinOrFullLTOPhase::FullLTOPostLink ||
+  LTOPhase == ThinOrFullLTOPhase::ThinLTOPostLink))
+checkWavesPerEU(M, TM);

shiltian wrote:

gentle ping

https://github.com/llvm/llvm-project/pull/123995
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [GOFF] Add writing of text records (PR #137235)

2025-04-30 Thread Kai Nacke via llvm-branch-commits

redstar wrote:

I had to force-push to get the fixed test cases.

https://github.com/llvm/llvm-project/pull/137235
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: refactor issue reporting (PR #135662)

2025-04-30 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/135662

>From 0d477fa179d365147d2cd809724b07f050603ff6 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Mon, 14 Apr 2025 15:08:54 +0300
Subject: [PATCH 1/5] [BOLT] Gadget scanner: refactor issue reporting

Remove `getAffectedRegisters` and `setOverwritingInstrs` methods from
the base `Report` class. Instead, make `Report` always represent the
brief version of the report. When an issue is detected on the first run
of the analysis, return an optional request for extra details to attach
to the report on the second run.
---
 bolt/include/bolt/Passes/PAuthGadgetScanner.h | 102 ++---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 200 ++
 .../AArch64/gs-pauth-debug-output.s   |   8 +-
 3 files changed, 187 insertions(+), 123 deletions(-)

diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h 
b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
index 4c1bef3d2265f..3b6c1f6af94a0 100644
--- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h
+++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
@@ -219,11 +219,6 @@ struct Report {
   virtual void generateReport(raw_ostream &OS,
   const BinaryContext &BC) const = 0;
 
-  // The two methods below are called by Analysis::computeDetailedInfo when
-  // iterating over the reports.
-  virtual ArrayRef getAffectedRegisters() const { return {}; }
-  virtual void setOverwritingInstrs(ArrayRef Instrs) {}
-
   void printBasicInfo(raw_ostream &OS, const BinaryContext &BC,
   StringRef IssueKind) const;
 };
@@ -231,27 +226,11 @@ struct Report {
 struct GadgetReport : public Report {
   // The particular kind of gadget that is detected.
   const GadgetKind &Kind;
-  // The set of registers related to this gadget report (possibly empty).
-  SmallVector AffectedRegisters;
-  // The instructions that clobber the affected registers.
-  // There is no one-to-one correspondence with AffectedRegisters: for example,
-  // the same register can be overwritten by different instructions in 
different
-  // preceding basic blocks.
-  SmallVector OverwritingInstrs;
-
-  GadgetReport(const GadgetKind &Kind, MCInstReference Location,
-   MCPhysReg AffectedRegister)
-  : Report(Location), Kind(Kind), AffectedRegisters({AffectedRegister}) {}
-
-  void generateReport(raw_ostream &OS, const BinaryContext &BC) const override;
 
-  ArrayRef getAffectedRegisters() const override {
-return AffectedRegisters;
-  }
+  GadgetReport(const GadgetKind &Kind, MCInstReference Location)
+  : Report(Location), Kind(Kind) {}
 
-  void setOverwritingInstrs(ArrayRef Instrs) override {
-OverwritingInstrs.assign(Instrs.begin(), Instrs.end());
-  }
+  void generateReport(raw_ostream &OS, const BinaryContext &BC) const override;
 };
 
 /// Report with a free-form message attached.
@@ -263,8 +242,75 @@ struct GenericReport : public Report {
   const BinaryContext &BC) const override;
 };
 
+/// An information about an issue collected on the slower, detailed,
+/// run of an analysis.
+class ExtraInfo {
+public:
+  virtual void print(raw_ostream &OS, const MCInstReference Location) const = 
0;
+
+  virtual ~ExtraInfo() {}
+};
+
+class ClobberingInfo : public ExtraInfo {
+  SmallVector ClobberingInstrs;
+
+public:
+  ClobberingInfo(const ArrayRef Instrs)
+  : ClobberingInstrs(Instrs) {}
+
+  void print(raw_ostream &OS, const MCInstReference Location) const override;
+};
+
+/// A brief version of a report that can be further augmented with the details.
+///
+/// It is common for a particular type of gadget detector to be tied to some
+/// specific kind of analysis. If an issue is returned by that detector, it may
+/// be further augmented with the detailed info in an analysis-specific way,
+/// or just be left as-is (f.e. if a free-form warning was reported).
+template  struct BriefReport {
+  BriefReport(std::shared_ptr Issue,
+  const std::optional RequestedDetails)
+  : Issue(Issue), RequestedDetails(RequestedDetails) {}
+
+  std::shared_ptr Issue;
+  std::optional RequestedDetails;
+};
+
+/// A detailed version of a report.
+struct DetailedReport {
+  DetailedReport(std::shared_ptr Issue,
+ std::shared_ptr Details)
+  : Issue(Issue), Details(Details) {}
+
+  std::shared_ptr Issue;
+  std::shared_ptr Details;
+};
+
 struct FunctionAnalysisResult {
-  std::vector> Diagnostics;
+  std::vector Diagnostics;
+};
+
+/// A helper class storing per-function context to be instantiated by Analysis.
+class FunctionAnalysis {
+  BinaryContext &BC;
+  BinaryFunction &BF;
+  MCPlusBuilder::AllocatorIdTy AllocatorId;
+  FunctionAnalysisResult Result;
+
+  bool PacRetGadgetsOnly;
+
+  void findUnsafeUses(SmallVector> &Reports);
+  void augmentUnsafeUseReports(const ArrayRef> Reports);
+
+public:
+  FunctionAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy 
AllocatorId,
+ 

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: refactor issue reporting (PR #135662)

2025-04-30 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/135662

>From 0d477fa179d365147d2cd809724b07f050603ff6 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Mon, 14 Apr 2025 15:08:54 +0300
Subject: [PATCH 1/5] [BOLT] Gadget scanner: refactor issue reporting

Remove `getAffectedRegisters` and `setOverwritingInstrs` methods from
the base `Report` class. Instead, make `Report` always represent the
brief version of the report. When an issue is detected on the first run
of the analysis, return an optional request for extra details to attach
to the report on the second run.
---
 bolt/include/bolt/Passes/PAuthGadgetScanner.h | 102 ++---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 200 ++
 .../AArch64/gs-pauth-debug-output.s   |   8 +-
 3 files changed, 187 insertions(+), 123 deletions(-)

diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h 
b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
index 4c1bef3d2265f..3b6c1f6af94a0 100644
--- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h
+++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
@@ -219,11 +219,6 @@ struct Report {
   virtual void generateReport(raw_ostream &OS,
   const BinaryContext &BC) const = 0;
 
-  // The two methods below are called by Analysis::computeDetailedInfo when
-  // iterating over the reports.
-  virtual ArrayRef getAffectedRegisters() const { return {}; }
-  virtual void setOverwritingInstrs(ArrayRef Instrs) {}
-
   void printBasicInfo(raw_ostream &OS, const BinaryContext &BC,
   StringRef IssueKind) const;
 };
@@ -231,27 +226,11 @@ struct Report {
 struct GadgetReport : public Report {
   // The particular kind of gadget that is detected.
   const GadgetKind &Kind;
-  // The set of registers related to this gadget report (possibly empty).
-  SmallVector AffectedRegisters;
-  // The instructions that clobber the affected registers.
-  // There is no one-to-one correspondence with AffectedRegisters: for example,
-  // the same register can be overwritten by different instructions in 
different
-  // preceding basic blocks.
-  SmallVector OverwritingInstrs;
-
-  GadgetReport(const GadgetKind &Kind, MCInstReference Location,
-   MCPhysReg AffectedRegister)
-  : Report(Location), Kind(Kind), AffectedRegisters({AffectedRegister}) {}
-
-  void generateReport(raw_ostream &OS, const BinaryContext &BC) const override;
 
-  ArrayRef getAffectedRegisters() const override {
-return AffectedRegisters;
-  }
+  GadgetReport(const GadgetKind &Kind, MCInstReference Location)
+  : Report(Location), Kind(Kind) {}
 
-  void setOverwritingInstrs(ArrayRef Instrs) override {
-OverwritingInstrs.assign(Instrs.begin(), Instrs.end());
-  }
+  void generateReport(raw_ostream &OS, const BinaryContext &BC) const override;
 };
 
 /// Report with a free-form message attached.
@@ -263,8 +242,75 @@ struct GenericReport : public Report {
   const BinaryContext &BC) const override;
 };
 
+/// An information about an issue collected on the slower, detailed,
+/// run of an analysis.
+class ExtraInfo {
+public:
+  virtual void print(raw_ostream &OS, const MCInstReference Location) const = 
0;
+
+  virtual ~ExtraInfo() {}
+};
+
+class ClobberingInfo : public ExtraInfo {
+  SmallVector ClobberingInstrs;
+
+public:
+  ClobberingInfo(const ArrayRef Instrs)
+  : ClobberingInstrs(Instrs) {}
+
+  void print(raw_ostream &OS, const MCInstReference Location) const override;
+};
+
+/// A brief version of a report that can be further augmented with the details.
+///
+/// It is common for a particular type of gadget detector to be tied to some
+/// specific kind of analysis. If an issue is returned by that detector, it may
+/// be further augmented with the detailed info in an analysis-specific way,
+/// or just be left as-is (f.e. if a free-form warning was reported).
+template  struct BriefReport {
+  BriefReport(std::shared_ptr Issue,
+  const std::optional RequestedDetails)
+  : Issue(Issue), RequestedDetails(RequestedDetails) {}
+
+  std::shared_ptr Issue;
+  std::optional RequestedDetails;
+};
+
+/// A detailed version of a report.
+struct DetailedReport {
+  DetailedReport(std::shared_ptr Issue,
+ std::shared_ptr Details)
+  : Issue(Issue), Details(Details) {}
+
+  std::shared_ptr Issue;
+  std::shared_ptr Details;
+};
+
 struct FunctionAnalysisResult {
-  std::vector> Diagnostics;
+  std::vector Diagnostics;
+};
+
+/// A helper class storing per-function context to be instantiated by Analysis.
+class FunctionAnalysis {
+  BinaryContext &BC;
+  BinaryFunction &BF;
+  MCPlusBuilder::AllocatorIdTy AllocatorId;
+  FunctionAnalysisResult Result;
+
+  bool PacRetGadgetsOnly;
+
+  void findUnsafeUses(SmallVector> &Reports);
+  void augmentUnsafeUseReports(const ArrayRef> Reports);
+
+public:
+  FunctionAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy 
AllocatorId,
+ 

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)

2025-04-30 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/137224

>From 33158bcada96af1d201a5cb5342f0256f30894cd Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 22 Apr 2025 21:43:14 +0300
Subject: [PATCH] [BOLT] Gadget scanner: detect untrusted LR before tail call

Implement the detection of tail calls performed with untrusted link
register, which violates the assumption made on entry to every function.

Unlike other pauth gadgets, this one involves some amount of guessing
which branch instructions should be checked as tail calls.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 112 +++-
 .../AArch64/gs-pacret-autiasp.s   |  31 +-
 .../AArch64/gs-pauth-debug-output.s   |  30 +-
 .../AArch64/gs-pauth-tail-calls.s | 597 ++
 4 files changed, 730 insertions(+), 40 deletions(-)
 create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-tail-calls.s

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index d4282daad648b..fab92947f6d67 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -660,8 +660,9 @@ class DataflowSrcSafetyAnalysis
 //
 // Then, a function can be split into a number of disjoint contiguous sequences
 // of instructions without labels in between. These sequences can be processed
-// the same way basic blocks are processed by data-flow analysis, assuming
-// pessimistically that all registers are unsafe at the start of each sequence.
+// the same way basic blocks are processed by data-flow analysis, with the same
+// pessimistic estimation of the initial state at the start of each sequence
+// (except the first instruction of the function).
 class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis {
   BinaryFunction &BF;
   MCPlusBuilder::AllocatorIdTy AllocId;
@@ -672,6 +673,30 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
   BC.MIB->removeAnnotation(I.second, StateAnnotationIndex);
   }
 
+  /// Compute a reasonably pessimistic estimation of the register state when
+  /// the previous instruction is not known for sure. Take the set of registers
+  /// which are trusted at function entry and remove all registers that can be
+  /// clobbered inside this function.
+  SrcState computePessimisticState(BinaryFunction &BF) {
+BitVector ClobberedRegs(NumRegs);
+for (auto &I : BF.instrs()) {
+  MCInst &Inst = I.second;
+  BC.MIB->getClobberedRegs(Inst, ClobberedRegs);
+
+  // If this is a call instruction, no register is safe anymore, unless
+  // it is a tail call. Ignore tail calls for the purpose of estimating the
+  // worst-case scenario, assuming no instructions are executed in the
+  // caller after this point anyway.
+  if (BC.MIB->isCall(Inst) && !BC.MIB->isTailCall(Inst))
+ClobberedRegs.set();
+}
+
+SrcState S = createEntryState();
+S.SafeToDerefRegs.reset(ClobberedRegs);
+S.TrustedRegs.reset(ClobberedRegs);
+return S;
+  }
+
 public:
   CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF,
   MCPlusBuilder::AllocatorIdTy AllocId,
@@ -682,6 +707,7 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
   }
 
   void run() override {
+const SrcState DefaultState = computePessimisticState(BF);
 SrcState S = createEntryState();
 for (auto &I : BF.instrs()) {
   MCInst &Inst = I.second;
@@ -696,7 +722,7 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
 LLVM_DEBUG({
   traceInst(BC, "Due to label, resetting the state before", Inst);
 });
-S = createUnsafeState();
+S = DefaultState;
   }
 
   // Check if we need to remove an old annotation (this is the case if
@@ -1233,6 +1259,83 @@ shouldReportReturnGadget(const BinaryContext &BC, const 
MCInstReference &Inst,
   return make_gadget_report(RetKind, Inst, *RetReg);
 }
 
+/// While BOLT already marks some of the branch instructions as tail calls,
+/// this function tries to improve the coverage by including less obvious cases
+/// when it is possible to do without introducing too many false positives.
+static bool shouldAnalyzeTailCallInst(const BinaryContext &BC,
+  const BinaryFunction &BF,
+  const MCInstReference &Inst) {
+  // Some BC.MIB->isXYZ(Inst) methods simply delegate to MCInstrDesc::isXYZ()
+  // (such as isBranch at the time of writing this comment), some don't (such
+  // as isCall). For that reason, call MCInstrDesc's methods explicitly when
+  // it is important.
+  const MCInstrDesc &Desc =
+  BC.MII->get(static_cast(Inst).getOpcode());
+  // Tail call should be a branch (but not necessarily an indirect one).
+  if (!Desc.isBranch())
+return false;
+
+  // Always analyze the branches already marked as tail calls by BOLT.
+  if (BC.MIB->isTailCall(Inst))
+return t

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: clarify MCPlusBuilder callbacks interface (PR #136147)

2025-04-30 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/136147

>From 59afabd2a0b89f6e529f58cd17ac56673e2c0599 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Thu, 17 Apr 2025 15:40:05 +0300
Subject: [PATCH] [BOLT] Gadget scanner: clarify MCPlusBuilder callbacks
 interface

Clarify the semantics of `getAuthenticatedReg` and remove a redundant
`isAuthenticationOfReg` method, as combined auth+something instructions
(such as `retaa` on AArch64) should be handled carefully, especially
when searching for authentication oracles: usually, such instructions
cannot be authentication oracles and only some of them actually write
an authenticated pointer to a register (such as "ldra x0, [x1]!").

Use `std::optional` returned type instead of plain MCPhysReg
and returning `getNoRegister()` as a "not applicable" indication.

Document a few existing methods, add information about preconditions.
---
 bolt/include/bolt/Core/MCPlusBuilder.h| 61 ++-
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 64 +---
 .../Target/AArch64/AArch64MCPlusBuilder.cpp   | 76 ---
 .../AArch64/gs-pauth-debug-output.s   |  3 -
 .../AArch64/gs-pauth-signing-oracles.s| 20 +
 5 files changed, 130 insertions(+), 94 deletions(-)

diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h 
b/bolt/include/bolt/Core/MCPlusBuilder.h
index 132d58f3f9f79..83ad70ea97076 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -562,30 +562,50 @@ class MCPlusBuilder {
 return {};
   }
 
-  virtual ErrorOr getAuthenticatedReg(const MCInst &Inst) const {
-llvm_unreachable("not implemented");
-return getNoRegister();
-  }
-
-  virtual bool isAuthenticationOfReg(const MCInst &Inst,
- MCPhysReg AuthenticatedReg) const {
+  /// Returns the register where an authenticated pointer is written to by 
Inst,
+  /// or std::nullopt if not authenticating any register.
+  ///
+  /// Sets IsChecked if the instruction always checks authenticated pointer,
+  /// i.e. it either returns a successfully authenticated pointer or terminates
+  /// the program abnormally (such as "ldra x0, [x1]!" on AArch64, which 
crashes
+  /// on authentication failure even if FEAT_FPAC is not implemented).
+  virtual std::optional
+  getWrittenAuthenticatedReg(const MCInst &Inst, bool &IsChecked) const {
 llvm_unreachable("not implemented");
-return false;
+return std::nullopt;
   }
 
-  virtual MCPhysReg getSignedReg(const MCInst &Inst) const {
+  /// Returns the register signed by Inst, or std::nullopt if not signing any
+  /// register.
+  ///
+  /// The returned register is assumed to be both input and output operand,
+  /// as it is done on AArch64.
+  virtual std::optional getSignedReg(const MCInst &Inst) const {
 llvm_unreachable("not implemented");
-return getNoRegister();
+return std::nullopt;
   }
 
-  virtual ErrorOr getRegUsedAsRetDest(const MCInst &Inst) const {
+  /// Returns the register used as a return address. Returns std::nullopt if
+  /// not applicable, such as reading the return address from a system register
+  /// or from the stack.
+  ///
+  /// Sets IsAuthenticatedInternally if the instruction accepts a signed
+  /// pointer as its operand and authenticates it internally.
+  ///
+  /// Should only be called when isReturn(Inst) is true.
+  virtual std::optional
+  getRegUsedAsRetDest(const MCInst &Inst,
+  bool &IsAuthenticatedInternally) const {
 llvm_unreachable("not implemented");
-return getNoRegister();
+return std::nullopt;
   }
 
   /// Returns the register used as the destination of an indirect branch or 
call
   /// instruction. Sets IsAuthenticatedInternally if the instruction accepts
   /// a signed pointer as its operand and authenticates it internally.
+  ///
+  /// Should only be called if isIndirectCall(Inst) or isIndirectBranch(Inst)
+  /// returns true.
   virtual MCPhysReg
   getRegUsedAsIndirectBranchDest(const MCInst &Inst,
  bool &IsAuthenticatedInternally) const {
@@ -602,14 +622,14 @@ class MCPlusBuilder {
   ///controlled, under the Pointer Authentication threat model.
   ///
   /// If the instruction does not write to any register satisfying the above
-  /// two conditions, NoRegister is returned.
+  /// two conditions, std::nullopt is returned.
   ///
   /// The Pointer Authentication threat model assumes an attacker is able to
   /// modify any writable memory, but not executable code (due to W^X).
-  virtual MCPhysReg
+  virtual std::optional
   getMaterializedAddressRegForPtrAuth(const MCInst &Inst) const {
 llvm_unreachable("not implemented");
-return getNoRegister();
+return std::nullopt;
   }
 
   /// Analyzes if this instruction can safely perform address arithmetics
@@ -622,10 +642,13 @@ class MCPlusBuilder {
   /// controlled, provided InReg and executa

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: use more appropriate types (NFC) (PR #135661)

2025-04-30 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/135661

>From b942d6e9dc8858653e4669d6a1b8e50f0dc19061 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Mon, 14 Apr 2025 14:35:56 +0300
Subject: [PATCH 1/2] [BOLT] Gadget scanner: use more appropriate types (NFC)

* use more flexible `const ArrayRef` and `StringRef` types instead of
  `const std::vector &` and `const std::string &`, correspondingly,
  for function arguments
* return plain `const SrcState &` instead of `ErrorOr`
  from `SrcSafetyAnalysis::getStateBefore`, as absent state is not
  handled gracefully by any caller
---
 bolt/include/bolt/Passes/PAuthGadgetScanner.h |  8 +---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 39 ---
 2 files changed, 19 insertions(+), 28 deletions(-)

diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h 
b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
index 6765e2aff414f..3e39b64e59e0f 100644
--- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h
+++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
@@ -12,7 +12,6 @@
 #include "bolt/Core/BinaryContext.h"
 #include "bolt/Core/BinaryFunction.h"
 #include "bolt/Passes/BinaryPasses.h"
-#include "llvm/ADT/SmallSet.h"
 #include "llvm/Support/raw_ostream.h"
 #include 
 
@@ -199,9 +198,6 @@ raw_ostream &operator<<(raw_ostream &OS, const 
MCInstReference &);
 
 namespace PAuthGadgetScanner {
 
-class SrcSafetyAnalysis;
-struct SrcState;
-
 /// Description of a gadget kind that can be detected. Intended to be
 /// statically allocated to be attached to reports by reference.
 class GadgetKind {
@@ -210,7 +206,7 @@ class GadgetKind {
 public:
   GadgetKind(const char *Description) : Description(Description) {}
 
-  const StringRef getDescription() const { return Description; }
+  StringRef getDescription() const { return Description; }
 };
 
 /// Base report located at some instruction, without any additional 
information.
@@ -261,7 +257,7 @@ struct GadgetReport : public Report {
 /// Report with a free-form message attached.
 struct GenericReport : public Report {
   std::string Text;
-  GenericReport(MCInstReference Location, const std::string &Text)
+  GenericReport(MCInstReference Location, StringRef Text)
   : Report(Location), Text(Text) {}
   virtual void generateReport(raw_ostream &OS,
   const BinaryContext &BC) const override;
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 12eb9c66130b9..3d723456b6ffd 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -91,14 +91,14 @@ class TrackedRegisters {
   const std::vector Registers;
   std::vector RegToIndexMapping;
 
-  static size_t getMappingSize(const std::vector &RegsToTrack) {
+  static size_t getMappingSize(const ArrayRef RegsToTrack) {
 if (RegsToTrack.empty())
   return 0;
 return 1 + *llvm::max_element(RegsToTrack);
   }
 
 public:
-  TrackedRegisters(const std::vector &RegsToTrack)
+  TrackedRegisters(const ArrayRef RegsToTrack)
   : Registers(RegsToTrack),
 RegToIndexMapping(getMappingSize(RegsToTrack), NoIndex) {
 for (unsigned I = 0; I < RegsToTrack.size(); ++I)
@@ -234,7 +234,7 @@ struct SrcState {
 
 static void printLastInsts(
 raw_ostream &OS,
-const std::vector> &LastInstWritingReg) {
+const ArrayRef> LastInstWritingReg) {
   OS << "Insts: ";
   for (unsigned I = 0; I < LastInstWritingReg.size(); ++I) {
 auto &Set = LastInstWritingReg[I];
@@ -295,7 +295,7 @@ void SrcStatePrinter::print(raw_ostream &OS, const SrcState 
&S) const {
 class SrcSafetyAnalysis {
 public:
   SrcSafetyAnalysis(BinaryFunction &BF,
-const std::vector &RegsToTrackInstsFor)
+const ArrayRef RegsToTrackInstsFor)
   : BC(BF.getBinaryContext()), NumRegs(BC.MRI->getNumRegs()),
 RegsToTrackInstsFor(RegsToTrackInstsFor) {}
 
@@ -303,11 +303,10 @@ class SrcSafetyAnalysis {
 
   static std::shared_ptr
   create(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId,
- const std::vector &RegsToTrackInstsFor);
+ const ArrayRef RegsToTrackInstsFor);
 
   virtual void run() = 0;
-  virtual ErrorOr
-  getStateBefore(const MCInst &Inst) const = 0;
+  virtual const SrcState &getStateBefore(const MCInst &Inst) const = 0;
 
 protected:
   BinaryContext &BC;
@@ -347,7 +346,7 @@ class SrcSafetyAnalysis {
   }
 
   BitVector getClobberedRegs(const MCInst &Point) const {
-BitVector Clobbered(NumRegs, false);
+BitVector Clobbered(NumRegs);
 // Assume a call can clobber all registers, including callee-saved
 // registers. There's a good chance that callee-saved registers will be
 // saved on the stack at some point during execution of the callee.
@@ -409,8 +408,7 @@ class SrcSafetyAnalysis {
   // FirstCheckerInst should belong to the same basic block (see the
   // assertion in DataflowSrcSafetyAnalysis::run()), meaning it was
  

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: improve handling of unreachable basic blocks (PR #136183)

2025-04-30 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/136183

>From 352364dfc00d23111fc44bc807d3484bc67aff00 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Thu, 17 Apr 2025 20:51:16 +0300
Subject: [PATCH 1/2] [BOLT] Gadget scanner: improve handling of unreachable
 basic blocks

Instead of refusing to analyze an instruction completely, when it is
unreachable according to the CFG reconstructed by BOLT, pessimistically
assume all registers to be unsafe at the start of basic blocks without
any predecessors. Nevertheless, unreachable basic blocks found in
optimized code likely means imprecise CFG reconstruction, thus report a
warning once per basic block without predecessors.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 46 ++-
 .../AArch64/gs-pacret-autiasp.s   |  7 ++-
 .../binary-analysis/AArch64/gs-pauth-calls.s  | 57 +++
 3 files changed, 95 insertions(+), 15 deletions(-)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 7a86d9c4c9a59..917649731884e 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -344,6 +344,12 @@ class SrcSafetyAnalysis {
 return S;
   }
 
+  /// Creates a state with all registers marked unsafe (not to be confused
+  /// with empty state).
+  SrcState createUnsafeState() const {
+return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
+  }
+
   BitVector getClobberedRegs(const MCInst &Point) const {
 BitVector Clobbered(NumRegs);
 // Assume a call can clobber all registers, including callee-saved
@@ -586,6 +592,13 @@ class DataflowSrcSafetyAnalysis
 if (BB.isEntryPoint())
   return createEntryState();
 
+// If a basic block without any predecessors is found in an optimized code,
+// this likely means that some CFG edges were not detected. Pessimistically
+// assume all registers to be unsafe before this basic block and warn about
+// this fact in FunctionAnalysis::findUnsafeUses().
+if (BB.pred_empty())
+  return createUnsafeState();
+
 return SrcState();
   }
 
@@ -659,12 +672,6 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
   BC.MIB->removeAnnotation(I.second, StateAnnotationIndex);
   }
 
-  /// Creates a state with all registers marked unsafe (not to be confused
-  /// with empty state).
-  SrcState createUnsafeState() const {
-return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
-  }
-
 public:
   CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF,
   MCPlusBuilder::AllocatorIdTy AllocId,
@@ -1335,19 +1342,30 @@ void FunctionAnalysisContext::findUnsafeUses(
 BF.dump();
   });
 
+  if (BF.hasCFG()) {
+// Warn on basic blocks being unreachable according to BOLT, as this
+// likely means CFG is imprecise.
+for (BinaryBasicBlock &BB : BF) {
+  if (!BB.pred_empty() || BB.isEntryPoint())
+continue;
+  // Arbitrarily attach the report to the first instruction of BB.
+  MCInst *InstToReport = BB.getFirstNonPseudoInstr();
+  if (!InstToReport)
+continue; // BB has no real instructions
+
+  Reports.push_back(
+  make_generic_report(MCInstReference::get(InstToReport, BF),
+  "Warning: no predecessor basic blocks detected "
+  "(possibly incomplete CFG)"));
+}
+  }
+
   iterateOverInstrs(BF, [&](MCInstReference Inst) {
 if (BC.MIB->isCFI(Inst))
   return;
 
 const SrcState &S = Analysis->getStateBefore(Inst);
-
-// If non-empty state was never propagated from the entry basic block
-// to Inst, assume it to be unreachable and report a warning.
-if (S.empty()) {
-  Reports.push_back(
-  make_generic_report(Inst, "Warning: unreachable instruction found"));
-  return;
-}
+assert(!S.empty() && "Instruction has no associated state");
 
 if (auto Report = shouldReportReturnGadget(BC, Inst, S))
   Reports.push_back(*Report);
diff --git a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s 
b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
index 284f0bea607a5..6559ba336e8de 100644
--- a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
+++ b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
@@ -215,12 +215,17 @@ f_callclobbered_calleesaved:
 .globl  f_unreachable_instruction
 .type   f_unreachable_instruction,@function
 f_unreachable_instruction:
-// CHECK-LABEL: GS-PAUTH: Warning: unreachable instruction found in function 
f_unreachable_instruction, basic block {{[0-9a-zA-Z.]+}}, at address
+// CHECK-LABEL: GS-PAUTH: Warning: no predecessor basic blocks detected 
(possibly incomplete CFG) in function f_unreachable_instruction, basic block 
{{[0-9a-zA-Z.]+}}, at address
 // CHECK-NEXT:The instruction is {{[0-9a-f]+}}:   add x0, x1, 
x2
 // CHECK-NOT:   instructi

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: do not crash on debug-printing CFI instructions (PR #136151)

2025-04-30 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/136151

>From a0a9cdfe1b7dacc9193317ef2ef8e340076067e7 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 15 Apr 2025 21:47:18 +0300
Subject: [PATCH] [BOLT] Gadget scanner: do not crash on debug-printing CFI
 instructions

Some instruction-printing code used under LLVM_DEBUG does not handle CFI
instructions well. While CFI instructions seem to be harmless for the
correctness of the analysis results, they do not convey any useful
information to the analysis either, so skip them early.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 16 ++
 .../AArch64/gs-pauth-debug-output.s   | 32 +++
 2 files changed, 48 insertions(+)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 76070bd95cae1..7a86d9c4c9a59 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -432,6 +432,9 @@ class SrcSafetyAnalysis {
   }
 
   SrcState computeNext(const MCInst &Point, const SrcState &Cur) {
+if (BC.MIB->isCFI(Point))
+  return Cur;
+
 SrcStatePrinter P(BC);
 LLVM_DEBUG({
   dbgs() << "  SrcSafetyAnalysis::ComputeNext(";
@@ -675,6 +678,8 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
 SrcState S = createEntryState();
 for (auto &I : BF.instrs()) {
   MCInst &Inst = I.second;
+  if (BC.MIB->isCFI(Inst))
+continue;
 
   // If there is a label before this instruction, it is possible that it
   // can be jumped-to, thus conservatively resetting S. As an exception,
@@ -952,6 +957,9 @@ class DstSafetyAnalysis {
   }
 
   DstState computeNext(const MCInst &Point, const DstState &Cur) {
+if (BC.MIB->isCFI(Point))
+  return Cur;
+
 DstStatePrinter P(BC);
 LLVM_DEBUG({
   dbgs() << "  DstSafetyAnalysis::ComputeNext(";
@@ -1128,6 +1136,8 @@ class CFGUnawareDstSafetyAnalysis : public 
DstSafetyAnalysis {
 DstState S = createUnsafeState();
 for (auto &I : llvm::reverse(BF.instrs())) {
   MCInst &Inst = I.second;
+  if (BC.MIB->isCFI(Inst))
+continue;
 
   // If Inst can change the control flow, we cannot be sure that the next
   // instruction (to be executed in analyzed program) is the one processed
@@ -1326,6 +1336,9 @@ void FunctionAnalysisContext::findUnsafeUses(
   });
 
   iterateOverInstrs(BF, [&](MCInstReference Inst) {
+if (BC.MIB->isCFI(Inst))
+  return;
+
 const SrcState &S = Analysis->getStateBefore(Inst);
 
 // If non-empty state was never propagated from the entry basic block
@@ -1389,6 +1402,9 @@ void FunctionAnalysisContext::findUnsafeDefs(
   });
 
   iterateOverInstrs(BF, [&](MCInstReference Inst) {
+if (BC.MIB->isCFI(Inst))
+  return;
+
 const DstState &S = Analysis->getStateAfter(Inst);
 
 if (auto Report = shouldReportAuthOracle(BC, Inst, S))
diff --git a/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s 
b/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s
index fd55880921d06..07b61bea77e94 100644
--- a/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s
+++ b/bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s
@@ -329,6 +329,38 @@ auth_oracle:
 // PAUTH-EMPTY:
 // PAUTH-NEXT:   Attaching leakage info to: :  autia   x0, x1 
# DataflowDstSafetyAnalysis: dst-state
 
+// Gadget scanner should not crash on CFI instructions, including when 
debug-printing them.
+// Note that the particular debug output is not checked, but BOLT should be
+// compiled with assertions enabled to support -debug-only argument.
+
+.globl  cfi_inst_df
+.type   cfi_inst_df,@function
+cfi_inst_df:
+.cfi_startproc
+sub sp, sp, #16
+.cfi_def_cfa_offset 16
+add sp, sp, #16
+.cfi_def_cfa_offset 0
+ret
+.size   cfi_inst_df, .-cfi_inst_df
+.cfi_endproc
+
+.globl  cfi_inst_nocfg
+.type   cfi_inst_nocfg,@function
+cfi_inst_nocfg:
+.cfi_startproc
+sub sp, sp, #16
+.cfi_def_cfa_offset 16
+
+adr x0, 1f
+br  x0
+1:
+add sp, sp, #16
+.cfi_def_cfa_offset 0
+ret
+.size   cfi_inst_nocfg, .-cfi_inst_nocfg
+.cfi_endproc
+
 // CHECK-LABEL:Analyzing function main, AllocatorId = 1
 .globl  main
 .type   main,@function

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: improve handling of unreachable basic blocks (PR #136183)

2025-04-30 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/136183

>From 352364dfc00d23111fc44bc807d3484bc67aff00 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Thu, 17 Apr 2025 20:51:16 +0300
Subject: [PATCH 1/2] [BOLT] Gadget scanner: improve handling of unreachable
 basic blocks

Instead of refusing to analyze an instruction completely, when it is
unreachable according to the CFG reconstructed by BOLT, pessimistically
assume all registers to be unsafe at the start of basic blocks without
any predecessors. Nevertheless, unreachable basic blocks found in
optimized code likely means imprecise CFG reconstruction, thus report a
warning once per basic block without predecessors.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 46 ++-
 .../AArch64/gs-pacret-autiasp.s   |  7 ++-
 .../binary-analysis/AArch64/gs-pauth-calls.s  | 57 +++
 3 files changed, 95 insertions(+), 15 deletions(-)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 7a86d9c4c9a59..917649731884e 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -344,6 +344,12 @@ class SrcSafetyAnalysis {
 return S;
   }
 
+  /// Creates a state with all registers marked unsafe (not to be confused
+  /// with empty state).
+  SrcState createUnsafeState() const {
+return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
+  }
+
   BitVector getClobberedRegs(const MCInst &Point) const {
 BitVector Clobbered(NumRegs);
 // Assume a call can clobber all registers, including callee-saved
@@ -586,6 +592,13 @@ class DataflowSrcSafetyAnalysis
 if (BB.isEntryPoint())
   return createEntryState();
 
+// If a basic block without any predecessors is found in an optimized code,
+// this likely means that some CFG edges were not detected. Pessimistically
+// assume all registers to be unsafe before this basic block and warn about
+// this fact in FunctionAnalysis::findUnsafeUses().
+if (BB.pred_empty())
+  return createUnsafeState();
+
 return SrcState();
   }
 
@@ -659,12 +672,6 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
   BC.MIB->removeAnnotation(I.second, StateAnnotationIndex);
   }
 
-  /// Creates a state with all registers marked unsafe (not to be confused
-  /// with empty state).
-  SrcState createUnsafeState() const {
-return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
-  }
-
 public:
   CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF,
   MCPlusBuilder::AllocatorIdTy AllocId,
@@ -1335,19 +1342,30 @@ void FunctionAnalysisContext::findUnsafeUses(
 BF.dump();
   });
 
+  if (BF.hasCFG()) {
+// Warn on basic blocks being unreachable according to BOLT, as this
+// likely means CFG is imprecise.
+for (BinaryBasicBlock &BB : BF) {
+  if (!BB.pred_empty() || BB.isEntryPoint())
+continue;
+  // Arbitrarily attach the report to the first instruction of BB.
+  MCInst *InstToReport = BB.getFirstNonPseudoInstr();
+  if (!InstToReport)
+continue; // BB has no real instructions
+
+  Reports.push_back(
+  make_generic_report(MCInstReference::get(InstToReport, BF),
+  "Warning: no predecessor basic blocks detected "
+  "(possibly incomplete CFG)"));
+}
+  }
+
   iterateOverInstrs(BF, [&](MCInstReference Inst) {
 if (BC.MIB->isCFI(Inst))
   return;
 
 const SrcState &S = Analysis->getStateBefore(Inst);
-
-// If non-empty state was never propagated from the entry basic block
-// to Inst, assume it to be unreachable and report a warning.
-if (S.empty()) {
-  Reports.push_back(
-  make_generic_report(Inst, "Warning: unreachable instruction found"));
-  return;
-}
+assert(!S.empty() && "Instruction has no associated state");
 
 if (auto Report = shouldReportReturnGadget(BC, Inst, S))
   Reports.push_back(*Report);
diff --git a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s 
b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
index 284f0bea607a5..6559ba336e8de 100644
--- a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
+++ b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
@@ -215,12 +215,17 @@ f_callclobbered_calleesaved:
 .globl  f_unreachable_instruction
 .type   f_unreachable_instruction,@function
 f_unreachable_instruction:
-// CHECK-LABEL: GS-PAUTH: Warning: unreachable instruction found in function 
f_unreachable_instruction, basic block {{[0-9a-zA-Z.]+}}, at address
+// CHECK-LABEL: GS-PAUTH: Warning: no predecessor basic blocks detected 
(possibly incomplete CFG) in function f_unreachable_instruction, basic block 
{{[0-9a-zA-Z.]+}}, at address
 // CHECK-NEXT:The instruction is {{[0-9a-f]+}}:   add x0, x1, 
x2
 // CHECK-NOT:   instructi

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: use more appropriate types (NFC) (PR #135661)

2025-04-30 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/135661

>From b942d6e9dc8858653e4669d6a1b8e50f0dc19061 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Mon, 14 Apr 2025 14:35:56 +0300
Subject: [PATCH 1/2] [BOLT] Gadget scanner: use more appropriate types (NFC)

* use more flexible `const ArrayRef` and `StringRef` types instead of
  `const std::vector &` and `const std::string &`, correspondingly,
  for function arguments
* return plain `const SrcState &` instead of `ErrorOr`
  from `SrcSafetyAnalysis::getStateBefore`, as absent state is not
  handled gracefully by any caller
---
 bolt/include/bolt/Passes/PAuthGadgetScanner.h |  8 +---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 39 ---
 2 files changed, 19 insertions(+), 28 deletions(-)

diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h 
b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
index 6765e2aff414f..3e39b64e59e0f 100644
--- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h
+++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
@@ -12,7 +12,6 @@
 #include "bolt/Core/BinaryContext.h"
 #include "bolt/Core/BinaryFunction.h"
 #include "bolt/Passes/BinaryPasses.h"
-#include "llvm/ADT/SmallSet.h"
 #include "llvm/Support/raw_ostream.h"
 #include 
 
@@ -199,9 +198,6 @@ raw_ostream &operator<<(raw_ostream &OS, const 
MCInstReference &);
 
 namespace PAuthGadgetScanner {
 
-class SrcSafetyAnalysis;
-struct SrcState;
-
 /// Description of a gadget kind that can be detected. Intended to be
 /// statically allocated to be attached to reports by reference.
 class GadgetKind {
@@ -210,7 +206,7 @@ class GadgetKind {
 public:
   GadgetKind(const char *Description) : Description(Description) {}
 
-  const StringRef getDescription() const { return Description; }
+  StringRef getDescription() const { return Description; }
 };
 
 /// Base report located at some instruction, without any additional 
information.
@@ -261,7 +257,7 @@ struct GadgetReport : public Report {
 /// Report with a free-form message attached.
 struct GenericReport : public Report {
   std::string Text;
-  GenericReport(MCInstReference Location, const std::string &Text)
+  GenericReport(MCInstReference Location, StringRef Text)
   : Report(Location), Text(Text) {}
   virtual void generateReport(raw_ostream &OS,
   const BinaryContext &BC) const override;
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 12eb9c66130b9..3d723456b6ffd 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -91,14 +91,14 @@ class TrackedRegisters {
   const std::vector Registers;
   std::vector RegToIndexMapping;
 
-  static size_t getMappingSize(const std::vector &RegsToTrack) {
+  static size_t getMappingSize(const ArrayRef RegsToTrack) {
 if (RegsToTrack.empty())
   return 0;
 return 1 + *llvm::max_element(RegsToTrack);
   }
 
 public:
-  TrackedRegisters(const std::vector &RegsToTrack)
+  TrackedRegisters(const ArrayRef RegsToTrack)
   : Registers(RegsToTrack),
 RegToIndexMapping(getMappingSize(RegsToTrack), NoIndex) {
 for (unsigned I = 0; I < RegsToTrack.size(); ++I)
@@ -234,7 +234,7 @@ struct SrcState {
 
 static void printLastInsts(
 raw_ostream &OS,
-const std::vector> &LastInstWritingReg) {
+const ArrayRef> LastInstWritingReg) {
   OS << "Insts: ";
   for (unsigned I = 0; I < LastInstWritingReg.size(); ++I) {
 auto &Set = LastInstWritingReg[I];
@@ -295,7 +295,7 @@ void SrcStatePrinter::print(raw_ostream &OS, const SrcState 
&S) const {
 class SrcSafetyAnalysis {
 public:
   SrcSafetyAnalysis(BinaryFunction &BF,
-const std::vector &RegsToTrackInstsFor)
+const ArrayRef RegsToTrackInstsFor)
   : BC(BF.getBinaryContext()), NumRegs(BC.MRI->getNumRegs()),
 RegsToTrackInstsFor(RegsToTrackInstsFor) {}
 
@@ -303,11 +303,10 @@ class SrcSafetyAnalysis {
 
   static std::shared_ptr
   create(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId,
- const std::vector &RegsToTrackInstsFor);
+ const ArrayRef RegsToTrackInstsFor);
 
   virtual void run() = 0;
-  virtual ErrorOr
-  getStateBefore(const MCInst &Inst) const = 0;
+  virtual const SrcState &getStateBefore(const MCInst &Inst) const = 0;
 
 protected:
   BinaryContext &BC;
@@ -347,7 +346,7 @@ class SrcSafetyAnalysis {
   }
 
   BitVector getClobberedRegs(const MCInst &Point) const {
-BitVector Clobbered(NumRegs, false);
+BitVector Clobbered(NumRegs);
 // Assume a call can clobber all registers, including callee-saved
 // registers. There's a good chance that callee-saved registers will be
 // saved on the stack at some point during execution of the callee.
@@ -409,8 +408,7 @@ class SrcSafetyAnalysis {
   // FirstCheckerInst should belong to the same basic block (see the
   // assertion in DataflowSrcSafetyAnalysis::run()), meaning it was
  

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: clarify MCPlusBuilder callbacks interface (PR #136147)

2025-04-30 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/136147

>From 59afabd2a0b89f6e529f58cd17ac56673e2c0599 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Thu, 17 Apr 2025 15:40:05 +0300
Subject: [PATCH] [BOLT] Gadget scanner: clarify MCPlusBuilder callbacks
 interface

Clarify the semantics of `getAuthenticatedReg` and remove a redundant
`isAuthenticationOfReg` method, as combined auth+something instructions
(such as `retaa` on AArch64) should be handled carefully, especially
when searching for authentication oracles: usually, such instructions
cannot be authentication oracles and only some of them actually write
an authenticated pointer to a register (such as "ldra x0, [x1]!").

Use `std::optional` returned type instead of plain MCPhysReg
and returning `getNoRegister()` as a "not applicable" indication.

Document a few existing methods, add information about preconditions.
---
 bolt/include/bolt/Core/MCPlusBuilder.h| 61 ++-
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 64 +---
 .../Target/AArch64/AArch64MCPlusBuilder.cpp   | 76 ---
 .../AArch64/gs-pauth-debug-output.s   |  3 -
 .../AArch64/gs-pauth-signing-oracles.s| 20 +
 5 files changed, 130 insertions(+), 94 deletions(-)

diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h 
b/bolt/include/bolt/Core/MCPlusBuilder.h
index 132d58f3f9f79..83ad70ea97076 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -562,30 +562,50 @@ class MCPlusBuilder {
 return {};
   }
 
-  virtual ErrorOr getAuthenticatedReg(const MCInst &Inst) const {
-llvm_unreachable("not implemented");
-return getNoRegister();
-  }
-
-  virtual bool isAuthenticationOfReg(const MCInst &Inst,
- MCPhysReg AuthenticatedReg) const {
+  /// Returns the register where an authenticated pointer is written to by 
Inst,
+  /// or std::nullopt if not authenticating any register.
+  ///
+  /// Sets IsChecked if the instruction always checks authenticated pointer,
+  /// i.e. it either returns a successfully authenticated pointer or terminates
+  /// the program abnormally (such as "ldra x0, [x1]!" on AArch64, which 
crashes
+  /// on authentication failure even if FEAT_FPAC is not implemented).
+  virtual std::optional
+  getWrittenAuthenticatedReg(const MCInst &Inst, bool &IsChecked) const {
 llvm_unreachable("not implemented");
-return false;
+return std::nullopt;
   }
 
-  virtual MCPhysReg getSignedReg(const MCInst &Inst) const {
+  /// Returns the register signed by Inst, or std::nullopt if not signing any
+  /// register.
+  ///
+  /// The returned register is assumed to be both input and output operand,
+  /// as it is done on AArch64.
+  virtual std::optional getSignedReg(const MCInst &Inst) const {
 llvm_unreachable("not implemented");
-return getNoRegister();
+return std::nullopt;
   }
 
-  virtual ErrorOr getRegUsedAsRetDest(const MCInst &Inst) const {
+  /// Returns the register used as a return address. Returns std::nullopt if
+  /// not applicable, such as reading the return address from a system register
+  /// or from the stack.
+  ///
+  /// Sets IsAuthenticatedInternally if the instruction accepts a signed
+  /// pointer as its operand and authenticates it internally.
+  ///
+  /// Should only be called when isReturn(Inst) is true.
+  virtual std::optional
+  getRegUsedAsRetDest(const MCInst &Inst,
+  bool &IsAuthenticatedInternally) const {
 llvm_unreachable("not implemented");
-return getNoRegister();
+return std::nullopt;
   }
 
   /// Returns the register used as the destination of an indirect branch or 
call
   /// instruction. Sets IsAuthenticatedInternally if the instruction accepts
   /// a signed pointer as its operand and authenticates it internally.
+  ///
+  /// Should only be called if isIndirectCall(Inst) or isIndirectBranch(Inst)
+  /// returns true.
   virtual MCPhysReg
   getRegUsedAsIndirectBranchDest(const MCInst &Inst,
  bool &IsAuthenticatedInternally) const {
@@ -602,14 +622,14 @@ class MCPlusBuilder {
   ///controlled, under the Pointer Authentication threat model.
   ///
   /// If the instruction does not write to any register satisfying the above
-  /// two conditions, NoRegister is returned.
+  /// two conditions, std::nullopt is returned.
   ///
   /// The Pointer Authentication threat model assumes an attacker is able to
   /// modify any writable memory, but not executable code (due to W^X).
-  virtual MCPhysReg
+  virtual std::optional
   getMaterializedAddressRegForPtrAuth(const MCInst &Inst) const {
 llvm_unreachable("not implemented");
-return getNoRegister();
+return std::nullopt;
   }
 
   /// Analyzes if this instruction can safely perform address arithmetics
@@ -622,10 +642,13 @@ class MCPlusBuilder {
   /// controlled, provided InReg and executa

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)

2025-04-30 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/137224

>From 33158bcada96af1d201a5cb5342f0256f30894cd Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 22 Apr 2025 21:43:14 +0300
Subject: [PATCH] [BOLT] Gadget scanner: detect untrusted LR before tail call

Implement the detection of tail calls performed with untrusted link
register, which violates the assumption made on entry to every function.

Unlike other pauth gadgets, this one involves some amount of guessing
which branch instructions should be checked as tail calls.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 112 +++-
 .../AArch64/gs-pacret-autiasp.s   |  31 +-
 .../AArch64/gs-pauth-debug-output.s   |  30 +-
 .../AArch64/gs-pauth-tail-calls.s | 597 ++
 4 files changed, 730 insertions(+), 40 deletions(-)
 create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-tail-calls.s

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index d4282daad648b..fab92947f6d67 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -660,8 +660,9 @@ class DataflowSrcSafetyAnalysis
 //
 // Then, a function can be split into a number of disjoint contiguous sequences
 // of instructions without labels in between. These sequences can be processed
-// the same way basic blocks are processed by data-flow analysis, assuming
-// pessimistically that all registers are unsafe at the start of each sequence.
+// the same way basic blocks are processed by data-flow analysis, with the same
+// pessimistic estimation of the initial state at the start of each sequence
+// (except the first instruction of the function).
 class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis {
   BinaryFunction &BF;
   MCPlusBuilder::AllocatorIdTy AllocId;
@@ -672,6 +673,30 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
   BC.MIB->removeAnnotation(I.second, StateAnnotationIndex);
   }
 
+  /// Compute a reasonably pessimistic estimation of the register state when
+  /// the previous instruction is not known for sure. Take the set of registers
+  /// which are trusted at function entry and remove all registers that can be
+  /// clobbered inside this function.
+  SrcState computePessimisticState(BinaryFunction &BF) {
+BitVector ClobberedRegs(NumRegs);
+for (auto &I : BF.instrs()) {
+  MCInst &Inst = I.second;
+  BC.MIB->getClobberedRegs(Inst, ClobberedRegs);
+
+  // If this is a call instruction, no register is safe anymore, unless
+  // it is a tail call. Ignore tail calls for the purpose of estimating the
+  // worst-case scenario, assuming no instructions are executed in the
+  // caller after this point anyway.
+  if (BC.MIB->isCall(Inst) && !BC.MIB->isTailCall(Inst))
+ClobberedRegs.set();
+}
+
+SrcState S = createEntryState();
+S.SafeToDerefRegs.reset(ClobberedRegs);
+S.TrustedRegs.reset(ClobberedRegs);
+return S;
+  }
+
 public:
   CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF,
   MCPlusBuilder::AllocatorIdTy AllocId,
@@ -682,6 +707,7 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
   }
 
   void run() override {
+const SrcState DefaultState = computePessimisticState(BF);
 SrcState S = createEntryState();
 for (auto &I : BF.instrs()) {
   MCInst &Inst = I.second;
@@ -696,7 +722,7 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
 LLVM_DEBUG({
   traceInst(BC, "Due to label, resetting the state before", Inst);
 });
-S = createUnsafeState();
+S = DefaultState;
   }
 
   // Check if we need to remove an old annotation (this is the case if
@@ -1233,6 +1259,83 @@ shouldReportReturnGadget(const BinaryContext &BC, const 
MCInstReference &Inst,
   return make_gadget_report(RetKind, Inst, *RetReg);
 }
 
+/// While BOLT already marks some of the branch instructions as tail calls,
+/// this function tries to improve the coverage by including less obvious cases
+/// when it is possible to do without introducing too many false positives.
+static bool shouldAnalyzeTailCallInst(const BinaryContext &BC,
+  const BinaryFunction &BF,
+  const MCInstReference &Inst) {
+  // Some BC.MIB->isXYZ(Inst) methods simply delegate to MCInstrDesc::isXYZ()
+  // (such as isBranch at the time of writing this comment), some don't (such
+  // as isCall). For that reason, call MCInstrDesc's methods explicitly when
+  // it is important.
+  const MCInstrDesc &Desc =
+  BC.MII->get(static_cast(Inst).getOpcode());
+  // Tail call should be a branch (but not necessarily an indirect one).
+  if (!Desc.isBranch())
+return false;
+
+  // Always analyze the branches already marked as tail calls by BOLT.
+  if (BC.MIB->isTailCall(Inst))
+return t

[llvm-branch-commits] [llvm] [AArch64][llvm] Codegen for 16/32/64-bit floating-point atomicrmw fminimum/faximum (PR #137703)

2025-04-30 Thread via llvm-branch-commits

https://github.com/CarolineConcatto approved this pull request.

Thank you Jonathan,
LGTM!

https://github.com/llvm/llvm-project/pull/137703
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-30 Thread Sam Tebbs via llvm-branch-commits


@@ -986,11 +986,23 @@ InstructionCost TargetTransformInfo::getShuffleCost(
 
 TargetTransformInfo::PartialReductionExtendKind
 TargetTransformInfo::getPartialReductionExtendKind(Instruction *I) {
-  if (isa(I))
-return PR_SignExtend;
-  if (isa(I))
+  auto *Cast = dyn_cast(I);
+  if (!Cast)
+return PR_None;
+  return getPartialReductionExtendKind(Cast->getOpcode());
+}
+
+TargetTransformInfo::PartialReductionExtendKind
+TargetTransformInfo::getPartialReductionExtendKind(
+Instruction::CastOps ExtOpcode) {
+  switch (ExtOpcode) {
+  case Instruction::CastOps::ZExt:
 return PR_ZeroExtend;
-  return PR_None;
+  case Instruction::CastOps::SExt:
+return PR_SignExtend;
+  default:
+return PR_None;

SamTebbs33 wrote:

Done.

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-30 Thread Sam Tebbs via llvm-branch-commits


@@ -986,11 +986,23 @@ InstructionCost TargetTransformInfo::getShuffleCost(
 
 TargetTransformInfo::PartialReductionExtendKind
 TargetTransformInfo::getPartialReductionExtendKind(Instruction *I) {
-  if (isa(I))
-return PR_SignExtend;
-  if (isa(I))
+  auto *Cast = dyn_cast(I);
+  if (!Cast)
+return PR_None;
+  return getPartialReductionExtendKind(Cast->getOpcode());

SamTebbs33 wrote:

Done.

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-30 Thread Sam Tebbs via llvm-branch-commits


@@ -2432,12 +2437,40 @@ static void 
tryToCreateAbstractReductionRecipe(VPReductionRecipe *Red,
   Red->replaceAllUsesWith(AbstractR);
 }
 
+/// This function tries to create an abstract recipe from a partial reduction 
to
+/// hide its mul and extends from cost estimation.
+static void
+tryToCreateAbstractPartialReductionRecipe(VPPartialReductionRecipe *PRed) {

SamTebbs33 wrote:

At this point we've already created the partial reduction and clamped the range 
so I don't think we need to do any costing (like 
`tryToMatchAndCreateMulAccumulateReduction` does with `getMulAccReductionCost`) 
since we already know it's worthwhile (see `getScaledReductions` in 
LoopVectorize.cpp). This part of the code just puts the partial reduction 
inside the abstract recipe, which shouldn't need to consider any costing.

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: account for BRK when searching for auth oracles (PR #137975)

2025-04-30 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko ready_for_review 
https://github.com/llvm/llvm-project/pull/137975
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect authentication oracles (PR #135663)

2025-04-30 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/135663

>From 46e3e96a9c0ffab84ee7003621041b7bf76f3d78 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Sat, 5 Apr 2025 14:54:01 +0300
Subject: [PATCH 1/4] [BOLT] Gadget scanner: detect authentication oracles

Implement the detection of authentication instructions whose results can
be inspected by an attacker to know whether authentication succeeded.

As the properties of output registers of authentication instructions are
inspected, add a second set of analysis-related classes to iterate over
the instructions in reverse order.
---
 bolt/include/bolt/Passes/PAuthGadgetScanner.h |  12 +
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 543 +
 .../AArch64/gs-pauth-authentication-oracles.s | 723 ++
 .../AArch64/gs-pauth-debug-output.s   |  78 ++
 4 files changed, 1356 insertions(+)
 create mode 100644 
bolt/test/binary-analysis/AArch64/gs-pauth-authentication-oracles.s

diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h 
b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
index a0dd33ff88d6f..73198be81f396 100644
--- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h
+++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
@@ -286,6 +286,15 @@ class ClobberingInfo : public ExtraInfo {
   void print(raw_ostream &OS, const MCInstReference Location) const override;
 };
 
+class LeakageInfo : public ExtraInfo {
+  SmallVector LeakingInstrs;
+
+public:
+  LeakageInfo(const ArrayRef Instrs) : LeakingInstrs(Instrs) 
{}
+
+  void print(raw_ostream &OS, const MCInstReference Location) const override;
+};
+
 /// A brief version of a report that can be further augmented with the details.
 ///
 /// A half-baked report produced on the first run of the analysis. An extra,
@@ -326,6 +335,9 @@ class FunctionAnalysisContext {
   void findUnsafeUses(SmallVector> &Reports);
   void augmentUnsafeUseReports(ArrayRef> Reports);
 
+  void findUnsafeDefs(SmallVector> &Reports);
+  void augmentUnsafeDefReports(ArrayRef> Reports);
+
   /// Process the reports which do not have to be augmented, and remove them
   /// from Reports.
   void handleSimpleReports(SmallVector> &Reports);
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 24ad4844ffb7d..1dd2e3824e385 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -717,6 +717,459 @@ SrcSafetyAnalysis::create(BinaryFunction &BF,
RegsToTrackInstsFor);
 }
 
+/// A state representing which registers are safe to be used as the destination
+/// operand of an authentication instruction.
+///
+/// Similar to SrcState, it is the analysis that should take register aliasing
+/// into account.
+///
+/// Depending on the implementation, it may be possible that an authentication
+/// instruction returns an invalid pointer on failure instead of terminating
+/// the program immediately (assuming the program will crash as soon as that
+/// pointer is dereferenced). To prevent brute-forcing the correct signature,
+/// it should be impossible for an attacker to test if a pointer is correctly
+/// signed - either the program should be terminated on authentication failure
+/// or it should be impossible to tell whether authentication succeeded or not.
+///
+/// For that reason, a restricted set of operations is allowed on any register
+/// containing a value derived from the result of an authentication instruction
+/// until that register is either wiped or checked not to contain a result of a
+/// failed authentication.
+///
+/// Specifically, the safety property for a register is computed by iterating
+/// the instructions in backward order: the source register Xn of an 
instruction
+/// Inst is safe if at least one of the following is true:
+/// * Inst checks if Xn contains the result of a successful authentication and
+///   terminates the program on failure. Note that Inst can either naturally
+///   dereference Xn (load, branch, return, etc. instructions) or be the first
+///   instruction of an explicit checking sequence.
+/// * Inst performs safe address arithmetic AND both source and result
+///   registers, as well as any temporary registers, must be safe after
+///   execution of Inst (temporaries are not used on AArch64 and thus not
+///   currently supported/allowed).
+///   See MCPlusBuilder::analyzeAddressArithmeticsForPtrAuth for the details.
+/// * Inst fully overwrites Xn with an unrelated value.
+struct DstState {
+  /// The set of registers whose values cannot be inspected by an attacker in
+  /// a way usable as an authentication oracle. The results of authentication
+  /// instructions should be written to such registers.
+  BitVector CannotEscapeUnchecked;
+
+  std::vector> FirstInstLeakingReg;
+
+  /// Construct an empty state.
+  DstState() {}
+
+  DstState(unsigned NumRegs, unsigned NumRegsToTrack)
+  : Cannot

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: account for BRK when searching for auth oracles (PR #137975)

2025-04-30 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko created 
https://github.com/llvm/llvm-project/pull/137975

An authenticated pointer can be explicitly checked by the compiler via a
sequence of instructions that executes BRK on failure. It is important
to recognize such BRK instruction as checking every register (as it is
expected to immediately trigger an abnormal program termination) to
prevent false positive reports about authentication oracles:

autia   x2, x3
autia   x0, x1
; neither x0 nor x2 are checked at this point
eor x16, x0, x0, lsl #1
tbz x16, #62, on_success ; marks x0 as checked
; end of BB: for x2 to be checked here, it must be checked in both
; successor basic blocks
  on_failure:
brk 0xc470
  on_success:
; x2 is checked
ldr x1, [x2] ; marks x2 as checked

>From 5d0aa042415f6b4321e5d278ce435ceef241dd3e Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Wed, 30 Apr 2025 16:08:10 +0300
Subject: [PATCH] [BOLT] Gadget scanner: account for BRK when searching for
 auth oracles

An authenticated pointer can be explicitly checked by the compiler via a
sequence of instructions that executes BRK on failure. It is important
to recognize such BRK instruction as checking every register (as it is
expected to immediately trigger an abnormal program termination) to
prevent false positive reports about authentication oracles:

autia   x2, x3
autia   x0, x1
; neither x0 nor x2 are checked at this point
eor x16, x0, x0, lsl #1
tbz x16, #62, on_success ; marks x0 as checked
; end of BB: for x2 to be checked here, it must be checked in both
; successor basic blocks
  on_failure:
brk 0xc470
  on_success:
; x2 is checked
ldr x1, [x2] ; marks x2 as checked
---
 bolt/include/bolt/Core/MCPlusBuilder.h| 14 ++
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 13 +-
 .../Target/AArch64/AArch64MCPlusBuilder.cpp   | 24 --
 .../AArch64/gs-pauth-address-checks.s | 44 +--
 .../AArch64/gs-pauth-authentication-oracles.s |  9 ++--
 .../AArch64/gs-pauth-signing-oracles.s|  6 +--
 6 files changed, 75 insertions(+), 35 deletions(-)

diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h 
b/bolt/include/bolt/Core/MCPlusBuilder.h
index 83ad70ea97076..b0484964cfbad 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -706,6 +706,20 @@ class MCPlusBuilder {
 return false;
   }
 
+  /// Returns true if Inst is a trap instruction.
+  ///
+  /// Tests if Inst is an instruction that immediately causes an abnormal
+  /// program termination, for example when a security violation is detected
+  /// by a compiler-inserted check.
+  ///
+  /// @note An implementation of this method should likely return false for
+  /// calls to library functions like abort(), as it is possible that the
+  /// execution state is partially attacker-controlled at this point.
+  virtual bool isTrap(const MCInst &Inst) const {
+llvm_unreachable("not implemented");
+return false;
+  }
+
   virtual bool isBreakpoint(const MCInst &Inst) const {
 llvm_unreachable("not implemented");
 return false;
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index fab92947f6d67..20d4fdf42f03a 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -1003,6 +1003,15 @@ class DstSafetyAnalysis {
   dbgs() << ")\n";
 });
 
+// If this instruction terminates the program immediately, no
+// authentication oracles are possible past this point.
+if (BC.MIB->isTrap(Point)) {
+  LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); });
+  DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
+  Next.CannotEscapeUnchecked.set();
+  return Next;
+}
+
 // If this instruction is reachable by the analysis, a non-empty state will
 // be propagated to it sooner or later. Until then, skip computeNext().
 if (Cur.empty()) {
@@ -1108,8 +1117,8 @@ class DataflowDstSafetyAnalysis
 //
 // A basic block without any successors, on the other hand, can be
 // pessimistically initialized to everything-is-unsafe: this will naturally
-// handle both return and tail call instructions and is harmless for
-// internal indirect branch instructions (such as computed gotos).
+// handle return, trap and tail call instructions. At the same time, it is
+// harmless for internal indirect branch instructions, like computed gotos.
 if (BB.succ_empty())
   return createUnsafeState();
 
diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp 
b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
index f3c29e6ee43b9..4d11c5b206eab 100644
--- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
@@ -386,10 +386,9 @@ class AArch64MCPlusBuilder : public MCPlusB

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: account for BRK when searching for auth oracles (PR #137975)

2025-04-30 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-bolt

Author: Anatoly Trosinenko (atrosinenko)


Changes

An authenticated pointer can be explicitly checked by the compiler via a
sequence of instructions that executes BRK on failure. It is important
to recognize such BRK instruction as checking every register (as it is
expected to immediately trigger an abnormal program termination) to
prevent false positive reports about authentication oracles:

  autia   x2, x3
  autia   x0, x1
  ; neither x0 nor x2 are checked at this point
  eor x16, x0, x0, lsl #1
  tbz x16, #62, on_success ; marks x0 as checked
  ; end of BB: for x2 to be checked here, it must be checked in both
  ; successor basic blocks
on_failure:
  brk 0xc470
on_success:
  ; x2 is checked
  ldr x1, [x2] ; marks x2 as checked

---
Full diff: https://github.com/llvm/llvm-project/pull/137975.diff


6 Files Affected:

- (modified) bolt/include/bolt/Core/MCPlusBuilder.h (+14) 
- (modified) bolt/lib/Passes/PAuthGadgetScanner.cpp (+11-2) 
- (modified) bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp (+21-3) 
- (modified) bolt/test/binary-analysis/AArch64/gs-pauth-address-checks.s 
(+22-22) 
- (modified) 
bolt/test/binary-analysis/AArch64/gs-pauth-authentication-oracles.s (+4-5) 
- (modified) bolt/test/binary-analysis/AArch64/gs-pauth-signing-oracles.s 
(+3-3) 


``diff
diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h 
b/bolt/include/bolt/Core/MCPlusBuilder.h
index 83ad70ea97076..b0484964cfbad 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -706,6 +706,20 @@ class MCPlusBuilder {
 return false;
   }
 
+  /// Returns true if Inst is a trap instruction.
+  ///
+  /// Tests if Inst is an instruction that immediately causes an abnormal
+  /// program termination, for example when a security violation is detected
+  /// by a compiler-inserted check.
+  ///
+  /// @note An implementation of this method should likely return false for
+  /// calls to library functions like abort(), as it is possible that the
+  /// execution state is partially attacker-controlled at this point.
+  virtual bool isTrap(const MCInst &Inst) const {
+llvm_unreachable("not implemented");
+return false;
+  }
+
   virtual bool isBreakpoint(const MCInst &Inst) const {
 llvm_unreachable("not implemented");
 return false;
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index fab92947f6d67..20d4fdf42f03a 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -1003,6 +1003,15 @@ class DstSafetyAnalysis {
   dbgs() << ")\n";
 });
 
+// If this instruction terminates the program immediately, no
+// authentication oracles are possible past this point.
+if (BC.MIB->isTrap(Point)) {
+  LLVM_DEBUG({ traceInst(BC, "Trap instruction found", Point); });
+  DstState Next(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
+  Next.CannotEscapeUnchecked.set();
+  return Next;
+}
+
 // If this instruction is reachable by the analysis, a non-empty state will
 // be propagated to it sooner or later. Until then, skip computeNext().
 if (Cur.empty()) {
@@ -1108,8 +1117,8 @@ class DataflowDstSafetyAnalysis
 //
 // A basic block without any successors, on the other hand, can be
 // pessimistically initialized to everything-is-unsafe: this will naturally
-// handle both return and tail call instructions and is harmless for
-// internal indirect branch instructions (such as computed gotos).
+// handle return, trap and tail call instructions. At the same time, it is
+// harmless for internal indirect branch instructions, like computed gotos.
 if (BB.succ_empty())
   return createUnsafeState();
 
diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp 
b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
index f3c29e6ee43b9..4d11c5b206eab 100644
--- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
@@ -386,10 +386,9 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
 // the list of successors of this basic block as appropriate.
 
 // Any of the above code sequences assume the fall-through basic block
-// is a dead-end BRK instruction (any immediate operand is accepted).
+// is a dead-end trap instruction.
 const BinaryBasicBlock *BreakBB = BB.getFallthrough();
-if (!BreakBB || BreakBB->empty() ||
-BreakBB->front().getOpcode() != AArch64::BRK)
+if (!BreakBB || BreakBB->empty() || !isTrap(BreakBB->front()))
   return std::nullopt;
 
 // Iterate over the instructions of BB in reverse order, matching opcodes
@@ -1745,6 +1744,25 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
 Inst.addOperand(MCOperand::createImm(0));
   }
 
+  bool isTrap(const MCInst &Inst) const override {
+if (

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: account for BRK when searching for auth oracles (PR #137975)

2025-04-30 Thread Anatoly Trosinenko via llvm-branch-commits

atrosinenko wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/137975?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#137975** https://app.graphite.dev/github/pr/llvm/llvm-project/137975?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> ๐Ÿ‘ˆ https://app.graphite.dev/github/pr/llvm/llvm-project/137975?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#137224** https://app.graphite.dev/github/pr/llvm/llvm-project/137224?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#136183** https://app.graphite.dev/github/pr/llvm/llvm-project/136183?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#136151** https://app.graphite.dev/github/pr/llvm/llvm-project/136151?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#135663** https://app.graphite.dev/github/pr/llvm/llvm-project/135663?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#136147** https://app.graphite.dev/github/pr/llvm/llvm-project/136147?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#135662** https://app.graphite.dev/github/pr/llvm/llvm-project/135662?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#135661** https://app.graphite.dev/github/pr/llvm/llvm-project/135661?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#134146** https://app.graphite.dev/github/pr/llvm/llvm-project/134146?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#133461** https://app.graphite.dev/github/pr/llvm/llvm-project/133461?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#135073** https://app.graphite.dev/github/pr/llvm/llvm-project/135073?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/137975
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: account for BRK when searching for auth oracles (PR #137975)

2025-04-30 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko edited 
https://github.com/llvm/llvm-project/pull/137975
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Recommit "[ConstraintElim] Simplify cmp after uadd.sat/usub.sat (#135603)" (PR #136467)

2025-04-30 Thread Iris Shi via llvm-branch-commits

el-ev wrote:

### Merge activity

* **Apr 30, 9:40 AM EDT**: A user started a stack merge that includes this pull 
request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/136467).


https://github.com/llvm/llvm-project/pull/136467
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-30 Thread Sander de Smalen via llvm-branch-commits


@@ -4923,9 +4923,7 @@ InstructionCost AArch64TTIImpl::getPartialReductionCost(
 return Invalid;
   break;
 case 16:
-  if (AccumEVT == MVT::i64)
-Cost *= 2;
-  else if (AccumEVT != MVT::i32)
+  if (AccumEVT != MVT::i32)

sdesmalen-arm wrote:

```llvm.partial.reduce(nxv2i64 %acc, mul(zext(nxv16i8 %a) -> nxv16i64, 
zext(nxv16i8 %b) -> nxv16i64))```
This case can be lowered with the correct support in selectiondag (see 
https://github.com/llvm/llvm-project/pull/130935#discussion_r2068162662)

I'd argue that this function will need a bit of a rewrite anyway once we get 
proper support in SelectionDAG for legalising these operations, so I wouldn't 
bother changing that in this PR.

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-30 Thread Sander de Smalen via llvm-branch-commits


@@ -2056,55 +2056,6 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe,
   }
 };
 
-/// A recipe for forming partial reductions. In the loop, an accumulator and

sdesmalen-arm wrote:

(We discussed this offline) swapping the operands in the debug-print function 
of the recipe is not something that really concerns me, and I think there's 
still value making this functionally (from end-user perspective) NFC change.

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-30 Thread Sander de Smalen via llvm-branch-commits


@@ -986,11 +986,23 @@ InstructionCost TargetTransformInfo::getShuffleCost(
 
 TargetTransformInfo::PartialReductionExtendKind
 TargetTransformInfo::getPartialReductionExtendKind(Instruction *I) {
-  if (isa(I))
-return PR_SignExtend;
-  if (isa(I))
+  auto *Cast = dyn_cast(I);
+  if (!Cast)
+return PR_None;
+  return getPartialReductionExtendKind(Cast->getOpcode());
+}
+
+TargetTransformInfo::PartialReductionExtendKind
+TargetTransformInfo::getPartialReductionExtendKind(
+Instruction::CastOps ExtOpcode) {
+  switch (ExtOpcode) {
+  case Instruction::CastOps::ZExt:
 return PR_ZeroExtend;
-  return PR_None;
+  case Instruction::CastOps::SExt:
+return PR_SignExtend;
+  default:
+return PR_None;

sdesmalen-arm wrote:

I think the default case should be an `llvm_unreachable()`, because only 
zero/sign-extend are supported partial reduction extends. The `PR_None` is 
handled by the function above that takes an `Instruction*`.

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-30 Thread Sander de Smalen via llvm-branch-commits


@@ -986,11 +986,23 @@ InstructionCost TargetTransformInfo::getShuffleCost(
 
 TargetTransformInfo::PartialReductionExtendKind
 TargetTransformInfo::getPartialReductionExtendKind(Instruction *I) {
-  if (isa(I))
-return PR_SignExtend;
-  if (isa(I))
+  auto *Cast = dyn_cast(I);
+  if (!Cast)
+return PR_None;
+  return getPartialReductionExtendKind(Cast->getOpcode());

sdesmalen-arm wrote:

```suggestion
  if (auto *Cast = dyn_cast(I))
return getPartialReductionExtendKind(Cast->getOpcode());
  return PR_None;
```

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-30 Thread Sander de Smalen via llvm-branch-commits


@@ -2432,12 +2437,40 @@ static void 
tryToCreateAbstractReductionRecipe(VPReductionRecipe *Red,
   Red->replaceAllUsesWith(AbstractR);
 }
 
+/// This function tries to create an abstract recipe from a partial reduction 
to
+/// hide its mul and extends from cost estimation.
+static void
+tryToCreateAbstractPartialReductionRecipe(VPPartialReductionRecipe *PRed) {

sdesmalen-arm wrote:

Does this need to be given the `Range &` and for that range to be clamped if it 
doesn't match or if the cost is higher than the individual operations (similar 
to what happens in `tryToCreateAbstractReductionRecipe`) ?

(note that the cost part is still missing)

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] release/20.x: [flang] Exempt construct entities from SAVE check for PURE (#131383) (PR #137752)

2025-04-30 Thread Eugene Epshteyn via llvm-branch-commits

https://github.com/eugeneepshteyn approved this pull request.


https://github.com/llvm/llvm-project/pull/137752
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-30 Thread Sam Tebbs via llvm-branch-commits


@@ -2432,12 +2437,40 @@ static void 
tryToCreateAbstractReductionRecipe(VPReductionRecipe *Red,
   Red->replaceAllUsesWith(AbstractR);
 }
 
+/// This function tries to create an abstract recipe from a partial reduction 
to
+/// hide its mul and extends from cost estimation.
+static void
+tryToCreateAbstractPartialReductionRecipe(VPPartialReductionRecipe *PRed) {
+  if (PRed->getOpcode() != Instruction::Add)
+return;
+
+  VPRecipeBase *BinOpR = PRed->getBinOp()->getDefiningRecipe();
+  auto *BinOp = dyn_cast(BinOpR);
+  if (!BinOp || BinOp->getOpcode() != Instruction::Mul)
+return;

SamTebbs33 wrote:

Done :+1: .

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AutoUpgrade][AMDGPU] Adjust AS7 address width to 48 bits (PR #137418)

2025-04-30 Thread Krzysztof Drewniak via llvm-branch-commits

krzysz00 wrote:

So p9 is quite weird. It's used, IIRC, by, 
https://github.com/GPUOpen-Drivers/llpc internally to represent "structured" 
pointers, and so its interpretation is `{p8 rsrc, i32 index, i32 offset}` . 
This does still have a 48-bit effective address by way of a thoroughly weird 
ptrtoaddr function that's, in it's full generality, a function of the resource 
flags (see swizzles)

---

As to ptrmask on p8 ... if ptrmask is only supposed to affect the Index bits, 
then I probably shouldn't be emitting it during the p7 => p8 process and only 
applying a mask to the offset. But that hits the same weirdness that masking 
for alignment doesn't work as expected if the base pointer isn't quite aligned 
enough.

(I can't think of many instances where that base pointer will, for example, be 
odd,  but I'm not aware of anything saying it's not allowed. Though I'd 
personally be OK with documenting around that)

I think most codegen - including down in LLPC - assumes the base is at least 
`align 4`, which is the largest value where you sometimes need to change 
behavior, so you can quietly figure that the alignment of the offset is 
*basically* the alignment of the pointer 

---

Actually, more general question on this address with stuff: x86 thread local 
storage. That's a form of pointer with an implicit base that can mess up your 
alignment checks - how's that work with ptrtoaddr?

https://github.com/llvm/llvm-project/pull/137418
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect authentication oracles (PR #135663)

2025-04-30 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/135663

>From 46e3e96a9c0ffab84ee7003621041b7bf76f3d78 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Sat, 5 Apr 2025 14:54:01 +0300
Subject: [PATCH 1/4] [BOLT] Gadget scanner: detect authentication oracles

Implement the detection of authentication instructions whose results can
be inspected by an attacker to know whether authentication succeeded.

As the properties of output registers of authentication instructions are
inspected, add a second set of analysis-related classes to iterate over
the instructions in reverse order.
---
 bolt/include/bolt/Passes/PAuthGadgetScanner.h |  12 +
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 543 +
 .../AArch64/gs-pauth-authentication-oracles.s | 723 ++
 .../AArch64/gs-pauth-debug-output.s   |  78 ++
 4 files changed, 1356 insertions(+)
 create mode 100644 
bolt/test/binary-analysis/AArch64/gs-pauth-authentication-oracles.s

diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h 
b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
index a0dd33ff88d6f..73198be81f396 100644
--- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h
+++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
@@ -286,6 +286,15 @@ class ClobberingInfo : public ExtraInfo {
   void print(raw_ostream &OS, const MCInstReference Location) const override;
 };
 
+class LeakageInfo : public ExtraInfo {
+  SmallVector LeakingInstrs;
+
+public:
+  LeakageInfo(const ArrayRef Instrs) : LeakingInstrs(Instrs) 
{}
+
+  void print(raw_ostream &OS, const MCInstReference Location) const override;
+};
+
 /// A brief version of a report that can be further augmented with the details.
 ///
 /// A half-baked report produced on the first run of the analysis. An extra,
@@ -326,6 +335,9 @@ class FunctionAnalysisContext {
   void findUnsafeUses(SmallVector> &Reports);
   void augmentUnsafeUseReports(ArrayRef> Reports);
 
+  void findUnsafeDefs(SmallVector> &Reports);
+  void augmentUnsafeDefReports(ArrayRef> Reports);
+
   /// Process the reports which do not have to be augmented, and remove them
   /// from Reports.
   void handleSimpleReports(SmallVector> &Reports);
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 24ad4844ffb7d..1dd2e3824e385 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -717,6 +717,459 @@ SrcSafetyAnalysis::create(BinaryFunction &BF,
RegsToTrackInstsFor);
 }
 
+/// A state representing which registers are safe to be used as the destination
+/// operand of an authentication instruction.
+///
+/// Similar to SrcState, it is the analysis that should take register aliasing
+/// into account.
+///
+/// Depending on the implementation, it may be possible that an authentication
+/// instruction returns an invalid pointer on failure instead of terminating
+/// the program immediately (assuming the program will crash as soon as that
+/// pointer is dereferenced). To prevent brute-forcing the correct signature,
+/// it should be impossible for an attacker to test if a pointer is correctly
+/// signed - either the program should be terminated on authentication failure
+/// or it should be impossible to tell whether authentication succeeded or not.
+///
+/// For that reason, a restricted set of operations is allowed on any register
+/// containing a value derived from the result of an authentication instruction
+/// until that register is either wiped or checked not to contain a result of a
+/// failed authentication.
+///
+/// Specifically, the safety property for a register is computed by iterating
+/// the instructions in backward order: the source register Xn of an 
instruction
+/// Inst is safe if at least one of the following is true:
+/// * Inst checks if Xn contains the result of a successful authentication and
+///   terminates the program on failure. Note that Inst can either naturally
+///   dereference Xn (load, branch, return, etc. instructions) or be the first
+///   instruction of an explicit checking sequence.
+/// * Inst performs safe address arithmetic AND both source and result
+///   registers, as well as any temporary registers, must be safe after
+///   execution of Inst (temporaries are not used on AArch64 and thus not
+///   currently supported/allowed).
+///   See MCPlusBuilder::analyzeAddressArithmeticsForPtrAuth for the details.
+/// * Inst fully overwrites Xn with an unrelated value.
+struct DstState {
+  /// The set of registers whose values cannot be inspected by an attacker in
+  /// a way usable as an authentication oracle. The results of authentication
+  /// instructions should be written to such registers.
+  BitVector CannotEscapeUnchecked;
+
+  std::vector> FirstInstLeakingReg;
+
+  /// Construct an empty state.
+  DstState() {}
+
+  DstState(unsigned NumRegs, unsigned NumRegsToTrack)
+  : Cannot

[llvm-branch-commits] [llvm] [AArch64][llvm] Pre-commit tests for #137703 (NFC) (PR #137702)

2025-04-30 Thread via llvm-branch-commits

https://github.com/CarolineConcatto approved this pull request.

I could not spot any significant difference between fmin and fminimum.  So I 
believe this is fine. Thank you for the work Jonathan

https://github.com/llvm/llvm-project/pull/137702
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/20.x: [AArch64][SME] Prevent spills of ZT0 when ZA is not enabled (PR #137683)

2025-04-30 Thread Benjamin Maxwell via llvm-branch-commits

MacDue wrote:

@sdesmalen-arm What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/137683
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [GOFF] Add writing of text records (PR #137235)

2025-04-30 Thread Kai Nacke via llvm-branch-commits

https://github.com/redstar updated 
https://github.com/llvm/llvm-project/pull/137235



  



Rate limit ยท GitHub


  body {
background-color: #f6f8fa;
color: #24292e;
font-family: -apple-system,BlinkMacSystemFont,Segoe 
UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol;
font-size: 14px;
line-height: 1.5;
margin: 0;
  }

  .container { margin: 50px auto; max-width: 600px; text-align: center; 
padding: 0 24px; }

  a { color: #0366d6; text-decoration: none; }
  a:hover { text-decoration: underline; }

  h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; 
text-shadow: 0 1px 0 #fff; }
  p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

  ul { list-style: none; margin: 25px 0; padding: 0; }
  li { display: table-cell; font-weight: bold; width: 1%; }

  .logo { display: inline-block; margin-top: 35px; }
  .logo-img-2x { display: none; }
  @media
  only screen and (-webkit-min-device-pixel-ratio: 2),
  only screen and (   min--moz-device-pixel-ratio: 2),
  only screen and ( -o-min-device-pixel-ratio: 2/1),
  only screen and (min-device-pixel-ratio: 2),
  only screen and (min-resolution: 192dpi),
  only screen and (min-resolution: 2dppx) {
.logo-img-1x { display: none; }
.logo-img-2x { display: inline-block; }
  }

  #suggestions {
margin-top: 35px;
color: #ccc;
  }
  #suggestions a {
color: #66;
font-weight: 200;
font-size: 14px;
margin: 0 10px;
  }


  
  



  Whoa there!
  You have exceeded a secondary rate limit.
Please wait a few minutes before you try again;
in some cases this may take up to an hour.
  
  
https://support.github.com/contact";>Contact Support โ€”
https://githubstatus.com";>GitHub Status โ€”
https://twitter.com/githubstatus";>@githubstatus
  

  

  

  

  

  


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT][test] Fix callcont-fallthru.s after #129481 (PR #135867)

2025-04-30 Thread Vladislav Khmelevsky via llvm-branch-commits

yota9 wrote:

@paschalis-mpeis  Could you please check if the binaries are identical and it 
is indeed nm problem? E.g. with objdump, is plt entry is there? Maybe there is 
problem related to the plt section type, e.g. one of the binaries has .plt.sec 
or .plt.got section and there is some kind of but in nm that not lists symbols 
from these sections.  Then we can use custom linker script with .plt section 
only.
My next suggestion would be just using llvm-nm here is to pass llvm-nm with 
--nmtool arg to link_fdata.py, since the 
[lit.cfg.py](https://github.com/llvm/llvm-project/blob/main/bolt/test/lit.cfg.py#L118)
 has it in the list of mandatory tools for bolt testing, so we won't have 
environment dependencies here.  Maybe even add nmtool as an link_fdata_cmd arg 
in lit.cfg.py , so all test would use it by default...

https://github.com/llvm/llvm-project/pull/135867
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AutoUpgrade][AMDGPU] Adjust AS7 address width to 48 bits (PR #137418)

2025-04-30 Thread Alexander Richardson via llvm-branch-commits

arichardson wrote:

@krzysz00 I tried setting the p8 index size to 0 and this almost works but 
there are a handful of tests that fail.

The first one is:
```
define ptr addrspace(8) @v_ptrmask_buffer_resource_variable_i128_neg8(ptr 
addrspace(8) %ptr) {
; GCN-LABEL: v_ptrmask_buffer_resource_variable_i128_neg8:
; GCN:   ; %bb.0:
; GCN-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT:v_and_b32_e32 v0, -8, v0
; GCN-NEXT:s_setpc_b64 s[30:31]
;
; GFX10PLUS-LABEL: v_ptrmask_buffer_resource_variable_i128_neg8:
; GFX10PLUS:   ; %bb.0:
; GFX10PLUS-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10PLUS-NEXT:v_and_b32_e32 v0, -8, v0
; GFX10PLUS-NEXT:s_setpc_b64 s[30:31]
  %masked = call ptr addrspace(8) @llvm.ptrmask.p8.i128(ptr addrspace(8) %ptr, 
i128 -8)
  ret ptr addrspace(8) %masked
}
```
This fails with `llvm.ptrmask intrinsic second argument bitwidth must match 
pointer index type size of first argument`

How would you expect ptrmask to work here? I would imagine it should use the 
address width and not the index width since you have valid codegen for p8. I 
don't know enough about amdgpu assembly so I can't tell what is actually being 
done here.

Another test that fails is 
llvm/test/CodeGen/AMDGPU/GlobalISel/unsupported-ptr-add.ll which now crashes 
with an assertion failure inside `StraightLineStrengthReduce` instead of 
failing with `unable to legalize instruction`. I imagine we could easily avoid 
the assertion here and handle that with a sensible error message in the IR 
verifier.

llvm/test/CodeGen/AMDGPU/GlobalISel/unsupported-load.ll also asserts but should 
be easy to handle.

https://github.com/llvm/llvm-project/pull/137418
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/20.x: [Clang][MicrosoftMangle] Implement mangling for ConstantMatrixType (#134930) (PR #138017)

2025-04-30 Thread via llvm-branch-commits

github-actions[bot] wrote:

โš ๏ธ We detected that you are using a GitHub private e-mail address to contribute 
to the repo. Please turn off [Keep my email addresses 
private](https://github.com/settings/emails) setting in your account. See 
[LLVM 
Discourse](https://discourse.llvm.org/t/hidden-emails-on-github-should-we-do-something-about-it)
 for more information.

https://github.com/llvm/llvm-project/pull/138017
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add mandatory parameters for RootConstants (PR #138002)

2025-04-30 Thread Finn Plummer via llvm-branch-commits

https://github.com/inbelic created 
https://github.com/llvm/llvm-project/pull/138002

- defines the `parseRootConstantParams` function and adds handling for the 
mandatory arguments of `num32BitConstants` and `bReg`

- adds corresponding unit tests

Part two of implementing #126576 

>From 15857bf8e1303e2325b48e417e7abd26aa77910e Mon Sep 17 00:00:00 2001
From: Finn Plummer 
Date: Wed, 30 Apr 2025 17:53:11 +
Subject: [PATCH] [HLSL][RootSignature] Add mandatory parameters for
 RootConstants

- defines the `parseRootConstantParams` function and adds handling for
the mandatory arguments of `num32BitConstants` and `bReg`

- adds corresponding unit tests

Part two of implementing
---
 .../clang/Parse/ParseHLSLRootSignature.h  | 10 ++-
 clang/lib/Parse/ParseHLSLRootSignature.cpp| 68 ++-
 .../Parse/ParseHLSLRootSignatureTest.cpp  | 14 +++-
 .../llvm/Frontend/HLSL/HLSLRootSignature.h|  5 +-
 4 files changed, 89 insertions(+), 8 deletions(-)

diff --git a/clang/include/clang/Parse/ParseHLSLRootSignature.h 
b/clang/include/clang/Parse/ParseHLSLRootSignature.h
index efa735ea03d94..0f05b05ed4df6 100644
--- a/clang/include/clang/Parse/ParseHLSLRootSignature.h
+++ b/clang/include/clang/Parse/ParseHLSLRootSignature.h
@@ -77,8 +77,14 @@ class RootSignatureParser {
   parseDescriptorTableClause();
 
   /// Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any
-  /// order and only exactly once. `ParsedClauseParams` denotes the current
-  /// state of parsed params
+  /// order and only exactly once. The following methods define a
+  /// `Parsed.*Params` struct to denote the current state of parsed params
+  struct ParsedConstantParams {
+std::optional Reg;
+std::optional Num32BitConstants;
+  };
+  std::optional parseRootConstantParams();
+
   struct ParsedClauseParams {
 std::optional Reg;
 std::optional NumDescriptors;
diff --git a/clang/lib/Parse/ParseHLSLRootSignature.cpp 
b/clang/lib/Parse/ParseHLSLRootSignature.cpp
index 48d3e38b0519d..2ce8e6e5cca98 100644
--- a/clang/lib/Parse/ParseHLSLRootSignature.cpp
+++ b/clang/lib/Parse/ParseHLSLRootSignature.cpp
@@ -57,6 +57,27 @@ std::optional 
RootSignatureParser::parseRootConstants() {
 
   RootConstants Constants;
 
+  auto Params = parseRootConstantParams();
+  if (!Params.has_value())
+return std::nullopt;
+
+  // Check mandatory parameters were provided
+  if (!Params->Num32BitConstants.has_value()) {
+getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_missing_param)
+<< TokenKind::kw_num32BitConstants;
+return std::nullopt;
+  }
+
+  Constants.Num32BitConstants = Params->Num32BitConstants.value();
+
+  if (!Params->Reg.has_value()) {
+getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_missing_param)
+<< TokenKind::bReg;
+return std::nullopt;
+  }
+
+  Constants.Reg = Params->Reg.value();
+
   if (consumeExpectedToken(TokenKind::pu_r_paren,
diag::err_hlsl_unexpected_end_of_params,
/*param of=*/TokenKind::kw_RootConstants))
@@ -187,14 +208,55 @@ RootSignatureParser::parseDescriptorTableClause() {
   return Clause;
 }
 
+// Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any
+// order and only exactly once. The following methods will parse through as
+// many arguments as possible reporting an error if a duplicate is seen.
+std::optional
+RootSignatureParser::parseRootConstantParams() {
+  assert(CurToken.TokKind == TokenKind::pu_l_paren &&
+ "Expects to only be invoked starting at given token");
+
+  ParsedConstantParams Params;
+  do {
+// `num32BitConstants` `=` POS_INT
+if (tryConsumeExpectedToken(TokenKind::kw_num32BitConstants)) {
+  if (Params.Num32BitConstants.has_value()) {
+getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param)
+<< CurToken.TokKind;
+return std::nullopt;
+  }
+
+  if (consumeExpectedToken(TokenKind::pu_equal))
+return std::nullopt;
+
+  auto Num32BitConstants = parseUIntParam();
+  if (!Num32BitConstants.has_value())
+return std::nullopt;
+  Params.Num32BitConstants = Num32BitConstants;
+}
+
+// `b` POS_INT
+if (tryConsumeExpectedToken(TokenKind::bReg)) {
+  if (Params.Reg.has_value()) {
+getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param)
+<< CurToken.TokKind;
+return std::nullopt;
+  }
+  auto Reg = parseRegister();
+  if (!Reg.has_value())
+return std::nullopt;
+  Params.Reg = Reg;
+}
+  } while (tryConsumeExpectedToken(TokenKind::pu_comma));
+
+  return Params;
+}
+
 std::optional
 RootSignatureParser::parseDescriptorTableClauseParams(TokenKind RegType) {
   assert(CurToken.TokKind == TokenKind::pu_l_paren &&
  "Expects to only be invoked starting at given token");
 
-  // Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any
-  // order

[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add mandatory parameters for RootConstants (PR #138002)

2025-04-30 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-hlsl

Author: Finn Plummer (inbelic)


Changes

- defines the `parseRootConstantParams` function and adds handling for the 
mandatory arguments of `num32BitConstants` and `bReg`

- adds corresponding unit tests

Part two of implementing #126576 

---
Full diff: https://github.com/llvm/llvm-project/pull/138002.diff


4 Files Affected:

- (modified) clang/include/clang/Parse/ParseHLSLRootSignature.h (+8-2) 
- (modified) clang/lib/Parse/ParseHLSLRootSignature.cpp (+65-3) 
- (modified) clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp (+12-2) 
- (modified) llvm/include/llvm/Frontend/HLSL/HLSLRootSignature.h (+4-1) 


``diff
diff --git a/clang/include/clang/Parse/ParseHLSLRootSignature.h 
b/clang/include/clang/Parse/ParseHLSLRootSignature.h
index efa735ea03d94..0f05b05ed4df6 100644
--- a/clang/include/clang/Parse/ParseHLSLRootSignature.h
+++ b/clang/include/clang/Parse/ParseHLSLRootSignature.h
@@ -77,8 +77,14 @@ class RootSignatureParser {
   parseDescriptorTableClause();
 
   /// Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any
-  /// order and only exactly once. `ParsedClauseParams` denotes the current
-  /// state of parsed params
+  /// order and only exactly once. The following methods define a
+  /// `Parsed.*Params` struct to denote the current state of parsed params
+  struct ParsedConstantParams {
+std::optional Reg;
+std::optional Num32BitConstants;
+  };
+  std::optional parseRootConstantParams();
+
   struct ParsedClauseParams {
 std::optional Reg;
 std::optional NumDescriptors;
diff --git a/clang/lib/Parse/ParseHLSLRootSignature.cpp 
b/clang/lib/Parse/ParseHLSLRootSignature.cpp
index 48d3e38b0519d..2ce8e6e5cca98 100644
--- a/clang/lib/Parse/ParseHLSLRootSignature.cpp
+++ b/clang/lib/Parse/ParseHLSLRootSignature.cpp
@@ -57,6 +57,27 @@ std::optional 
RootSignatureParser::parseRootConstants() {
 
   RootConstants Constants;
 
+  auto Params = parseRootConstantParams();
+  if (!Params.has_value())
+return std::nullopt;
+
+  // Check mandatory parameters were provided
+  if (!Params->Num32BitConstants.has_value()) {
+getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_missing_param)
+<< TokenKind::kw_num32BitConstants;
+return std::nullopt;
+  }
+
+  Constants.Num32BitConstants = Params->Num32BitConstants.value();
+
+  if (!Params->Reg.has_value()) {
+getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_missing_param)
+<< TokenKind::bReg;
+return std::nullopt;
+  }
+
+  Constants.Reg = Params->Reg.value();
+
   if (consumeExpectedToken(TokenKind::pu_r_paren,
diag::err_hlsl_unexpected_end_of_params,
/*param of=*/TokenKind::kw_RootConstants))
@@ -187,14 +208,55 @@ RootSignatureParser::parseDescriptorTableClause() {
   return Clause;
 }
 
+// Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any
+// order and only exactly once. The following methods will parse through as
+// many arguments as possible reporting an error if a duplicate is seen.
+std::optional
+RootSignatureParser::parseRootConstantParams() {
+  assert(CurToken.TokKind == TokenKind::pu_l_paren &&
+ "Expects to only be invoked starting at given token");
+
+  ParsedConstantParams Params;
+  do {
+// `num32BitConstants` `=` POS_INT
+if (tryConsumeExpectedToken(TokenKind::kw_num32BitConstants)) {
+  if (Params.Num32BitConstants.has_value()) {
+getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param)
+<< CurToken.TokKind;
+return std::nullopt;
+  }
+
+  if (consumeExpectedToken(TokenKind::pu_equal))
+return std::nullopt;
+
+  auto Num32BitConstants = parseUIntParam();
+  if (!Num32BitConstants.has_value())
+return std::nullopt;
+  Params.Num32BitConstants = Num32BitConstants;
+}
+
+// `b` POS_INT
+if (tryConsumeExpectedToken(TokenKind::bReg)) {
+  if (Params.Reg.has_value()) {
+getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param)
+<< CurToken.TokKind;
+return std::nullopt;
+  }
+  auto Reg = parseRegister();
+  if (!Reg.has_value())
+return std::nullopt;
+  Params.Reg = Reg;
+}
+  } while (tryConsumeExpectedToken(TokenKind::pu_comma));
+
+  return Params;
+}
+
 std::optional
 RootSignatureParser::parseDescriptorTableClauseParams(TokenKind RegType) {
   assert(CurToken.TokKind == TokenKind::pu_l_paren &&
  "Expects to only be invoked starting at given token");
 
-  // Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any
-  // order and only exactly once. Parse through as many arguments as possible
-  // reporting an error if a duplicate is seen.
   ParsedClauseParams Params;
   do {
 // ( `b` | `t` | `u` | `s`) POS_INT
diff --git a/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp 
b/clang/unittests/Parse/ParseHLSLRootSign

[llvm-branch-commits] [clang] release/20.x: [Clang][MicrosoftMangle] Implement mangling for ConstantMatrixType (#134930) (PR #138017)

2025-04-30 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/138017

Backport f5a30f111dc4ad6422863722eb708059a68a9d5c

Requested by: @tstellar

>From bc9b07f09ab1c728e4dcce614eb4ccfa2e5e52c7 Mon Sep 17 00:00:00 2001
From: Losy001 <64610343+losy...@users.noreply.github.com>
Date: Sat, 26 Apr 2025 18:04:12 +0200
Subject: [PATCH] [Clang][MicrosoftMangle] Implement mangling for
 ConstantMatrixType (#134930)

This pull request implements mangling for ConstantMatrixType, allowing
matrices to be used on Windows.

Related issues: #53158, #127127

This example code:
```cpp
#include 
#include 

typedef float Matrix4 __attribute__((matrix_type(4, 4)));

int main()
{
  printf("%s\n", typeid(Matrix4).name());
}
```
Outputs this:
```
struct __clang::__matrix
```

(cherry picked from commit f5a30f111dc4ad6422863722eb708059a68a9d5c)
---
 clang/lib/AST/MicrosoftMangle.cpp  | 17 ++-
 clang/test/CodeGenCXX/mangle-ms-matrix.cpp | 57 ++
 2 files changed, 73 insertions(+), 1 deletion(-)
 create mode 100644 clang/test/CodeGenCXX/mangle-ms-matrix.cpp

diff --git a/clang/lib/AST/MicrosoftMangle.cpp 
b/clang/lib/AST/MicrosoftMangle.cpp
index 42b735ccf4a2c..74c995f2f97f0 100644
--- a/clang/lib/AST/MicrosoftMangle.cpp
+++ b/clang/lib/AST/MicrosoftMangle.cpp
@@ -3552,7 +3552,22 @@ void MicrosoftCXXNameMangler::mangleType(const 
DependentSizedExtVectorType *T,
 
 void MicrosoftCXXNameMangler::mangleType(const ConstantMatrixType *T,
  Qualifiers quals, SourceRange Range) {
-  Error(Range.getBegin(), "matrix type") << Range;
+  QualType EltTy = T->getElementType();
+  const BuiltinType *ET = EltTy->getAs();
+
+  llvm::SmallString<64> TemplateMangling;
+  llvm::raw_svector_ostream Stream(TemplateMangling);
+  MicrosoftCXXNameMangler Extra(Context, Stream);
+
+  Stream << "?$";
+
+  Extra.mangleSourceName("__matrix");
+  Extra.mangleType(EltTy, Range, QMM_Escape);
+
+  Extra.mangleIntegerLiteral(llvm::APSInt::getUnsigned(T->getNumRows()));
+  Extra.mangleIntegerLiteral(llvm::APSInt::getUnsigned(T->getNumColumns()));
+
+  mangleArtificialTagType(TagTypeKind::Struct, TemplateMangling, {"__clang"});
 }
 
 void MicrosoftCXXNameMangler::mangleType(const DependentSizedMatrixType *T,
diff --git a/clang/test/CodeGenCXX/mangle-ms-matrix.cpp 
b/clang/test/CodeGenCXX/mangle-ms-matrix.cpp
new file mode 100644
index 0..b244aa6e33cfa
--- /dev/null
+++ b/clang/test/CodeGenCXX/mangle-ms-matrix.cpp
@@ -0,0 +1,57 @@
+// RUN: %clang_cc1 -fenable-matrix -fms-extensions -fcxx-exceptions 
-ffreestanding -target-feature +avx -emit-llvm %s -o - -triple=i686-pc-win32 | 
FileCheck %s
+// RUN: %clang_cc1 -fenable-matrix -fms-extensions -fcxx-exceptions 
-ffreestanding -target-feature +avx -emit-llvm %s -o - -triple=i686-pc-win32 
-fexperimental-new-constant-interpreter | FileCheck %s
+
+typedef float __attribute__((matrix_type(4, 4))) m4x4f;
+typedef float __attribute__((matrix_type(2, 2))) m2x2f;
+
+typedef int __attribute__((matrix_type(4, 4))) m4x4i;
+typedef int __attribute__((matrix_type(2, 2))) m2x2i;
+
+void thow(int i) {
+  switch (i) {
+case 0: throw m4x4f();
+// CHECK: ??_R0U?$__matrix@M$03$03@__clang@@@8
+// CHECK: _CT??_R0U?$__matrix@M$03$03@__clang@@@864
+// CHECK: _CTA1U?$__matrix@M$03$03@__clang@@
+// CHECK: _TI1U?$__matrix@M$03$03@__clang@@
+case 1: throw m2x2f();
+// CHECK: ??_R0U?$__matrix@M$01$01@__clang@@@8
+// CHECK: _CT??_R0U?$__matrix@M$01$01@__clang@@@816
+// CHECK: _CTA1U?$__matrix@M$01$01@__clang@@
+// CHECK: _TI1U?$__matrix@M$01$01@__clang@@
+case 2: throw m4x4i();
+// CHECK: ??_R0U?$__matrix@H$03$03@__clang@@@8
+// CHECK: _CT??_R0U?$__matrix@H$03$03@__clang@@@864
+// CHECK: _CTA1U?$__matrix@H$03$03@__clang@@
+// CHECK: _TI1U?$__matrix@H$03$03@__clang@@
+case 3: throw m2x2i();
+// CHECK: ??_R0U?$__matrix@H$01$01@__clang@@@8
+// CHECK: _CT??_R0U?$__matrix@H$01$01@__clang@@@816
+// CHECK: _CTA1U?$__matrix@H$01$01@__clang@@
+// CHECK: _TI1U?$__matrix@H$01$01@__clang@@
+  }
+}
+
+void foo44f(m4x4f) {}
+// CHECK: define dso_local void @"?foo44f@@YAXU?$__matrix@M$03$03@__clang@@@Z"
+
+m4x4f rfoo44f() { return m4x4f(); }
+// CHECK: define dso_local noundef <16 x float> 
@"?rfoo44f@@YAU?$__matrix@M$03$03@__clang@@XZ"
+
+void foo22f(m2x2f) {}
+// CHECK: define dso_local void @"?foo22f@@YAXU?$__matrix@M$01$01@__clang@@@Z"
+
+m2x2f rfoo22f() { return m2x2f(); }
+// CHECK: define dso_local noundef <4 x float> 
@"?rfoo22f@@YAU?$__matrix@M$01$01@__clang@@XZ"
+
+void foo44i(m4x4i) {}
+// CHECK: define dso_local void @"?foo44i@@YAXU?$__matrix@H$03$03@__clang@@@Z"
+
+m4x4i rfoo44i() { return m4x4i(); }
+// CHECK: define dso_local noundef <16 x i32> 
@"?rfoo44i@@YAU?$__matrix@H$03$03@__clang@@XZ"
+
+void foo22i(m2x2i) {}
+// CHECK: define dso_local void @"?foo22i@@YAXU?$__matrix@H$01$01@__clang@@@Z"
+
+m2x2i rfoo22i() { return m2x2i(); }
+// CHECK: define dso_local noundef <4 x

[llvm-branch-commits] [clang] release/20.x: [Clang][MicrosoftMangle] Implement mangling for ConstantMatrixType (#134930) (PR #138017)

2025-04-30 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/138017
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/20.x: [Clang][MicrosoftMangle] Implement mangling for ConstantMatrixType (#134930) (PR #138017)

2025-04-30 Thread via llvm-branch-commits

llvmbot wrote:

@rnk What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/138017
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [GOFF] Add writing of text records (PR #137235)

2025-04-30 Thread Ulrich Weigand via llvm-branch-commits


@@ -688,28 +689,32 @@ MCSectionGOFF *MCContext::getGOFFSection(SectionKind 
Kind, StringRef Name,
 return Iter->second;
 
   StringRef CachedName = StringRef(Iter->first.c_str(), Name.size());
-  MCSectionGOFF *GOFFSection = new (GOFFAllocator.Allocate()) MCSectionGOFF(
-  CachedName, Kind, Attributes, static_cast(Parent));
+  MCSectionGOFF *GOFFSection = new (GOFFAllocator.Allocate())
+  MCSectionGOFF(CachedName, Kind, IsVirtual, Attributes,
+static_cast(Parent));
   Iter->second = GOFFSection;
   allocInitialFragment(*GOFFSection);
   return GOFFSection;
 }
 
 MCSectionGOFF *MCContext::getGOFFSection(SectionKind Kind, StringRef Name,
  GOFF::SDAttr SDAttributes) {
-  return getGOFFSection(Kind, Name, SDAttributes, nullptr);
+  return getGOFFSection(Kind, Name, SDAttributes, nullptr,
+  /*IsVirtual=*/true);
 }
 
 MCSectionGOFF *MCContext::getGOFFSection(SectionKind Kind, StringRef Name,
  GOFF::EDAttr EDAttributes,
- MCSection *Parent) {
-  return getGOFFSection(Kind, Name, EDAttributes, Parent);
+ MCSection *Parent, bool IsVirtual) {
+  return getGOFFSection(Kind, Name, EDAttributes, Parent,
+  IsVirtual);

uweigand wrote:

Can we deduce the `IsVirtual` property from `EDAttributes.BindAlgorithm == 
GOFF::ESD_BA_Merge` here?  Having to explicitly specific it with every caller 
seems error-prone.

https://github.com/llvm/llvm-project/pull/137235
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][llvm] Codegen for 16/32/64-bit floating-point atomicrmw fminimum/faximum (PR #137703)

2025-04-30 Thread Jonathan Thackray via llvm-branch-commits

https://github.com/jthackray updated 
https://github.com/llvm/llvm-project/pull/137703

>From c47eaad006a6310f667ac5e12589935b5e62a322 Mon Sep 17 00:00:00 2001
From: Jonathan Thackray 
Date: Wed, 23 Apr 2025 21:10:31 +
Subject: [PATCH] [AArch64][llvm] Codegen for 16/32/64-bit floating-point
 atomicrmw fminimum/fmaximum

Codegen for AArch64 16/32/64-bit floating-point atomic read-modify-write
operations (`atomicrmw {fmaximum,fminimum}`) using LD{B}FMAX and
LD{B}FMIN atomic instructions.
---
 .../Target/AArch64/AArch64ISelLowering.cpp|  14 +-
 .../lib/Target/AArch64/AArch64InstrAtomics.td |   7 +
 .../AArch64/Atomics/aarch64-atomicrmw-lsfe.ll | 686 +++--
 .../Atomics/aarch64_be-atomicrmw-lsfe.ll  | 728 +++---
 4 files changed, 244 insertions(+), 1191 deletions(-)

diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 63924dc1b30ea..0d3cd575a517e 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -1001,6 +1001,16 @@ AArch64TargetLowering::AArch64TargetLowering(const 
TargetMachine &TM,
 setOperationAction(ISD::ATOMIC_LOAD_FMIN, MVT::f32, LibCall);
 setOperationAction(ISD::ATOMIC_LOAD_FMIN, MVT::f64, LibCall);
 setOperationAction(ISD::ATOMIC_LOAD_FMIN, MVT::bf16, LibCall);
+
+setOperationAction(ISD::ATOMIC_LOAD_FMAXIMUM, MVT::f16, LibCall);
+setOperationAction(ISD::ATOMIC_LOAD_FMAXIMUM, MVT::f32, LibCall);
+setOperationAction(ISD::ATOMIC_LOAD_FMAXIMUM, MVT::f64, LibCall);
+setOperationAction(ISD::ATOMIC_LOAD_FMAXIMUM, MVT::bf16, LibCall);
+
+setOperationAction(ISD::ATOMIC_LOAD_FMINIMUM, MVT::f16, LibCall);
+setOperationAction(ISD::ATOMIC_LOAD_FMINIMUM, MVT::f32, LibCall);
+setOperationAction(ISD::ATOMIC_LOAD_FMINIMUM, MVT::f64, LibCall);
+setOperationAction(ISD::ATOMIC_LOAD_FMINIMUM, MVT::bf16, LibCall);
   }
 
   if (Subtarget->hasLSE128()) {
@@ -27991,7 +28001,9 @@ 
AArch64TargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const {
   // If LSFE available, use atomic FP instructions in preference to expansion
   if (Subtarget->hasLSFE() && (AI->getOperation() == AtomicRMWInst::FAdd ||
AI->getOperation() == AtomicRMWInst::FMax ||
-   AI->getOperation() == AtomicRMWInst::FMin))
+   AI->getOperation() == AtomicRMWInst::FMin ||
+   AI->getOperation() == AtomicRMWInst::FMaximum ||
+   AI->getOperation() == AtomicRMWInst::FMinimum))
 return AtomicExpansionKind::None;
 
   // Nand is not supported in LSE.
diff --git a/llvm/lib/Target/AArch64/AArch64InstrAtomics.td 
b/llvm/lib/Target/AArch64/AArch64InstrAtomics.td
index f3734e05ae667..31fcd63b9f2c8 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrAtomics.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrAtomics.td
@@ -551,14 +551,21 @@ defm atomic_load_fadd  : 
binary_atomic_op_fp;
 defm atomic_load_fmin  : binary_atomic_op_fp;
 defm atomic_load_fmax  : binary_atomic_op_fp;
 
+defm atomic_load_fminimum  : binary_atomic_op_fp;
+defm atomic_load_fmaximum  : binary_atomic_op_fp;
+
 let Predicates = [HasLSFE] in {
   defm : LDFPOPregister_patterns<"LDFADD",   "atomic_load_fadd">;
   defm : LDFPOPregister_patterns<"LDFMAXNM", "atomic_load_fmax">;
   defm : LDFPOPregister_patterns<"LDFMINNM", "atomic_load_fmin">;
+  defm : LDFPOPregister_patterns<"LDFMAX",   "atomic_load_fmaximum">;
+  defm : LDFPOPregister_patterns<"LDFMIN",   "atomic_load_fminimum">;
 
   defm : LDBFPOPregister_patterns<"LDBFADD",   "atomic_load_fadd">;
   defm : LDBFPOPregister_patterns<"LDBFMAXNM", "atomic_load_fmax">;
   defm : LDBFPOPregister_patterns<"LDBFMINNM", "atomic_load_fmin">;
+  defm : LDBFPOPregister_patterns<"LDBFMAX",   "atomic_load_fmaximum">;
+  defm : LDBFPOPregister_patterns<"LDBFMIN",   "atomic_load_fminimum">;
 }
 
 // v8.9a/v9.4a FEAT_LRCPC patterns
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomicrmw-lsfe.ll 
b/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomicrmw-lsfe.ll
index 7ee2a0bb19c0e..d5ead2bff2f18 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomicrmw-lsfe.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomicrmw-lsfe.ll
@@ -1600,428 +1600,197 @@ define dso_local double 
@atomicrmw_fmin_double_unaligned_seq_cst(ptr %ptr, doubl
 }
 
 define dso_local half @atomicrmw_fmaximum_half_aligned_monotonic(ptr %ptr, 
half %value) {
-; -O0-LABEL: atomicrmw_fmaximum_half_aligned_monotonic:
-; -O0:ldaxrh w0, [x9]
-; -O0:cmp w0, w10, uxth
-; -O0:stlxrh w8, w11, [x9]
-; -O0:subs w8, w8, w0, uxth
-;
-; -O1-LABEL: atomicrmw_fmaximum_half_aligned_monotonic:
-; -O1:ldxrh w8, [x0]
-; -O1:stxrh w9, w8, [x0]
+; CHECK-LABEL: atomicrmw_fmaximum_half_aligned_monotonic:
+; CHECK:ldfmax h0, h0, [x0]
 %r = atomicrmw fmaximum ptr %ptr, half %value monotonic, align 2
 ret half %r
 }
 

[llvm-branch-commits] [llvm] [AArch64][llvm] Pre-commit tests for #137703 (NFC) (PR #137702)

2025-04-30 Thread Jonathan Thackray via llvm-branch-commits

https://github.com/jthackray updated 
https://github.com/llvm/llvm-project/pull/137702



  



Rate limit ยท GitHub


  body {
background-color: #f6f8fa;
color: #24292e;
font-family: -apple-system,BlinkMacSystemFont,Segoe 
UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol;
font-size: 14px;
line-height: 1.5;
margin: 0;
  }

  .container { margin: 50px auto; max-width: 600px; text-align: center; 
padding: 0 24px; }

  a { color: #0366d6; text-decoration: none; }
  a:hover { text-decoration: underline; }

  h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; 
text-shadow: 0 1px 0 #fff; }
  p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

  ul { list-style: none; margin: 25px 0; padding: 0; }
  li { display: table-cell; font-weight: bold; width: 1%; }

  .logo { display: inline-block; margin-top: 35px; }
  .logo-img-2x { display: none; }
  @media
  only screen and (-webkit-min-device-pixel-ratio: 2),
  only screen and (   min--moz-device-pixel-ratio: 2),
  only screen and ( -o-min-device-pixel-ratio: 2/1),
  only screen and (min-device-pixel-ratio: 2),
  only screen and (min-resolution: 192dpi),
  only screen and (min-resolution: 2dppx) {
.logo-img-1x { display: none; }
.logo-img-2x { display: inline-block; }
  }

  #suggestions {
margin-top: 35px;
color: #ccc;
  }
  #suggestions a {
color: #66;
font-weight: 200;
font-size: 14px;
margin: 0 10px;
  }


  
  



  Whoa there!
  You have exceeded a secondary rate limit.
Please wait a few minutes before you try again;
in some cases this may take up to an hour.
  
  
https://support.github.com/contact";>Contact Support โ€”
https://githubstatus.com";>GitHub Status โ€”
https://twitter.com/githubstatus";>@githubstatus
  

  

  

  

  

  


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] release/20.x: [wasm-ld] Refactor WasmSym from static globals to per-link context (#134970) (PR #137620)

2025-04-30 Thread Sam Clegg via llvm-branch-commits

sbc100 wrote:

> cc @tstellar
> 
> Not sure you saw my tag above but as I mentioned here [#137620 
> (comment)](https://github.com/llvm/llvm-project/pull/137620#issuecomment-2834979164)
>  I had to manually backport this due to some merge conflicts. This was 
> critical for downstream projects and I was hoping it would be picked up in 
> the 20.1.4 release ๐Ÿ˜ฌ
> 
> EDIT: I relized 20.1.4 is out, so probably you could help me move this a part 
> of 20.1.5 ? To be fair we actually track the release/20.x branch, so as soon 
> as this PR goes in, we can put it to use. Thanks !

Do you not build llvm from source in your project?  Can't you therefore build 
from tip-of-tree?

https://github.com/llvm/llvm-project/pull/137620
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add mandatory parameters for RootConstants (PR #138002)

2025-04-30 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Finn Plummer (inbelic)


Changes

- defines the `parseRootConstantParams` function and adds handling for the 
mandatory arguments of `num32BitConstants` and `bReg`

- adds corresponding unit tests

Part two of implementing #126576 

---
Full diff: https://github.com/llvm/llvm-project/pull/138002.diff


4 Files Affected:

- (modified) clang/include/clang/Parse/ParseHLSLRootSignature.h (+8-2) 
- (modified) clang/lib/Parse/ParseHLSLRootSignature.cpp (+65-3) 
- (modified) clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp (+12-2) 
- (modified) llvm/include/llvm/Frontend/HLSL/HLSLRootSignature.h (+4-1) 


``diff
diff --git a/clang/include/clang/Parse/ParseHLSLRootSignature.h 
b/clang/include/clang/Parse/ParseHLSLRootSignature.h
index efa735ea03d94..0f05b05ed4df6 100644
--- a/clang/include/clang/Parse/ParseHLSLRootSignature.h
+++ b/clang/include/clang/Parse/ParseHLSLRootSignature.h
@@ -77,8 +77,14 @@ class RootSignatureParser {
   parseDescriptorTableClause();
 
   /// Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any
-  /// order and only exactly once. `ParsedClauseParams` denotes the current
-  /// state of parsed params
+  /// order and only exactly once. The following methods define a
+  /// `Parsed.*Params` struct to denote the current state of parsed params
+  struct ParsedConstantParams {
+std::optional Reg;
+std::optional Num32BitConstants;
+  };
+  std::optional parseRootConstantParams();
+
   struct ParsedClauseParams {
 std::optional Reg;
 std::optional NumDescriptors;
diff --git a/clang/lib/Parse/ParseHLSLRootSignature.cpp 
b/clang/lib/Parse/ParseHLSLRootSignature.cpp
index 48d3e38b0519d..2ce8e6e5cca98 100644
--- a/clang/lib/Parse/ParseHLSLRootSignature.cpp
+++ b/clang/lib/Parse/ParseHLSLRootSignature.cpp
@@ -57,6 +57,27 @@ std::optional 
RootSignatureParser::parseRootConstants() {
 
   RootConstants Constants;
 
+  auto Params = parseRootConstantParams();
+  if (!Params.has_value())
+return std::nullopt;
+
+  // Check mandatory parameters were provided
+  if (!Params->Num32BitConstants.has_value()) {
+getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_missing_param)
+<< TokenKind::kw_num32BitConstants;
+return std::nullopt;
+  }
+
+  Constants.Num32BitConstants = Params->Num32BitConstants.value();
+
+  if (!Params->Reg.has_value()) {
+getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_missing_param)
+<< TokenKind::bReg;
+return std::nullopt;
+  }
+
+  Constants.Reg = Params->Reg.value();
+
   if (consumeExpectedToken(TokenKind::pu_r_paren,
diag::err_hlsl_unexpected_end_of_params,
/*param of=*/TokenKind::kw_RootConstants))
@@ -187,14 +208,55 @@ RootSignatureParser::parseDescriptorTableClause() {
   return Clause;
 }
 
+// Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any
+// order and only exactly once. The following methods will parse through as
+// many arguments as possible reporting an error if a duplicate is seen.
+std::optional
+RootSignatureParser::parseRootConstantParams() {
+  assert(CurToken.TokKind == TokenKind::pu_l_paren &&
+ "Expects to only be invoked starting at given token");
+
+  ParsedConstantParams Params;
+  do {
+// `num32BitConstants` `=` POS_INT
+if (tryConsumeExpectedToken(TokenKind::kw_num32BitConstants)) {
+  if (Params.Num32BitConstants.has_value()) {
+getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param)
+<< CurToken.TokKind;
+return std::nullopt;
+  }
+
+  if (consumeExpectedToken(TokenKind::pu_equal))
+return std::nullopt;
+
+  auto Num32BitConstants = parseUIntParam();
+  if (!Num32BitConstants.has_value())
+return std::nullopt;
+  Params.Num32BitConstants = Num32BitConstants;
+}
+
+// `b` POS_INT
+if (tryConsumeExpectedToken(TokenKind::bReg)) {
+  if (Params.Reg.has_value()) {
+getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param)
+<< CurToken.TokKind;
+return std::nullopt;
+  }
+  auto Reg = parseRegister();
+  if (!Reg.has_value())
+return std::nullopt;
+  Params.Reg = Reg;
+}
+  } while (tryConsumeExpectedToken(TokenKind::pu_comma));
+
+  return Params;
+}
+
 std::optional
 RootSignatureParser::parseDescriptorTableClauseParams(TokenKind RegType) {
   assert(CurToken.TokKind == TokenKind::pu_l_paren &&
  "Expects to only be invoked starting at given token");
 
-  // Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any
-  // order and only exactly once. Parse through as many arguments as possible
-  // reporting an error if a duplicate is seen.
   ParsedClauseParams Params;
   do {
 // ( `b` | `t` | `u` | `s`) POS_INT
diff --git a/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp 
b/clang/unittests/Parse/ParseHLSLRootSig

[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add mandatory parameters for RootConstants (PR #138002)

2025-04-30 Thread Finn Plummer via llvm-branch-commits

https://github.com/inbelic updated 
https://github.com/llvm/llvm-project/pull/138002

>From 15857bf8e1303e2325b48e417e7abd26aa77910e Mon Sep 17 00:00:00 2001
From: Finn Plummer 
Date: Wed, 30 Apr 2025 17:53:11 +
Subject: [PATCH] [HLSL][RootSignature] Add mandatory parameters for
 RootConstants

- defines the `parseRootConstantParams` function and adds handling for
the mandatory arguments of `num32BitConstants` and `bReg`

- adds corresponding unit tests

Part two of implementing
---
 .../clang/Parse/ParseHLSLRootSignature.h  | 10 ++-
 clang/lib/Parse/ParseHLSLRootSignature.cpp| 68 ++-
 .../Parse/ParseHLSLRootSignatureTest.cpp  | 14 +++-
 .../llvm/Frontend/HLSL/HLSLRootSignature.h|  5 +-
 4 files changed, 89 insertions(+), 8 deletions(-)

diff --git a/clang/include/clang/Parse/ParseHLSLRootSignature.h 
b/clang/include/clang/Parse/ParseHLSLRootSignature.h
index efa735ea03d94..0f05b05ed4df6 100644
--- a/clang/include/clang/Parse/ParseHLSLRootSignature.h
+++ b/clang/include/clang/Parse/ParseHLSLRootSignature.h
@@ -77,8 +77,14 @@ class RootSignatureParser {
   parseDescriptorTableClause();
 
   /// Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any
-  /// order and only exactly once. `ParsedClauseParams` denotes the current
-  /// state of parsed params
+  /// order and only exactly once. The following methods define a
+  /// `Parsed.*Params` struct to denote the current state of parsed params
+  struct ParsedConstantParams {
+std::optional Reg;
+std::optional Num32BitConstants;
+  };
+  std::optional parseRootConstantParams();
+
   struct ParsedClauseParams {
 std::optional Reg;
 std::optional NumDescriptors;
diff --git a/clang/lib/Parse/ParseHLSLRootSignature.cpp 
b/clang/lib/Parse/ParseHLSLRootSignature.cpp
index 48d3e38b0519d..2ce8e6e5cca98 100644
--- a/clang/lib/Parse/ParseHLSLRootSignature.cpp
+++ b/clang/lib/Parse/ParseHLSLRootSignature.cpp
@@ -57,6 +57,27 @@ std::optional 
RootSignatureParser::parseRootConstants() {
 
   RootConstants Constants;
 
+  auto Params = parseRootConstantParams();
+  if (!Params.has_value())
+return std::nullopt;
+
+  // Check mandatory parameters were provided
+  if (!Params->Num32BitConstants.has_value()) {
+getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_missing_param)
+<< TokenKind::kw_num32BitConstants;
+return std::nullopt;
+  }
+
+  Constants.Num32BitConstants = Params->Num32BitConstants.value();
+
+  if (!Params->Reg.has_value()) {
+getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_missing_param)
+<< TokenKind::bReg;
+return std::nullopt;
+  }
+
+  Constants.Reg = Params->Reg.value();
+
   if (consumeExpectedToken(TokenKind::pu_r_paren,
diag::err_hlsl_unexpected_end_of_params,
/*param of=*/TokenKind::kw_RootConstants))
@@ -187,14 +208,55 @@ RootSignatureParser::parseDescriptorTableClause() {
   return Clause;
 }
 
+// Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any
+// order and only exactly once. The following methods will parse through as
+// many arguments as possible reporting an error if a duplicate is seen.
+std::optional
+RootSignatureParser::parseRootConstantParams() {
+  assert(CurToken.TokKind == TokenKind::pu_l_paren &&
+ "Expects to only be invoked starting at given token");
+
+  ParsedConstantParams Params;
+  do {
+// `num32BitConstants` `=` POS_INT
+if (tryConsumeExpectedToken(TokenKind::kw_num32BitConstants)) {
+  if (Params.Num32BitConstants.has_value()) {
+getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param)
+<< CurToken.TokKind;
+return std::nullopt;
+  }
+
+  if (consumeExpectedToken(TokenKind::pu_equal))
+return std::nullopt;
+
+  auto Num32BitConstants = parseUIntParam();
+  if (!Num32BitConstants.has_value())
+return std::nullopt;
+  Params.Num32BitConstants = Num32BitConstants;
+}
+
+// `b` POS_INT
+if (tryConsumeExpectedToken(TokenKind::bReg)) {
+  if (Params.Reg.has_value()) {
+getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param)
+<< CurToken.TokKind;
+return std::nullopt;
+  }
+  auto Reg = parseRegister();
+  if (!Reg.has_value())
+return std::nullopt;
+  Params.Reg = Reg;
+}
+  } while (tryConsumeExpectedToken(TokenKind::pu_comma));
+
+  return Params;
+}
+
 std::optional
 RootSignatureParser::parseDescriptorTableClauseParams(TokenKind RegType) {
   assert(CurToken.TokKind == TokenKind::pu_l_paren &&
  "Expects to only be invoked starting at given token");
 
-  // Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any
-  // order and only exactly once. Parse through as many arguments as possible
-  // reporting an error if a duplicate is seen.
   ParsedClauseParams Params;
   do {
 // ( `b` | `t` | `u` | `s`) POS_INT
dif

[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add optional parameter for RootConstants (PR #138007)

2025-04-30 Thread Finn Plummer via llvm-branch-commits

https://github.com/inbelic created 
https://github.com/llvm/llvm-project/pull/138007

- extends `parseRootConstantParams` and the struct to include the optional 
parameters of a RootConstant

- adds corresponding unit tests

Part three of and resolves https://github.com/llvm/llvm-project/issues/126576

>From f8011807d233d1d0d1f9d4ce08bad0c559cde74a Mon Sep 17 00:00:00 2001
From: Finn Plummer 
Date: Wed, 30 Apr 2025 18:08:31 +
Subject: [PATCH] [HLSL][RootSignature] Add optional parameter for
 RootConstants

- extends `parseRootConstantParams` and the struct to include the
optional parameters of a RootConstant

- adds corresponding unit tests
---
 .../clang/Parse/ParseHLSLRootSignature.h  |  2 +
 clang/lib/Parse/ParseHLSLRootSignature.cpp| 41 +++
 .../Parse/ParseHLSLRootSignatureTest.cpp  |  8 +++-
 .../llvm/Frontend/HLSL/HLSLRootSignature.h|  2 +
 4 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/clang/include/clang/Parse/ParseHLSLRootSignature.h 
b/clang/include/clang/Parse/ParseHLSLRootSignature.h
index 0f05b05ed4df6..2ac2083983741 100644
--- a/clang/include/clang/Parse/ParseHLSLRootSignature.h
+++ b/clang/include/clang/Parse/ParseHLSLRootSignature.h
@@ -82,6 +82,8 @@ class RootSignatureParser {
   struct ParsedConstantParams {
 std::optional Reg;
 std::optional Num32BitConstants;
+std::optional Space;
+std::optional Visibility;
   };
   std::optional parseRootConstantParams();
 
diff --git a/clang/lib/Parse/ParseHLSLRootSignature.cpp 
b/clang/lib/Parse/ParseHLSLRootSignature.cpp
index 2ce8e6e5cca98..a5006b77a6e44 100644
--- a/clang/lib/Parse/ParseHLSLRootSignature.cpp
+++ b/clang/lib/Parse/ParseHLSLRootSignature.cpp
@@ -78,6 +78,13 @@ std::optional 
RootSignatureParser::parseRootConstants() {
 
   Constants.Reg = Params->Reg.value();
 
+  // Fill in optional parameters
+  if (Params->Visibility.has_value())
+Constants.Visibility = Params->Visibility.value();
+
+  if (Params->Space.has_value())
+Constants.Space = Params->Space.value();
+
   if (consumeExpectedToken(TokenKind::pu_r_paren,
diag::err_hlsl_unexpected_end_of_params,
/*param of=*/TokenKind::kw_RootConstants))
@@ -247,6 +254,40 @@ RootSignatureParser::parseRootConstantParams() {
 return std::nullopt;
   Params.Reg = Reg;
 }
+
+// `space` `=` POS_INT
+if (tryConsumeExpectedToken(TokenKind::kw_space)) {
+  if (Params.Space.has_value()) {
+getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param)
+<< CurToken.TokKind;
+return std::nullopt;
+  }
+
+  if (consumeExpectedToken(TokenKind::pu_equal))
+return std::nullopt;
+
+  auto Space = parseUIntParam();
+  if (!Space.has_value())
+return std::nullopt;
+  Params.Space = Space;
+}
+
+// `visibility` `=` SHADER_VISIBILITY
+if (tryConsumeExpectedToken(TokenKind::kw_visibility)) {
+  if (Params.Visibility.has_value()) {
+getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param)
+<< CurToken.TokKind;
+return std::nullopt;
+  }
+
+  if (consumeExpectedToken(TokenKind::pu_equal))
+return std::nullopt;
+
+  auto Visibility = parseShaderVisibility();
+  if (!Visibility.has_value())
+return std::nullopt;
+  Params.Visibility = Visibility;
+}
   } while (tryConsumeExpectedToken(TokenKind::pu_comma));
 
   return Params;
diff --git a/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp 
b/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp
index 336868b579866..150eb3e6e54ef 100644
--- a/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp
+++ b/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp
@@ -255,7 +255,9 @@ TEST_F(ParseHLSLRootSignatureTest, ValidSamplerFlagsTest) {
 TEST_F(ParseHLSLRootSignatureTest, ValidParseRootConsantsTest) {
   const llvm::StringLiteral Source = R"cc(
 RootConstants(num32BitConstants = 1, b0),
-RootConstants(b42, num32BitConstants = 4294967295)
+RootConstants(b42, space = 3, num32BitConstants = 4294967295,
+  visibility = SHADER_VISIBILITY_HULL
+)
   )cc";
 
   TrivialModuleLoader ModLoader;
@@ -278,12 +280,16 @@ TEST_F(ParseHLSLRootSignatureTest, 
ValidParseRootConsantsTest) {
   ASSERT_EQ(std::get(Elem).Num32BitConstants, 1u);
   ASSERT_EQ(std::get(Elem).Reg.ViewType, RegisterType::BReg);
   ASSERT_EQ(std::get(Elem).Reg.Number, 0u);
+  ASSERT_EQ(std::get(Elem).Space, 0u);
+  ASSERT_EQ(std::get(Elem).Visibility, ShaderVisibility::All);
 
   Elem = Elements[1];
   ASSERT_TRUE(std::holds_alternative(Elem));
   ASSERT_EQ(std::get(Elem).Num32BitConstants, 4294967295u);
   ASSERT_EQ(std::get(Elem).Reg.ViewType, RegisterType::BReg);
   ASSERT_EQ(std::get(Elem).Reg.Number, 42u);
+  ASSERT_EQ(std::get(Elem).Space, 3u);
+  ASSERT_EQ(std::get(Elem).Visibility, ShaderVisibility::Hull);
 
   ASSERT_TRUE(Consumer->isSatisfied());
 }
diff --git 

[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add optional parameter for RootConstants (PR #138007)

2025-04-30 Thread via llvm-branch-commits

llvmbot wrote:



@llvm/pr-subscribers-hlsl

@llvm/pr-subscribers-clang

Author: Finn Plummer (inbelic)


Changes

- extends `parseRootConstantParams` and the struct to include the optional 
parameters of a RootConstant

- adds corresponding unit tests

Part three of and resolves https://github.com/llvm/llvm-project/issues/126576

---
Full diff: https://github.com/llvm/llvm-project/pull/138007.diff


4 Files Affected:

- (modified) clang/include/clang/Parse/ParseHLSLRootSignature.h (+2) 
- (modified) clang/lib/Parse/ParseHLSLRootSignature.cpp (+41) 
- (modified) clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp (+7-1) 
- (modified) llvm/include/llvm/Frontend/HLSL/HLSLRootSignature.h (+2) 


``diff
diff --git a/clang/include/clang/Parse/ParseHLSLRootSignature.h 
b/clang/include/clang/Parse/ParseHLSLRootSignature.h
index 0f05b05ed4df6..2ac2083983741 100644
--- a/clang/include/clang/Parse/ParseHLSLRootSignature.h
+++ b/clang/include/clang/Parse/ParseHLSLRootSignature.h
@@ -82,6 +82,8 @@ class RootSignatureParser {
   struct ParsedConstantParams {
 std::optional Reg;
 std::optional Num32BitConstants;
+std::optional Space;
+std::optional Visibility;
   };
   std::optional parseRootConstantParams();
 
diff --git a/clang/lib/Parse/ParseHLSLRootSignature.cpp 
b/clang/lib/Parse/ParseHLSLRootSignature.cpp
index 2ce8e6e5cca98..a5006b77a6e44 100644
--- a/clang/lib/Parse/ParseHLSLRootSignature.cpp
+++ b/clang/lib/Parse/ParseHLSLRootSignature.cpp
@@ -78,6 +78,13 @@ std::optional 
RootSignatureParser::parseRootConstants() {
 
   Constants.Reg = Params->Reg.value();
 
+  // Fill in optional parameters
+  if (Params->Visibility.has_value())
+Constants.Visibility = Params->Visibility.value();
+
+  if (Params->Space.has_value())
+Constants.Space = Params->Space.value();
+
   if (consumeExpectedToken(TokenKind::pu_r_paren,
diag::err_hlsl_unexpected_end_of_params,
/*param of=*/TokenKind::kw_RootConstants))
@@ -247,6 +254,40 @@ RootSignatureParser::parseRootConstantParams() {
 return std::nullopt;
   Params.Reg = Reg;
 }
+
+// `space` `=` POS_INT
+if (tryConsumeExpectedToken(TokenKind::kw_space)) {
+  if (Params.Space.has_value()) {
+getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param)
+<< CurToken.TokKind;
+return std::nullopt;
+  }
+
+  if (consumeExpectedToken(TokenKind::pu_equal))
+return std::nullopt;
+
+  auto Space = parseUIntParam();
+  if (!Space.has_value())
+return std::nullopt;
+  Params.Space = Space;
+}
+
+// `visibility` `=` SHADER_VISIBILITY
+if (tryConsumeExpectedToken(TokenKind::kw_visibility)) {
+  if (Params.Visibility.has_value()) {
+getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param)
+<< CurToken.TokKind;
+return std::nullopt;
+  }
+
+  if (consumeExpectedToken(TokenKind::pu_equal))
+return std::nullopt;
+
+  auto Visibility = parseShaderVisibility();
+  if (!Visibility.has_value())
+return std::nullopt;
+  Params.Visibility = Visibility;
+}
   } while (tryConsumeExpectedToken(TokenKind::pu_comma));
 
   return Params;
diff --git a/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp 
b/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp
index 336868b579866..150eb3e6e54ef 100644
--- a/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp
+++ b/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp
@@ -255,7 +255,9 @@ TEST_F(ParseHLSLRootSignatureTest, ValidSamplerFlagsTest) {
 TEST_F(ParseHLSLRootSignatureTest, ValidParseRootConsantsTest) {
   const llvm::StringLiteral Source = R"cc(
 RootConstants(num32BitConstants = 1, b0),
-RootConstants(b42, num32BitConstants = 4294967295)
+RootConstants(b42, space = 3, num32BitConstants = 4294967295,
+  visibility = SHADER_VISIBILITY_HULL
+)
   )cc";
 
   TrivialModuleLoader ModLoader;
@@ -278,12 +280,16 @@ TEST_F(ParseHLSLRootSignatureTest, 
ValidParseRootConsantsTest) {
   ASSERT_EQ(std::get(Elem).Num32BitConstants, 1u);
   ASSERT_EQ(std::get(Elem).Reg.ViewType, RegisterType::BReg);
   ASSERT_EQ(std::get(Elem).Reg.Number, 0u);
+  ASSERT_EQ(std::get(Elem).Space, 0u);
+  ASSERT_EQ(std::get(Elem).Visibility, ShaderVisibility::All);
 
   Elem = Elements[1];
   ASSERT_TRUE(std::holds_alternative(Elem));
   ASSERT_EQ(std::get(Elem).Num32BitConstants, 4294967295u);
   ASSERT_EQ(std::get(Elem).Reg.ViewType, RegisterType::BReg);
   ASSERT_EQ(std::get(Elem).Reg.Number, 42u);
+  ASSERT_EQ(std::get(Elem).Space, 3u);
+  ASSERT_EQ(std::get(Elem).Visibility, ShaderVisibility::Hull);
 
   ASSERT_TRUE(Consumer->isSatisfied());
 }
diff --git a/llvm/include/llvm/Frontend/HLSL/HLSLRootSignature.h 
b/llvm/include/llvm/Frontend/HLSL/HLSLRootSignature.h
index a3f98a9f1944f..8b8324df18bb3 100644
--- a/llvm/include/llvm/Frontend/HLSL/HLSLRootSignature.h
+++ b/llvm/include

[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add optional parameters for RootConstants (PR #138007)

2025-04-30 Thread Finn Plummer via llvm-branch-commits

https://github.com/inbelic edited 
https://github.com/llvm/llvm-project/pull/138007
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AutoUpgrade][AMDGPU] Adjust AS7 address width to 48 bits (PR #137418)

2025-04-30 Thread Alexander Richardson via llvm-branch-commits

arichardson wrote:

> So p9 is quite weird. It's used, IIRC, by, 
> https://github.com/GPUOpen-Drivers/llpc internally to represent "structured" 
> pointers, and so its interpretation is `{p8 rsrc, i32 index, i32 offset}` . 
> This does still have a 48-bit effective address by way of a thoroughly weird 
> ptrtoaddr function that's, in it's full generality, a function of the 
> resource flags (see swizzles)
> 
> As to ptrmask on p8 ... if ptrmask is only supposed to affect the Index bits, 
> then I probably shouldn't be emitting it during the p7 => p8 process and only 
> applying a mask to the offset. But that hits the same weirdness that masking 
> for alignment doesn't work as expected if the base pointer isn't quite 
> aligned enough.
> 
> (I can't think of many instances where that base pointer will, for example, 
> be odd, but I'm not aware of anything saying it's not allowed. Though I'd 
> personally be OK with documenting around that)
> 
> I think most codegen - including down in LLPC - assumes the base is at least 
> `align 4`, which is the largest value where you sometimes need to change 
> behavior, so you can quietly figure that the alignment of the offset is 
> _basically_ the alignment of the pointer
> 

Thanks for all the background - how do you feel about pretending that p8 has a 
48-bit index width until we figure out how to do "no indexing allowed" 
correctly?

> Actually, more general question on this address with stuff: x86 thread local 
> storage. That's a form of pointer with an implicit base that can mess up your 
> alignment checks - how's that work with ptrtoaddr?

Agreed, segmentation is awkward and I think we just have to assume that the 
segments are sufficiently aligned. I think this is somewhat outside of the LLVM 
abstract machine similar to page based memory. If we perform alignment checks 
on virtual addresses beyond 12/16/etc bits, there is no guarantee the final 
address will be sufficiently aligned but I view that as being outside of the 
LLVM semantics. I think we just have to pretend that virtual addresses or 
segment-relative addresses are the only thing that matters.



https://github.com/llvm/llvm-project/pull/137418
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add mandatory parameters for RootConstants (PR #138002)

2025-04-30 Thread Finn Plummer via llvm-branch-commits

https://github.com/inbelic updated 
https://github.com/llvm/llvm-project/pull/138002

>From 15857bf8e1303e2325b48e417e7abd26aa77910e Mon Sep 17 00:00:00 2001
From: Finn Plummer 
Date: Wed, 30 Apr 2025 17:53:11 +
Subject: [PATCH 1/2] [HLSL][RootSignature] Add mandatory parameters for
 RootConstants

- defines the `parseRootConstantParams` function and adds handling for
the mandatory arguments of `num32BitConstants` and `bReg`

- adds corresponding unit tests

Part two of implementing
---
 .../clang/Parse/ParseHLSLRootSignature.h  | 10 ++-
 clang/lib/Parse/ParseHLSLRootSignature.cpp| 68 ++-
 .../Parse/ParseHLSLRootSignatureTest.cpp  | 14 +++-
 .../llvm/Frontend/HLSL/HLSLRootSignature.h|  5 +-
 4 files changed, 89 insertions(+), 8 deletions(-)

diff --git a/clang/include/clang/Parse/ParseHLSLRootSignature.h 
b/clang/include/clang/Parse/ParseHLSLRootSignature.h
index efa735ea03d94..0f05b05ed4df6 100644
--- a/clang/include/clang/Parse/ParseHLSLRootSignature.h
+++ b/clang/include/clang/Parse/ParseHLSLRootSignature.h
@@ -77,8 +77,14 @@ class RootSignatureParser {
   parseDescriptorTableClause();
 
   /// Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any
-  /// order and only exactly once. `ParsedClauseParams` denotes the current
-  /// state of parsed params
+  /// order and only exactly once. The following methods define a
+  /// `Parsed.*Params` struct to denote the current state of parsed params
+  struct ParsedConstantParams {
+std::optional Reg;
+std::optional Num32BitConstants;
+  };
+  std::optional parseRootConstantParams();
+
   struct ParsedClauseParams {
 std::optional Reg;
 std::optional NumDescriptors;
diff --git a/clang/lib/Parse/ParseHLSLRootSignature.cpp 
b/clang/lib/Parse/ParseHLSLRootSignature.cpp
index 48d3e38b0519d..2ce8e6e5cca98 100644
--- a/clang/lib/Parse/ParseHLSLRootSignature.cpp
+++ b/clang/lib/Parse/ParseHLSLRootSignature.cpp
@@ -57,6 +57,27 @@ std::optional 
RootSignatureParser::parseRootConstants() {
 
   RootConstants Constants;
 
+  auto Params = parseRootConstantParams();
+  if (!Params.has_value())
+return std::nullopt;
+
+  // Check mandatory parameters were provided
+  if (!Params->Num32BitConstants.has_value()) {
+getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_missing_param)
+<< TokenKind::kw_num32BitConstants;
+return std::nullopt;
+  }
+
+  Constants.Num32BitConstants = Params->Num32BitConstants.value();
+
+  if (!Params->Reg.has_value()) {
+getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_missing_param)
+<< TokenKind::bReg;
+return std::nullopt;
+  }
+
+  Constants.Reg = Params->Reg.value();
+
   if (consumeExpectedToken(TokenKind::pu_r_paren,
diag::err_hlsl_unexpected_end_of_params,
/*param of=*/TokenKind::kw_RootConstants))
@@ -187,14 +208,55 @@ RootSignatureParser::parseDescriptorTableClause() {
   return Clause;
 }
 
+// Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any
+// order and only exactly once. The following methods will parse through as
+// many arguments as possible reporting an error if a duplicate is seen.
+std::optional
+RootSignatureParser::parseRootConstantParams() {
+  assert(CurToken.TokKind == TokenKind::pu_l_paren &&
+ "Expects to only be invoked starting at given token");
+
+  ParsedConstantParams Params;
+  do {
+// `num32BitConstants` `=` POS_INT
+if (tryConsumeExpectedToken(TokenKind::kw_num32BitConstants)) {
+  if (Params.Num32BitConstants.has_value()) {
+getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param)
+<< CurToken.TokKind;
+return std::nullopt;
+  }
+
+  if (consumeExpectedToken(TokenKind::pu_equal))
+return std::nullopt;
+
+  auto Num32BitConstants = parseUIntParam();
+  if (!Num32BitConstants.has_value())
+return std::nullopt;
+  Params.Num32BitConstants = Num32BitConstants;
+}
+
+// `b` POS_INT
+if (tryConsumeExpectedToken(TokenKind::bReg)) {
+  if (Params.Reg.has_value()) {
+getDiags().Report(CurToken.TokLoc, diag::err_hlsl_rootsig_repeat_param)
+<< CurToken.TokKind;
+return std::nullopt;
+  }
+  auto Reg = parseRegister();
+  if (!Reg.has_value())
+return std::nullopt;
+  Params.Reg = Reg;
+}
+  } while (tryConsumeExpectedToken(TokenKind::pu_comma));
+
+  return Params;
+}
+
 std::optional
 RootSignatureParser::parseDescriptorTableClauseParams(TokenKind RegType) {
   assert(CurToken.TokKind == TokenKind::pu_l_paren &&
  "Expects to only be invoked starting at given token");
 
-  // Parameter arguments (eg. `bReg`, `space`, ...) can be specified in any
-  // order and only exactly once. Parse through as many arguments as possible
-  // reporting an error if a duplicate is seen.
   ParsedClauseParams Params;
   do {
 // ( `b` | `t` | `u` | `s`) POS_INT

[llvm-branch-commits] [llvm] [BOLT][test] Fix callcont-fallthru.s after #129481 (PR #135867)

2025-04-30 Thread Amir Ayupov via llvm-branch-commits

aaupov wrote:

Thanks for tracking it down, looks like it's an issue with GNU nm. However 
llvm-nm has no functionality equivalent to `nm --synthetic` which prints the 
address of the PLT entry, and the test relies on that.

Let me try to decouple this test from GNU nm.

https://github.com/llvm/llvm-project/pull/135867
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/20.x: [AArch64][SME] Prevent spills of ZT0 when ZA is not enabled (PR #137683)

2025-04-30 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm approved this pull request.


https://github.com/llvm/llvm-project/pull/137683
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR][OpenMP] Assert on map translation functions, NFC (PR #137199)

2025-04-30 Thread Sergio Afonso via llvm-branch-commits


@@ -3623,6 +3623,9 @@ static llvm::omp::OpenMPOffloadMappingFlags 
mapParentWithMembers(
 LLVM::ModuleTranslation &moduleTranslation, llvm::IRBuilderBase &builder,
 llvm::OpenMPIRBuilder &ompBuilder, DataLayout &dl, MapInfosTy 
&combinedInfo,
 MapInfoData &mapData, uint64_t mapDataIndex, bool isTargetParams) {
+  assert(!ompBuilder.Config.isTargetDevice() &&
+ "function only supported for host device codegen");

skatrak wrote:

Yes, I do agree that "host device" seems like a rather contradictory term, 
since we generally just talk about "host" as the CPU and "device" as whatever 
we're offloading to. The reason of using these terms is to align with OpenMP 
terminology (5.2 spec, Section 1.2.1):
> **host device** The _device_ on which the _OpenMP program_ begins execution.
> **target device** A _device_ with respect to which the current _device_ 
> performs an operation, as specified by a _device construct_ or an OpenMP 
> device memory routine.

This is also why the `-fopenmp-is-target-device` flag is called that and not 
`-fopenmp-is-device` (which [used to be its 
name](https://reviews.llvm.org/D154591)).

https://github.com/llvm/llvm-project/pull/137199
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang-tools-extra] release/20.x: [clang-tidy] Do not pass any file when listing checks in run_clang_tiโ€ฆ (#137286) (PR #137775)

2025-04-30 Thread Carlos Galvez via llvm-branch-commits

https://github.com/carlosgalvezp updated 
https://github.com/llvm/llvm-project/pull/137775

>From 387c2886b68b1eaf68916e4b08e8d92966a1 Mon Sep 17 00:00:00 2001
From: Carlos Galvez 
Date: Tue, 29 Apr 2025 11:13:30 +0200
Subject: [PATCH] =?UTF-8?q?[clang-tidy]=20Do=20not=20pass=20any=20file=20w?=
 =?UTF-8?q?hen=20listing=20checks=20in=20run=5Fclang=5Fti=E2=80=A6=20(#137?=
 =?UTF-8?q?286)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

โ€ฆdy.py

Currently, run_clang_tidy.py does not correctly display the list of
checks picked up from the top-level .clang-tidy file. The reason for
that is that we are passing an empty string as input file.

However, that's not how we are supposed to use clang-tidy to list
checks. Per
https://github.com/llvm/llvm-project/commit/65eccb463df7fe511c813ee6a1794c80d7489ff2,
we simply should not pass any file at all - the internal code of
clang-tidy will pass a "dummy" file if that's the case and get the
.clang-tidy file from the current working directory.

Fixes #136659

Co-authored-by: Carlos Gรกlvez 
(cherry picked from commit 014ab736dc741f24c007f9861e24b31faba0e1e7)
---
 clang-tools-extra/clang-tidy/tool/run-clang-tidy.py | 7 ---
 clang-tools-extra/docs/ReleaseNotes.rst | 3 +++
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/clang-tools-extra/clang-tidy/tool/run-clang-tidy.py 
b/clang-tools-extra/clang-tidy/tool/run-clang-tidy.py
index f1b934f7139e9..8741147a4f8a3 100755
--- a/clang-tools-extra/clang-tidy/tool/run-clang-tidy.py
+++ b/clang-tools-extra/clang-tidy/tool/run-clang-tidy.py
@@ -87,7 +87,7 @@ def find_compilation_database(path: str) -> str:
 
 
 def get_tidy_invocation(
-f: str,
+f: Optional[str],
 clang_tidy_binary: str,
 checks: str,
 tmpdir: Optional[str],
@@ -147,7 +147,8 @@ def get_tidy_invocation(
 start.append(f"--warnings-as-errors={warnings_as_errors}")
 if allow_no_checks:
 start.append("--allow-no-checks")
-start.append(f)
+if f:
+start.append(f)
 return start
 
 
@@ -490,7 +491,7 @@ async def main() -> None:
 
 try:
 invocation = get_tidy_invocation(
-"",
+None,
 clang_tidy_binary,
 args.checks,
 None,
diff --git a/clang-tools-extra/docs/ReleaseNotes.rst 
b/clang-tools-extra/docs/ReleaseNotes.rst
index 2ab597eb37048..0b2e9c5fabc36 100644
--- a/clang-tools-extra/docs/ReleaseNotes.rst
+++ b/clang-tools-extra/docs/ReleaseNotes.rst
@@ -190,6 +190,9 @@ Improvements to clang-tidy
 - Fixed bug in :program:`clang-tidy` by which `HeaderFilterRegex` did not take
   effect when passed via the `.clang-tidy` file.
 
+- Fixed bug in :program:`run_clang_tidy.py` where the program would not
+  correctly display the checks enabled by the top-level `.clang-tidy` file.
+
 New checks
 ^^
 

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT][test] Fix callcont-fallthru.s after #129481 (PR #135867)

2025-04-30 Thread Paschalis Mpeis via llvm-branch-commits

paschalis-mpeis wrote:


Hey  @yota9, thanks for the suggestions!

Indeed, the PLT entries exist in both binaries. For example running:

```
build/bin/llvm-objdump -d -j .plt 
build/tools/bolt/test/X86/Output/callcont-fallthru.s.tmp
```

shows:

```
build/tools/bolt/test/X86/Output/callcont-fallthru.s.tmp:   file format 
elf64-x86-64

Disassembly of section .plt:
1430 <.plt>:
1430: ff 35 f2 20 00 00 pushq   0x20f2(%rip)# 
0x3528 
1436: ff 25 f4 20 00 00 jmpq*0x20f4(%rip)   # 
0x3530 
143c: 0f 1f 40 00   nopl(%rax)

1440 :
1440: ff 25 f2 20 00 00 jmpq*0x20f2(%rip)   # 
0x3538 
1446: 68 00 00 00 00pushq   $0x0
144b: e9 e0 ff ff ffjmp 0x1430 <.plt>
```

I noticed some code differences in the binaries but I haven't looked deeper 
into it.


**It looks like it's differences in GNU nm though:**

On my AArch64 dev-machine, `nm --synthetic` lists `puts@plt`, but when I copy 
that same binary over to our upcoming AArch64 buildbot, it's missing.

Conversely, `nm --synthetic` on the buildbot does not list `puts@plt`, but when 
if I copy that binary to the dev-machine it does appear.

---

I too agree that relying on GNU is not ideal. Essentially using any binary tool 
that does not come from the built LLVM revision. However, `llvm-nm` does not 
seem support `--synthetic`.

BTW, thanks for all the help! I'm focused on AArch64, so while I may be 
involved to some extent with this, I'll let Amir drive the fix. That's why I'm 
looking for a code owner to get #137831 stamped. :)
(also cc'ing: @aaupov, @maksfb)

https://github.com/llvm/llvm-project/pull/135867
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)

2025-04-30 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/137224

>From 7d1cbb97817e615a5fa8841d72a18643b5722114 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 22 Apr 2025 21:43:14 +0300
Subject: [PATCH] [BOLT] Gadget scanner: detect untrusted LR before tail call

Implement the detection of tail calls performed with untrusted link
register, which violates the assumption made on entry to every function.

Unlike other pauth gadgets, this one involves some amount of guessing
which branch instructions should be checked as tail calls.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 112 +++-
 .../AArch64/gs-pacret-autiasp.s   |  31 +-
 .../AArch64/gs-pauth-debug-output.s   |  30 +-
 .../AArch64/gs-pauth-tail-calls.s | 597 ++
 4 files changed, 730 insertions(+), 40 deletions(-)
 create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-tail-calls.s

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index d4282daad648b..fab92947f6d67 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -660,8 +660,9 @@ class DataflowSrcSafetyAnalysis
 //
 // Then, a function can be split into a number of disjoint contiguous sequences
 // of instructions without labels in between. These sequences can be processed
-// the same way basic blocks are processed by data-flow analysis, assuming
-// pessimistically that all registers are unsafe at the start of each sequence.
+// the same way basic blocks are processed by data-flow analysis, with the same
+// pessimistic estimation of the initial state at the start of each sequence
+// (except the first instruction of the function).
 class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis {
   BinaryFunction &BF;
   MCPlusBuilder::AllocatorIdTy AllocId;
@@ -672,6 +673,30 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
   BC.MIB->removeAnnotation(I.second, StateAnnotationIndex);
   }
 
+  /// Compute a reasonably pessimistic estimation of the register state when
+  /// the previous instruction is not known for sure. Take the set of registers
+  /// which are trusted at function entry and remove all registers that can be
+  /// clobbered inside this function.
+  SrcState computePessimisticState(BinaryFunction &BF) {
+BitVector ClobberedRegs(NumRegs);
+for (auto &I : BF.instrs()) {
+  MCInst &Inst = I.second;
+  BC.MIB->getClobberedRegs(Inst, ClobberedRegs);
+
+  // If this is a call instruction, no register is safe anymore, unless
+  // it is a tail call. Ignore tail calls for the purpose of estimating the
+  // worst-case scenario, assuming no instructions are executed in the
+  // caller after this point anyway.
+  if (BC.MIB->isCall(Inst) && !BC.MIB->isTailCall(Inst))
+ClobberedRegs.set();
+}
+
+SrcState S = createEntryState();
+S.SafeToDerefRegs.reset(ClobberedRegs);
+S.TrustedRegs.reset(ClobberedRegs);
+return S;
+  }
+
 public:
   CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF,
   MCPlusBuilder::AllocatorIdTy AllocId,
@@ -682,6 +707,7 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
   }
 
   void run() override {
+const SrcState DefaultState = computePessimisticState(BF);
 SrcState S = createEntryState();
 for (auto &I : BF.instrs()) {
   MCInst &Inst = I.second;
@@ -696,7 +722,7 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
 LLVM_DEBUG({
   traceInst(BC, "Due to label, resetting the state before", Inst);
 });
-S = createUnsafeState();
+S = DefaultState;
   }
 
   // Check if we need to remove an old annotation (this is the case if
@@ -1233,6 +1259,83 @@ shouldReportReturnGadget(const BinaryContext &BC, const 
MCInstReference &Inst,
   return make_gadget_report(RetKind, Inst, *RetReg);
 }
 
+/// While BOLT already marks some of the branch instructions as tail calls,
+/// this function tries to improve the coverage by including less obvious cases
+/// when it is possible to do without introducing too many false positives.
+static bool shouldAnalyzeTailCallInst(const BinaryContext &BC,
+  const BinaryFunction &BF,
+  const MCInstReference &Inst) {
+  // Some BC.MIB->isXYZ(Inst) methods simply delegate to MCInstrDesc::isXYZ()
+  // (such as isBranch at the time of writing this comment), some don't (such
+  // as isCall). For that reason, call MCInstrDesc's methods explicitly when
+  // it is important.
+  const MCInstrDesc &Desc =
+  BC.MII->get(static_cast(Inst).getOpcode());
+  // Tail call should be a branch (but not necessarily an indirect one).
+  if (!Desc.isBranch())
+return false;
+
+  // Always analyze the branches already marked as tail calls by BOLT.
+  if (BC.MIB->isTailCall(Inst))
+return t

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect untrusted LR before tail call (PR #137224)

2025-04-30 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/137224

>From 7d1cbb97817e615a5fa8841d72a18643b5722114 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Tue, 22 Apr 2025 21:43:14 +0300
Subject: [PATCH] [BOLT] Gadget scanner: detect untrusted LR before tail call

Implement the detection of tail calls performed with untrusted link
register, which violates the assumption made on entry to every function.

Unlike other pauth gadgets, this one involves some amount of guessing
which branch instructions should be checked as tail calls.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 112 +++-
 .../AArch64/gs-pacret-autiasp.s   |  31 +-
 .../AArch64/gs-pauth-debug-output.s   |  30 +-
 .../AArch64/gs-pauth-tail-calls.s | 597 ++
 4 files changed, 730 insertions(+), 40 deletions(-)
 create mode 100644 bolt/test/binary-analysis/AArch64/gs-pauth-tail-calls.s

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index d4282daad648b..fab92947f6d67 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -660,8 +660,9 @@ class DataflowSrcSafetyAnalysis
 //
 // Then, a function can be split into a number of disjoint contiguous sequences
 // of instructions without labels in between. These sequences can be processed
-// the same way basic blocks are processed by data-flow analysis, assuming
-// pessimistically that all registers are unsafe at the start of each sequence.
+// the same way basic blocks are processed by data-flow analysis, with the same
+// pessimistic estimation of the initial state at the start of each sequence
+// (except the first instruction of the function).
 class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis {
   BinaryFunction &BF;
   MCPlusBuilder::AllocatorIdTy AllocId;
@@ -672,6 +673,30 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
   BC.MIB->removeAnnotation(I.second, StateAnnotationIndex);
   }
 
+  /// Compute a reasonably pessimistic estimation of the register state when
+  /// the previous instruction is not known for sure. Take the set of registers
+  /// which are trusted at function entry and remove all registers that can be
+  /// clobbered inside this function.
+  SrcState computePessimisticState(BinaryFunction &BF) {
+BitVector ClobberedRegs(NumRegs);
+for (auto &I : BF.instrs()) {
+  MCInst &Inst = I.second;
+  BC.MIB->getClobberedRegs(Inst, ClobberedRegs);
+
+  // If this is a call instruction, no register is safe anymore, unless
+  // it is a tail call. Ignore tail calls for the purpose of estimating the
+  // worst-case scenario, assuming no instructions are executed in the
+  // caller after this point anyway.
+  if (BC.MIB->isCall(Inst) && !BC.MIB->isTailCall(Inst))
+ClobberedRegs.set();
+}
+
+SrcState S = createEntryState();
+S.SafeToDerefRegs.reset(ClobberedRegs);
+S.TrustedRegs.reset(ClobberedRegs);
+return S;
+  }
+
 public:
   CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF,
   MCPlusBuilder::AllocatorIdTy AllocId,
@@ -682,6 +707,7 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
   }
 
   void run() override {
+const SrcState DefaultState = computePessimisticState(BF);
 SrcState S = createEntryState();
 for (auto &I : BF.instrs()) {
   MCInst &Inst = I.second;
@@ -696,7 +722,7 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
 LLVM_DEBUG({
   traceInst(BC, "Due to label, resetting the state before", Inst);
 });
-S = createUnsafeState();
+S = DefaultState;
   }
 
   // Check if we need to remove an old annotation (this is the case if
@@ -1233,6 +1259,83 @@ shouldReportReturnGadget(const BinaryContext &BC, const 
MCInstReference &Inst,
   return make_gadget_report(RetKind, Inst, *RetReg);
 }
 
+/// While BOLT already marks some of the branch instructions as tail calls,
+/// this function tries to improve the coverage by including less obvious cases
+/// when it is possible to do without introducing too many false positives.
+static bool shouldAnalyzeTailCallInst(const BinaryContext &BC,
+  const BinaryFunction &BF,
+  const MCInstReference &Inst) {
+  // Some BC.MIB->isXYZ(Inst) methods simply delegate to MCInstrDesc::isXYZ()
+  // (such as isBranch at the time of writing this comment), some don't (such
+  // as isCall). For that reason, call MCInstrDesc's methods explicitly when
+  // it is important.
+  const MCInstrDesc &Desc =
+  BC.MII->get(static_cast(Inst).getOpcode());
+  // Tail call should be a branch (but not necessarily an indirect one).
+  if (!Desc.isBranch())
+return false;
+
+  // Always analyze the branches already marked as tail calls by BOLT.
+  if (BC.MIB->isTailCall(Inst))
+return t

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: improve handling of unreachable basic blocks (PR #136183)

2025-04-30 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/136183

>From 96e1b04ab61480c080350d9fc2c658cb0364f4da Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Thu, 17 Apr 2025 20:51:16 +0300
Subject: [PATCH 1/2] [BOLT] Gadget scanner: improve handling of unreachable
 basic blocks

Instead of refusing to analyze an instruction completely, when it is
unreachable according to the CFG reconstructed by BOLT, pessimistically
assume all registers to be unsafe at the start of basic blocks without
any predecessors. Nevertheless, unreachable basic blocks found in
optimized code likely means imprecise CFG reconstruction, thus report a
warning once per basic block without predecessors.
---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 46 ++-
 .../AArch64/gs-pacret-autiasp.s   |  7 ++-
 .../binary-analysis/AArch64/gs-pauth-calls.s  | 57 +++
 3 files changed, 95 insertions(+), 15 deletions(-)

diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 7a86d9c4c9a59..917649731884e 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -344,6 +344,12 @@ class SrcSafetyAnalysis {
 return S;
   }
 
+  /// Creates a state with all registers marked unsafe (not to be confused
+  /// with empty state).
+  SrcState createUnsafeState() const {
+return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
+  }
+
   BitVector getClobberedRegs(const MCInst &Point) const {
 BitVector Clobbered(NumRegs);
 // Assume a call can clobber all registers, including callee-saved
@@ -586,6 +592,13 @@ class DataflowSrcSafetyAnalysis
 if (BB.isEntryPoint())
   return createEntryState();
 
+// If a basic block without any predecessors is found in an optimized code,
+// this likely means that some CFG edges were not detected. Pessimistically
+// assume all registers to be unsafe before this basic block and warn about
+// this fact in FunctionAnalysis::findUnsafeUses().
+if (BB.pred_empty())
+  return createUnsafeState();
+
 return SrcState();
   }
 
@@ -659,12 +672,6 @@ class CFGUnawareSrcSafetyAnalysis : public 
SrcSafetyAnalysis {
   BC.MIB->removeAnnotation(I.second, StateAnnotationIndex);
   }
 
-  /// Creates a state with all registers marked unsafe (not to be confused
-  /// with empty state).
-  SrcState createUnsafeState() const {
-return SrcState(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
-  }
-
 public:
   CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF,
   MCPlusBuilder::AllocatorIdTy AllocId,
@@ -1335,19 +1342,30 @@ void FunctionAnalysisContext::findUnsafeUses(
 BF.dump();
   });
 
+  if (BF.hasCFG()) {
+// Warn on basic blocks being unreachable according to BOLT, as this
+// likely means CFG is imprecise.
+for (BinaryBasicBlock &BB : BF) {
+  if (!BB.pred_empty() || BB.isEntryPoint())
+continue;
+  // Arbitrarily attach the report to the first instruction of BB.
+  MCInst *InstToReport = BB.getFirstNonPseudoInstr();
+  if (!InstToReport)
+continue; // BB has no real instructions
+
+  Reports.push_back(
+  make_generic_report(MCInstReference::get(InstToReport, BF),
+  "Warning: no predecessor basic blocks detected "
+  "(possibly incomplete CFG)"));
+}
+  }
+
   iterateOverInstrs(BF, [&](MCInstReference Inst) {
 if (BC.MIB->isCFI(Inst))
   return;
 
 const SrcState &S = Analysis->getStateBefore(Inst);
-
-// If non-empty state was never propagated from the entry basic block
-// to Inst, assume it to be unreachable and report a warning.
-if (S.empty()) {
-  Reports.push_back(
-  make_generic_report(Inst, "Warning: unreachable instruction found"));
-  return;
-}
+assert(!S.empty() && "Instruction has no associated state");
 
 if (auto Report = shouldReportReturnGadget(BC, Inst, S))
   Reports.push_back(*Report);
diff --git a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s 
b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
index 284f0bea607a5..6559ba336e8de 100644
--- a/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
+++ b/bolt/test/binary-analysis/AArch64/gs-pacret-autiasp.s
@@ -215,12 +215,17 @@ f_callclobbered_calleesaved:
 .globl  f_unreachable_instruction
 .type   f_unreachable_instruction,@function
 f_unreachable_instruction:
-// CHECK-LABEL: GS-PAUTH: Warning: unreachable instruction found in function 
f_unreachable_instruction, basic block {{[0-9a-zA-Z.]+}}, at address
+// CHECK-LABEL: GS-PAUTH: Warning: no predecessor basic blocks detected 
(possibly incomplete CFG) in function f_unreachable_instruction, basic block 
{{[0-9a-zA-Z.]+}}, at address
 // CHECK-NEXT:The instruction is {{[0-9a-f]+}}:   add x0, x1, 
x2
 // CHECK-NOT:   instructi

[llvm-branch-commits] [llvm] [ConstraintElim] Simplify `usub_with_overflow` when A uge B (PR #135785)

2025-04-30 Thread Iris Shi via llvm-branch-commits

https://github.com/el-ev updated 
https://github.com/llvm/llvm-project/pull/135785

>From c5328e2075b5f98311eb9c9657da79d800cf28f4 Mon Sep 17 00:00:00 2001
From: Iris Shi <0...@owo.li>
Date: Tue, 15 Apr 2025 20:20:45 +0800
Subject: [PATCH 1/2] [ConstraintElim] Simplify `usub_with_overflow` when A uge
 B

---
 .../Scalar/ConstraintElimination.cpp  | 10 ++
 .../usub-with-overflow.ll | 33 +++
 2 files changed, 21 insertions(+), 22 deletions(-)

diff --git a/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp 
b/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
index 6d17b36b7d9ea..e030f0fac4c0d 100644
--- a/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
+++ b/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
@@ -1123,6 +1123,7 @@ void State::addInfoFor(BasicBlock &BB) {
 // Enqueue overflow intrinsics for simplification.
 case Intrinsic::sadd_with_overflow:
 case Intrinsic::ssub_with_overflow:
+case Intrinsic::usub_with_overflow:
 case Intrinsic::ucmp:
 case Intrinsic::scmp:
   WorkList.push_back(
@@ -1785,6 +1786,15 @@ tryToSimplifyOverflowMath(IntrinsicInst *II, 
ConstraintInfo &Info,
 Changed = replaceSubOverflowUses(II, A, B, ToRemove);
 break;
   }
+  case Intrinsic::usub_with_overflow: {
+// usub overflows iff A < B
+// TODO: If the operation is guaranteed to overflow, we could
+// also apply some simplifications.
+if (DoesConditionHold(CmpInst::ICMP_UGE, A, B, Info)) {
+  Changed = replaceSubOverflowUses(II, A, B, ToRemove);
+}
+break;
+  }
   }
 
   return Changed;
diff --git a/llvm/test/Transforms/ConstraintElimination/usub-with-overflow.ll 
b/llvm/test/Transforms/ConstraintElimination/usub-with-overflow.ll
index 06bfd8269d97d..0a3c4eb00fd25 100644
--- a/llvm/test/Transforms/ConstraintElimination/usub-with-overflow.ll
+++ b/llvm/test/Transforms/ConstraintElimination/usub-with-overflow.ll
@@ -9,11 +9,9 @@ define i8 @usub_no_overflow_due_to_cmp_condition(i8 %a, i8 %b) 
{
 ; CHECK-NEXT:[[C_1:%.*]] = icmp uge i8 [[B:%.*]], [[A:%.*]]
 ; CHECK-NEXT:br i1 [[C_1]], label [[MATH:%.*]], label [[EXIT_FAIL:%.*]]
 ; CHECK:   math:
-; CHECK-NEXT:[[OP:%.*]] = tail call { i8, i1 } 
@llvm.usub.with.overflow.i8(i8 [[B]], i8 [[A]])
-; CHECK-NEXT:[[STATUS:%.*]] = extractvalue { i8, i1 } [[OP]], 1
-; CHECK-NEXT:br i1 [[STATUS]], label [[EXIT_FAIL]], label [[EXIT_OK:%.*]]
+; CHECK-NEXT:[[RES:%.*]] = sub i8 [[B]], [[A]]
+; CHECK-NEXT:br i1 false, label [[EXIT_FAIL]], label [[EXIT_OK:%.*]]
 ; CHECK:   exit.ok:
-; CHECK-NEXT:[[RES:%.*]] = extractvalue { i8, i1 } [[OP]], 0
 ; CHECK-NEXT:ret i8 [[RES]]
 ; CHECK:   exit.fail:
 ; CHECK-NEXT:ret i8 0
@@ -41,11 +39,9 @@ define i8 @usub_no_overflow_due_to_cmp_condition2(i8 %a, i8 
%b) {
 ; CHECK-NEXT:[[C_1:%.*]] = icmp ule i8 [[B:%.*]], [[A:%.*]]
 ; CHECK-NEXT:br i1 [[C_1]], label [[EXIT_FAIL:%.*]], label [[MATH:%.*]]
 ; CHECK:   math:
-; CHECK-NEXT:[[OP:%.*]] = tail call { i8, i1 } 
@llvm.usub.with.overflow.i8(i8 [[B]], i8 [[A]])
-; CHECK-NEXT:[[STATUS:%.*]] = extractvalue { i8, i1 } [[OP]], 1
-; CHECK-NEXT:br i1 [[STATUS]], label [[EXIT_FAIL]], label [[EXIT_OK:%.*]]
+; CHECK-NEXT:[[RES:%.*]] = sub i8 [[B]], [[A]]
+; CHECK-NEXT:br i1 false, label [[EXIT_FAIL]], label [[EXIT_OK:%.*]]
 ; CHECK:   exit.ok:
-; CHECK-NEXT:[[RES:%.*]] = extractvalue { i8, i1 } [[OP]], 0
 ; CHECK-NEXT:ret i8 [[RES]]
 ; CHECK:   exit.fail:
 ; CHECK-NEXT:ret i8 0
@@ -75,12 +71,11 @@ define i8 
@sub_no_overflow_due_to_cmp_condition_result_used(i8 %a, i8 %b) {
 ; CHECK-NEXT:[[C_1:%.*]] = icmp ule i8 [[B:%.*]], [[A:%.*]]
 ; CHECK-NEXT:br i1 [[C_1]], label [[EXIT_FAIL:%.*]], label [[MATH:%.*]]
 ; CHECK:   math:
+; CHECK-NEXT:[[RES:%.*]] = sub i8 [[B]], [[A]]
 ; CHECK-NEXT:[[OP:%.*]] = tail call { i8, i1 } 
@llvm.usub.with.overflow.i8(i8 [[B]], i8 [[A]])
 ; CHECK-NEXT:call void @use_res({ i8, i1 } [[OP]])
-; CHECK-NEXT:[[STATUS:%.*]] = extractvalue { i8, i1 } [[OP]], 1
-; CHECK-NEXT:br i1 [[STATUS]], label [[EXIT_FAIL]], label [[EXIT_OK:%.*]]
+; CHECK-NEXT:br i1 false, label [[EXIT_FAIL]], label [[EXIT_OK:%.*]]
 ; CHECK:   exit.ok:
-; CHECK-NEXT:[[RES:%.*]] = extractvalue { i8, i1 } [[OP]], 0
 ; CHECK-NEXT:ret i8 [[RES]]
 ; CHECK:   exit.fail:
 ; CHECK-NEXT:ret i8 0
@@ -111,11 +106,9 @@ define i8 @usub_no_overflow_due_to_or_conds(i8 %a, i8 %b) {
 ; CHECK-NEXT:[[OR:%.*]] = or i1 [[C_2]], [[C_1]]
 ; CHECK-NEXT:br i1 [[OR]], label [[EXIT_FAIL:%.*]], label [[MATH:%.*]]
 ; CHECK:   math:
-; CHECK-NEXT:[[OP:%.*]] = tail call { i8, i1 } 
@llvm.usub.with.overflow.i8(i8 [[B]], i8 [[A]])
-; CHECK-NEXT:[[STATUS:%.*]] = extractvalue { i8, i1 } [[OP]], 1
-; CHECK-NEXT:br i1 [[STATUS]], label [[EXIT_FAIL]], label [[EXIT_OK:%.*]]
+; CHECK-NEXT:[[RES:%.*]] = sub i8 [[B]], [[A]]
+; CHECK-NEXT:br i1 false, label [[EX

[llvm-branch-commits] [llvm] [BOLT][test] Fix callcont-fallthru.s after #129481 (PR #135867)

2025-04-30 Thread Paschalis Mpeis via llvm-branch-commits

paschalis-mpeis wrote:

Great. A quick way to use an llvm tool could be:
```bash 
llvm-objdump -d -j .plt %t | grep @plt
```

This produces output similar to what `nm --synthetic` produces (when it works):
```bash
1430 :
```

You'll need ofc to tweak `link_fdata` to properly parse symbol+address:
https://github.com/llvm/llvm-project/blob/fb8d61d8163f1439749a62be614c09215fe65e9f/bolt/test/link_fdata.py#L96-L99

Not sure of any cleaner approach? (@yota9, @MaskRay)

https://github.com/llvm/llvm-project/pull/135867
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)

2025-04-30 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/125432

>From d382b955279cf3699599f1885e4c47af5f5bda2e Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 31 Jan 2025 13:12:56 -0500
Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic
 vector.

After splitting, all elements are created. The elements are
placed back into a concat_vectors. This change extends EltsFromConsecutiveLoads
to understand AtomicSDNode so that its concat_vectors can be
mapped to a BUILD_VECTOR and so unused elements are no longer
referenced.

commit-id:b83937a8
---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |   4 +-
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp |  20 ++-
 .../SelectionDAGAddressAnalysis.cpp   |  30 ++--
 .../SelectionDAG/SelectionDAGBuilder.cpp  |   6 +-
 llvm/lib/Target/X86/X86ISelLowering.cpp   |  29 +--
 llvm/test/CodeGen/X86/atomic-load-store.ll| 167 ++
 6 files changed, 69 insertions(+), 187 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index ba11ddbb5b731..d3cd81c146280 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1843,7 +1843,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will 
have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
@@ -2281,7 +2281,7 @@ class SelectionDAG {
   /// merged. Check that both are nonvolatile and if LD is loading
   /// 'Bytes' bytes from a location that is 'Dist' units away from the
   /// location that the 'Base' load is loading from.
-  bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base,
+  bool areNonVolatileConsecutiveLoads(MemSDNode *LD, MemSDNode *Base,
   unsigned Bytes, int Dist) const;
 
   /// Infer alignment of a load / store address. Return std::nullopt if it
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index d9d6d43c42430..e4d7e176c51e4 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -12213,7 +12213,7 @@ SDValue 
SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain,
   return TokenFactor;
 }
 
-SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad,
+SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad,
SDValue NewMemOp) {
   assert(isa(NewMemOp.getNode()) && "Expected a memop node");
   SDValue OldChain = SDValue(OldLoad, 1);
@@ -12906,17 +12906,21 @@ std::pair 
SelectionDAG::UnrollVectorOverflowOp(
 getBuildVector(NewOvVT, dl, OvScalars));
 }
 
-bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD,
-  LoadSDNode *Base,
+bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD,
+  MemSDNode *Base,
   unsigned Bytes,
   int Dist) const {
   if (LD->isVolatile() || Base->isVolatile())
 return false;
-  // TODO: probably too restrictive for atomics, revisit
-  if (!LD->isSimple())
-return false;
-  if (LD->isIndexed() || Base->isIndexed())
-return false;
+  if (auto Ld = dyn_cast(LD)) {
+if (!Ld->isSimple())
+  return false;
+if (Ld->isIndexed())
+  return false;
+  }
+  if (auto Ld = dyn_cast(Base))
+if (Ld->isIndexed())
+  return false;
   if (LD->getChain() != Base->getChain())
 return false;
   EVT VT = LD->getMemoryVT();
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
index f2ab88851b780..c29cb424c7a4c 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
@@ -195,8 +195,8 @@ bool BaseIndexOffset::contains(const SelectionDAG &DAG, 
int64_t BitSize,
 }
 
 /// Parses tree in Ptr for base, index, offset addresses.
-static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
-   const SelectionDAG &DAG) {
+template 
+static BaseIndexOffset matchSDNode(const T *N, const SelectionDAG &DAG) {
   SDValue Ptr = N->getBasePtr();
 
   // (((B + I*M) + c)) + c ...
@@ -206,16 +206,18 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
   bool IsIndexSignExt = false;
 
   // pre-inc/pre-dec ops are components of EA.
-  if (N->get

[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)

2025-04-30 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120386

>From cb2e5bc0ea3c3d0ccc3f684ef9bced8b30c7e74e Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:38:23 -0500
Subject: [PATCH] [X86] Manage atomic load of fp -> int promotion in DAG

When lowering atomic <1 x T> vector types with floats, selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.

commit-id:f9d761c5
---
 llvm/lib/Target/X86/X86ISelLowering.cpp|  4 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll | 37 ++
 2 files changed, 41 insertions(+)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 483aceb239b0c..c48344332d707 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -2651,6 +2651,10 @@ X86TargetLowering::X86TargetLowering(const 
X86TargetMachine &TM,
 setOperationAction(Op, MVT::f32, Promote);
   }
 
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f16, MVT::i16);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f32, MVT::i32);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f64, MVT::i64);
+
   // We have target-specific dag combine patterns for the following nodes:
   setTargetDAGCombine({ISD::VECTOR_SHUFFLE,
ISD::SCALAR_TO_VECTOR,
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index d23cfb89f9fc8..6efcbb80c0ce6 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -145,3 +145,40 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   %ret = load atomic <1 x i64>, ptr %x acquire, align 8
   ret <1 x i64> %ret
 }
+
+define <1 x half> @atomic_vec1_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %cx
+; CHECK0-NEXT:## implicit-def: $eax
+; CHECK0-NEXT:movw %cx, %ax
+; CHECK0-NEXT:## implicit-def: $xmm0
+; CHECK0-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x half>, ptr %x acquire, align 2
+  ret <1 x half> %ret
+}
+
+define <1 x float> @atomic_vec1_float(ptr %x) {
+; CHECK-LABEL: atomic_vec1_float:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x float>, ptr %x acquire, align 4
+  ret <1 x float> %ret
+}
+
+define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec1_double_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 8
+  ret <1 x double> %ret
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-04-30 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From be154f89c03bf89ee99d213ca954c77c44fc4e9c Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 20 +-
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 30 +
 .../X86/expand-atomic-non-integer.ll  | 65 +++
 4 files changed, 163 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index c376de877ac7d..b6f1e9db9ce35 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2066,9 +2066,23 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's scalar type to I's  vector type
+  if (I->getType()->getScalarType()->isPointerTy() &&
+  I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) {
+unsigned AS =
+
cast(I->getType()->getScalarType())->getAddressSpace();
+ElementCount EC = cast(I->getType())->getElementCount();
+Value *BC = Builder.CreateBitCast(
+Result,
+VectorType::get(IntegerType::get(Ctx, DL.getPointerSizeInBits(AS)),
+EC));
+Value *IntToPtr = Builder.CreateIntToPtr(
+BC, VectorType::get(PointerType::get(Ctx, AS), EC));
+V = Builder.CreateBitOrPointerCast(IntToPtr, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29..36c1305a7c5df 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 08d0405345f57..4293df8c13571 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -371,6 +371,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:popq %rax
+; CHECK-NEXT:retq
+  %ret = load atomic <2 x ptr>, ptr %x acquire, align 16
+  ret <2 x ptr> %ret
+}
+
 define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind {
 ; CHECK3-LAB

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-04-30 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120598

>From 867af9f33e61399964c51dd0c40dfcf37ec0a1e0 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 11:19:39 -0500
Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic
 load

Vector types of 2 elements must be widened. This change does this
for vector types of atomic load in SelectionDAG
so that it can translate aligned vectors of >1 size. Also,
it also adds Pats to remove an extra MOV.

commit-id:2894ccd1
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 104 ++
 llvm/lib/Target/X86/X86InstrCompiler.td   |   7 ++
 llvm/test/CodeGen/X86/atomic-load-store.ll|  81 ++
 llvm/test/CodeGen/X86/atomic-unordered.ll |   3 +-
 5 files changed, 173 insertions(+), 23 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 89ea7ef4dbe89..bdfa5f7741ad3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -1062,6 +1062,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N);
   SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N);
   SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N);
+  SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue WidenVecRes_LOAD(SDNode* N);
   SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N);
   SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index f9034c94bfb6d..23353b326b02c 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -4590,6 +4590,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, 
unsigned ResNo) {
 break;
   case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break;
   case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+Res = WidenVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:  Res = WidenVecRes_LOAD(N); break;
   case ISD::STEP_VECTOR:
   case ISD::SPLAT_VECTOR:
@@ -5979,6 +5982,85 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) {
  N->getOperand(1), N->getOperand(2));
 }
 
+/// Either return the same load or provide appropriate casts
+/// from the load and return that.
+static SDValue loadElement(SDValue LdOp, EVT FirstVT, EVT WidenVT,
+   TypeSize LdWidth, TypeSize FirstVTWidth, SDLoc dl,
+   SelectionDAG &DAG) {
+  assert(TypeSize::isKnownLE(LdWidth, FirstVTWidth));
+  TypeSize WidenWidth = WidenVT.getSizeInBits();
+  if (!FirstVT.isVector()) {
+unsigned NumElts =
+WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue();
+EVT NewVecVT = EVT::getVectorVT(*DAG.getContext(), FirstVT, NumElts);
+SDValue VecOp = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, NewVecVT, LdOp);
+return DAG.getNode(ISD::BITCAST, dl, WidenVT, VecOp);
+  } else if (FirstVT == WidenVT)
+return LdOp;
+  else {
+// TODO: We don't currently have any tests that exercise this code path.
+assert(WidenWidth.getFixedValue() % FirstVTWidth.getFixedValue() == 0);
+unsigned NumConcat =
+WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue();
+SmallVector ConcatOps(NumConcat);
+SDValue UndefVal = DAG.getUNDEF(FirstVT);
+ConcatOps[0] = LdOp;
+for (unsigned i = 1; i != NumConcat; ++i)
+  ConcatOps[i] = UndefVal;
+return DAG.getNode(ISD::CONCAT_VECTORS, dl, WidenVT, ConcatOps);
+  }
+}
+
+static std::optional findMemType(SelectionDAG &DAG,
+  const TargetLowering &TLI, unsigned 
Width,
+  EVT WidenVT, unsigned Align,
+  unsigned WidenEx);
+
+SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *LD) {
+  EVT WidenVT =
+  TLI.getTypeToTransformTo(*DAG.getContext(),LD->getValueType(0));
+  EVT LdVT = LD->getMemoryVT();
+  SDLoc dl(LD);
+  assert(LdVT.isVector() && WidenVT.isVector() && "Expected vectors");
+  assert(LdVT.isScalableVector() == WidenVT.isScalableVector() &&
+  "Must be scalable");
+  assert(LdVT.getVectorElementType() == WidenVT.getVectorElementType() &&
+  "Expected equivalent element types");
+
+  // Load information
+  SDValue Chain = LD->getChain();
+  SDValue BasePtr = LD->getBasePtr();
+  MachineMemOperand::Flags MMOFlags = LD->getMemOperand()->getFlags();
+  AAMDNodes AAInfo = LD->getAAInfo();
+
+  TypeSize LdWidth = LdVT.getSizeInBits();
+  TypeSize WidenWidth = WidenVT.getSizeInBits();
+  TypeSize WidthDiff = WidenWidth - LdWidth;
+
+  // Find the vector type that can load from.
+  std::optional FirstVT =
+  fi

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-04-30 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120598

>From 867af9f33e61399964c51dd0c40dfcf37ec0a1e0 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 11:19:39 -0500
Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic
 load

Vector types of 2 elements must be widened. This change does this
for vector types of atomic load in SelectionDAG
so that it can translate aligned vectors of >1 size. Also,
it also adds Pats to remove an extra MOV.

commit-id:2894ccd1
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 104 ++
 llvm/lib/Target/X86/X86InstrCompiler.td   |   7 ++
 llvm/test/CodeGen/X86/atomic-load-store.ll|  81 ++
 llvm/test/CodeGen/X86/atomic-unordered.ll |   3 +-
 5 files changed, 173 insertions(+), 23 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 89ea7ef4dbe89..bdfa5f7741ad3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -1062,6 +1062,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N);
   SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N);
   SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N);
+  SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue WidenVecRes_LOAD(SDNode* N);
   SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N);
   SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index f9034c94bfb6d..23353b326b02c 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -4590,6 +4590,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, 
unsigned ResNo) {
 break;
   case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break;
   case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+Res = WidenVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:  Res = WidenVecRes_LOAD(N); break;
   case ISD::STEP_VECTOR:
   case ISD::SPLAT_VECTOR:
@@ -5979,6 +5982,85 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) {
  N->getOperand(1), N->getOperand(2));
 }
 
+/// Either return the same load or provide appropriate casts
+/// from the load and return that.
+static SDValue loadElement(SDValue LdOp, EVT FirstVT, EVT WidenVT,
+   TypeSize LdWidth, TypeSize FirstVTWidth, SDLoc dl,
+   SelectionDAG &DAG) {
+  assert(TypeSize::isKnownLE(LdWidth, FirstVTWidth));
+  TypeSize WidenWidth = WidenVT.getSizeInBits();
+  if (!FirstVT.isVector()) {
+unsigned NumElts =
+WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue();
+EVT NewVecVT = EVT::getVectorVT(*DAG.getContext(), FirstVT, NumElts);
+SDValue VecOp = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, NewVecVT, LdOp);
+return DAG.getNode(ISD::BITCAST, dl, WidenVT, VecOp);
+  } else if (FirstVT == WidenVT)
+return LdOp;
+  else {
+// TODO: We don't currently have any tests that exercise this code path.
+assert(WidenWidth.getFixedValue() % FirstVTWidth.getFixedValue() == 0);
+unsigned NumConcat =
+WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue();
+SmallVector ConcatOps(NumConcat);
+SDValue UndefVal = DAG.getUNDEF(FirstVT);
+ConcatOps[0] = LdOp;
+for (unsigned i = 1; i != NumConcat; ++i)
+  ConcatOps[i] = UndefVal;
+return DAG.getNode(ISD::CONCAT_VECTORS, dl, WidenVT, ConcatOps);
+  }
+}
+
+static std::optional findMemType(SelectionDAG &DAG,
+  const TargetLowering &TLI, unsigned 
Width,
+  EVT WidenVT, unsigned Align,
+  unsigned WidenEx);
+
+SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *LD) {
+  EVT WidenVT =
+  TLI.getTypeToTransformTo(*DAG.getContext(),LD->getValueType(0));
+  EVT LdVT = LD->getMemoryVT();
+  SDLoc dl(LD);
+  assert(LdVT.isVector() && WidenVT.isVector() && "Expected vectors");
+  assert(LdVT.isScalableVector() == WidenVT.isScalableVector() &&
+  "Must be scalable");
+  assert(LdVT.getVectorElementType() == WidenVT.getVectorElementType() &&
+  "Expected equivalent element types");
+
+  // Load information
+  SDValue Chain = LD->getChain();
+  SDValue BasePtr = LD->getBasePtr();
+  MachineMemOperand::Flags MMOFlags = LD->getMemOperand()->getFlags();
+  AAMDNodes AAInfo = LD->getAAInfo();
+
+  TypeSize LdWidth = LdVT.getSizeInBits();
+  TypeSize WidenWidth = WidenVT.getSizeInBits();
+  TypeSize WidthDiff = WidenWidth - LdWidth;
+
+  // Find the vector type that can load from.
+  std::optional FirstVT =
+  fi

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Split via Concat vector types for atomic load (PR #120640)

2025-04-30 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120640

>From 79fce921b0cb321760da43ede34e58d4e0880d33 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 16:25:55 -0500
Subject: [PATCH] [SelectionDAG][X86] Split via Concat  vector types for
 atomic load

Vector types that aren't widened are 'split' via CONCAT_VECTORS
so that a single ATOMIC_LOAD is issued for the entire vector at once.
This change utilizes the load vectorization infrastructure in
SelectionDAG in order to group the vectors. This enables SelectionDAG
to translate vectors with type bfloat,half.

commit-id:3a045357
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  33 
 llvm/test/CodeGen/X86/atomic-load-store.ll| 171 ++
 3 files changed, 205 insertions(+)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index bdfa5f7741ad3..7905f5a94c579 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -960,6 +960,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);
+  void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD);
   void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 23353b326b02c..c1cb45ed40d49 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1172,6 +1172,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SplitVecRes_STEP_VECTOR(N, Lo, Hi);
 break;
   case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break;
+  case ISD::ATOMIC_LOAD:
+SplitVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:
 SplitVecRes_LOAD(cast(N), Lo, Hi);
 break;
@@ -1421,6 +1424,36 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD) {
+  SDLoc dl(LD);
+
+  EVT MemoryVT = LD->getMemoryVT();
+  unsigned NumElts = MemoryVT.getVectorMinNumElements();
+
+  EVT IntMemoryVT = EVT::getVectorVT(*DAG.getContext(), MVT::i16, NumElts);
+  EVT ElemVT =
+  EVT::getVectorVT(*DAG.getContext(), MemoryVT.getVectorElementType(), 1);
+
+  // Create a single atomic to load all the elements at once.
+  SDValue Atomic =
+  DAG.getAtomicLoad(ISD::NON_EXTLOAD, dl, IntMemoryVT, IntMemoryVT,
+LD->getChain(), LD->getBasePtr(),
+LD->getMemOperand());
+
+  // Instead of splitting, put all the elements back into a vector.
+  SmallVector Ops;
+  for (unsigned i = 0; i < NumElts; ++i) {
+SDValue Elt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::i16, Atomic,
+  DAG.getVectorIdxConstant(i, dl));
+Elt = DAG.getBitcast(ElemVT, Elt);
+Ops.push_back(Elt);
+  }
+  SDValue Concat = DAG.getNode(ISD::CONCAT_VECTORS, dl, MemoryVT, Ops);
+
+  ReplaceValueWith(SDValue(LD, 0), Concat);
+  ReplaceValueWith(SDValue(LD, 1), LD->getChain());
+}
+
 void DAGTypeLegalizer::IncrementPointer(MemSDNode *N, EVT MemVT,
 MachinePointerInfo &MPI, SDValue &Ptr,
 uint64_t *ScaledOffset) {
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 935d058a52f8f..227bdbf9c0747 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -204,6 +204,76 @@ define <2 x float> @atomic_vec2_float_align(ptr %x) {
   ret <2 x float> %ret
 }
 
+define <2 x half> @atomic_vec2_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec2_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movl (%rdi), %eax
+; CHECK3-NEXT:movd %eax, %xmm1
+; CHECK3-NEXT:shrl $16, %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm2
+; CHECK3-NEXT:movdqa {{.*#+}} xmm0 = 
[65535,0,65535,65535,65535,65535,65535,65535]
+; CHECK3-NEXT:pand %xmm0, %xmm1
+; CHECK3-NEXT:pslld $16, %xmm2
+; CHECK3-NEXT:pandn %xmm2, %xmm0
+; CHECK3-NEXT:por %xmm1, %xmm0
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movl (%rdi), %eax
+; CHECK0-NEXT:movl %eax, %ecx
+; CHECK0-NEXT:shrl $16, %ecx
+; CHECK0-NEXT:movw %cx, %dx
+; CHECK0-NEXT:## implicit-def: $ecx
+; CHECK0-NEXT:movw %dx, %cx
+; CHECK0-NEXT:## implicit-def: $xmm2
+; CHEC

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-04-30 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From be154f89c03bf89ee99d213ca954c77c44fc4e9c Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 20 +-
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 30 +
 .../X86/expand-atomic-non-integer.ll  | 65 +++
 4 files changed, 163 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index c376de877ac7d..b6f1e9db9ce35 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2066,9 +2066,23 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's scalar type to I's  vector type
+  if (I->getType()->getScalarType()->isPointerTy() &&
+  I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) {
+unsigned AS =
+
cast(I->getType()->getScalarType())->getAddressSpace();
+ElementCount EC = cast(I->getType())->getElementCount();
+Value *BC = Builder.CreateBitCast(
+Result,
+VectorType::get(IntegerType::get(Ctx, DL.getPointerSizeInBits(AS)),
+EC));
+Value *IntToPtr = Builder.CreateIntToPtr(
+BC, VectorType::get(PointerType::get(Ctx, AS), EC));
+V = Builder.CreateBitOrPointerCast(IntToPtr, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29..36c1305a7c5df 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 08d0405345f57..4293df8c13571 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -371,6 +371,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:popq %rax
+; CHECK-NEXT:retq
+  %ret = load atomic <2 x ptr>, ptr %x acquire, align 16
+  ret <2 x ptr> %ret
+}
+
 define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind {
 ; CHECK3-LAB

[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)

2025-04-30 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120387

>From a60144bede07d2ca53f5f3a48009b10d10f8c082 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:40:32 -0500
Subject: [PATCH] [X86] Add atomic vector tests for unaligned >1 sizes.

Unaligned atomic vectors with size >1 are lowered to calls.
Adding their tests separately here.

commit-id:a06a5cc6
---
 llvm/test/CodeGen/X86/atomic-load-store.ll | 253 +
 1 file changed, 253 insertions(+)

diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 6efcbb80c0ce6..39e9fdfa5e62b 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -146,6 +146,34 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   ret <1 x i64> %ret
 }
 
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_ptr:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_ptr:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
+
 define <1 x half> @atomic_vec1_half(ptr %x) {
 ; CHECK3-LABEL: atomic_vec1_half:
 ; CHECK3:   ## %bb.0:
@@ -182,3 +210,228 @@ define <1 x double> @atomic_vec1_double_align(ptr %x) 
nounwind {
   %ret = load atomic <1 x double>, ptr %x acquire, align 8
   ret <1 x double> %ret
 }
+
+define <1 x i64> @atomic_vec1_i64(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_i64:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i64:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i64>, ptr %x acquire, align 4
+  ret <1 x i64> %ret
+}
+
+define <1 x double> @atomic_vec1_double(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_double:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_double:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 4
+  ret <1 x double> %ret
+}
+
+define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec2_i32:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_i32:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <2 x i32>, ptr %x acquire, align 4
+  ret <2 x i32> %ret
+}
+
+define <4 x float> @atomic_vec4_float_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec4_float_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-04-30 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120598

>From 867af9f33e61399964c51dd0c40dfcf37ec0a1e0 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 11:19:39 -0500
Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic
 load

Vector types of 2 elements must be widened. This change does this
for vector types of atomic load in SelectionDAG
so that it can translate aligned vectors of >1 size. Also,
it also adds Pats to remove an extra MOV.

commit-id:2894ccd1
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 104 ++
 llvm/lib/Target/X86/X86InstrCompiler.td   |   7 ++
 llvm/test/CodeGen/X86/atomic-load-store.ll|  81 ++
 llvm/test/CodeGen/X86/atomic-unordered.ll |   3 +-
 5 files changed, 173 insertions(+), 23 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 89ea7ef4dbe89..bdfa5f7741ad3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -1062,6 +1062,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N);
   SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N);
   SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N);
+  SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue WidenVecRes_LOAD(SDNode* N);
   SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N);
   SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index f9034c94bfb6d..23353b326b02c 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -4590,6 +4590,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, 
unsigned ResNo) {
 break;
   case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break;
   case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+Res = WidenVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:  Res = WidenVecRes_LOAD(N); break;
   case ISD::STEP_VECTOR:
   case ISD::SPLAT_VECTOR:
@@ -5979,6 +5982,85 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) {
  N->getOperand(1), N->getOperand(2));
 }
 
+/// Either return the same load or provide appropriate casts
+/// from the load and return that.
+static SDValue loadElement(SDValue LdOp, EVT FirstVT, EVT WidenVT,
+   TypeSize LdWidth, TypeSize FirstVTWidth, SDLoc dl,
+   SelectionDAG &DAG) {
+  assert(TypeSize::isKnownLE(LdWidth, FirstVTWidth));
+  TypeSize WidenWidth = WidenVT.getSizeInBits();
+  if (!FirstVT.isVector()) {
+unsigned NumElts =
+WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue();
+EVT NewVecVT = EVT::getVectorVT(*DAG.getContext(), FirstVT, NumElts);
+SDValue VecOp = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, NewVecVT, LdOp);
+return DAG.getNode(ISD::BITCAST, dl, WidenVT, VecOp);
+  } else if (FirstVT == WidenVT)
+return LdOp;
+  else {
+// TODO: We don't currently have any tests that exercise this code path.
+assert(WidenWidth.getFixedValue() % FirstVTWidth.getFixedValue() == 0);
+unsigned NumConcat =
+WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue();
+SmallVector ConcatOps(NumConcat);
+SDValue UndefVal = DAG.getUNDEF(FirstVT);
+ConcatOps[0] = LdOp;
+for (unsigned i = 1; i != NumConcat; ++i)
+  ConcatOps[i] = UndefVal;
+return DAG.getNode(ISD::CONCAT_VECTORS, dl, WidenVT, ConcatOps);
+  }
+}
+
+static std::optional findMemType(SelectionDAG &DAG,
+  const TargetLowering &TLI, unsigned 
Width,
+  EVT WidenVT, unsigned Align,
+  unsigned WidenEx);
+
+SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *LD) {
+  EVT WidenVT =
+  TLI.getTypeToTransformTo(*DAG.getContext(),LD->getValueType(0));
+  EVT LdVT = LD->getMemoryVT();
+  SDLoc dl(LD);
+  assert(LdVT.isVector() && WidenVT.isVector() && "Expected vectors");
+  assert(LdVT.isScalableVector() == WidenVT.isScalableVector() &&
+  "Must be scalable");
+  assert(LdVT.getVectorElementType() == WidenVT.getVectorElementType() &&
+  "Expected equivalent element types");
+
+  // Load information
+  SDValue Chain = LD->getChain();
+  SDValue BasePtr = LD->getBasePtr();
+  MachineMemOperand::Flags MMOFlags = LD->getMemOperand()->getFlags();
+  AAMDNodes AAInfo = LD->getAAInfo();
+
+  TypeSize LdWidth = LdVT.getSizeInBits();
+  TypeSize WidenWidth = WidenVT.getSizeInBits();
+  TypeSize WidthDiff = WidenWidth - LdWidth;
+
+  // Find the vector type that can load from.
+  std::optional FirstVT =
+  fi

[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)

2025-04-30 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120385

>From 7f2115c386e164541ba579eefd87bdfbb45c2890 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:37:17 -0500
Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load

`load atomic <1 x T>` is not valid. This change legalizes
vector types of atomic load via scalarization in SelectionDAG
so that it can, for example, translate from `v1i32` to `i32`.

commit-id:5c36cc8c
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  15 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +-
 3 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 720393158aa5e..89ea7ef4dbe89 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -874,6 +874,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N);
   SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N);
   SDValue ScalarizeVecRes_LOAD(LoadSDNode *N);
+  SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N);
   SDValue ScalarizeVecRes_VSELECT(SDNode *N);
   SDValue ScalarizeVecRes_SELECT(SDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index a01e1cff74564..f9034c94bfb6d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, 
unsigned ResNo) {
 R = ScalarizeVecRes_UnaryOpWithExtraInput(N);
 break;
   case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+R = ScalarizeVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:   R = 
ScalarizeVecRes_LOAD(cast(N));break;
   case ISD::SCALAR_TO_VECTOR:  R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break;
   case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break;
@@ -458,6 +461,18 @@ SDValue 
DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) {
   return Op;
 }
 
+SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SDValue Result = DAG.getAtomicLoad(
+  ISD::NON_EXTLOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(),
+  N->getValueType(0).getVectorElementType(), N->getChain(),
+  N->getBasePtr(), N->getMemOperand());
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(N, 1), Result.getValue(1));
+  return Result;
+}
+
 SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) {
   assert(N->isUnindexed() && "Indexed vector load?");
 
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 5bce4401f7bdb..d23cfb89f9fc8 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s --check-prefixes=CHECK,CHECK3
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s --check-prefixes=CHECK,CHECK0
 
 define void @test1(ptr %ptr, i32 %val1) {
 ; CHECK-LABEL: test1:
@@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) {
   %val = load atomic i32, ptr %ptr seq_cst, align 4
   ret i32 %val
 }
+
+define <1 x i32> @atomic_vec1_i32(ptr %x) {
+; CHECK-LABEL: atomic_vec1_i32:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movl (%rdi), %eax
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x i32>, ptr %x acquire, align 4
+  ret <1 x i32> %ret
+}
+
+define <1 x i8> @atomic_vec1_i8(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i8:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzbl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i8:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movb (%rdi), %al
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i8>, ptr %x acquire, align 1
+  ret <1 x i8> %ret
+}
+
+define <1 x i16> @atomic_vec1_i16(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i16:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i16:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %ax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i16>, ptr %x acquire, align 2
+  ret <1 x i16> %ret
+}
+
+define <1 x i32> @atomic_vec1_i8_zext(ptr %x) {
+; CHECK3-LABEL: atomic_vec

[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)

2025-04-30 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120385

>From 7f2115c386e164541ba579eefd87bdfbb45c2890 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:37:17 -0500
Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load

`load atomic <1 x T>` is not valid. This change legalizes
vector types of atomic load via scalarization in SelectionDAG
so that it can, for example, translate from `v1i32` to `i32`.

commit-id:5c36cc8c
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  15 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +-
 3 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 720393158aa5e..89ea7ef4dbe89 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -874,6 +874,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N);
   SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N);
   SDValue ScalarizeVecRes_LOAD(LoadSDNode *N);
+  SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N);
   SDValue ScalarizeVecRes_VSELECT(SDNode *N);
   SDValue ScalarizeVecRes_SELECT(SDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index a01e1cff74564..f9034c94bfb6d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, 
unsigned ResNo) {
 R = ScalarizeVecRes_UnaryOpWithExtraInput(N);
 break;
   case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+R = ScalarizeVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:   R = 
ScalarizeVecRes_LOAD(cast(N));break;
   case ISD::SCALAR_TO_VECTOR:  R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break;
   case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break;
@@ -458,6 +461,18 @@ SDValue 
DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) {
   return Op;
 }
 
+SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SDValue Result = DAG.getAtomicLoad(
+  ISD::NON_EXTLOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(),
+  N->getValueType(0).getVectorElementType(), N->getChain(),
+  N->getBasePtr(), N->getMemOperand());
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(N, 1), Result.getValue(1));
+  return Result;
+}
+
 SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) {
   assert(N->isUnindexed() && "Indexed vector load?");
 
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 5bce4401f7bdb..d23cfb89f9fc8 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s --check-prefixes=CHECK,CHECK3
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s --check-prefixes=CHECK,CHECK0
 
 define void @test1(ptr %ptr, i32 %val1) {
 ; CHECK-LABEL: test1:
@@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) {
   %val = load atomic i32, ptr %ptr seq_cst, align 4
   ret i32 %val
 }
+
+define <1 x i32> @atomic_vec1_i32(ptr %x) {
+; CHECK-LABEL: atomic_vec1_i32:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movl (%rdi), %eax
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x i32>, ptr %x acquire, align 4
+  ret <1 x i32> %ret
+}
+
+define <1 x i8> @atomic_vec1_i8(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i8:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzbl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i8:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movb (%rdi), %al
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i8>, ptr %x acquire, align 1
+  ret <1 x i8> %ret
+}
+
+define <1 x i16> @atomic_vec1_i16(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i16:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i16:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %ax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i16>, ptr %x acquire, align 2
+  ret <1 x i16> %ret
+}
+
+define <1 x i32> @atomic_vec1_i8_zext(ptr %x) {
+; CHECK3-LABEL: atomic_vec

[llvm-branch-commits] [mlir] [MLIR][OpenMP] Assert on map translation functions, NFC (PR #137199)

2025-04-30 Thread Pranav Bhandarkar via llvm-branch-commits


@@ -3623,6 +3623,9 @@ static llvm::omp::OpenMPOffloadMappingFlags 
mapParentWithMembers(
 LLVM::ModuleTranslation &moduleTranslation, llvm::IRBuilderBase &builder,
 llvm::OpenMPIRBuilder &ompBuilder, DataLayout &dl, MapInfosTy 
&combinedInfo,
 MapInfoData &mapData, uint64_t mapDataIndex, bool isTargetParams) {
+  assert(!ompBuilder.Config.isTargetDevice() &&
+ "function only supported for host device codegen");

bhandarkar-pranav wrote:

> Yes, I do agree that "host device" seems like a rather contradictory term, 
> since we generally just talk about "host" as the CPU and "device" as whatever 
> we're offloading to. The reason of using these terms is to align with OpenMP 
> terminology (5.2 spec, Section 1.2.1):
> 
> > **host device** The _device_ on which the _OpenMP program_ begins execution.
> > **target device** A _device_ with respect to which the current _device_ 
> > performs an operation, as specified by a _device construct_ or an OpenMP 
> > device memory routine.
> 
> This is also why the `-fopenmp-is-target-device` flag is called that and not 
> `-fopenmp-is-device` (which [used to be its 
> name](https://reviews.llvm.org/D154591)).

Thank you, good to know. So in this world every thing is a device and some 
devices are hosts and some are targets meant to be offloaded to.

https://github.com/llvm/llvm-project/pull/137199
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DirectX] Implementation of DXILResourceImplicitBinding pass (PR #138043)

2025-04-30 Thread Helena Kotas via llvm-branch-commits

https://github.com/hekota edited 
https://github.com/llvm/llvm-project/pull/138043
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DirectX] Implement DXILResourceImplicitBinding pass (PR #138043)

2025-04-30 Thread Helena Kotas via llvm-branch-commits

https://github.com/hekota edited 
https://github.com/llvm/llvm-project/pull/138043
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DirectX] Implement DXILResourceImplicitBinding pass (PR #138043)

2025-04-30 Thread Helena Kotas via llvm-branch-commits

https://github.com/hekota updated 
https://github.com/llvm/llvm-project/pull/138043

>From f58e2d5c079f31d7a4d99d481a569ad4c754c4e7 Mon Sep 17 00:00:00 2001
From: Helena Kotas 
Date: Wed, 30 Apr 2025 15:08:04 -0700
Subject: [PATCH 1/2] [HLSL] Implementation of DXILResourceImplicitBinding pass

This pass takes advantage of the DXILResourceBinding analysis and assigns
register slots to resources that do not have explicit binding.

Part 2/2 of #136786

Closes #136786
---
 llvm/include/llvm/Analysis/DXILResource.h |   8 +
 llvm/include/llvm/InitializePasses.h  |   1 +
 llvm/lib/Analysis/Analysis.cpp|   1 +
 llvm/lib/Analysis/DXILResource.cpp|  53 +
 llvm/lib/Target/DirectX/CMakeLists.txt|   1 +
 .../DirectX/DXILResourceImplicitBinding.cpp   | 182 ++
 .../DirectX/DXILResourceImplicitBinding.h |  29 +++
 llvm/lib/Target/DirectX/DirectX.h |   6 +
 .../Target/DirectX/DirectXPassRegistry.def|   1 +
 .../Target/DirectX/DirectXTargetMachine.cpp   |   3 +
 .../CodeGen/DirectX/ImplicitBinding/arrays.ll |  41 
 .../ImplicitBinding/multiple-spaces.ll|  44 +
 .../CodeGen/DirectX/ImplicitBinding/simple.ll |  30 +++
 .../ImplicitBinding/unbounded-arrays-error.ll |  34 
 .../ImplicitBinding/unbounded-arrays.ll   |  42 
 llvm/test/CodeGen/DirectX/llc-pipeline.ll |   2 +
 16 files changed, 478 insertions(+)
 create mode 100644 llvm/lib/Target/DirectX/DXILResourceImplicitBinding.cpp
 create mode 100644 llvm/lib/Target/DirectX/DXILResourceImplicitBinding.h
 create mode 100644 llvm/test/CodeGen/DirectX/ImplicitBinding/arrays.ll
 create mode 100644 llvm/test/CodeGen/DirectX/ImplicitBinding/multiple-spaces.ll
 create mode 100644 llvm/test/CodeGen/DirectX/ImplicitBinding/simple.ll
 create mode 100644 
llvm/test/CodeGen/DirectX/ImplicitBinding/unbounded-arrays-error.ll
 create mode 100644 
llvm/test/CodeGen/DirectX/ImplicitBinding/unbounded-arrays.ll

diff --git a/llvm/include/llvm/Analysis/DXILResource.h 
b/llvm/include/llvm/Analysis/DXILResource.h
index 569f5346b9e36..fd34fdfda4c1c 100644
--- a/llvm/include/llvm/Analysis/DXILResource.h
+++ b/llvm/include/llvm/Analysis/DXILResource.h
@@ -632,12 +632,15 @@ class DXILResourceBindingInfo {
 RegisterSpace(uint32_t Space) : Space(Space) {
   FreeRanges.emplace_back(0, UINT32_MAX);
 }
+// Size == -1 means unbounded array
+bool findAvailableBinding(int32_t Size, uint32_t *RegSlot);
   };
 
   struct BindingSpaces {
 dxil::ResourceClass ResClass;
 llvm::SmallVector Spaces;
 BindingSpaces(dxil::ResourceClass ResClass) : ResClass(ResClass) {}
+RegisterSpace &getOrInsertSpace(uint32_t Space);
   };
 
 private:
@@ -658,6 +661,7 @@ class DXILResourceBindingInfo {
 OverlappingBinding(false) {}
 
   bool hasImplicitBinding() const { return ImplicitBinding; }
+  void setHasImplicitBinding(bool Value) { ImplicitBinding = Value; }
   bool hasOverlappingBinding() const { return OverlappingBinding; }
 
   BindingSpaces &getBindingSpaces(dxil::ResourceClass RC) {
@@ -673,6 +677,10 @@ class DXILResourceBindingInfo {
 }
   }
 
+  // Size == -1 means unbounded array
+  bool findAvailableBinding(dxil::ResourceClass RC, uint32_t Space,
+int32_t Size, uint32_t *RegSlot);
+
   friend class DXILResourceBindingAnalysis;
   friend class DXILResourceBindingWrapperPass;
 };
diff --git a/llvm/include/llvm/InitializePasses.h 
b/llvm/include/llvm/InitializePasses.h
index b84c6d6123d58..71db56d922676 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -85,6 +85,7 @@ void initializeDCELegacyPassPass(PassRegistry &);
 void initializeDXILMetadataAnalysisWrapperPassPass(PassRegistry &);
 void initializeDXILMetadataAnalysisWrapperPrinterPass(PassRegistry &);
 void initializeDXILResourceBindingWrapperPassPass(PassRegistry &);
+void initializeDXILResourceImplicitBindingLegacyPass(PassRegistry &);
 void initializeDXILResourceTypeWrapperPassPass(PassRegistry &);
 void initializeDXILResourceWrapperPassPass(PassRegistry &);
 void initializeDeadMachineInstructionElimPass(PassRegistry &);
diff --git a/llvm/lib/Analysis/Analysis.cpp b/llvm/lib/Analysis/Analysis.cpp
index 484a456f49f1b..9f5daf32be9a0 100644
--- a/llvm/lib/Analysis/Analysis.cpp
+++ b/llvm/lib/Analysis/Analysis.cpp
@@ -27,6 +27,7 @@ void llvm::initializeAnalysis(PassRegistry &Registry) {
   initializeCallGraphViewerPass(Registry);
   initializeCycleInfoWrapperPassPass(Registry);
   initializeDXILMetadataAnalysisWrapperPassPass(Registry);
+  initializeDXILResourceWrapperPassPass(Registry);
   initializeDXILResourceBindingWrapperPassPass(Registry);
   initializeDXILResourceTypeWrapperPassPass(Registry);
   initializeDXILResourceWrapperPassPass(Registry);
diff --git a/llvm/lib/Analysis/DXILResource.cpp 
b/llvm/lib/Analysis/DXILResource.cpp
index ce8e250a32ebe..34355c81e77b0 100644
--- a/llvm/lib/Analysis/DXILResource.cpp
+++ b/llvm/lib

[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)

2025-04-30 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120387

>From 6d045290b83d4abb2765e6d323d7a441aab6445c Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:40:32 -0500
Subject: [PATCH] [X86] Add atomic vector tests for unaligned >1 sizes.

Unaligned atomic vectors with size >1 are lowered to calls.
Adding their tests separately here.

commit-id:a06a5cc6
---
 llvm/test/CodeGen/X86/atomic-load-store.ll | 253 +
 1 file changed, 253 insertions(+)

diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 6efcbb80c0ce6..39e9fdfa5e62b 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -146,6 +146,34 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   ret <1 x i64> %ret
 }
 
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_ptr:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_ptr:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
+
 define <1 x half> @atomic_vec1_half(ptr %x) {
 ; CHECK3-LABEL: atomic_vec1_half:
 ; CHECK3:   ## %bb.0:
@@ -182,3 +210,228 @@ define <1 x double> @atomic_vec1_double_align(ptr %x) 
nounwind {
   %ret = load atomic <1 x double>, ptr %x acquire, align 8
   ret <1 x double> %ret
 }
+
+define <1 x i64> @atomic_vec1_i64(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_i64:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i64:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i64>, ptr %x acquire, align 4
+  ret <1 x i64> %ret
+}
+
+define <1 x double> @atomic_vec1_double(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_double:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_double:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 4
+  ret <1 x double> %ret
+}
+
+define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec2_i32:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_i32:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <2 x i32>, ptr %x acquire, align 4
+  ret <2 x i32> %ret
+}
+
+define <4 x float> @atomic_vec4_float_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec4_float_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)

2025-04-30 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/125432

>From abb646bbd716e2b12e10961c80c28dcfde8d66f5 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 31 Jan 2025 13:12:56 -0500
Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic
 vector.

After splitting, all elements are created. The elements are
placed back into a concat_vectors. This change extends EltsFromConsecutiveLoads
to understand AtomicSDNode so that its concat_vectors can be
mapped to a BUILD_VECTOR and so unused elements are no longer
referenced.

commit-id:b83937a8
---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |   4 +-
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp |  20 ++-
 .../SelectionDAGAddressAnalysis.cpp   |  30 ++--
 .../SelectionDAG/SelectionDAGBuilder.cpp  |   6 +-
 llvm/lib/Target/X86/X86ISelLowering.cpp   |  29 +--
 llvm/test/CodeGen/X86/atomic-load-store.ll| 167 ++
 6 files changed, 69 insertions(+), 187 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index ba11ddbb5b731..d3cd81c146280 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1843,7 +1843,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will 
have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
@@ -2281,7 +2281,7 @@ class SelectionDAG {
   /// merged. Check that both are nonvolatile and if LD is loading
   /// 'Bytes' bytes from a location that is 'Dist' units away from the
   /// location that the 'Base' load is loading from.
-  bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base,
+  bool areNonVolatileConsecutiveLoads(MemSDNode *LD, MemSDNode *Base,
   unsigned Bytes, int Dist) const;
 
   /// Infer alignment of a load / store address. Return std::nullopt if it
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index d9d6d43c42430..e4d7e176c51e4 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -12213,7 +12213,7 @@ SDValue 
SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain,
   return TokenFactor;
 }
 
-SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad,
+SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad,
SDValue NewMemOp) {
   assert(isa(NewMemOp.getNode()) && "Expected a memop node");
   SDValue OldChain = SDValue(OldLoad, 1);
@@ -12906,17 +12906,21 @@ std::pair 
SelectionDAG::UnrollVectorOverflowOp(
 getBuildVector(NewOvVT, dl, OvScalars));
 }
 
-bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD,
-  LoadSDNode *Base,
+bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD,
+  MemSDNode *Base,
   unsigned Bytes,
   int Dist) const {
   if (LD->isVolatile() || Base->isVolatile())
 return false;
-  // TODO: probably too restrictive for atomics, revisit
-  if (!LD->isSimple())
-return false;
-  if (LD->isIndexed() || Base->isIndexed())
-return false;
+  if (auto Ld = dyn_cast(LD)) {
+if (!Ld->isSimple())
+  return false;
+if (Ld->isIndexed())
+  return false;
+  }
+  if (auto Ld = dyn_cast(Base))
+if (Ld->isIndexed())
+  return false;
   if (LD->getChain() != Base->getChain())
 return false;
   EVT VT = LD->getMemoryVT();
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
index f2ab88851b780..c29cb424c7a4c 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
@@ -195,8 +195,8 @@ bool BaseIndexOffset::contains(const SelectionDAG &DAG, 
int64_t BitSize,
 }
 
 /// Parses tree in Ptr for base, index, offset addresses.
-static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
-   const SelectionDAG &DAG) {
+template 
+static BaseIndexOffset matchSDNode(const T *N, const SelectionDAG &DAG) {
   SDValue Ptr = N->getBasePtr();
 
   // (((B + I*M) + c)) + c ...
@@ -206,16 +206,18 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
   bool IsIndexSignExt = false;
 
   // pre-inc/pre-dec ops are components of EA.
-  if (N->get

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-04-30 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From 855430ef05b0a8cfe6696ec62f827cab6d663655 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 20 +-
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 30 +
 .../X86/expand-atomic-non-integer.ll  | 65 +++
 4 files changed, 163 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index c376de877ac7d..b6f1e9db9ce35 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2066,9 +2066,23 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's scalar type to I's  vector type
+  if (I->getType()->getScalarType()->isPointerTy() &&
+  I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) {
+unsigned AS =
+
cast(I->getType()->getScalarType())->getAddressSpace();
+ElementCount EC = cast(I->getType())->getElementCount();
+Value *BC = Builder.CreateBitCast(
+Result,
+VectorType::get(IntegerType::get(Ctx, DL.getPointerSizeInBits(AS)),
+EC));
+Value *IntToPtr = Builder.CreateIntToPtr(
+BC, VectorType::get(PointerType::get(Ctx, AS), EC));
+V = Builder.CreateBitOrPointerCast(IntToPtr, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29..36c1305a7c5df 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 08d0405345f57..4293df8c13571 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -371,6 +371,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:popq %rax
+; CHECK-NEXT:retq
+  %ret = load atomic <2 x ptr>, ptr %x acquire, align 16
+  ret <2 x ptr> %ret
+}
+
 define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind {
 ; CHECK3-LAB

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-04-30 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120598

>From 5bc5d3291ad26fa3af646e45bd5d491d03588d98 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 11:19:39 -0500
Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic
 load

Vector types of 2 elements must be widened. This change does this
for vector types of atomic load in SelectionDAG
so that it can translate aligned vectors of >1 size. Also,
it also adds Pats to remove an extra MOV.

commit-id:2894ccd1
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 104 ++
 llvm/lib/Target/X86/X86InstrCompiler.td   |   7 ++
 llvm/test/CodeGen/X86/atomic-load-store.ll|  81 ++
 llvm/test/CodeGen/X86/atomic-unordered.ll |   3 +-
 5 files changed, 173 insertions(+), 23 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 89ea7ef4dbe89..bdfa5f7741ad3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -1062,6 +1062,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N);
   SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N);
   SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N);
+  SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue WidenVecRes_LOAD(SDNode* N);
   SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N);
   SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 5126d597161d4..43710b77b763b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -4590,6 +4590,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, 
unsigned ResNo) {
 break;
   case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break;
   case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+Res = WidenVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:  Res = WidenVecRes_LOAD(N); break;
   case ISD::STEP_VECTOR:
   case ISD::SPLAT_VECTOR:
@@ -5979,6 +5982,85 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) {
  N->getOperand(1), N->getOperand(2));
 }
 
+/// Either return the same load or provide appropriate casts
+/// from the load and return that.
+static SDValue loadElement(SDValue LdOp, EVT FirstVT, EVT WidenVT,
+   TypeSize LdWidth, TypeSize FirstVTWidth, SDLoc dl,
+   SelectionDAG &DAG) {
+  assert(TypeSize::isKnownLE(LdWidth, FirstVTWidth));
+  TypeSize WidenWidth = WidenVT.getSizeInBits();
+  if (!FirstVT.isVector()) {
+unsigned NumElts =
+WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue();
+EVT NewVecVT = EVT::getVectorVT(*DAG.getContext(), FirstVT, NumElts);
+SDValue VecOp = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, NewVecVT, LdOp);
+return DAG.getNode(ISD::BITCAST, dl, WidenVT, VecOp);
+  } else if (FirstVT == WidenVT)
+return LdOp;
+  else {
+// TODO: We don't currently have any tests that exercise this code path.
+assert(WidenWidth.getFixedValue() % FirstVTWidth.getFixedValue() == 0);
+unsigned NumConcat =
+WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue();
+SmallVector ConcatOps(NumConcat);
+SDValue UndefVal = DAG.getUNDEF(FirstVT);
+ConcatOps[0] = LdOp;
+for (unsigned i = 1; i != NumConcat; ++i)
+  ConcatOps[i] = UndefVal;
+return DAG.getNode(ISD::CONCAT_VECTORS, dl, WidenVT, ConcatOps);
+  }
+}
+
+static std::optional findMemType(SelectionDAG &DAG,
+  const TargetLowering &TLI, unsigned 
Width,
+  EVT WidenVT, unsigned Align,
+  unsigned WidenEx);
+
+SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *LD) {
+  EVT WidenVT =
+  TLI.getTypeToTransformTo(*DAG.getContext(), LD->getValueType(0));
+  EVT LdVT = LD->getMemoryVT();
+  SDLoc dl(LD);
+  assert(LdVT.isVector() && WidenVT.isVector() && "Expected vectors");
+  assert(LdVT.isScalableVector() == WidenVT.isScalableVector() &&
+ "Must be scalable");
+  assert(LdVT.getVectorElementType() == WidenVT.getVectorElementType() &&
+ "Expected equivalent element types");
+
+  // Load information
+  SDValue Chain = LD->getChain();
+  SDValue BasePtr = LD->getBasePtr();
+  MachineMemOperand::Flags MMOFlags = LD->getMemOperand()->getFlags();
+  AAMDNodes AAInfo = LD->getAAInfo();
+
+  TypeSize LdWidth = LdVT.getSizeInBits();
+  TypeSize WidenWidth = WidenVT.getSizeInBits();
+  TypeSize WidthDiff = WidenWidth - LdWidth;
+
+  // Find the vector type that can load from.
+  std::optional FirstVT =
+ 

[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)

2025-04-30 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120386

>From 0824a27d19d3a26eaf9d1ce0b04cf09dbf3cab4b Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:38:23 -0500
Subject: [PATCH] [X86] Manage atomic load of fp -> int promotion in DAG

When lowering atomic <1 x T> vector types with floats, selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.

commit-id:f9d761c5
---
 llvm/lib/Target/X86/X86ISelLowering.cpp|  4 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll | 37 ++
 2 files changed, 41 insertions(+)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 483aceb239b0c..c48344332d707 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -2651,6 +2651,10 @@ X86TargetLowering::X86TargetLowering(const 
X86TargetMachine &TM,
 setOperationAction(Op, MVT::f32, Promote);
   }
 
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f16, MVT::i16);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f32, MVT::i32);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f64, MVT::i64);
+
   // We have target-specific dag combine patterns for the following nodes:
   setTargetDAGCombine({ISD::VECTOR_SHUFFLE,
ISD::SCALAR_TO_VECTOR,
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index d23cfb89f9fc8..6efcbb80c0ce6 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -145,3 +145,40 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   %ret = load atomic <1 x i64>, ptr %x acquire, align 8
   ret <1 x i64> %ret
 }
+
+define <1 x half> @atomic_vec1_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %cx
+; CHECK0-NEXT:## implicit-def: $eax
+; CHECK0-NEXT:movw %cx, %ax
+; CHECK0-NEXT:## implicit-def: $xmm0
+; CHECK0-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x half>, ptr %x acquire, align 2
+  ret <1 x half> %ret
+}
+
+define <1 x float> @atomic_vec1_float(ptr %x) {
+; CHECK-LABEL: atomic_vec1_float:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x float>, ptr %x acquire, align 4
+  ret <1 x float> %ret
+}
+
+define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec1_double_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 8
+  ret <1 x double> %ret
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-04-30 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From 855430ef05b0a8cfe6696ec62f827cab6d663655 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 20 +-
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 30 +
 .../X86/expand-atomic-non-integer.ll  | 65 +++
 4 files changed, 163 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index c376de877ac7d..b6f1e9db9ce35 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2066,9 +2066,23 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's scalar type to I's  vector type
+  if (I->getType()->getScalarType()->isPointerTy() &&
+  I->getType()->isVectorTy() && !Result->getType()->isVectorTy()) {
+unsigned AS =
+
cast(I->getType()->getScalarType())->getAddressSpace();
+ElementCount EC = cast(I->getType())->getElementCount();
+Value *BC = Builder.CreateBitCast(
+Result,
+VectorType::get(IntegerType::get(Ctx, DL.getPointerSizeInBits(AS)),
+EC));
+Value *IntToPtr = Builder.CreateIntToPtr(
+BC, VectorType::get(PointerType::get(Ctx, AS), EC));
+V = Builder.CreateBitOrPointerCast(IntToPtr, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29..36c1305a7c5df 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 08d0405345f57..4293df8c13571 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -371,6 +371,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:popq %rax
+; CHECK-NEXT:retq
+  %ret = load atomic <2 x ptr>, ptr %x acquire, align 16
+  ret <2 x ptr> %ret
+}
+
 define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind {
 ; CHECK3-LAB

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Widen <2 x T> vector types for atomic load (PR #120598)

2025-04-30 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120598

>From 5bc5d3291ad26fa3af646e45bd5d491d03588d98 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 11:19:39 -0500
Subject: [PATCH] [SelectionDAG][X86] Widen <2 x T> vector types for atomic
 load

Vector types of 2 elements must be widened. This change does this
for vector types of atomic load in SelectionDAG
so that it can translate aligned vectors of >1 size. Also,
it also adds Pats to remove an extra MOV.

commit-id:2894ccd1
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 104 ++
 llvm/lib/Target/X86/X86InstrCompiler.td   |   7 ++
 llvm/test/CodeGen/X86/atomic-load-store.ll|  81 ++
 llvm/test/CodeGen/X86/atomic-unordered.ll |   3 +-
 5 files changed, 173 insertions(+), 23 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 89ea7ef4dbe89..bdfa5f7741ad3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -1062,6 +1062,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N);
   SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N);
   SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N);
+  SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue WidenVecRes_LOAD(SDNode* N);
   SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N);
   SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 5126d597161d4..43710b77b763b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -4590,6 +4590,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, 
unsigned ResNo) {
 break;
   case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break;
   case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+Res = WidenVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:  Res = WidenVecRes_LOAD(N); break;
   case ISD::STEP_VECTOR:
   case ISD::SPLAT_VECTOR:
@@ -5979,6 +5982,85 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) {
  N->getOperand(1), N->getOperand(2));
 }
 
+/// Either return the same load or provide appropriate casts
+/// from the load and return that.
+static SDValue loadElement(SDValue LdOp, EVT FirstVT, EVT WidenVT,
+   TypeSize LdWidth, TypeSize FirstVTWidth, SDLoc dl,
+   SelectionDAG &DAG) {
+  assert(TypeSize::isKnownLE(LdWidth, FirstVTWidth));
+  TypeSize WidenWidth = WidenVT.getSizeInBits();
+  if (!FirstVT.isVector()) {
+unsigned NumElts =
+WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue();
+EVT NewVecVT = EVT::getVectorVT(*DAG.getContext(), FirstVT, NumElts);
+SDValue VecOp = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, NewVecVT, LdOp);
+return DAG.getNode(ISD::BITCAST, dl, WidenVT, VecOp);
+  } else if (FirstVT == WidenVT)
+return LdOp;
+  else {
+// TODO: We don't currently have any tests that exercise this code path.
+assert(WidenWidth.getFixedValue() % FirstVTWidth.getFixedValue() == 0);
+unsigned NumConcat =
+WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue();
+SmallVector ConcatOps(NumConcat);
+SDValue UndefVal = DAG.getUNDEF(FirstVT);
+ConcatOps[0] = LdOp;
+for (unsigned i = 1; i != NumConcat; ++i)
+  ConcatOps[i] = UndefVal;
+return DAG.getNode(ISD::CONCAT_VECTORS, dl, WidenVT, ConcatOps);
+  }
+}
+
+static std::optional findMemType(SelectionDAG &DAG,
+  const TargetLowering &TLI, unsigned 
Width,
+  EVT WidenVT, unsigned Align,
+  unsigned WidenEx);
+
+SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *LD) {
+  EVT WidenVT =
+  TLI.getTypeToTransformTo(*DAG.getContext(), LD->getValueType(0));
+  EVT LdVT = LD->getMemoryVT();
+  SDLoc dl(LD);
+  assert(LdVT.isVector() && WidenVT.isVector() && "Expected vectors");
+  assert(LdVT.isScalableVector() == WidenVT.isScalableVector() &&
+ "Must be scalable");
+  assert(LdVT.getVectorElementType() == WidenVT.getVectorElementType() &&
+ "Expected equivalent element types");
+
+  // Load information
+  SDValue Chain = LD->getChain();
+  SDValue BasePtr = LD->getBasePtr();
+  MachineMemOperand::Flags MMOFlags = LD->getMemOperand()->getFlags();
+  AAMDNodes AAInfo = LD->getAAInfo();
+
+  TypeSize LdWidth = LdVT.getSizeInBits();
+  TypeSize WidenWidth = WidenVT.getSizeInBits();
+  TypeSize WidthDiff = WidenWidth - LdWidth;
+
+  // Find the vector type that can load from.
+  std::optional FirstVT =
+ 

[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)

2025-04-30 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120385

>From 4a47d3fd2f29c66c2230f40d7cca8fdbd1238abd Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:37:17 -0500
Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load

`load atomic <1 x T>` is not valid. This change legalizes
vector types of atomic load via scalarization in SelectionDAG
so that it can, for example, translate from `v1i32` to `i32`.

commit-id:5c36cc8c
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  15 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +-
 3 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 720393158aa5e..89ea7ef4dbe89 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -874,6 +874,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N);
   SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N);
   SDValue ScalarizeVecRes_LOAD(LoadSDNode *N);
+  SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N);
   SDValue ScalarizeVecRes_VSELECT(SDNode *N);
   SDValue ScalarizeVecRes_SELECT(SDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index a01e1cff74564..5126d597161d4 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, 
unsigned ResNo) {
 R = ScalarizeVecRes_UnaryOpWithExtraInput(N);
 break;
   case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+R = ScalarizeVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:   R = 
ScalarizeVecRes_LOAD(cast(N));break;
   case ISD::SCALAR_TO_VECTOR:  R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break;
   case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break;
@@ -458,6 +461,18 @@ SDValue 
DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) {
   return Op;
 }
 
+SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SDValue Result = DAG.getAtomicLoad(
+  ISD::NON_EXTLOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(),
+  N->getValueType(0).getVectorElementType(), N->getChain(), 
N->getBasePtr(),
+  N->getMemOperand());
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(N, 1), Result.getValue(1));
+  return Result;
+}
+
 SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) {
   assert(N->isUnindexed() && "Indexed vector load?");
 
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 5bce4401f7bdb..d23cfb89f9fc8 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s --check-prefixes=CHECK,CHECK3
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s --check-prefixes=CHECK,CHECK0
 
 define void @test1(ptr %ptr, i32 %val1) {
 ; CHECK-LABEL: test1:
@@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) {
   %val = load atomic i32, ptr %ptr seq_cst, align 4
   ret i32 %val
 }
+
+define <1 x i32> @atomic_vec1_i32(ptr %x) {
+; CHECK-LABEL: atomic_vec1_i32:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movl (%rdi), %eax
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x i32>, ptr %x acquire, align 4
+  ret <1 x i32> %ret
+}
+
+define <1 x i8> @atomic_vec1_i8(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i8:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzbl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i8:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movb (%rdi), %al
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i8>, ptr %x acquire, align 1
+  ret <1 x i8> %ret
+}
+
+define <1 x i16> @atomic_vec1_i16(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i16:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i16:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %ax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i16>, ptr %x acquire, align 2
+  ret <1 x i16> %ret
+}
+
+define <1 x i32> @atomic_vec1_i8_zext(ptr %x) {
+; CHECK3-LABEL: atomic_ve

[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)

2025-04-30 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120385

>From 4a47d3fd2f29c66c2230f40d7cca8fdbd1238abd Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:37:17 -0500
Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load

`load atomic <1 x T>` is not valid. This change legalizes
vector types of atomic load via scalarization in SelectionDAG
so that it can, for example, translate from `v1i32` to `i32`.

commit-id:5c36cc8c
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  15 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +-
 3 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 720393158aa5e..89ea7ef4dbe89 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -874,6 +874,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N);
   SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N);
   SDValue ScalarizeVecRes_LOAD(LoadSDNode *N);
+  SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N);
   SDValue ScalarizeVecRes_VSELECT(SDNode *N);
   SDValue ScalarizeVecRes_SELECT(SDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index a01e1cff74564..5126d597161d4 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, 
unsigned ResNo) {
 R = ScalarizeVecRes_UnaryOpWithExtraInput(N);
 break;
   case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+R = ScalarizeVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:   R = 
ScalarizeVecRes_LOAD(cast(N));break;
   case ISD::SCALAR_TO_VECTOR:  R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break;
   case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break;
@@ -458,6 +461,18 @@ SDValue 
DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) {
   return Op;
 }
 
+SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SDValue Result = DAG.getAtomicLoad(
+  ISD::NON_EXTLOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(),
+  N->getValueType(0).getVectorElementType(), N->getChain(), 
N->getBasePtr(),
+  N->getMemOperand());
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(N, 1), Result.getValue(1));
+  return Result;
+}
+
 SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) {
   assert(N->isUnindexed() && "Indexed vector load?");
 
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 5bce4401f7bdb..d23cfb89f9fc8 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s --check-prefixes=CHECK,CHECK3
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s --check-prefixes=CHECK,CHECK0
 
 define void @test1(ptr %ptr, i32 %val1) {
 ; CHECK-LABEL: test1:
@@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) {
   %val = load atomic i32, ptr %ptr seq_cst, align 4
   ret i32 %val
 }
+
+define <1 x i32> @atomic_vec1_i32(ptr %x) {
+; CHECK-LABEL: atomic_vec1_i32:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movl (%rdi), %eax
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x i32>, ptr %x acquire, align 4
+  ret <1 x i32> %ret
+}
+
+define <1 x i8> @atomic_vec1_i8(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i8:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzbl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i8:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movb (%rdi), %al
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i8>, ptr %x acquire, align 1
+  ret <1 x i8> %ret
+}
+
+define <1 x i16> @atomic_vec1_i16(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i16:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i16:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %ax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i16>, ptr %x acquire, align 2
+  ret <1 x i16> %ret
+}
+
+define <1 x i32> @atomic_vec1_i8_zext(ptr %x) {
+; CHECK3-LABEL: atomic_ve

[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)

2025-04-30 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120386

>From 0824a27d19d3a26eaf9d1ce0b04cf09dbf3cab4b Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:38:23 -0500
Subject: [PATCH] [X86] Manage atomic load of fp -> int promotion in DAG

When lowering atomic <1 x T> vector types with floats, selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.

commit-id:f9d761c5
---
 llvm/lib/Target/X86/X86ISelLowering.cpp|  4 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll | 37 ++
 2 files changed, 41 insertions(+)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 483aceb239b0c..c48344332d707 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -2651,6 +2651,10 @@ X86TargetLowering::X86TargetLowering(const 
X86TargetMachine &TM,
 setOperationAction(Op, MVT::f32, Promote);
   }
 
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f16, MVT::i16);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f32, MVT::i32);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f64, MVT::i64);
+
   // We have target-specific dag combine patterns for the following nodes:
   setTargetDAGCombine({ISD::VECTOR_SHUFFLE,
ISD::SCALAR_TO_VECTOR,
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index d23cfb89f9fc8..6efcbb80c0ce6 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -145,3 +145,40 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   %ret = load atomic <1 x i64>, ptr %x acquire, align 8
   ret <1 x i64> %ret
 }
+
+define <1 x half> @atomic_vec1_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %cx
+; CHECK0-NEXT:## implicit-def: $eax
+; CHECK0-NEXT:movw %cx, %ax
+; CHECK0-NEXT:## implicit-def: $xmm0
+; CHECK0-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x half>, ptr %x acquire, align 2
+  ret <1 x half> %ret
+}
+
+define <1 x float> @atomic_vec1_float(ptr %x) {
+; CHECK-LABEL: atomic_vec1_float:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x float>, ptr %x acquire, align 4
+  ret <1 x float> %ret
+}
+
+define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec1_double_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 8
+  ret <1 x double> %ret
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AutoUpgrade][AMDGPU] Adjust AS7 address width to 48 bits (PR #137418)

2025-04-30 Thread Krzysztof Drewniak via llvm-branch-commits

krzysz00 wrote:

>  I think we just have to pretend that virtual addresses or segment-relative 
> addresses are the only thing that matters.

This is why I've been advocating for the index-based definition of ptrtoaddr. 
So there's no need to do this 48-bit thing on the assumption that the base of a 
buffer fat pointer is "aligned enough" (in most contrxts where I've seen this 
the start point is some sort of allocation that'll definitely be align(16) or 
better)

https://github.com/llvm/llvm-project/pull/137418
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [HLSL] Implementation of DXILResourceImplicitBinding pass (PR #138043)

2025-04-30 Thread Helena Kotas via llvm-branch-commits

https://github.com/hekota created 
https://github.com/llvm/llvm-project/pull/138043

The `DXILResourceImplicitBinding` pass uses the results of 
`DXILResourceBindingAnalysis` to assigns register slots to resources that do 
not have explicit binding. It replaces all 
`llvm.dx.resource.handlefromimplicitbinding` calls with 
`llvm.dx.resource.handlefrombinding` using the newly assigned binding.

If a binding cannot be found for a resource, the pass will raise a diagnostic 
error. Currently this diagnostic message does not include the resource name, 
which will be addressed in a separate task (#137868). 

Part 2/2 of #136786
Closes #136786

>From f58e2d5c079f31d7a4d99d481a569ad4c754c4e7 Mon Sep 17 00:00:00 2001
From: Helena Kotas 
Date: Wed, 30 Apr 2025 15:08:04 -0700
Subject: [PATCH] [HLSL] Implementation of DXILResourceImplicitBinding pass

This pass takes advantage of the DXILResourceBinding analysis and assigns
register slots to resources that do not have explicit binding.

Part 2/2 of #136786

Closes #136786
---
 llvm/include/llvm/Analysis/DXILResource.h |   8 +
 llvm/include/llvm/InitializePasses.h  |   1 +
 llvm/lib/Analysis/Analysis.cpp|   1 +
 llvm/lib/Analysis/DXILResource.cpp|  53 +
 llvm/lib/Target/DirectX/CMakeLists.txt|   1 +
 .../DirectX/DXILResourceImplicitBinding.cpp   | 182 ++
 .../DirectX/DXILResourceImplicitBinding.h |  29 +++
 llvm/lib/Target/DirectX/DirectX.h |   6 +
 .../Target/DirectX/DirectXPassRegistry.def|   1 +
 .../Target/DirectX/DirectXTargetMachine.cpp   |   3 +
 .../CodeGen/DirectX/ImplicitBinding/arrays.ll |  41 
 .../ImplicitBinding/multiple-spaces.ll|  44 +
 .../CodeGen/DirectX/ImplicitBinding/simple.ll |  30 +++
 .../ImplicitBinding/unbounded-arrays-error.ll |  34 
 .../ImplicitBinding/unbounded-arrays.ll   |  42 
 llvm/test/CodeGen/DirectX/llc-pipeline.ll |   2 +
 16 files changed, 478 insertions(+)
 create mode 100644 llvm/lib/Target/DirectX/DXILResourceImplicitBinding.cpp
 create mode 100644 llvm/lib/Target/DirectX/DXILResourceImplicitBinding.h
 create mode 100644 llvm/test/CodeGen/DirectX/ImplicitBinding/arrays.ll
 create mode 100644 llvm/test/CodeGen/DirectX/ImplicitBinding/multiple-spaces.ll
 create mode 100644 llvm/test/CodeGen/DirectX/ImplicitBinding/simple.ll
 create mode 100644 
llvm/test/CodeGen/DirectX/ImplicitBinding/unbounded-arrays-error.ll
 create mode 100644 
llvm/test/CodeGen/DirectX/ImplicitBinding/unbounded-arrays.ll

diff --git a/llvm/include/llvm/Analysis/DXILResource.h 
b/llvm/include/llvm/Analysis/DXILResource.h
index 569f5346b9e36..fd34fdfda4c1c 100644
--- a/llvm/include/llvm/Analysis/DXILResource.h
+++ b/llvm/include/llvm/Analysis/DXILResource.h
@@ -632,12 +632,15 @@ class DXILResourceBindingInfo {
 RegisterSpace(uint32_t Space) : Space(Space) {
   FreeRanges.emplace_back(0, UINT32_MAX);
 }
+// Size == -1 means unbounded array
+bool findAvailableBinding(int32_t Size, uint32_t *RegSlot);
   };
 
   struct BindingSpaces {
 dxil::ResourceClass ResClass;
 llvm::SmallVector Spaces;
 BindingSpaces(dxil::ResourceClass ResClass) : ResClass(ResClass) {}
+RegisterSpace &getOrInsertSpace(uint32_t Space);
   };
 
 private:
@@ -658,6 +661,7 @@ class DXILResourceBindingInfo {
 OverlappingBinding(false) {}
 
   bool hasImplicitBinding() const { return ImplicitBinding; }
+  void setHasImplicitBinding(bool Value) { ImplicitBinding = Value; }
   bool hasOverlappingBinding() const { return OverlappingBinding; }
 
   BindingSpaces &getBindingSpaces(dxil::ResourceClass RC) {
@@ -673,6 +677,10 @@ class DXILResourceBindingInfo {
 }
   }
 
+  // Size == -1 means unbounded array
+  bool findAvailableBinding(dxil::ResourceClass RC, uint32_t Space,
+int32_t Size, uint32_t *RegSlot);
+
   friend class DXILResourceBindingAnalysis;
   friend class DXILResourceBindingWrapperPass;
 };
diff --git a/llvm/include/llvm/InitializePasses.h 
b/llvm/include/llvm/InitializePasses.h
index b84c6d6123d58..71db56d922676 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -85,6 +85,7 @@ void initializeDCELegacyPassPass(PassRegistry &);
 void initializeDXILMetadataAnalysisWrapperPassPass(PassRegistry &);
 void initializeDXILMetadataAnalysisWrapperPrinterPass(PassRegistry &);
 void initializeDXILResourceBindingWrapperPassPass(PassRegistry &);
+void initializeDXILResourceImplicitBindingLegacyPass(PassRegistry &);
 void initializeDXILResourceTypeWrapperPassPass(PassRegistry &);
 void initializeDXILResourceWrapperPassPass(PassRegistry &);
 void initializeDeadMachineInstructionElimPass(PassRegistry &);
diff --git a/llvm/lib/Analysis/Analysis.cpp b/llvm/lib/Analysis/Analysis.cpp
index 484a456f49f1b..9f5daf32be9a0 100644
--- a/llvm/lib/Analysis/Analysis.cpp
+++ b/llvm/lib/Analysis/Analysis.cpp
@@ -27,6 +27,7 @@ void llvm::initializeAnalysis(PassRegistry &Registry) {

[llvm-branch-commits] [llvm] [HLSL] Implementation of DXILResourceImplicitBinding pass (PR #138043)

2025-04-30 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-directx

Author: Helena Kotas (hekota)


Changes

The `DXILResourceImplicitBinding` pass uses the results of 
`DXILResourceBindingAnalysis` to assigns register slots to resources that do 
not have explicit binding. It replaces all 
`llvm.dx.resource.handlefromimplicitbinding` calls with 
`llvm.dx.resource.handlefrombinding` using the newly assigned binding.

If a binding cannot be found for a resource, the pass will raise a diagnostic 
error. Currently this diagnostic message does not include the resource name, 
which will be addressed in a separate task (#137868). 

Part 2/2 of #136786
Closes #136786

---

Patch is 26.40 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/138043.diff


16 Files Affected:

- (modified) llvm/include/llvm/Analysis/DXILResource.h (+8) 
- (modified) llvm/include/llvm/InitializePasses.h (+1) 
- (modified) llvm/lib/Analysis/Analysis.cpp (+1) 
- (modified) llvm/lib/Analysis/DXILResource.cpp (+53) 
- (modified) llvm/lib/Target/DirectX/CMakeLists.txt (+1) 
- (added) llvm/lib/Target/DirectX/DXILResourceImplicitBinding.cpp (+182) 
- (added) llvm/lib/Target/DirectX/DXILResourceImplicitBinding.h (+29) 
- (modified) llvm/lib/Target/DirectX/DirectX.h (+6) 
- (modified) llvm/lib/Target/DirectX/DirectXPassRegistry.def (+1) 
- (modified) llvm/lib/Target/DirectX/DirectXTargetMachine.cpp (+3) 
- (added) llvm/test/CodeGen/DirectX/ImplicitBinding/arrays.ll (+41) 
- (added) llvm/test/CodeGen/DirectX/ImplicitBinding/multiple-spaces.ll (+44) 
- (added) llvm/test/CodeGen/DirectX/ImplicitBinding/simple.ll (+30) 
- (added) llvm/test/CodeGen/DirectX/ImplicitBinding/unbounded-arrays-error.ll 
(+34) 
- (added) llvm/test/CodeGen/DirectX/ImplicitBinding/unbounded-arrays.ll (+42) 
- (modified) llvm/test/CodeGen/DirectX/llc-pipeline.ll (+2) 


``diff
diff --git a/llvm/include/llvm/Analysis/DXILResource.h 
b/llvm/include/llvm/Analysis/DXILResource.h
index 569f5346b9e36..fd34fdfda4c1c 100644
--- a/llvm/include/llvm/Analysis/DXILResource.h
+++ b/llvm/include/llvm/Analysis/DXILResource.h
@@ -632,12 +632,15 @@ class DXILResourceBindingInfo {
 RegisterSpace(uint32_t Space) : Space(Space) {
   FreeRanges.emplace_back(0, UINT32_MAX);
 }
+// Size == -1 means unbounded array
+bool findAvailableBinding(int32_t Size, uint32_t *RegSlot);
   };
 
   struct BindingSpaces {
 dxil::ResourceClass ResClass;
 llvm::SmallVector Spaces;
 BindingSpaces(dxil::ResourceClass ResClass) : ResClass(ResClass) {}
+RegisterSpace &getOrInsertSpace(uint32_t Space);
   };
 
 private:
@@ -658,6 +661,7 @@ class DXILResourceBindingInfo {
 OverlappingBinding(false) {}
 
   bool hasImplicitBinding() const { return ImplicitBinding; }
+  void setHasImplicitBinding(bool Value) { ImplicitBinding = Value; }
   bool hasOverlappingBinding() const { return OverlappingBinding; }
 
   BindingSpaces &getBindingSpaces(dxil::ResourceClass RC) {
@@ -673,6 +677,10 @@ class DXILResourceBindingInfo {
 }
   }
 
+  // Size == -1 means unbounded array
+  bool findAvailableBinding(dxil::ResourceClass RC, uint32_t Space,
+int32_t Size, uint32_t *RegSlot);
+
   friend class DXILResourceBindingAnalysis;
   friend class DXILResourceBindingWrapperPass;
 };
diff --git a/llvm/include/llvm/InitializePasses.h 
b/llvm/include/llvm/InitializePasses.h
index b84c6d6123d58..71db56d922676 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -85,6 +85,7 @@ void initializeDCELegacyPassPass(PassRegistry &);
 void initializeDXILMetadataAnalysisWrapperPassPass(PassRegistry &);
 void initializeDXILMetadataAnalysisWrapperPrinterPass(PassRegistry &);
 void initializeDXILResourceBindingWrapperPassPass(PassRegistry &);
+void initializeDXILResourceImplicitBindingLegacyPass(PassRegistry &);
 void initializeDXILResourceTypeWrapperPassPass(PassRegistry &);
 void initializeDXILResourceWrapperPassPass(PassRegistry &);
 void initializeDeadMachineInstructionElimPass(PassRegistry &);
diff --git a/llvm/lib/Analysis/Analysis.cpp b/llvm/lib/Analysis/Analysis.cpp
index 484a456f49f1b..9f5daf32be9a0 100644
--- a/llvm/lib/Analysis/Analysis.cpp
+++ b/llvm/lib/Analysis/Analysis.cpp
@@ -27,6 +27,7 @@ void llvm::initializeAnalysis(PassRegistry &Registry) {
   initializeCallGraphViewerPass(Registry);
   initializeCycleInfoWrapperPassPass(Registry);
   initializeDXILMetadataAnalysisWrapperPassPass(Registry);
+  initializeDXILResourceWrapperPassPass(Registry);
   initializeDXILResourceBindingWrapperPassPass(Registry);
   initializeDXILResourceTypeWrapperPassPass(Registry);
   initializeDXILResourceWrapperPassPass(Registry);
diff --git a/llvm/lib/Analysis/DXILResource.cpp 
b/llvm/lib/Analysis/DXILResource.cpp
index ce8e250a32ebe..34355c81e77b0 100644
--- a/llvm/lib/Analysis/DXILResource.cpp
+++ b/llvm/lib/Analysis/DXILResource.cpp
@@ -995,6 +995,59 @@ void DXILResourceBindingInfo::populate(Module &M, 
D

[llvm-branch-commits] [llvm] [HLSL] Implementation of DXILResourceImplicitBinding pass (PR #138043)

2025-04-30 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-analysis

Author: Helena Kotas (hekota)


Changes

The `DXILResourceImplicitBinding` pass uses the results of 
`DXILResourceBindingAnalysis` to assigns register slots to resources that do 
not have explicit binding. It replaces all 
`llvm.dx.resource.handlefromimplicitbinding` calls with 
`llvm.dx.resource.handlefrombinding` using the newly assigned binding.

If a binding cannot be found for a resource, the pass will raise a diagnostic 
error. Currently this diagnostic message does not include the resource name, 
which will be addressed in a separate task (#137868). 

Part 2/2 of #136786
Closes #136786

---

Patch is 26.40 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/138043.diff


16 Files Affected:

- (modified) llvm/include/llvm/Analysis/DXILResource.h (+8) 
- (modified) llvm/include/llvm/InitializePasses.h (+1) 
- (modified) llvm/lib/Analysis/Analysis.cpp (+1) 
- (modified) llvm/lib/Analysis/DXILResource.cpp (+53) 
- (modified) llvm/lib/Target/DirectX/CMakeLists.txt (+1) 
- (added) llvm/lib/Target/DirectX/DXILResourceImplicitBinding.cpp (+182) 
- (added) llvm/lib/Target/DirectX/DXILResourceImplicitBinding.h (+29) 
- (modified) llvm/lib/Target/DirectX/DirectX.h (+6) 
- (modified) llvm/lib/Target/DirectX/DirectXPassRegistry.def (+1) 
- (modified) llvm/lib/Target/DirectX/DirectXTargetMachine.cpp (+3) 
- (added) llvm/test/CodeGen/DirectX/ImplicitBinding/arrays.ll (+41) 
- (added) llvm/test/CodeGen/DirectX/ImplicitBinding/multiple-spaces.ll (+44) 
- (added) llvm/test/CodeGen/DirectX/ImplicitBinding/simple.ll (+30) 
- (added) llvm/test/CodeGen/DirectX/ImplicitBinding/unbounded-arrays-error.ll 
(+34) 
- (added) llvm/test/CodeGen/DirectX/ImplicitBinding/unbounded-arrays.ll (+42) 
- (modified) llvm/test/CodeGen/DirectX/llc-pipeline.ll (+2) 


``diff
diff --git a/llvm/include/llvm/Analysis/DXILResource.h 
b/llvm/include/llvm/Analysis/DXILResource.h
index 569f5346b9e36..fd34fdfda4c1c 100644
--- a/llvm/include/llvm/Analysis/DXILResource.h
+++ b/llvm/include/llvm/Analysis/DXILResource.h
@@ -632,12 +632,15 @@ class DXILResourceBindingInfo {
 RegisterSpace(uint32_t Space) : Space(Space) {
   FreeRanges.emplace_back(0, UINT32_MAX);
 }
+// Size == -1 means unbounded array
+bool findAvailableBinding(int32_t Size, uint32_t *RegSlot);
   };
 
   struct BindingSpaces {
 dxil::ResourceClass ResClass;
 llvm::SmallVector Spaces;
 BindingSpaces(dxil::ResourceClass ResClass) : ResClass(ResClass) {}
+RegisterSpace &getOrInsertSpace(uint32_t Space);
   };
 
 private:
@@ -658,6 +661,7 @@ class DXILResourceBindingInfo {
 OverlappingBinding(false) {}
 
   bool hasImplicitBinding() const { return ImplicitBinding; }
+  void setHasImplicitBinding(bool Value) { ImplicitBinding = Value; }
   bool hasOverlappingBinding() const { return OverlappingBinding; }
 
   BindingSpaces &getBindingSpaces(dxil::ResourceClass RC) {
@@ -673,6 +677,10 @@ class DXILResourceBindingInfo {
 }
   }
 
+  // Size == -1 means unbounded array
+  bool findAvailableBinding(dxil::ResourceClass RC, uint32_t Space,
+int32_t Size, uint32_t *RegSlot);
+
   friend class DXILResourceBindingAnalysis;
   friend class DXILResourceBindingWrapperPass;
 };
diff --git a/llvm/include/llvm/InitializePasses.h 
b/llvm/include/llvm/InitializePasses.h
index b84c6d6123d58..71db56d922676 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -85,6 +85,7 @@ void initializeDCELegacyPassPass(PassRegistry &);
 void initializeDXILMetadataAnalysisWrapperPassPass(PassRegistry &);
 void initializeDXILMetadataAnalysisWrapperPrinterPass(PassRegistry &);
 void initializeDXILResourceBindingWrapperPassPass(PassRegistry &);
+void initializeDXILResourceImplicitBindingLegacyPass(PassRegistry &);
 void initializeDXILResourceTypeWrapperPassPass(PassRegistry &);
 void initializeDXILResourceWrapperPassPass(PassRegistry &);
 void initializeDeadMachineInstructionElimPass(PassRegistry &);
diff --git a/llvm/lib/Analysis/Analysis.cpp b/llvm/lib/Analysis/Analysis.cpp
index 484a456f49f1b..9f5daf32be9a0 100644
--- a/llvm/lib/Analysis/Analysis.cpp
+++ b/llvm/lib/Analysis/Analysis.cpp
@@ -27,6 +27,7 @@ void llvm::initializeAnalysis(PassRegistry &Registry) {
   initializeCallGraphViewerPass(Registry);
   initializeCycleInfoWrapperPassPass(Registry);
   initializeDXILMetadataAnalysisWrapperPassPass(Registry);
+  initializeDXILResourceWrapperPassPass(Registry);
   initializeDXILResourceBindingWrapperPassPass(Registry);
   initializeDXILResourceTypeWrapperPassPass(Registry);
   initializeDXILResourceWrapperPassPass(Registry);
diff --git a/llvm/lib/Analysis/DXILResource.cpp 
b/llvm/lib/Analysis/DXILResource.cpp
index ce8e250a32ebe..34355c81e77b0 100644
--- a/llvm/lib/Analysis/DXILResource.cpp
+++ b/llvm/lib/Analysis/DXILResource.cpp
@@ -995,6 +995,59 @@ void DXILResourceBindingInfo::populate(Module &M, 
DXI

[llvm-branch-commits] [llvm] [HLSL] Implementation of DXILResourceImplicitBinding pass (PR #138043)

2025-04-30 Thread Helena Kotas via llvm-branch-commits


@@ -0,0 +1,44 @@
+; RUN: opt -S -dxil-resource-implicit-binding %s | FileCheck %s

hekota wrote:

I just realized this test is not complete... working on a fix.

https://github.com/llvm/llvm-project/pull/138043
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang-tools-extra] [clang-doc] Add Mustache template assets (PR #138059)

2025-04-30 Thread Paul Kirth via llvm-branch-commits

ilovepi wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/138059?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#138063** https://app.graphite.dev/github/pr/llvm/llvm-project/138063?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#138062** https://app.graphite.dev/github/pr/llvm/llvm-project/138062?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#138061** https://app.graphite.dev/github/pr/llvm/llvm-project/138061?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#138060** https://app.graphite.dev/github/pr/llvm/llvm-project/138060?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#138059** https://app.graphite.dev/github/pr/llvm/llvm-project/138059?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> ๐Ÿ‘ˆ https://app.graphite.dev/github/pr/llvm/llvm-project/138059?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#138058** https://app.graphite.dev/github/pr/llvm/llvm-project/138058?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/138059
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [llvm][IR] Treat memcmp and bcmp as libcalls (PR #135706)

2025-04-30 Thread Paul Kirth via llvm-branch-commits

ilovepi wrote:

@efriedma-quic We've been discussing the topic of LTOing libc quite a bit 
internally, and are currently sketching out how this could work. 
Unsurprisingly, there's quite a lot to think about in both the compiler and 
linker, and how the two combine in our different versions of LTO, and how that 
may break in new and fun ways. 

I was wondering if you'd be available to join the libc monthly meeting (5/8 9am 
PST) to discuss your take on the whole idea?  I'm not sure what time zone 
you're normally in, but I think there will be quite a number of folks who are 
interested in making LTO work well with libc's and LLVM libc in particular. I 
know a few of my team members would like to pick your brain on the subject, as 
we're sketching our ideas out. I can try to post a short summary of our 
thoughts either here or on discouse to make the discussion a bit easier as well.

https://github.com/llvm/llvm-project/pull/135706
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [HLSL][RootSignature] Add parsing for RootFlags (PR #138055)

2025-04-30 Thread Finn Plummer via llvm-branch-commits

https://github.com/inbelic created 
https://github.com/llvm/llvm-project/pull/138055

- defines the `RootFlags` in-memory enum
- defines `parseRootFlags` to parse the various flag enums into a single 
`uint32_t`
- adds corresponding unit tests

- improves the diagnostic message for when we provide a non-zero integer value 
to the flags

Resolves https://github.com/llvm/llvm-project/issues/126575



  



Rate limit ยท GitHub


  body {
background-color: #f6f8fa;
color: #24292e;
font-family: -apple-system,BlinkMacSystemFont,Segoe 
UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol;
font-size: 14px;
line-height: 1.5;
margin: 0;
  }

  .container { margin: 50px auto; max-width: 600px; text-align: center; 
padding: 0 24px; }

  a { color: #0366d6; text-decoration: none; }
  a:hover { text-decoration: underline; }

  h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; 
text-shadow: 0 1px 0 #fff; }
  p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

  ul { list-style: none; margin: 25px 0; padding: 0; }
  li { display: table-cell; font-weight: bold; width: 1%; }

  .logo { display: inline-block; margin-top: 35px; }
  .logo-img-2x { display: none; }
  @media
  only screen and (-webkit-min-device-pixel-ratio: 2),
  only screen and (   min--moz-device-pixel-ratio: 2),
  only screen and ( -o-min-device-pixel-ratio: 2/1),
  only screen and (min-device-pixel-ratio: 2),
  only screen and (min-resolution: 192dpi),
  only screen and (min-resolution: 2dppx) {
.logo-img-1x { display: none; }
.logo-img-2x { display: inline-block; }
  }

  #suggestions {
margin-top: 35px;
color: #ccc;
  }
  #suggestions a {
color: #66;
font-weight: 200;
font-size: 14px;
margin: 0 10px;
  }


  
  



  Whoa there!
  You have exceeded a secondary rate limit.
Please wait a few minutes before you try again;
in some cases this may take up to an hour.
  
  
https://support.github.com/contact";>Contact Support โ€”
https://githubstatus.com";>GitHub Status โ€”
https://twitter.com/githubstatus";>@githubstatus
  

  

  

  

  

  


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


  1   2   >