[llvm-branch-commits] [GlobalISel] Combiner: Observer-based DCE and retrying of combines (PR #102163)

2024-08-06 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson approved this pull request.

LGTM with nit.

https://github.com/llvm/llvm-project/pull/102163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [GlobalISel] Combiner: Observer-based DCE and retrying of combines (PR #102163)

2024-08-06 Thread Amara Emerson via llvm-branch-commits


@@ -45,61 +45,189 @@ cl::OptionCategory GICombinerOptionCategory(
 );
 } // end namespace llvm
 
-/// This class acts as the glue the joins the CombinerHelper to the overall
+/// This class acts as the glue that joins the CombinerHelper to the overall
 /// Combine algorithm. The CombinerHelper is intended to report the
 /// modifications it makes to the MIR to the GISelChangeObserver and the
-/// observer subclass will act on these events. In this case, instruction
-/// erasure will cancel any future visits to the erased instruction and
-/// instruction creation will schedule that instruction for a future visit.
-/// Other Combiner implementations may require more complex behaviour from
-/// their GISelChangeObserver subclass.
+/// observer subclass will act on these events.
 class Combiner::WorkListMaintainer : public GISelChangeObserver {
-  using WorkListTy = GISelWorkList<512>;
-  WorkListTy &WorkList;
+protected:
+#ifndef NDEBUG
   /// The instructions that have been created but we want to report once they
   /// have their operands. This is only maintained if debug output is 
requested.
-#ifndef NDEBUG
-  SetVector CreatedInstrs;
+  SmallSetVector CreatedInstrs;
 #endif
+  using Level = CombinerInfo::ObserverLevel;
 
 public:
-  WorkListMaintainer(WorkListTy &WorkList) : WorkList(WorkList) {}
+  static std::unique_ptr
+  create(Level Lvl, WorkListTy &WorkList, MachineRegisterInfo &MRI);
+
   virtual ~WorkListMaintainer() = default;
 
+  void reportFullyCreatedInstrs() {
+LLVM_DEBUG({
+  for (auto *MI : CreatedInstrs) {
+dbgs() << "Created: " << *MI;
+  }
+  CreatedInstrs.clear();
+});
+  }
+
+  virtual void reset() = 0;
+  virtual void appliedCombine() = 0;
+};
+
+/// A configurable WorkListMaintainer implementation.
+/// The ObserverLevel determines how the WorkListMaintainer reacts to MIR
+/// changes.
+template 
+class Combiner::WorkListMaintainerImpl : public Combiner::WorkListMaintainer {
+  WorkListTy &WorkList;
+  MachineRegisterInfo &MRI;
+
+  // Defer handling these instructions until the combine finishes.
+  SmallSetVector DeferList;
+
+  // Track VRegs that (might) have lost a use.
+  SmallSetVector LostUses;
+
+public:
+  WorkListMaintainerImpl(WorkListTy &WorkList, MachineRegisterInfo &MRI)
+  : WorkList(WorkList), MRI(MRI) {}
+
+  virtual ~WorkListMaintainerImpl() = default;
+
+  void reset() override {
+DeferList.clear();
+LostUses.clear();
+  }
+
   void erasingInstr(MachineInstr &MI) override {
-LLVM_DEBUG(dbgs() << "Erasing: " << MI << "\n");
+// MI will become dangling, remove it from all lists.
+LLVM_DEBUG(dbgs() << "Erasing: " << MI; CreatedInstrs.remove(&MI));
 WorkList.remove(&MI);
+if constexpr (Lvl != Level::Basic) {
+  DeferList.remove(&MI);
+  noteLostUses(MI);
+}
   }
+
   void createdInstr(MachineInstr &MI) override {
-LLVM_DEBUG(dbgs() << "Creating: " << MI << "\n");
-WorkList.insert(&MI);
-LLVM_DEBUG(CreatedInstrs.insert(&MI));
+LLVM_DEBUG(dbgs() << "Creating: " << MI; CreatedInstrs.insert(&MI));
+if constexpr (Lvl == Level::Basic)
+  WorkList.insert(&MI);
+else
+  // Defer handling newly created instructions, because they don't have
+  // operands yet. We also insert them into the WorkList in reverse
+  // order so that they will be combined top down.
+  DeferList.insert(&MI);
   }
+
   void changingInstr(MachineInstr &MI) override {
-LLVM_DEBUG(dbgs() << "Changing: " << MI << "\n");
-WorkList.insert(&MI);
+LLVM_DEBUG(dbgs() << "Changing: " << MI);
+// Some uses might get dropped when MI is changed.
+// For now, overapproximate by assuming all uses will be dropped.
+// TODO: Is a more precise heuristic or manual tracking of use count
+// decrements worth it?
+if constexpr (Lvl != Level::Basic)
+  noteLostUses(MI);
   }
+
   void changedInstr(MachineInstr &MI) override {
-LLVM_DEBUG(dbgs() << "Changed: " << MI << "\n");
-WorkList.insert(&MI);
+LLVM_DEBUG(dbgs() << "Changed: " << MI);
+if constexpr (Lvl == Level::Basic)
+  WorkList.insert(&MI);
+else
+  // Defer this for DCE
+  DeferList.insert(&MI);
   }
 
-  void reportFullyCreatedInstrs() {
-LLVM_DEBUG(for (const auto *MI
-: CreatedInstrs) {
-  dbgs() << "Created: ";
-  MI->print(dbgs());
-});
-LLVM_DEBUG(CreatedInstrs.clear());
+  // Only track changes during the combine and then walk the def/use-chains 
once
+  // the combine is finished, because:
+  // - instructions might have multiple defs during the combine.
+  // - use counts aren't accurate during the combine.
+  void appliedCombine() override {
+if constexpr (Lvl == Level::Basic)
+  return;
+
+// DCE deferred instructions and add them to the WorkList bottom up.
+while (!DeferList.empty()) {
+  MachineInstr &MI = *DeferList.pop_back_val();
+  if (tryDCE(MI, MRI))
+continue;
+
+  if const

[llvm-branch-commits] [GlobalISel] Combiner: Observer-based DCE and retrying of combines (PR #102163)

2024-08-06 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson edited 
https://github.com/llvm/llvm-project/pull/102163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AArch64][GlobalISel] Disable fixed-point iteration in all Combiners (PR #102167)

2024-08-06 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson approved this pull request.

These are some very nice improvements, thanks for working on this. None of the 
test output changes look to be exposing problems with this patch, so LGTM.

https://github.com/llvm/llvm-project/pull/102167
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [GlobalISel] Fix store merging incorrectly classifying an unknown index expr as 0. (#90375) (PR #90673)

2024-04-30 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson approved this pull request.


https://github.com/llvm/llvm-project/pull/90673
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [GlobalISel] Don't form anyextending atomic loads. (PR #90435)

2024-04-30 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson approved this pull request.


https://github.com/llvm/llvm-project/pull/90435
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT (PR #90827)

2024-05-02 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson approved this pull request.


https://github.com/llvm/llvm-project/pull/90827
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT (PR #90827)

2024-05-04 Thread Amara Emerson via llvm-branch-commits

aemerson wrote:

@tstellar It looks like this cherry-pick has a test failure, what's the 
recommended way to resolve this? Make a new PR or modify this one (if that's 
possible?)

https://github.com/llvm/llvm-project/pull/90827
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT (PR #90827)

2024-05-07 Thread Amara Emerson via llvm-branch-commits

aemerson wrote:

@nikic do you know the procedure here?

https://github.com/llvm/llvm-project/pull/90827
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT (PR #90827)

2024-05-09 Thread Amara Emerson via llvm-branch-commits

aemerson wrote:

> @aemerson Did you submit a new pull request with a fix?

I have not yet, will do so now...

https://github.com/llvm/llvm-project/pull/90827
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT - manual merge (PR #91672)

2024-05-09 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson edited 
https://github.com/llvm/llvm-project/pull/91672
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT - manual merge (PR #91672)

2024-05-09 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson edited 
https://github.com/llvm/llvm-project/pull/91672
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT - manual merge (PR #91672)

2024-05-09 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson ready_for_review 
https://github.com/llvm/llvm-project/pull/91672
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT (PR #90827)

2024-05-09 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson closed 
https://github.com/llvm/llvm-project/pull/90827
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT (PR #90827)

2024-05-09 Thread Amara Emerson via llvm-branch-commits

aemerson wrote:

New PR: https://github.com/llvm/llvm-project/pull/91672

https://github.com/llvm/llvm-project/pull/90827
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT - manual merge (PR #91672)

2024-05-09 Thread Amara Emerson via llvm-branch-commits

aemerson wrote:

Test has been changed from original commit due to a fallback in a G_BITCAST. 
Added abort=2 so we can see partial legalization and check no crash.

https://github.com/llvm/llvm-project/pull/91672
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT - manual merge (PR #91672)

2024-05-09 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson milestoned 
https://github.com/llvm/llvm-project/pull/91672
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT - manual merge (PR #91672)

2024-05-10 Thread Amara Emerson via llvm-branch-commits

aemerson wrote:

@tstellar could we merge this now?

https://github.com/llvm/llvm-project/pull/91672
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [compiler-rt] [llvm] [clang] [GlobalISel] Always direct-call IFuncs and Aliases (PR #74902)

2023-12-10 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson approved this pull request.

LGTM.

https://github.com/llvm/llvm-project/pull/74902
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Disable SVE paired ld1/st1 for callee-saves. (PR #107406)

2024-09-05 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson approved this pull request.


https://github.com/llvm/llvm-project/pull/107406
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [AArch64] Remove redundant COPY from loadRegFromStackSlot (#107396) (PR #107435)

2024-09-05 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson approved this pull request.


https://github.com/llvm/llvm-project/pull/107435
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [AArch64] Remove redundant COPY from loadRegFromStackSlot (#107396) (PR #107435)

2024-09-05 Thread Amara Emerson via llvm-branch-commits

aemerson wrote:

To justify this for the 19 release: this is easily triggered by small IR so we 
should take this.

https://github.com/llvm/llvm-project/pull/107435
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] bd64ad3 - Recommit "[AArch64][GlobalISel] Make G_USUBO legal and select it."

2021-01-22 Thread Amara Emerson via llvm-branch-commits

Author: Cassie Jones
Date: 2021-01-22T17:29:54-08:00
New Revision: bd64ad3fe17506933ac2971dcc900271d6ae5969

URL: 
https://github.com/llvm/llvm-project/commit/bd64ad3fe17506933ac2971dcc900271d6ae5969
DIFF: 
https://github.com/llvm/llvm-project/commit/bd64ad3fe17506933ac2971dcc900271d6ae5969.diff

LOG: Recommit "[AArch64][GlobalISel] Make G_USUBO legal and select it."

The expansion for wide subtractions includes G_USUBO.

Differential Revision: https://reviews.llvm.org/D95032

This was miscompiling on ubsan bots.

Added: 
llvm/test/CodeGen/AArch64/GlobalISel/select-saddo.mir
llvm/test/CodeGen/AArch64/GlobalISel/select-ssubo.mir
llvm/test/CodeGen/AArch64/GlobalISel/select-usubo.mir

Modified: 
llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir
llvm/test/CodeGen/AArch64/GlobalISel/legalize-sub.mir
llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir

Removed: 




diff  --git a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp 
b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
index 9619bb43ae9c..5259f4f5a4d0 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
@@ -2745,7 +2745,8 @@ bool AArch64InstructionSelector::select(MachineInstr &I) {
   }
   case TargetOpcode::G_SADDO:
   case TargetOpcode::G_UADDO:
-  case TargetOpcode::G_SSUBO: {
+  case TargetOpcode::G_SSUBO:
+  case TargetOpcode::G_USUBO: {
 // Emit the operation and get the correct condition code.
 MachineIRBuilder MIRBuilder(I);
 auto OpAndCC = emitOverflowOp(Opcode, I.getOperand(0).getReg(),
@@ -4376,6 +4377,8 @@ AArch64InstructionSelector::emitOverflowOp(unsigned 
Opcode, Register Dst,
 return std::make_pair(emitADDS(Dst, LHS, RHS, MIRBuilder), AArch64CC::HS);
   case TargetOpcode::G_SSUBO:
 return std::make_pair(emitSUBS(Dst, LHS, RHS, MIRBuilder), AArch64CC::VS);
+  case TargetOpcode::G_USUBO:
+return std::make_pair(emitSUBS(Dst, LHS, RHS, MIRBuilder), AArch64CC::LO);
   }
 }
 

diff  --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp 
b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index cc7aada211bb..5a6c904e3f5d 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -165,7 +165,8 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const 
AArch64Subtarget &ST)
 
   getActionDefinitionsBuilder({G_SMULH, G_UMULH}).legalFor({s32, s64});
 
-  getActionDefinitionsBuilder({G_UADDE, G_USUBE, G_SADDO, G_SSUBO, G_UADDO})
+  getActionDefinitionsBuilder(
+  {G_UADDE, G_USUBE, G_SADDO, G_SSUBO, G_UADDO, G_USUBO})
   .legalFor({{s32, s1}, {s64, s1}})
   .minScalar(0, s32);
 

diff  --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir 
b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir
index ab8510bf9d92..4f97d153d28b 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir
@@ -73,6 +73,44 @@ body: |
 %5:_(s64) = G_ANYEXT %4(s8)
 $x0 = COPY %5(s64)
 
+...
+---
+name:test_scalar_uaddo_32
+body: |
+  bb.0.entry:
+; CHECK-LABEL: name: test_scalar_uaddo_32
+; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
+; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $w1
+; CHECK: [[UADDO:%[0-9]+]]:_(s32), [[UADDO1:%[0-9]+]]:_(s1) = G_UADDO 
[[COPY]], [[COPY1]]
+; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[UADDO1]](s1)
+; CHECK: $w0 = COPY [[UADDO]](s32)
+; CHECK: $w1 = COPY [[ANYEXT]](s32)
+%0:_(s32) = COPY $w0
+%1:_(s32) = COPY $w1
+%2:_(s32), %3:_(s1) = G_UADDO %0, %1
+%4:_(s32) = G_ANYEXT %3
+$w0 = COPY %2(s32)
+$w1 = COPY %4(s32)
+
+...
+---
+name:test_scalar_saddo_32
+body: |
+  bb.0.entry:
+; CHECK-LABEL: name: test_scalar_saddo_32
+; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
+; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $w1
+; CHECK: [[SADDO:%[0-9]+]]:_(s32), [[SADDO1:%[0-9]+]]:_(s1) = G_SADDO 
[[COPY]], [[COPY1]]
+; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[SADDO1]](s1)
+; CHECK: $w0 = COPY [[SADDO]](s32)
+; CHECK: $w1 = COPY [[ANYEXT]](s32)
+%0:_(s32) = COPY $w0
+%1:_(s32) = COPY $w1
+%2:_(s32), %3:_(s1) = G_SADDO %0, %1
+%4:_(s32) = G_ANYEXT %3
+$w0 = COPY %2(s32)
+$w1 = COPY %4(s32)
+
 ...
 ---
 name:test_vector_add

diff  --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-sub.mir 
b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-sub.mir
index 32796e0948cc..b372a32eb7fc 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-sub.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-sub.mir
@@ -1,6 +1,59 @@
 # NOTE: As

[llvm-branch-commits] [llvm] fa0971b - GlobalISel: check type size before getZExtValue()ing it.

2021-04-09 Thread Amara Emerson via llvm-branch-commits

Author: Tim Northover
Date: 2021-04-09T11:19:39-07:00
New Revision: fa0971b87fb2c9d14d1bba2551e61f02f18f329b

URL: 
https://github.com/llvm/llvm-project/commit/fa0971b87fb2c9d14d1bba2551e61f02f18f329b
DIFF: 
https://github.com/llvm/llvm-project/commit/fa0971b87fb2c9d14d1bba2551e61f02f18f329b.diff

LOG: GlobalISel: check type size before getZExtValue()ing it.

Otherwise getZExtValue() asserts.

(cherry picked from commit c2b322fc19e829162ed4c7dcd04d9e9b2cd4e66c)

Added: 
llvm/test/CodeGen/AArch64/GlobalISel/huge-switch.ll

Modified: 
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp

Removed: 




diff  --git a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp 
b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
index b97c369b832da..b7883cbc3120f 100644
--- a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
@@ -840,9 +840,8 @@ void IRTranslator::emitSwitchCase(SwitchCG::CaseBlock &CB,
 // For conditional branch lowering, we might try to do something silly like
 // emit an G_ICMP to compare an existing G_ICMP i1 result with true. If so,
 // just re-use the existing condition vreg.
-if (CI && CI->getZExtValue() == 1 &&
-MRI->getType(CondLHS).getSizeInBits() == 1 &&
-CB.PredInfo.Pred == CmpInst::ICMP_EQ) {
+if (MRI->getType(CondLHS).getSizeInBits() == 1 && CI &&
+CI->getZExtValue() == 1 && CB.PredInfo.Pred == CmpInst::ICMP_EQ) {
   Cond = CondLHS;
 } else {
   Register CondRHS = getOrCreateVReg(*CB.CmpRHS);

diff  --git a/llvm/test/CodeGen/AArch64/GlobalISel/huge-switch.ll 
b/llvm/test/CodeGen/AArch64/GlobalISel/huge-switch.ll
new file mode 100644
index 0..8742a848c4af1
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/huge-switch.ll
@@ -0,0 +1,22 @@
+; RUN: llc -mtriple=arm64-apple-ios %s -o - -O0 -global-isel=1 | FileCheck %s
+define void @foo(i512 %in) {
+; CHECK-LABEL: foo:
+; CHECK: cbz
+  switch i512 %in, label %default [
+i512 3923188584616675477397368389504791510063972152790021570560, label %l1
+i512 3923188584616675477397368389504791510063972152790021570561, label %l2
+i512 3923188584616675477397368389504791510063972152790021570562, label %l3
+  ]
+
+default:
+  ret void
+
+l1:
+  ret void
+
+l2:
+  ret void
+
+l3:
+  ret void
+}



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] bc3606d - [AArch64][GlobalISel] Assign FPR banks to loads which are used by integer->float conversions.

2021-01-14 Thread Amara Emerson via llvm-branch-commits

Author: Amara Emerson
Date: 2021-01-14T16:14:59-08:00
New Revision: bc3606d0b27b2ba13a826b5c3fcba81f7e737387

URL: 
https://github.com/llvm/llvm-project/commit/bc3606d0b27b2ba13a826b5c3fcba81f7e737387
DIFF: 
https://github.com/llvm/llvm-project/commit/bc3606d0b27b2ba13a826b5c3fcba81f7e737387.diff

LOG: [AArch64][GlobalISel] Assign FPR banks to loads which are used by 
integer->float conversions.

G_[US]ITOFP users of loads on AArch64 can operate on both gpr and fpr banks for 
scalars.
Because of this, if their source is a load, then that load can be assigned to 
an fpr
bank and therefore avoid having to do a cross bank copy via a gpr->fpr 
conversion.

Differential Revision: https://reviews.llvm.org/D94701

Added: 


Modified: 
llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp
llvm/test/CodeGen/AArch64/GlobalISel/regbank-fp-use-def.mir

Removed: 




diff  --git a/llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp 
b/llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp
index eeb7d5bc6eb7..c76c43389b37 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp
@@ -680,11 +680,18 @@ AArch64RegisterBankInfo::getInstrMapping(const 
MachineInstr &MI) const {
 break;
   }
   case TargetOpcode::G_SITOFP:
-  case TargetOpcode::G_UITOFP:
+  case TargetOpcode::G_UITOFP: {
 if (MRI.getType(MI.getOperand(0).getReg()).isVector())
   break;
-OpRegBankIdx = {PMI_FirstFPR, PMI_FirstGPR};
+// Integer to FP conversions don't necessarily happen between GPR -> FPR
+// regbanks. They can also be done within an FPR register.
+Register SrcReg = MI.getOperand(1).getReg();
+if (getRegBank(SrcReg, MRI, TRI) == &AArch64::FPRRegBank)
+  OpRegBankIdx = {PMI_FirstFPR, PMI_FirstFPR};
+else
+  OpRegBankIdx = {PMI_FirstFPR, PMI_FirstGPR};
 break;
+  }
   case TargetOpcode::G_FPTOSI:
   case TargetOpcode::G_FPTOUI:
 if (MRI.getType(MI.getOperand(0).getReg()).isVector())
@@ -722,7 +729,8 @@ AArch64RegisterBankInfo::getInstrMapping(const MachineInstr 
&MI) const {
 // assume this was a floating point load in the IR.
 // If it was not, we would have had a bitcast before
 // reaching that instruction.
-if (onlyUsesFP(UseMI, MRI, TRI)) {
+// Int->FP conversion operations are also captured in onlyDefinesFP().
+if (onlyUsesFP(UseMI, MRI, TRI) || onlyDefinesFP(UseMI, MRI, TRI)) {
   OpRegBankIdx[0] = PMI_FirstFPR;
   break;
 }

diff  --git a/llvm/test/CodeGen/AArch64/GlobalISel/regbank-fp-use-def.mir 
b/llvm/test/CodeGen/AArch64/GlobalISel/regbank-fp-use-def.mir
index a7aae275fa5d..46177b4f1b1f 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/regbank-fp-use-def.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/regbank-fp-use-def.mir
@@ -4,7 +4,7 @@
 # Check that we correctly assign register banks based off of instructions which
 # only use or only define FPRs.
 #
-# For example, G_SITOFP takes in a GPR, but only ever produces values on FPRs.
+# For example, G_SITOFP may take in a GPR, but only ever produces values on 
FPRs.
 # Some instructions can have inputs/outputs on either FPRs or GPRs. If one of
 # those instructions takes in the result of a G_SITOFP as a source, we should
 # put that source on a FPR.
@@ -361,3 +361,47 @@ body: |
 %phi:_(s32) = G_PHI %gpr_copy(s32), %bb.0, %unmerge_1(s32), %bb.1
 $s0 = COPY %phi(s32)
 RET_ReallyLR implicit $s0
+
+...
+---
+name:load_used_by_sitofp
+legalized:   true
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $x0
+; The load should be assigned an fpr bank because it's used by the sitofp.
+; The sitofp should assign both src and dest to FPR, resulting in no 
copies.
+; CHECK-LABEL: name: load_used_by_sitofp
+; CHECK: liveins: $x0
+; CHECK: [[COPY:%[0-9]+]]:gpr(p0) = COPY $x0
+; CHECK: [[LOAD:%[0-9]+]]:fpr(s32) = G_LOAD [[COPY]](p0) :: (load 4)
+; CHECK: [[SITOFP:%[0-9]+]]:fpr(s32) = G_SITOFP [[LOAD]](s32)
+; CHECK: $s0 = COPY [[SITOFP]](s32)
+; CHECK: RET_ReallyLR implicit $s0
+%0:_(p0) = COPY $x0
+%1:_(s32) = G_LOAD %0 :: (load 4)
+%2:_(s32) = G_SITOFP %1:_(s32)
+$s0 = COPY %2(s32)
+RET_ReallyLR implicit $s0
+...
+---
+name:load_used_by_uitofp
+legalized:   true
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $x0
+; CHECK-LABEL: name: load_used_by_uitofp
+; CHECK: liveins: $x0
+; CHECK: [[COPY:%[0-9]+]]:gpr(p0) = COPY $x0
+; CHECK: [[LOAD:%[0-9]+]]:fpr(s32) = G_LOAD [[COPY]](p0) :: (load 4)
+; CHECK: [[UITOFP:%[0-9]+]]:fpr(s32) = G_UITOFP [[LOAD]](s32)
+; CHECK: $s0 = COPY [[UITOFP]](s32)
+; CHECK: RET_ReallyLR implicit $s0
+%0:_(p0) = COPY $x0
+%1:_(s32) = G_LOAD %0 :: (load 4)
+%2:_(s32) = G_UITOFP %1:_(s32)
+$s0 = COPY %2

[llvm-branch-commits] [llvm] 036bc79 - [AArch64][GlobalISel] Assign FPR banks to loads which are used by integer->float conversions.

2021-01-14 Thread Amara Emerson via llvm-branch-commits

Author: Amara Emerson
Date: 2021-01-14T16:33:34-08:00
New Revision: 036bc798f2ae4d266fe01e70778afe0b3381c088

URL: 
https://github.com/llvm/llvm-project/commit/036bc798f2ae4d266fe01e70778afe0b3381c088
DIFF: 
https://github.com/llvm/llvm-project/commit/036bc798f2ae4d266fe01e70778afe0b3381c088.diff

LOG: [AArch64][GlobalISel] Assign FPR banks to loads which are used by 
integer->float conversions.

G_[US]ITOFP users of loads on AArch64 can operate on both gpr and fpr banks for 
scalars.
Because of this, if their source is a load, then that load can be assigned to 
an fpr
bank and therefore avoid having to do a cross bank copy via a gpr->fpr 
conversion.

Differential Revision: https://reviews.llvm.org/D94701

Added: 


Modified: 
llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp
llvm/test/CodeGen/AArch64/GlobalISel/regbank-fp-use-def.mir

Removed: 




diff  --git a/llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp 
b/llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp
index eeb7d5bc6eb7..c76c43389b37 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp
@@ -680,11 +680,18 @@ AArch64RegisterBankInfo::getInstrMapping(const 
MachineInstr &MI) const {
 break;
   }
   case TargetOpcode::G_SITOFP:
-  case TargetOpcode::G_UITOFP:
+  case TargetOpcode::G_UITOFP: {
 if (MRI.getType(MI.getOperand(0).getReg()).isVector())
   break;
-OpRegBankIdx = {PMI_FirstFPR, PMI_FirstGPR};
+// Integer to FP conversions don't necessarily happen between GPR -> FPR
+// regbanks. They can also be done within an FPR register.
+Register SrcReg = MI.getOperand(1).getReg();
+if (getRegBank(SrcReg, MRI, TRI) == &AArch64::FPRRegBank)
+  OpRegBankIdx = {PMI_FirstFPR, PMI_FirstFPR};
+else
+  OpRegBankIdx = {PMI_FirstFPR, PMI_FirstGPR};
 break;
+  }
   case TargetOpcode::G_FPTOSI:
   case TargetOpcode::G_FPTOUI:
 if (MRI.getType(MI.getOperand(0).getReg()).isVector())
@@ -722,7 +729,8 @@ AArch64RegisterBankInfo::getInstrMapping(const MachineInstr 
&MI) const {
 // assume this was a floating point load in the IR.
 // If it was not, we would have had a bitcast before
 // reaching that instruction.
-if (onlyUsesFP(UseMI, MRI, TRI)) {
+// Int->FP conversion operations are also captured in onlyDefinesFP().
+if (onlyUsesFP(UseMI, MRI, TRI) || onlyDefinesFP(UseMI, MRI, TRI)) {
   OpRegBankIdx[0] = PMI_FirstFPR;
   break;
 }

diff  --git a/llvm/test/CodeGen/AArch64/GlobalISel/regbank-fp-use-def.mir 
b/llvm/test/CodeGen/AArch64/GlobalISel/regbank-fp-use-def.mir
index a7aae275fa5d..46177b4f1b1f 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/regbank-fp-use-def.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/regbank-fp-use-def.mir
@@ -4,7 +4,7 @@
 # Check that we correctly assign register banks based off of instructions which
 # only use or only define FPRs.
 #
-# For example, G_SITOFP takes in a GPR, but only ever produces values on FPRs.
+# For example, G_SITOFP may take in a GPR, but only ever produces values on 
FPRs.
 # Some instructions can have inputs/outputs on either FPRs or GPRs. If one of
 # those instructions takes in the result of a G_SITOFP as a source, we should
 # put that source on a FPR.
@@ -361,3 +361,47 @@ body: |
 %phi:_(s32) = G_PHI %gpr_copy(s32), %bb.0, %unmerge_1(s32), %bb.1
 $s0 = COPY %phi(s32)
 RET_ReallyLR implicit $s0
+
+...
+---
+name:load_used_by_sitofp
+legalized:   true
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $x0
+; The load should be assigned an fpr bank because it's used by the sitofp.
+; The sitofp should assign both src and dest to FPR, resulting in no 
copies.
+; CHECK-LABEL: name: load_used_by_sitofp
+; CHECK: liveins: $x0
+; CHECK: [[COPY:%[0-9]+]]:gpr(p0) = COPY $x0
+; CHECK: [[LOAD:%[0-9]+]]:fpr(s32) = G_LOAD [[COPY]](p0) :: (load 4)
+; CHECK: [[SITOFP:%[0-9]+]]:fpr(s32) = G_SITOFP [[LOAD]](s32)
+; CHECK: $s0 = COPY [[SITOFP]](s32)
+; CHECK: RET_ReallyLR implicit $s0
+%0:_(p0) = COPY $x0
+%1:_(s32) = G_LOAD %0 :: (load 4)
+%2:_(s32) = G_SITOFP %1:_(s32)
+$s0 = COPY %2(s32)
+RET_ReallyLR implicit $s0
+...
+---
+name:load_used_by_uitofp
+legalized:   true
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $x0
+; CHECK-LABEL: name: load_used_by_uitofp
+; CHECK: liveins: $x0
+; CHECK: [[COPY:%[0-9]+]]:gpr(p0) = COPY $x0
+; CHECK: [[LOAD:%[0-9]+]]:fpr(s32) = G_LOAD [[COPY]](p0) :: (load 4)
+; CHECK: [[UITOFP:%[0-9]+]]:fpr(s32) = G_UITOFP [[LOAD]](s32)
+; CHECK: $s0 = COPY [[UITOFP]](s32)
+; CHECK: RET_ReallyLR implicit $s0
+%0:_(p0) = COPY $x0
+%1:_(s32) = G_LOAD %0 :: (load 4)
+%2:_(s32) = G_UITOFP %1:_(s32)
+$s0 = COPY %2

[llvm-branch-commits] [llvm] 8f283ca - [AArch64][GlobalISel] Add selection support for fpr bank source variants of G_SITOFP and G_UITOFP.

2021-01-14 Thread Amara Emerson via llvm-branch-commits

Author: Amara Emerson
Date: 2021-01-14T19:31:19-08:00
New Revision: 8f283cafddfa8d6d01a94b48cdc5d25817569e91

URL: 
https://github.com/llvm/llvm-project/commit/8f283cafddfa8d6d01a94b48cdc5d25817569e91
DIFF: 
https://github.com/llvm/llvm-project/commit/8f283cafddfa8d6d01a94b48cdc5d25817569e91.diff

LOG: [AArch64][GlobalISel] Add selection support for fpr bank source variants 
of G_SITOFP and G_UITOFP.

In order to import patterns for these, we need to define new ops that can map to
the AArch64ISD::[SU]ITOF nodes. We then transform fpr->fpr variants of the 
generic
opcodes to these custom opcodes in preisel-lowering. We have to do it here and
not the PostLegalizer combiner because this has to run after regbankselect.

Differential Revision: https://reviews.llvm.org/D94702

Added: 


Modified: 
llvm/lib/Target/AArch64/AArch64InstrGISel.td
llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
llvm/test/CodeGen/AArch64/GlobalISel/select-fp-casts.mir

Removed: 




diff  --git a/llvm/lib/Target/AArch64/AArch64InstrGISel.td 
b/llvm/lib/Target/AArch64/AArch64InstrGISel.td
index eadb6847ceb6..25656fac1d2f 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrGISel.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrGISel.td
@@ -146,6 +146,16 @@ def G_VLSHR : AArch64GenericInstruction {
   let InOperandList = (ins type0:$src1, untyped_imm_0:$imm);
 }
 
+// Represents an integer to FP conversion on the FPR bank.
+def G_SITOF : AArch64GenericInstruction {
+  let OutOperandList = (outs type0:$dst);
+  let InOperandList = (ins type0:$src);
+}
+def G_UITOF : AArch64GenericInstruction {
+  let OutOperandList = (outs type0:$dst);
+  let InOperandList = (ins type0:$src);
+}
+
 def : GINodeEquiv;
 def : GINodeEquiv;
 def : GINodeEquiv;
@@ -163,6 +173,8 @@ def : GINodeEquiv;
 def : GINodeEquiv;
 def : GINodeEquiv;
 def : GINodeEquiv;
+def : GINodeEquiv;
+def : GINodeEquiv;
 
 def : GINodeEquiv;
 

diff  --git a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp 
b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
index 6dc0d1fb97e2..c2e3d9484207 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
@@ -1941,6 +1941,24 @@ bool 
AArch64InstructionSelector::preISelLower(MachineInstr &I) {
 I.getOperand(1).setReg(NewSrc.getReg(0));
 return true;
   }
+  case TargetOpcode::G_UITOFP:
+  case TargetOpcode::G_SITOFP: {
+// If both source and destination regbanks are FPR, then convert the opcode
+// to G_SITOF so that the importer can select it to an fpr variant.
+// Otherwise, it ends up matching an fpr/gpr variant and adding a 
cross-bank
+// copy.
+Register SrcReg = I.getOperand(1).getReg();
+if (MRI.getType(SrcReg).isVector())
+  return false;
+if (RBI.getRegBank(SrcReg, MRI, TRI)->getID() == AArch64::FPRRegBankID) {
+  if (I.getOpcode() == TargetOpcode::G_SITOFP)
+I.setDesc(TII.get(AArch64::G_SITOF));
+  else
+I.setDesc(TII.get(AArch64::G_UITOF));
+  return true;
+}
+return false;
+  }
   default:
 return false;
   }

diff  --git a/llvm/test/CodeGen/AArch64/GlobalISel/select-fp-casts.mir 
b/llvm/test/CodeGen/AArch64/GlobalISel/select-fp-casts.mir
index aea10c5c6c9d..aad71bd99f8f 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/select-fp-casts.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/select-fp-casts.mir
@@ -218,7 +218,7 @@ body: |
 ...
 
 ---
-name:sitofp_s32_s32_fpr
+name:sitofp_s32_s32_fpr_gpr
 legalized:   true
 regBankSelected: true
 
@@ -230,7 +230,7 @@ body: |
   bb.0:
 liveins: $w0
 
-; CHECK-LABEL: name: sitofp_s32_s32_fpr
+; CHECK-LABEL: name: sitofp_s32_s32_fpr_gpr
 ; CHECK: [[COPY:%[0-9]+]]:gpr32 = COPY $w0
 ; CHECK: [[SCVTFUWSri:%[0-9]+]]:fpr32 = SCVTFUWSri [[COPY]]
 ; CHECK: $s0 = COPY [[SCVTFUWSri]]
@@ -239,6 +239,50 @@ body: |
 $s0 = COPY %1(s32)
 ...
 
+---
+name:sitofp_s32_s32_fpr_fpr
+legalized:   true
+regBankSelected: true
+
+registers:
+  - { id: 0, class: fpr }
+  - { id: 1, class: fpr }
+
+body: |
+  bb.0:
+liveins: $s0
+
+; CHECK-LABEL: name: sitofp_s32_s32_fpr_fpr
+; CHECK: [[COPY:%[0-9]+]]:fpr32 = COPY $s0
+; CHECK: [[SCVTFv1i32:%[0-9]+]]:fpr32 = SCVTFv1i32 [[COPY]]
+; CHECK: $s0 = COPY [[SCVTFv1i32]]
+%0(s32) = COPY $s0
+%1(s32) = G_SITOFP %0
+$s0 = COPY %1(s32)
+...
+
+---
+name:uitofp_s32_s32_fpr_fpr
+legalized:   true
+regBankSelected: true
+
+registers:
+  - { id: 0, class: fpr }
+  - { id: 1, class: fpr }
+
+body: |
+  bb.0:
+liveins: $s0
+
+; CHECK-LABEL: name: uitofp_s32_s32_fpr_fpr
+; CHECK: [[COPY:%[0-9]+]]:fpr32 = COPY $s0
+; CHECK: [[UCVTFv1i32:%[0-9]+]]:fpr32 = UCVTFv1i32 [[COPY]]
+; CHECK: $s0 = COPY [[UCVTFv1

[llvm-branch-commits] [llvm] 89e84de - [AArch64][GlobalISel] Fix fallbacks introduced for G_SITOFP in 8f283cafddfa8d6d01a94b48cdc5d25817569e91

2021-01-15 Thread Amara Emerson via llvm-branch-commits

Author: Amara Emerson
Date: 2021-01-15T01:10:49-08:00
New Revision: 89e84dec1879417fb7eb96edaa55dac7eca204ab

URL: 
https://github.com/llvm/llvm-project/commit/89e84dec1879417fb7eb96edaa55dac7eca204ab
DIFF: 
https://github.com/llvm/llvm-project/commit/89e84dec1879417fb7eb96edaa55dac7eca204ab.diff

LOG: [AArch64][GlobalISel] Fix fallbacks introduced for G_SITOFP in 
8f283cafddfa8d6d01a94b48cdc5d25817569e91

If we have an integer->fp convert that has differing sizes, e.g. s32 to s64,
then don't try to convert it to AArch64::G_SITOF since it won't select.

Added: 


Modified: 
llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
llvm/test/CodeGen/AArch64/GlobalISel/select-fp-casts.mir

Removed: 




diff  --git a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp 
b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
index 5dcb9b2d00da..797f33ce2ab4 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
@@ -1947,8 +1947,11 @@ bool 
AArch64InstructionSelector::preISelLower(MachineInstr &I) {
 // Otherwise, it ends up matching an fpr/gpr variant and adding a 
cross-bank
 // copy.
 Register SrcReg = I.getOperand(1).getReg();
-if (MRI.getType(SrcReg).isVector())
+LLT SrcTy = MRI.getType(SrcReg);
+LLT DstTy = MRI.getType(I.getOperand(0).getReg());
+if (SrcTy.isVector() || SrcTy.getSizeInBits() != DstTy.getSizeInBits())
   return false;
+
 if (RBI.getRegBank(SrcReg, MRI, TRI)->getID() == AArch64::FPRRegBankID) {
   if (I.getOpcode() == TargetOpcode::G_SITOFP)
 I.setDesc(TII.get(AArch64::G_SITOF));

diff  --git a/llvm/test/CodeGen/AArch64/GlobalISel/select-fp-casts.mir 
b/llvm/test/CodeGen/AArch64/GlobalISel/select-fp-casts.mir
index aad71bd99f8f..4274f91dba49 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/select-fp-casts.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/select-fp-casts.mir
@@ -327,6 +327,29 @@ body: |
 $d0 = COPY %1(s64)
 ...
 
+---
+name:sitofp_s64_s32_fpr_both
+legalized:   true
+regBankSelected: true
+
+registers:
+  - { id: 0, class: fpr }
+  - { id: 1, class: fpr }
+
+body: |
+  bb.0:
+liveins: $s0
+
+; CHECK-LABEL: name: sitofp_s64_s32_fpr
+; CHECK: [[COPY:%[0-9]+]]:fpr32 = COPY $s0
+; CHECK: [[COPY2:%[0-9]+]]:gpr32 = COPY [[COPY]]
+; CHECK: [[SCVTFUWDri:%[0-9]+]]:fpr64 = SCVTFUWDri [[COPY2]]
+; CHECK: $d0 = COPY [[SCVTFUWDri]]
+%0(s32) = COPY $s0
+%1(s64) = G_SITOFP %0
+$d0 = COPY %1(s64)
+...
+
 ---
 name:sitofp_s64_s64_fpr
 legalized:   true



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] aa8a2d8 - [AArch64][GlobalISel] Select immediate fcmp if the zero is on the LHS.

2021-01-15 Thread Amara Emerson via llvm-branch-commits

Author: Amara Emerson
Date: 2021-01-15T14:31:39-08:00
New Revision: aa8a2d8a3da3704f82ba4ea3a6e7b463737597e1

URL: 
https://github.com/llvm/llvm-project/commit/aa8a2d8a3da3704f82ba4ea3a6e7b463737597e1
DIFF: 
https://github.com/llvm/llvm-project/commit/aa8a2d8a3da3704f82ba4ea3a6e7b463737597e1.diff

LOG: [AArch64][GlobalISel] Select immediate fcmp if the zero is on the LHS.

Added: 


Modified: 
llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
llvm/test/CodeGen/AArch64/GlobalISel/select-fcmp.mir

Removed: 




diff  --git a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp 
b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
index 797f33ce2ab4..b24fad35e32b 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
@@ -4224,6 +4224,14 @@ AArch64InstructionSelector::emitFPCompare(Register LHS, 
Register RHS,
   // to explicitly materialize a constant.
   const ConstantFP *FPImm = getConstantFPVRegVal(RHS, MRI);
   bool ShouldUseImm = FPImm && (FPImm->isZero() && !FPImm->isNegative());
+  if (!ShouldUseImm) {
+// Try commutating the operands.
+const ConstantFP *LHSImm = getConstantFPVRegVal(LHS, MRI);
+if (LHSImm && (LHSImm->isZero() && !LHSImm->isNegative())) {
+  ShouldUseImm = true;
+  std::swap(LHS, RHS);
+}
+  }
   unsigned CmpOpcTbl[2][2] = {{AArch64::FCMPSrr, AArch64::FCMPDrr},
   {AArch64::FCMPSri, AArch64::FCMPDri}};
   unsigned CmpOpc = CmpOpcTbl[ShouldUseImm][OpSize == 64];

diff  --git a/llvm/test/CodeGen/AArch64/GlobalISel/select-fcmp.mir 
b/llvm/test/CodeGen/AArch64/GlobalISel/select-fcmp.mir
index 45799079f920..c12cd3343c7e 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/select-fcmp.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/select-fcmp.mir
@@ -107,3 +107,30 @@ body: |
 %3:gpr(s32) = G_FCMP floatpred(oeq), %0(s64), %2
 $s0 = COPY %3(s32)
 RET_ReallyLR implicit $s0
+...
+
+---
+name:zero_lhs
+alignment:   4
+legalized:   true
+regBankSelected: true
+tracksRegLiveness: true
+body: |
+  bb.1:
+liveins: $s0, $s1
+
+; CHECK-LABEL: name: zero_lhs
+; CHECK: liveins: $s0, $s1
+; CHECK: [[COPY:%[0-9]+]]:fpr32 = COPY $s0
+; CHECK: FCMPSri [[COPY]], implicit-def $nzcv
+; CHECK: [[CSINCWr:%[0-9]+]]:gpr32 = CSINCWr $wzr, $wzr, 1, implicit $nzcv
+; CHECK: $s0 = COPY [[CSINCWr]]
+; CHECK: RET_ReallyLR implicit $s0
+%0:fpr(s32) = COPY $s0
+%1:fpr(s32) = COPY $s1
+%2:fpr(s32) = G_FCONSTANT float 0.00e+00
+%3:gpr(s32) = G_FCMP floatpred(oeq), %2(s32), %0
+$s0 = COPY %3(s32)
+RET_ReallyLR implicit $s0
+
+...



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 8456c3a - AArch64: fix regression introduced by fcmp immediate selection.

2021-01-15 Thread Amara Emerson via llvm-branch-commits

Author: Amara Emerson
Date: 2021-01-15T22:53:25-08:00
New Revision: 8456c3a789285079ad35d146e487436b5a27b027

URL: 
https://github.com/llvm/llvm-project/commit/8456c3a789285079ad35d146e487436b5a27b027
DIFF: 
https://github.com/llvm/llvm-project/commit/8456c3a789285079ad35d146e487436b5a27b027.diff

LOG: AArch64: fix regression introduced by fcmp immediate selection.

Forgot to check if the predicate is safe to commutate operands.

Added: 


Modified: 
llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
llvm/test/CodeGen/AArch64/GlobalISel/select-fcmp.mir

Removed: 




diff  --git a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp 
b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
index b24fad35e32b..0021456a596d 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
@@ -34,6 +34,7 @@
 #include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/CodeGen/TargetOpcodes.h"
 #include "llvm/IR/Constants.h"
+#include "llvm/IR/Instructions.h"
 #include "llvm/IR/PatternMatch.h"
 #include "llvm/IR/Type.h"
 #include "llvm/IR/IntrinsicsAArch64.h"
@@ -177,8 +178,10 @@ class AArch64InstructionSelector : public 
InstructionSelector {
MachineIRBuilder &MIRBuilder) const;
 
   /// Emit a floating point comparison between \p LHS and \p RHS.
+  /// \p Pred if given is the intended predicate to use.
   MachineInstr *emitFPCompare(Register LHS, Register RHS,
-  MachineIRBuilder &MIRBuilder) const;
+  MachineIRBuilder &MIRBuilder,
+  Optional = None) const;
 
   MachineInstr *emitInstr(unsigned Opcode,
   std::initializer_list DstOps,
@@ -1483,11 +1486,11 @@ bool 
AArch64InstructionSelector::selectCompareBranchFedByFCmp(
   assert(I.getOpcode() == TargetOpcode::G_BRCOND);
   // Unfortunately, the mapping of LLVM FP CC's onto AArch64 CC's isn't
   // totally clean.  Some of them require two branches to implement.
-  emitFPCompare(FCmp.getOperand(2).getReg(), FCmp.getOperand(3).getReg(), MIB);
+  auto Pred = (CmpInst::Predicate)FCmp.getOperand(1).getPredicate();
+  emitFPCompare(FCmp.getOperand(2).getReg(), FCmp.getOperand(3).getReg(), MIB,
+Pred);
   AArch64CC::CondCode CC1, CC2;
-  changeFCMPPredToAArch64CC(
-  static_cast(FCmp.getOperand(1).getPredicate()), CC1,
-  CC2);
+  changeFCMPPredToAArch64CC(static_cast(Pred), CC1, CC2);
   MachineBasicBlock *DestMBB = I.getOperand(1).getMBB();
   MIB.buildInstr(AArch64::Bcc, {}, {}).addImm(CC1).addMBB(DestMBB);
   if (CC2 != AArch64CC::AL)
@@ -3090,7 +3093,7 @@ bool AArch64InstructionSelector::select(MachineInstr &I) {
 CmpInst::Predicate Pred =
 static_cast(I.getOperand(1).getPredicate());
 if (!emitFPCompare(I.getOperand(2).getReg(), I.getOperand(3).getReg(),
-   MIRBuilder) ||
+   MIRBuilder, Pred) ||
 !emitCSetForFCmp(I.getOperand(0).getReg(), Pred, MIRBuilder))
   return false;
 I.eraseFromParent();
@@ -4211,7 +4214,8 @@ MachineInstr *AArch64InstructionSelector::emitCSetForFCmp(
 
 MachineInstr *
 AArch64InstructionSelector::emitFPCompare(Register LHS, Register RHS,
-  MachineIRBuilder &MIRBuilder) const {
+  MachineIRBuilder &MIRBuilder,
+  Optional Pred) 
const {
   MachineRegisterInfo &MRI = *MIRBuilder.getMRI();
   LLT Ty = MRI.getType(LHS);
   if (Ty.isVector())
@@ -4224,7 +4228,12 @@ AArch64InstructionSelector::emitFPCompare(Register LHS, 
Register RHS,
   // to explicitly materialize a constant.
   const ConstantFP *FPImm = getConstantFPVRegVal(RHS, MRI);
   bool ShouldUseImm = FPImm && (FPImm->isZero() && !FPImm->isNegative());
-  if (!ShouldUseImm) {
+
+  auto IsEqualityPred = [](CmpInst::Predicate P) {
+return P == CmpInst::FCMP_OEQ || P == CmpInst::FCMP_ONE ||
+   P == CmpInst::FCMP_UEQ || P == CmpInst::FCMP_UNE;
+  };
+  if (!ShouldUseImm && Pred && IsEqualityPred(*Pred)) {
 // Try commutating the operands.
 const ConstantFP *LHSImm = getConstantFPVRegVal(LHS, MRI);
 if (LHSImm && (LHSImm->isZero() && !LHSImm->isNegative())) {

diff  --git a/llvm/test/CodeGen/AArch64/GlobalISel/select-fcmp.mir 
b/llvm/test/CodeGen/AArch64/GlobalISel/select-fcmp.mir
index c12cd3343c7e..cde785a6a446 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/select-fcmp.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/select-fcmp.mir
@@ -134,3 +134,29 @@ body: |
 RET_ReallyLR implicit $s0
 
 ...
+---
+name:zero_lhs_not_commutative_pred
+alignment:   4
+legalized:   true
+regBankSelected: true
+tracksRegLiveness: true
+body: |
+  bb.1:
+liveins: $s0, $s1
+
+

[llvm-branch-commits] [llvm] 3dedad4 - [AArch64][GlobalISel] Make G_USUBO legal and select it.

2021-01-21 Thread Amara Emerson via llvm-branch-commits

Author: Cassie Jones
Date: 2021-01-21T18:53:33-08:00
New Revision: 3dedad475da45c05bc4f66cd14e9f44581edf0bc

URL: 
https://github.com/llvm/llvm-project/commit/3dedad475da45c05bc4f66cd14e9f44581edf0bc
DIFF: 
https://github.com/llvm/llvm-project/commit/3dedad475da45c05bc4f66cd14e9f44581edf0bc.diff

LOG: [AArch64][GlobalISel] Make G_USUBO legal and select it.

The expansion for wide subtractions includes G_USUBO.

Differential Revision: https://reviews.llvm.org/D95032

Added: 
llvm/test/CodeGen/AArch64/GlobalISel/select-saddo.mir
llvm/test/CodeGen/AArch64/GlobalISel/select-ssubo.mir
llvm/test/CodeGen/AArch64/GlobalISel/select-usubo.mir

Modified: 
llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir
llvm/test/CodeGen/AArch64/GlobalISel/legalize-sub.mir
llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir

Removed: 




diff  --git a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp 
b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
index 9619bb43ae9c..43ad18101069 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
@@ -2745,7 +2745,8 @@ bool AArch64InstructionSelector::select(MachineInstr &I) {
   }
   case TargetOpcode::G_SADDO:
   case TargetOpcode::G_UADDO:
-  case TargetOpcode::G_SSUBO: {
+  case TargetOpcode::G_SSUBO:
+  case TargetOpcode::G_USUBO: {
 // Emit the operation and get the correct condition code.
 MachineIRBuilder MIRBuilder(I);
 auto OpAndCC = emitOverflowOp(Opcode, I.getOperand(0).getReg(),
@@ -4376,6 +4377,8 @@ AArch64InstructionSelector::emitOverflowOp(unsigned 
Opcode, Register Dst,
 return std::make_pair(emitADDS(Dst, LHS, RHS, MIRBuilder), AArch64CC::HS);
   case TargetOpcode::G_SSUBO:
 return std::make_pair(emitSUBS(Dst, LHS, RHS, MIRBuilder), AArch64CC::VS);
+  case TargetOpcode::G_USUBO:
+return std::make_pair(emitSUBS(Dst, LHS, RHS, MIRBuilder), AArch64CC::HS);
   }
 }
 

diff  --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp 
b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index cc7aada211bb..5a6c904e3f5d 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -165,7 +165,8 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const 
AArch64Subtarget &ST)
 
   getActionDefinitionsBuilder({G_SMULH, G_UMULH}).legalFor({s32, s64});
 
-  getActionDefinitionsBuilder({G_UADDE, G_USUBE, G_SADDO, G_SSUBO, G_UADDO})
+  getActionDefinitionsBuilder(
+  {G_UADDE, G_USUBE, G_SADDO, G_SSUBO, G_UADDO, G_USUBO})
   .legalFor({{s32, s1}, {s64, s1}})
   .minScalar(0, s32);
 

diff  --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir 
b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir
index ab8510bf9d92..4f97d153d28b 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir
@@ -73,6 +73,44 @@ body: |
 %5:_(s64) = G_ANYEXT %4(s8)
 $x0 = COPY %5(s64)
 
+...
+---
+name:test_scalar_uaddo_32
+body: |
+  bb.0.entry:
+; CHECK-LABEL: name: test_scalar_uaddo_32
+; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
+; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $w1
+; CHECK: [[UADDO:%[0-9]+]]:_(s32), [[UADDO1:%[0-9]+]]:_(s1) = G_UADDO 
[[COPY]], [[COPY1]]
+; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[UADDO1]](s1)
+; CHECK: $w0 = COPY [[UADDO]](s32)
+; CHECK: $w1 = COPY [[ANYEXT]](s32)
+%0:_(s32) = COPY $w0
+%1:_(s32) = COPY $w1
+%2:_(s32), %3:_(s1) = G_UADDO %0, %1
+%4:_(s32) = G_ANYEXT %3
+$w0 = COPY %2(s32)
+$w1 = COPY %4(s32)
+
+...
+---
+name:test_scalar_saddo_32
+body: |
+  bb.0.entry:
+; CHECK-LABEL: name: test_scalar_saddo_32
+; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
+; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $w1
+; CHECK: [[SADDO:%[0-9]+]]:_(s32), [[SADDO1:%[0-9]+]]:_(s1) = G_SADDO 
[[COPY]], [[COPY1]]
+; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[SADDO1]](s1)
+; CHECK: $w0 = COPY [[SADDO]](s32)
+; CHECK: $w1 = COPY [[ANYEXT]](s32)
+%0:_(s32) = COPY $w0
+%1:_(s32) = COPY $w1
+%2:_(s32), %3:_(s1) = G_SADDO %0, %1
+%4:_(s32) = G_ANYEXT %3
+$w0 = COPY %2(s32)
+$w1 = COPY %4(s32)
+
 ...
 ---
 name:test_vector_add

diff  --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-sub.mir 
b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-sub.mir
index 32796e0948cc..b372a32eb7fc 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-sub.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-sub.mir
@@ -1,6 +1,59 @@
 # NOTE: Assertions have been autogenerated by utils/update_

[llvm-branch-commits] [llvm] 541d98e - [AArch64][GlobalISel] Implement widenScalar for signed overflow

2021-01-21 Thread Amara Emerson via llvm-branch-commits

Author: Cassie Jones
Date: 2021-01-21T22:55:42-08:00
New Revision: 541d98efa222b00e16c67348810898c2fa11f398

URL: 
https://github.com/llvm/llvm-project/commit/541d98efa222b00e16c67348810898c2fa11f398
DIFF: 
https://github.com/llvm/llvm-project/commit/541d98efa222b00e16c67348810898c2fa11f398.diff

LOG: [AArch64][GlobalISel] Implement widenScalar for signed overflow

Implement widening for G_SADDO and G_SSUBO. Previously it was only
implemented for G_UADDO and G_USUBO. Also add legalize-add/sub tests for
narrow overflowing add/sub on AArch64.

Differential Revision: https://reviews.llvm.org/D95034

Added: 


Modified: 
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir
llvm/test/CodeGen/AArch64/GlobalISel/legalize-sub.mir

Removed: 




diff  --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp 
b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
index b9e32257d2c8..aef9e6f70c65 100644
--- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
@@ -1814,6 +1814,26 @@ LegalizerHelper::widenScalar(MachineInstr &MI, unsigned 
TypeIdx, LLT WideTy) {
 return widenScalarMergeValues(MI, TypeIdx, WideTy);
   case TargetOpcode::G_UNMERGE_VALUES:
 return widenScalarUnmergeValues(MI, TypeIdx, WideTy);
+  case TargetOpcode::G_SADDO:
+  case TargetOpcode::G_SSUBO: {
+if (TypeIdx == 1)
+  return UnableToLegalize; // TODO
+auto LHSExt = MIRBuilder.buildSExt(WideTy, MI.getOperand(2));
+auto RHSExt = MIRBuilder.buildSExt(WideTy, MI.getOperand(3));
+unsigned Opcode = MI.getOpcode() == TargetOpcode::G_SADDO
+  ? TargetOpcode::G_ADD
+  : TargetOpcode::G_SUB;
+auto NewOp = MIRBuilder.buildInstr(Opcode, {WideTy}, {LHSExt, RHSExt});
+LLT OrigTy = MRI.getType(MI.getOperand(0).getReg());
+auto TruncOp = MIRBuilder.buildTrunc(OrigTy, NewOp);
+auto ExtOp = MIRBuilder.buildSExt(WideTy, TruncOp);
+// There is no overflow if the re-extended result is the same as NewOp.
+MIRBuilder.buildICmp(CmpInst::ICMP_NE, MI.getOperand(1), NewOp, ExtOp);
+// Now trunc the NewOp to the original result.
+MIRBuilder.buildTrunc(MI.getOperand(0), NewOp);
+MI.eraseFromParent();
+return Legalized;
+  }
   case TargetOpcode::G_UADDO:
   case TargetOpcode::G_USUBO: {
 if (TypeIdx == 1)

diff  --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir 
b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir
index 4f97d153d28b..f3564a950310 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir
@@ -73,6 +73,66 @@ body: |
 %5:_(s64) = G_ANYEXT %4(s8)
 $x0 = COPY %5(s64)
 
+...
+---
+name:test_scalar_uaddo_small
+body: |
+  bb.0.entry:
+; CHECK-LABEL: name: test_scalar_uaddo_small
+; CHECK: [[COPY:%[0-9]+]]:_(s64) = COPY $x0
+; CHECK: [[COPY1:%[0-9]+]]:_(s64) = COPY $x1
+; CHECK: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 255
+; CHECK: [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[COPY]](s64)
+; CHECK: [[AND:%[0-9]+]]:_(s32) = G_AND [[TRUNC]], [[C]]
+; CHECK: [[TRUNC1:%[0-9]+]]:_(s32) = G_TRUNC [[COPY1]](s64)
+; CHECK: [[AND1:%[0-9]+]]:_(s32) = G_AND [[TRUNC1]], [[C]]
+; CHECK: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[AND]], [[AND1]]
+; CHECK: [[AND2:%[0-9]+]]:_(s32) = G_AND [[ADD]], [[C]]
+; CHECK: [[ICMP:%[0-9]+]]:_(s32) = G_ICMP intpred(ne), [[ADD]](s32), 
[[AND2]]
+; CHECK: [[ANYEXT:%[0-9]+]]:_(s64) = G_ANYEXT [[ADD]](s32)
+; CHECK: [[ANYEXT1:%[0-9]+]]:_(s64) = G_ANYEXT [[ICMP]](s32)
+; CHECK: $x0 = COPY [[ANYEXT]](s64)
+; CHECK: $x1 = COPY [[ANYEXT1]](s64)
+%0:_(s64) = COPY $x0
+%1:_(s64) = COPY $x1
+%2:_(s8) = G_TRUNC %0(s64)
+%3:_(s8) = G_TRUNC %1(s64)
+%4:_(s8), %5:_(s1) = G_UADDO %2, %3
+%6:_(s64) = G_ANYEXT %4(s8)
+%7:_(s64) = G_ANYEXT %5(s1)
+$x0 = COPY %6(s64)
+$x1 = COPY %7(s64)
+
+...
+---
+name:test_scalar_saddo_small
+body: |
+  bb.0.entry:
+; CHECK-LABEL: name: test_scalar_saddo_small
+; CHECK: [[COPY:%[0-9]+]]:_(s64) = COPY $x0
+; CHECK: [[COPY1:%[0-9]+]]:_(s64) = COPY $x1
+; CHECK: [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[COPY]](s64)
+; CHECK: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[TRUNC]], 8
+; CHECK: [[TRUNC1:%[0-9]+]]:_(s32) = G_TRUNC [[COPY1]](s64)
+; CHECK: [[SEXT_INREG1:%[0-9]+]]:_(s32) = G_SEXT_INREG [[TRUNC1]], 8
+; CHECK: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG]], [[SEXT_INREG1]]
+; CHECK: [[COPY2:%[0-9]+]]:_(s32) = COPY [[ADD]](s32)
+; CHECK: [[SEXT_INREG2:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY2]], 8
+; CHECK: [[ICMP:%[0-9]+]]:_(s32) = G_ICMP intpred(ne), [[ADD]](s32), 
[[SEXT_INREG2]]
+; CHECK: [[ANYEXT:%[0-9]+]]:_(s64) = G_ANYEXT [[ADD]](s32)
+; C

[llvm-branch-commits] [llvm] 2bb92bf - [GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method

2021-01-22 Thread Amara Emerson via llvm-branch-commits

Author: Cassie Jones
Date: 2021-01-22T14:08:46-08:00
New Revision: 2bb92bf451d7eb2c817f3e5403353e7c0c14d350

URL: 
https://github.com/llvm/llvm-project/commit/2bb92bf451d7eb2c817f3e5403353e7c0c14d350
DIFF: 
https://github.com/llvm/llvm-project/commit/2bb92bf451d7eb2c817f3e5403353e7c0c14d350.diff

LOG: [GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method

The widenScalar implementation for signed and unsigned overflowing
operations were very similar: both are checked by truncating the result
and then re-sign/zero-extending it and checking that it matches the
computed operation.

Using a truncate + zero-extend for the unsigned case instead of manually
producing the AND instruction like before leads to an extra copy
instruction during legalization, but this should be harmless.

Differential Revision: https://reviews.llvm.org/D95035

Added: 


Modified: 
llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir
llvm/test/CodeGen/AArch64/GlobalISel/legalize-sub.mir
llvm/test/CodeGen/AArch64/legalize-uaddo.mir
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-uaddo.mir
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-usubo.mir
llvm/unittests/CodeGen/GlobalISel/LegalizerHelperTest.cpp

Removed: 




diff  --git a/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h 
b/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
index 2e9c7d8250ba..c3b494e94ff1 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
@@ -170,8 +170,10 @@ class LegalizerHelper {
   widenScalarExtract(MachineInstr &MI, unsigned TypeIdx, LLT WideTy);
   LegalizeResult
   widenScalarInsert(MachineInstr &MI, unsigned TypeIdx, LLT WideTy);
-  LegalizeResult
-  widenScalarAddSubShlSat(MachineInstr &MI, unsigned TypeIdx, LLT WideTy);
+  LegalizeResult widenScalarAddoSubo(MachineInstr &MI, unsigned TypeIdx,
+ LLT WideTy);
+  LegalizeResult widenScalarAddSubShlSat(MachineInstr &MI, unsigned TypeIdx,
+ LLT WideTy);
 
   /// Helper function to split a wide generic register into bitwise blocks with
   /// the given Type (which implies the number of blocks needed). The generic

diff  --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp 
b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
index aef9e6f70c65..e7f40523efaf 100644
--- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
@@ -1757,6 +1757,34 @@ LegalizerHelper::widenScalarInsert(MachineInstr &MI, 
unsigned TypeIdx,
   return Legalized;
 }
 
+LegalizerHelper::LegalizeResult
+LegalizerHelper::widenScalarAddoSubo(MachineInstr &MI, unsigned TypeIdx,
+ LLT WideTy) {
+  if (TypeIdx == 1)
+return UnableToLegalize; // TODO
+  unsigned Op = MI.getOpcode();
+  unsigned Opcode = Op == TargetOpcode::G_UADDO || Op == TargetOpcode::G_SADDO
+? TargetOpcode::G_ADD
+: TargetOpcode::G_SUB;
+  unsigned ExtOpcode =
+  Op == TargetOpcode::G_UADDO || Op == TargetOpcode::G_USUBO
+  ? TargetOpcode::G_ZEXT
+  : TargetOpcode::G_SEXT;
+  auto LHSExt = MIRBuilder.buildInstr(ExtOpcode, {WideTy}, {MI.getOperand(2)});
+  auto RHSExt = MIRBuilder.buildInstr(ExtOpcode, {WideTy}, {MI.getOperand(3)});
+  // Do the arithmetic in the larger type.
+  auto NewOp = MIRBuilder.buildInstr(Opcode, {WideTy}, {LHSExt, RHSExt});
+  LLT OrigTy = MRI.getType(MI.getOperand(0).getReg());
+  auto TruncOp = MIRBuilder.buildTrunc(OrigTy, NewOp);
+  auto ExtOp = MIRBuilder.buildInstr(ExtOpcode, {WideTy}, {TruncOp});
+  // There is no overflow if the ExtOp is the same as NewOp.
+  MIRBuilder.buildICmp(CmpInst::ICMP_NE, MI.getOperand(1), NewOp, ExtOp);
+  // Now trunc the NewOp to the original result.
+  MIRBuilder.buildTrunc(MI.getOperand(0), NewOp);
+  MI.eraseFromParent();
+  return Legalized;
+}
+
 LegalizerHelper::LegalizeResult
 LegalizerHelper::widenScalarAddSubShlSat(MachineInstr &MI, unsigned TypeIdx,
  LLT WideTy) {
@@ -1815,48 +1843,10 @@ LegalizerHelper::widenScalar(MachineInstr &MI, unsigned 
TypeIdx, LLT WideTy) {
   case TargetOpcode::G_UNMERGE_VALUES:
 return widenScalarUnmergeValues(MI, TypeIdx, WideTy);
   case TargetOpcode::G_SADDO:
-  case TargetOpcode::G_SSUBO: {
-if (TypeIdx == 1)
-  return UnableToLegalize; // TODO
-auto LHSExt = MIRBuilder.buildSExt(WideTy, MI.getOperand(2));
-auto RHSExt = MIRBuilder.buildSExt(WideTy, MI.getOperand(3));
-unsigned Opcode = MI.getOpcode() == TargetOpcode::G_SADDO
-  ? TargetOpcode::G_ADD
-  : TargetOpcode::G_SUB;
-auto NewOp = MIRBuilder.buildInstr(Op

[llvm-branch-commits] [llvm] a126569 - Fix failing triple test for macOS 11 with non-zero minor versions.

2021-01-06 Thread Amara Emerson via llvm-branch-commits

Author: Amara Emerson
Date: 2021-01-06T14:57:37-08:00
New Revision: a1265690cf614bde8a7fd1d503c5f13c184dc786

URL: 
https://github.com/llvm/llvm-project/commit/a1265690cf614bde8a7fd1d503c5f13c184dc786
DIFF: 
https://github.com/llvm/llvm-project/commit/a1265690cf614bde8a7fd1d503c5f13c184dc786.diff

LOG: Fix failing triple test for macOS 11 with non-zero minor versions.

Differential Revision: https://reviews.llvm.org/D94197

Added: 


Modified: 
llvm/unittests/ADT/TripleTest.cpp
llvm/unittests/Support/Host.cpp

Removed: 




diff  --git a/llvm/unittests/ADT/TripleTest.cpp 
b/llvm/unittests/ADT/TripleTest.cpp
index ffce07ba2b12..ff6c2dde4b16 100644
--- a/llvm/unittests/ADT/TripleTest.cpp
+++ b/llvm/unittests/ADT/TripleTest.cpp
@@ -1264,6 +1264,14 @@ TEST(TripleTest, getOSVersion) {
   EXPECT_EQ((unsigned)0, Minor);
   EXPECT_EQ((unsigned)0, Micro);
 
+  // For darwin triples on macOS 11, only compare the major version.
+  T = Triple("x86_64-apple-darwin20.2");
+  EXPECT_TRUE(T.isMacOSX());
+  T.getMacOSXVersion(Major, Minor, Micro);
+  EXPECT_EQ((unsigned)11, Major);
+  EXPECT_EQ((unsigned)0, Minor);
+  EXPECT_EQ((unsigned)0, Micro);
+
   T = Triple("armv7-apple-ios");
   EXPECT_FALSE(T.isMacOSX());
   EXPECT_TRUE(T.isiOS());

diff  --git a/llvm/unittests/Support/Host.cpp b/llvm/unittests/Support/Host.cpp
index 8029bb5830fc..b452048361db 100644
--- a/llvm/unittests/Support/Host.cpp
+++ b/llvm/unittests/Support/Host.cpp
@@ -348,9 +348,15 @@ TEST_F(HostTest, getMacOSHostVersion) {
   unsigned HostMajor, HostMinor, HostMicro;
   ASSERT_EQ(HostTriple.getMacOSXVersion(HostMajor, HostMinor, HostMicro), 
true);
 
-  // Don't compare the 'Micro' version, as it's always '0' for the 'Darwin'
-  // triples.
-  ASSERT_EQ(std::tie(SystemMajor, SystemMinor), std::tie(HostMajor, 
HostMinor));
+  if (SystemMajor > 10) {
+// Don't compare the 'Minor' and 'Micro' versions, as they're always '0' 
for
+// the 'Darwin' triples on 11.x.
+ASSERT_EQ(SystemMajor, HostMajor);
+  } else {
+// Don't compare the 'Micro' version, as it's always '0' for the 'Darwin'
+// triples.
+ASSERT_EQ(std::tie(SystemMajor, SystemMinor), std::tie(HostMajor, 
HostMinor));
+  }
 }
 #endif
 



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] a69b76c - [GlobalISel][IRTranslator] Ensure branch probabilities are added when translating invoke edges.

2020-12-14 Thread Amara Emerson via llvm-branch-commits

Author: Amara Emerson
Date: 2020-12-14T23:36:54-08:00
New Revision: a69b76c500849bacc0ba494df03b465e4bcff0ef

URL: 
https://github.com/llvm/llvm-project/commit/a69b76c500849bacc0ba494df03b465e4bcff0ef
DIFF: 
https://github.com/llvm/llvm-project/commit/a69b76c500849bacc0ba494df03b465e4bcff0ef.diff

LOG: [GlobalISel][IRTranslator] Ensure branch probabilities are added when 
translating invoke edges.

This uses a straightforward port of findUnwindDestinations() from SelectionDAG.

Differential Revision: https://reviews.llvm.org/D93256

Added: 
llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-invoke-probabilities.ll

Modified: 
llvm/include/llvm/CodeGen/GlobalISel/IRTranslator.h
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp

Removed: 




diff  --git a/llvm/include/llvm/CodeGen/GlobalISel/IRTranslator.h 
b/llvm/include/llvm/CodeGen/GlobalISel/IRTranslator.h
index 37c94ccbbd20..8eab8a5846a7 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/IRTranslator.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/IRTranslator.h
@@ -260,6 +260,19 @@ class IRTranslator : public MachineFunctionPass {
   /// \pre \p U is a call instruction.
   bool translateCall(const User &U, MachineIRBuilder &MIRBuilder);
 
+  /// When an invoke or a cleanupret unwinds to the next EH pad, there are
+  /// many places it could ultimately go. In the IR, we have a single unwind
+  /// destination, but in the machine CFG, we enumerate all the possible 
blocks.
+  /// This function skips over imaginary basic blocks that hold catchswitch
+  /// instructions, and finds all the "real" machine
+  /// basic block destinations. As those destinations may not be successors of
+  /// EHPadBB, here we also calculate the edge probability to those
+  /// destinations. The passed-in Prob is the edge probability to EHPadBB.
+  bool findUnwindDestinations(
+  const BasicBlock *EHPadBB, BranchProbability Prob,
+  SmallVectorImpl>
+  &UnwindDests);
+
   bool translateInvoke(const User &U, MachineIRBuilder &MIRBuilder);
 
   bool translateCallBr(const User &U, MachineIRBuilder &MIRBuilder);
@@ -659,8 +672,9 @@ class IRTranslator : public MachineFunctionPass {
   BranchProbability getEdgeProbability(const MachineBasicBlock *Src,
const MachineBasicBlock *Dst) const;
 
-  void addSuccessorWithProb(MachineBasicBlock *Src, MachineBasicBlock *Dst,
-BranchProbability Prob);
+  void addSuccessorWithProb(
+  MachineBasicBlock *Src, MachineBasicBlock *Dst,
+  BranchProbability Prob = BranchProbability::getUnknown());
 
 public:
   IRTranslator(CodeGenOpt::Level OptLevel = CodeGenOpt::None);

diff  --git a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp 
b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
index 202163ff9507..dde97ba599b9 100644
--- a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
@@ -2347,6 +2347,62 @@ bool IRTranslator::translateCall(const User &U, 
MachineIRBuilder &MIRBuilder) {
   return true;
 }
 
+bool IRTranslator::findUnwindDestinations(
+const BasicBlock *EHPadBB,
+BranchProbability Prob,
+SmallVectorImpl>
+&UnwindDests) {
+  EHPersonality Personality = classifyEHPersonality(
+  EHPadBB->getParent()->getFunction().getPersonalityFn());
+  bool IsMSVCCXX = Personality == EHPersonality::MSVC_CXX;
+  bool IsCoreCLR = Personality == EHPersonality::CoreCLR;
+  bool IsWasmCXX = Personality == EHPersonality::Wasm_CXX;
+  bool IsSEH = isAsynchronousEHPersonality(Personality);
+
+  if (IsWasmCXX) {
+// Ignore this for now.
+return false;
+  }
+
+  while (EHPadBB) {
+const Instruction *Pad = EHPadBB->getFirstNonPHI();
+BasicBlock *NewEHPadBB = nullptr;
+if (isa(Pad)) {
+  // Stop on landingpads. They are not funclets.
+  UnwindDests.emplace_back(&getMBB(*EHPadBB), Prob);
+  break;
+}
+if (isa(Pad)) {
+  // Stop on cleanup pads. Cleanups are always funclet entries for all 
known
+  // personalities.
+  UnwindDests.emplace_back(&getMBB(*EHPadBB), Prob);
+  UnwindDests.back().first->setIsEHScopeEntry();
+  UnwindDests.back().first->setIsEHFuncletEntry();
+  break;
+}
+if (auto *CatchSwitch = dyn_cast(Pad)) {
+  // Add the catchpad handlers to the possible destinations.
+  for (const BasicBlock *CatchPadBB : CatchSwitch->handlers()) {
+UnwindDests.emplace_back(&getMBB(*CatchPadBB), Prob);
+// For MSVC++ and the CLR, catchblocks are funclets and need prologues.
+if (IsMSVCCXX || IsCoreCLR)
+  UnwindDests.back().first->setIsEHFuncletEntry();
+if (!IsSEH)
+  UnwindDests.back().first->setIsEHScopeEntry();
+  }
+  NewEHPadBB = CatchSwitch->getUnwindDest();
+} else {
+  continue;
+}
+
+BranchProbabilityInfo *BPI = FuncInfo.BPI;
+if (BPI && NewEHPadBB)
+  Prob *= BPI->g

[llvm-branch-commits] [llvm] 9caca72 - [AArch64][GlobalISel] Use the look-through constant helper for the shift s32->s64 custom legalization.

2020-12-18 Thread Amara Emerson via llvm-branch-commits

Author: Amara Emerson
Date: 2020-12-18T11:57:24-08:00
New Revision: 9caca7241d447266a23a99ea0536f30faaf19694

URL: 
https://github.com/llvm/llvm-project/commit/9caca7241d447266a23a99ea0536f30faaf19694
DIFF: 
https://github.com/llvm/llvm-project/commit/9caca7241d447266a23a99ea0536f30faaf19694.diff

LOG: [AArch64][GlobalISel] Use the look-through constant helper for the shift 
s32->s64 custom legalization.

Almost NFC, except it catches more cases and gives a 0.1% CTMark -O0 size win.

Added: 


Modified: 
llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
llvm/test/CodeGen/AArch64/GlobalISel/legalize-unmerge-values.mir

Removed: 




diff  --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp 
b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index 2eaec0b970fa..3dcc244a08fa 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -710,16 +710,14 @@ bool AArch64LegalizerInfo::legalizeShlAshrLshr(
   // If the shift amount is a G_CONSTANT, promote it to a 64 bit type so the
   // imported patterns can select it later. Either way, it will be legal.
   Register AmtReg = MI.getOperand(2).getReg();
-  auto *CstMI = MRI.getVRegDef(AmtReg);
-  assert(CstMI && "expected to find a vreg def");
-  if (CstMI->getOpcode() != TargetOpcode::G_CONSTANT)
+  auto VRegAndVal = getConstantVRegValWithLookThrough(AmtReg, MRI);
+  if (!VRegAndVal)
 return true;
   // Check the shift amount is in range for an immediate form.
-  unsigned Amount = CstMI->getOperand(1).getCImm()->getZExtValue();
+  int64_t Amount = VRegAndVal->Value;
   if (Amount > 31)
 return true; // This will have to remain a register variant.
-  assert(MRI.getType(AmtReg).getSizeInBits() == 32);
-  auto ExtCst = MIRBuilder.buildZExt(LLT::scalar(64), AmtReg);
+  auto ExtCst = MIRBuilder.buildConstant(LLT::scalar(64), Amount);
   MI.getOperand(2).setReg(ExtCst.getReg(0));
   return true;
 }

diff  --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-unmerge-values.mir 
b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-unmerge-values.mir
index 56c5b8a8f1e2..9c1f6fc6f41b 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-unmerge-values.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-unmerge-values.mir
@@ -24,9 +24,10 @@ body: |
 ; CHECK-LABEL: name: test_unmerge_s4
 ; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
 ; CHECK: [[UV:%[0-9]+]]:_(s8), [[UV1:%[0-9]+]]:_(s8), 
[[UV2:%[0-9]+]]:_(s8), [[UV3:%[0-9]+]]:_(s8) = G_UNMERGE_VALUES [[COPY]](s32)
-; CHECK: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 4
+; CHECK: [[C:%[0-9]+]]:_(s8) = G_CONSTANT i8 4
 ; CHECK: [[ZEXT:%[0-9]+]]:_(s32) = G_ZEXT [[UV]](s8)
-; CHECK: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[ZEXT]], [[C]](s32)
+; CHECK: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 4
+; CHECK: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[ZEXT]], [[C1]](s64)
 ; CHECK: [[ANYEXT:%[0-9]+]]:_(s64) = G_ANYEXT [[UV]](s8)
 ; CHECK: [[ANYEXT1:%[0-9]+]]:_(s64) = G_ANYEXT [[LSHR]](s32)
 ; CHECK: $x0 = COPY [[ANYEXT]](s64)



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 43ff75f - [AArch64][GlobalISel] Promote scalar G_SHL constant shift amounts to s64.

2020-12-18 Thread Amara Emerson via llvm-branch-commits

Author: Amara Emerson
Date: 2020-12-18T11:57:38-08:00
New Revision: 43ff75f2c3feef64f9d73328230d34dac8832a91

URL: 
https://github.com/llvm/llvm-project/commit/43ff75f2c3feef64f9d73328230d34dac8832a91
DIFF: 
https://github.com/llvm/llvm-project/commit/43ff75f2c3feef64f9d73328230d34dac8832a91.diff

LOG: [AArch64][GlobalISel] Promote scalar G_SHL constant shift amounts to s64.

This was supposed to be done in the first place as is currently the case for
G_ASHR and G_LSHR but was forgotten when the original shift legalization
overhaul was done last year.

This was exposed because we started falling back on s32 = s32, s64 SHLs
due to a recent combiner change.

Gives a very minor (0.1%) code size -O0 improvement on consumer-typeset.

Added: 


Modified: 
llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
llvm/test/CodeGen/AArch64/GlobalISel/legalize-merge-values.mir
llvm/test/CodeGen/AArch64/GlobalISel/legalize-non-pow2-load-store.mir
llvm/test/CodeGen/AArch64/GlobalISel/legalize-shift.mir
llvm/test/CodeGen/AArch64/arm64-clrsb.ll

Removed: 




diff  --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp 
b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index 3dcc244a08fa..4ffde2a7e3c4 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -97,15 +97,25 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const 
AArch64Subtarget &ST)
   .moreElementsToNextPow2(0);
 
   getActionDefinitionsBuilder(G_SHL)
-.legalFor({{s32, s32}, {s64, s64},
-   {v2s32, v2s32}, {v4s32, v4s32}, {v2s64, v2s64}})
-.clampScalar(1, s32, s64)
-.clampScalar(0, s32, s64)
-.widenScalarToNextPow2(0)
-.clampNumElements(0, v2s32, v4s32)
-.clampNumElements(0, v2s64, v2s64)
-.moreElementsToNextPow2(0)
-.minScalarSameAs(1, 0);
+  .customIf([=](const LegalityQuery &Query) {
+const auto &SrcTy = Query.Types[0];
+const auto &AmtTy = Query.Types[1];
+return !SrcTy.isVector() && SrcTy.getSizeInBits() == 32 &&
+   AmtTy.getSizeInBits() == 32;
+  })
+  .legalFor({{s32, s32},
+ {s64, s64},
+ {s32, s64},
+ {v2s32, v2s32},
+ {v4s32, v4s32},
+ {v2s64, v2s64}})
+  .clampScalar(1, s32, s64)
+  .clampScalar(0, s32, s64)
+  .widenScalarToNextPow2(0)
+  .clampNumElements(0, v2s32, v4s32)
+  .clampNumElements(0, v2s64, v2s64)
+  .moreElementsToNextPow2(0)
+  .minScalarSameAs(1, 0);
 
   getActionDefinitionsBuilder(G_PTR_ADD)
   .legalFor({{p0, s64}, {v2p0, v2s64}})

diff  --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-merge-values.mir 
b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-merge-values.mir
index 09ae228b4f1d..a802baca4c8d 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-merge-values.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-merge-values.mir
@@ -6,11 +6,12 @@ name:test_merge_s4
 body: |
   bb.0:
 ; CHECK-LABEL: name: test_merge_s4
-; CHECK: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 4
+; CHECK: [[C:%[0-9]+]]:_(s8) = G_CONSTANT i8 4
 ; CHECK: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 15
 ; CHECK: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
 ; CHECK: [[AND:%[0-9]+]]:_(s32) = G_AND [[C2]], [[C1]]
-; CHECK: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND]], [[C]](s32)
+; CHECK: [[C3:%[0-9]+]]:_(s64) = G_CONSTANT i64 4
+; CHECK: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND]], [[C3]](s64)
 ; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY [[C2]](s32)
 ; CHECK: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY]], [[C1]]
 ; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY [[SHL]](s32)

diff  --git 
a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-non-pow2-load-store.mir 
b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-non-pow2-load-store.mir
index 7d7b77aa7535..6dc28e738dbc 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-non-pow2-load-store.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-non-pow2-load-store.mir
@@ -28,12 +28,11 @@ body: |
 ; CHECK: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 2
 ; CHECK: [[PTR_ADD:%[0-9]+]]:_(p0) = G_PTR_ADD [[COPY]], [[C1]](s64)
 ; CHECK: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[PTR_ADD]](p0) :: (load 1 from 
%ir.ptr + 2, align 4)
-; CHECK: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
-; CHECK: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[LOAD]], [[C2]](s32)
+; CHECK: [[C2:%[0-9]+]]:_(s64) = G_CONSTANT i64 16
+; CHECK: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[LOAD]], [[C2]](s64)
 ; CHECK: [[OR:%[0-9]+]]:_(s32) = G_OR [[SHL]], [[ZEXTLOAD]]
 ; CHECK: [[COPY2:%[0-9]+]]:_(s32) = COPY [[OR]](s32)
-; CHECK: [[C3:%[0-9]+]]:_(s64) = G_CONSTANT i64 16
-; CHECK: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[COPY2]], [[C3]](s64)
+; CHECK: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[COPY

[llvm-branch-commits] [llvm] e0721a0 - [AArch64][GlobalISel] Notify observer of mutated instruction for shift custom legalization.

2020-12-25 Thread Amara Emerson via llvm-branch-commits

Author: Amara Emerson
Date: 2020-12-25T00:31:47-08:00
New Revision: e0721a0992288122d62940f622b4c2127098a2da

URL: 
https://github.com/llvm/llvm-project/commit/e0721a0992288122d62940f622b4c2127098a2da
DIFF: 
https://github.com/llvm/llvm-project/commit/e0721a0992288122d62940f622b4c2127098a2da.diff

LOG: [AArch64][GlobalISel] Notify observer of mutated instruction for shift 
custom legalization.

No test for this because it's a CSE verifier failure that's only exposed in a
WIP patch for enabling CSE throughout the AArch64 GISel pipeline.

Added: 


Modified: 
llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp

Removed: 




diff  --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp 
b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index 0774f7b02dd2..a611d68cb2e5 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -841,7 +841,9 @@ bool AArch64LegalizerInfo::legalizeShlAshrLshr(
   if (Amount > 31)
 return true; // This will have to remain a register variant.
   auto ExtCst = MIRBuilder.buildConstant(LLT::scalar(64), Amount);
+  Observer.changingInstr(MI);
   MI.getOperand(2).setReg(ExtCst.getReg(0));
+  Observer.changedInstr(MI);
   return true;
 }
 



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 7df3544 - [GlobalISel] Fix assertion failures after "GlobalISel: Return APInt from getConstantVRegVal" landed.

2020-12-27 Thread Amara Emerson via llvm-branch-commits

Author: Amara Emerson
Date: 2020-12-26T23:51:44-08:00
New Revision: 7df3544e80fb40c742707613cd39ca77f7fea558

URL: 
https://github.com/llvm/llvm-project/commit/7df3544e80fb40c742707613cd39ca77f7fea558
DIFF: 
https://github.com/llvm/llvm-project/commit/7df3544e80fb40c742707613cd39ca77f7fea558.diff

LOG: [GlobalISel] Fix assertion failures after "GlobalISel: Return APInt from 
getConstantVRegVal" landed.

APInt binary ops don't promote types but instead assert, which a combine was
relying on.

Added: 
llvm/test/CodeGen/AArch64/GlobalISel/combine-shift-immed-mismatch-crash.mir

Modified: 
llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

Removed: 




diff  --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp 
b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
index 90b1dcea2648..abc23da3d418 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
@@ -1570,7 +1570,8 @@ bool CombinerHelper::matchShiftImmedChain(MachineInstr 
&MI,
 return false;
 
   // Pass the combined immediate to the apply function.
-  MatchInfo.Imm = (MaybeImmVal->Value + MaybeImm2Val->Value).getSExtValue();
+  MatchInfo.Imm =
+  (MaybeImmVal->Value.getSExtValue() + MaybeImm2Val->Value).getSExtValue();
   MatchInfo.Reg = Base;
 
   // There is no simple replacement for a saturating unsigned left shift that

diff  --git 
a/llvm/test/CodeGen/AArch64/GlobalISel/combine-shift-immed-mismatch-crash.mir 
b/llvm/test/CodeGen/AArch64/GlobalISel/combine-shift-immed-mismatch-crash.mir
new file mode 100644
index ..481c71fbed60
--- /dev/null
+++ 
b/llvm/test/CodeGen/AArch64/GlobalISel/combine-shift-immed-mismatch-crash.mir
@@ -0,0 +1,58 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -run-pass=aarch64-prelegalizer-combiner -verify-machineinstrs 
-mtriple aarch64-unknown-unknown %s -o - | FileCheck %s
+---
+name:shift_immed_chain_mismatch_size_crash
+alignment:   4
+tracksRegLiveness: true
+liveins:
+  - { reg: '$x0' }
+body: |
+  ; CHECK-LABEL: name: shift_immed_chain_mismatch_size_crash
+  ; CHECK: bb.0:
+  ; CHECK:   successors: %bb.1(0x4000), %bb.2(0x4000)
+  ; CHECK:   liveins: $x0
+  ; CHECK:   [[DEF:%[0-9]+]]:_(p0) = G_IMPLICIT_DEF
+  ; CHECK:   [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
+  ; CHECK:   [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 9
+  ; CHECK:   [[DEF1:%[0-9]+]]:_(s1) = G_IMPLICIT_DEF
+  ; CHECK:   G_BRCOND [[DEF1]](s1), %bb.2
+  ; CHECK:   G_BR %bb.1
+  ; CHECK: bb.1:
+  ; CHECK:   successors:
+  ; CHECK: bb.2:
+  ; CHECK:   [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[DEF]](p0) :: (load 4 from 
`i32* undef`, align 8)
+  ; CHECK:   [[MUL:%[0-9]+]]:_(s32) = nsw G_MUL [[C]], [[LOAD]]
+  ; CHECK:   [[MUL1:%[0-9]+]]:_(s32) = nsw G_MUL [[MUL]], [[C1]]
+  ; CHECK:   [[C2:%[0-9]+]]:_(s64) = G_CONSTANT i64 2
+  ; CHECK:   [[SHL:%[0-9]+]]:_(s32) = G_SHL [[MUL1]], [[C2]](s64)
+  ; CHECK:   $w0 = COPY [[SHL]](s32)
+  ; CHECK:   RET_ReallyLR implicit $w0
+  bb.1:
+liveins: $x0
+
+%0:_(p0) = COPY $x0
+%1:_(s1) = G_IMPLICIT_DEF
+%3:_(p0) = G_IMPLICIT_DEF
+%4:_(s32) = G_CONSTANT i32 16
+%6:_(s32) = G_CONSTANT i32 9
+%8:_(s32) = G_CONSTANT i32 2
+%11:_(s64) = G_CONSTANT i64 2
+G_BRCOND %1(s1), %bb.2
+G_BR %bb.3
+
+  bb.2:
+successors:
+
+
+  bb.3:
+%2:_(s32) = G_LOAD %3(p0) :: (load 4 from `i32* undef`, align 8)
+%5:_(s32) = nsw G_MUL %4, %2
+%7:_(s32) = nsw G_MUL %5, %6
+%9:_(s32) = nsw G_MUL %7, %8
+%10:_(s64) = G_SEXT %9(s32)
+%12:_(s64) = G_MUL %10, %11
+%13:_(s32) = G_TRUNC %12(s64)
+$w0 = COPY %13(s32)
+RET_ReallyLR implicit $w0
+
+...



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 4ab45cc - [AArch64][GlobalISel] Add some more legal types for G_PHI, G_IMPLICIT_DEF, G_FREEZE.

2020-09-30 Thread Amara Emerson via llvm-branch-commits

Author: Amara Emerson
Date: 2020-09-30T17:25:33-07:00
New Revision: 4ab45cc2260d87f18e1b05517d5d366b2e754b72

URL: 
https://github.com/llvm/llvm-project/commit/4ab45cc2260d87f18e1b05517d5d366b2e754b72
DIFF: 
https://github.com/llvm/llvm-project/commit/4ab45cc2260d87f18e1b05517d5d366b2e754b72.diff

LOG: [AArch64][GlobalISel] Add some more legal types for G_PHI, G_IMPLICIT_DEF, 
G_FREEZE.

Also use this opportunity start to clean up the mess of vector type lists we
have in the LegalizerInfo. Unfortunately since the legalizer rule builders 
require
std::initializer_list objects as parameters we can't programmatically generate 
the
type lists.

Added: 


Modified: 
llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
llvm/test/CodeGen/AArch64/GlobalISel/legalize-freeze.mir
llvm/test/CodeGen/AArch64/GlobalISel/legalize-phi.mir

Removed: 




diff  --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp 
b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index 7d013c439883..206e40999224 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -54,6 +54,12 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const 
AArch64Subtarget &ST)
   const LLT v2s64 = LLT::vector(2, 64);
   const LLT v2p0 = LLT::vector(2, p0);
 
+  const auto PackedVectorAllTypeList = {/* Begin 128bit types */
+v16s8, v8s16, v4s32, v2s64, v2p0,
+/* End 128bit types */
+/* Begin 64bit types */
+v8s8, v4s16, v2s32};
+
   const TargetMachine &TM = ST.getTargetLowering()->getTargetMachine();
 
   // FIXME: support subtargets which have neon/fp-armv8 disabled.
@@ -63,7 +69,8 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const 
AArch64Subtarget &ST)
   }
 
   getActionDefinitionsBuilder({G_IMPLICIT_DEF, G_FREEZE})
-  .legalFor({p0, s1, s8, s16, s32, s64, v2s32, v4s32, v2s64, v16s8, v8s16})
+  .legalFor({p0, s1, s8, s16, s32, s64})
+  .legalFor(PackedVectorAllTypeList)
   .clampScalar(0, s1, s64)
   .widenScalarToNextPow2(0, 8)
   .fewerElementsIf(
@@ -79,8 +86,8 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const 
AArch64Subtarget &ST)
 return std::make_pair(0, EltTy);
   });
 
-  getActionDefinitionsBuilder(G_PHI)
-  .legalFor({p0, s16, s32, s64, v2s32, v4s32, v2s64})
+  getActionDefinitionsBuilder(G_PHI).legalFor({p0, s16, s32, s64})
+  .legalFor(PackedVectorAllTypeList)
   .clampScalar(0, s16, s64)
   .widenScalarToNextPow2(0);
 

diff  --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-freeze.mir 
b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-freeze.mir
index 9417df066a46..f6c15ec4925d 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-freeze.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-freeze.mir
@@ -1,5 +1,5 @@
 # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
-# RUN: llc -march=aarch64 -run-pass=legalizer -O0 %s -o - | FileCheck %s
+# RUN: llc -march=aarch64 -run-pass=legalizer -global-isel-abort=1 -O0 %s -o - 
| FileCheck %s
 ---
 name:test_freeze_s64
 body: |
@@ -67,3 +67,21 @@ body: |
 $w0 = COPY %1
 $w1 = COPY %2
 ...
+---
+name: test_freeze_v8s8
+body: |
+  bb.0:
+liveins: $d0
+
+; CHECK-LABEL: name: test_freeze_v8s8
+; CHECK: %d0:_(<8 x s8>) = COPY $d0
+; CHECK: [[FREEZE:%[0-9]+]]:_(<8 x s8>) = G_FREEZE %d0
+; CHECK: [[UV:%[0-9]+]]:_(<4 x s8>), [[UV1:%[0-9]+]]:_(<4 x s8>) = 
G_UNMERGE_VALUES [[FREEZE]](<8 x s8>)
+; CHECK: $w0 = COPY [[UV]](<4 x s8>)
+; CHECK: $w1 = COPY [[UV1]](<4 x s8>)
+%d0:_(<8 x s8>) = COPY $d0
+%0:_(<8 x s8>) = G_FREEZE %d0
+%1:_(<4 x s8>), %2:_(<4 x s8>) = G_UNMERGE_VALUES %0
+$w0 = COPY %1
+$w1 = COPY %2
+...

diff  --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-phi.mir 
b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-phi.mir
index c909b27b83cc..b9fbd17c07da 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-phi.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-phi.mir
@@ -1,51 +1,5 @@
 # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
-# RUN: llc -O0 -mtriple=aarch64-unknown-unknown -verify-machineinstrs 
-run-pass=legalizer %s -o - | FileCheck %s
 |
-  ; ModuleID = '/tmp/test.ll'
-  source_filename = "/tmp/test.ll"
-  target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
-  target triple = "aarch64-unknown-unknown"
-
-  define i32 @legalize_phi(i32 %argc) {
-  entry:
-ret i32 0
-  }
-
-  define i64* @legalize_phi_ptr(i64* %a, i64* %b, i1 %cond) {
-  entry:
-ret i64* null
-  }
-
-  define i32 @legalize_phi_empty(i32 %argc) {
-  entry:
-ret i32 0
-  }
-
-  define i32 @legalize_phi_loop(i32 %argc) {
-  entry:
-  

[llvm-branch-commits] [llvm] 87ff156 - [AArch64][GlobalISel] Fix crash during legalization of a vector G_SELECT with scalar mask.

2020-11-30 Thread Amara Emerson via llvm-branch-commits

Author: Amara Emerson
Date: 2020-11-30T16:37:49-08:00
New Revision: 87ff156414370043cf149e0c77513c5227b336b1

URL: 
https://github.com/llvm/llvm-project/commit/87ff156414370043cf149e0c77513c5227b336b1
DIFF: 
https://github.com/llvm/llvm-project/commit/87ff156414370043cf149e0c77513c5227b336b1.diff

LOG: [AArch64][GlobalISel] Fix crash during legalization of a vector G_SELECT 
with scalar mask.

The lowering of vector selects needs to first splat the scalar mask into a 
vector
first.

This was causing a crash when building oggenc in the test suite.

Differential Revision: https://reviews.llvm.org/D91655

Added: 


Modified: 
llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp
llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
llvm/test/CodeGen/AArch64/GlobalISel/legalize-select.mir

Removed: 




diff  --git a/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h 
b/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
index 0ce40e60e6fc..739600ead21a 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
@@ -958,6 +958,23 @@ class MachineIRBuilder {
   MachineInstrBuilder buildBuildVectorTrunc(const DstOp &Res,
 ArrayRef Ops);
 
+  /// Build and insert a vector splat of a scalar \p Src using a
+  /// G_INSERT_VECTOR_ELT and G_SHUFFLE_VECTOR idiom.
+  ///
+  /// \pre setBasicBlock or setMI must have been called.
+  /// \pre \p Src must have the same type as the element type of \p Dst
+  ///
+  /// \return a MachineInstrBuilder for the newly created instruction.
+  MachineInstrBuilder buildShuffleSplat(const DstOp &Res, const SrcOp &Src);
+
+  /// Build and insert \p Res = G_SHUFFLE_VECTOR \p Src1, \p Src2, \p Mask
+  ///
+  /// \pre setBasicBlock or setMI must have been called.
+  ///
+  /// \return a MachineInstrBuilder for the newly created instruction.
+  MachineInstrBuilder buildShuffleVector(const DstOp &Res, const SrcOp &Src1,
+ const SrcOp &Src2, ArrayRef 
Mask);
+
   /// Build and insert \p Res = G_CONCAT_VECTORS \p Op0, ...
   ///
   /// G_CONCAT_VECTORS creates a vector from the concatenation of 2 or more

diff  --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp 
b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
index 1ad6109a65be..7b346a1bbbec 100644
--- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
@@ -6217,8 +6217,23 @@ LegalizerHelper::LegalizeResult 
LegalizerHelper::lowerSelect(MachineInstr &MI) {
   if (!DstTy.isVector())
 return UnableToLegalize;
 
-  if (MaskTy.getSizeInBits() != Op1Ty.getSizeInBits())
+  // Vector selects can have a scalar predicate. If so, splat into a vector and
+  // finish for later legalization attempts to try again.
+  if (MaskTy.isScalar()) {
+Register MaskElt = MaskReg;
+if (MaskTy.getSizeInBits() < DstTy.getScalarSizeInBits())
+  MaskElt = MIRBuilder.buildSExt(DstTy.getElementType(), 
MaskElt).getReg(0);
+// Generate a vector splat idiom to be pattern matched later.
+auto ShufSplat = MIRBuilder.buildShuffleSplat(DstTy, MaskElt);
+Observer.changingInstr(MI);
+MI.getOperand(1).setReg(ShufSplat.getReg(0));
+Observer.changedInstr(MI);
+return Legalized;
+  }
+
+  if (MaskTy.getSizeInBits() != Op1Ty.getSizeInBits()) {
 return UnableToLegalize;
+  }
 
   auto NotMask = MIRBuilder.buildNot(MaskTy, MaskReg);
   auto NewOp1 = MIRBuilder.buildAnd(MaskTy, Op1Reg, MaskReg);

diff  --git a/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp 
b/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp
index 4a0f70811057..1e39605c90be 100644
--- a/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp
@@ -635,6 +635,33 @@ MachineIRBuilder::buildBuildVectorTrunc(const DstOp &Res,
   return buildInstr(TargetOpcode::G_BUILD_VECTOR_TRUNC, Res, TmpVec);
 }
 
+MachineInstrBuilder MachineIRBuilder::buildShuffleSplat(const DstOp &Res,
+const SrcOp &Src) {
+  LLT DstTy = Res.getLLTTy(*getMRI());
+  LLT SrcTy = Src.getLLTTy(*getMRI());
+  assert(SrcTy == DstTy.getElementType() && "Expected Src to match Dst elt 
ty");
+  auto UndefVec = buildUndef(DstTy);
+  auto Zero = buildConstant(LLT::scalar(64), 0);
+  auto InsElt = buildInsertVectorElement(DstTy, UndefVec, Src, Zero);
+  SmallVector ZeroMask(DstTy.getNumElements());
+  return buildShuffleVector(DstTy, InsElt, UndefVec, ZeroMask);
+}
+
+MachineInstrBuilder MachineIRBuilder::buildShuffleVector(const DstOp &Res,
+ const SrcOp &Src1,
+ const SrcOp &Src2,
+

[llvm-branch-commits] [llvm] 2ac4d0f - [AArch64] Fix some minor coding style issues in AArch64CompressJumpTables

2020-12-07 Thread Amara Emerson via llvm-branch-commits

Author: Amara Emerson
Date: 2020-12-07T12:48:09-08:00
New Revision: 2ac4d0f45a2a301163ca53f3e23e675f4f5bdbd3

URL: 
https://github.com/llvm/llvm-project/commit/2ac4d0f45a2a301163ca53f3e23e675f4f5bdbd3
DIFF: 
https://github.com/llvm/llvm-project/commit/2ac4d0f45a2a301163ca53f3e23e675f4f5bdbd3.diff

LOG: [AArch64] Fix some minor coding style issues in AArch64CompressJumpTables

Added: 


Modified: 
llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp

Removed: 




diff  --git a/llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp 
b/llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp
index 57dc8a4061f1..c265592d05a7 100644
--- a/llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp
+++ b/llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp
@@ -59,7 +59,7 @@ class AArch64CompressJumpTables : public MachineFunctionPass {
   }
 };
 char AArch64CompressJumpTables::ID = 0;
-}
+} // namespace
 
 INITIALIZE_PASS(AArch64CompressJumpTables, DEBUG_TYPE,
 "AArch64 compress jump tables pass", false, false)
@@ -104,7 +104,7 @@ bool 
AArch64CompressJumpTables::compressJumpTable(MachineInstr &MI,
   int MaxOffset = std::numeric_limits::min(),
   MinOffset = std::numeric_limits::max();
   MachineBasicBlock *MinBlock = nullptr;
-  for (auto Block : JT.MBBs) {
+  for (auto *Block : JT.MBBs) {
 int BlockOffset = BlockInfo[Block->getNumber()];
 assert(BlockOffset % 4 == 0 && "misaligned basic block");
 
@@ -124,13 +124,14 @@ bool 
AArch64CompressJumpTables::compressJumpTable(MachineInstr &MI,
   }
 
   int Span = MaxOffset - MinOffset;
-  auto AFI = MF->getInfo();
+  auto *AFI = MF->getInfo();
   if (isUInt<8>(Span / 4)) {
 AFI->setJumpTableEntryInfo(JTIdx, 1, MinBlock->getSymbol());
 MI.setDesc(TII->get(AArch64::JumpTableDest8));
 ++NumJT8;
 return true;
-  } else if (isUInt<16>(Span / 4)) {
+  }
+  if (isUInt<16>(Span / 4)) {
 AFI->setJumpTableEntryInfo(JTIdx, 2, MinBlock->getSymbol());
 MI.setDesc(TII->get(AArch64::JumpTableDest16));
 ++NumJT16;



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] c29af37 - [AArch64] Don't try to compress jump tables if there are any inline asm instructions.

2020-12-10 Thread Amara Emerson via llvm-branch-commits

Author: Amara Emerson
Date: 2020-12-10T12:20:02-08:00
New Revision: c29af37c6c9d74ca330bd7f1d084f1f676ba2824

URL: 
https://github.com/llvm/llvm-project/commit/c29af37c6c9d74ca330bd7f1d084f1f676ba2824
DIFF: 
https://github.com/llvm/llvm-project/commit/c29af37c6c9d74ca330bd7f1d084f1f676ba2824.diff

LOG: [AArch64] Don't try to compress jump tables if there are any inline asm 
instructions.

Inline asm can contain constructs like .bytes which may have arbitrary size.
In some cases, this causes us to miscalculate the size of blocks and therefore
offsets, causing us to incorrectly compress a JT.

To be safe, just bail out of the whole thing if we find any inline asm.

Fixes PR48255

Differential Revision: https://reviews.llvm.org/D92865

Added: 


Modified: 
llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp
llvm/test/CodeGen/AArch64/jump-table-compress.mir
llvm/test/CodeGen/AArch64/jump-table.ll

Removed: 




diff  --git a/llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp 
b/llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp
index c265592d05a7..2328a8b4deb8 100644
--- a/llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp
+++ b/llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp
@@ -37,8 +37,13 @@ class AArch64CompressJumpTables : public MachineFunctionPass 
{
   MachineFunction *MF;
   SmallVector BlockInfo;
 
-  int computeBlockSize(MachineBasicBlock &MBB);
-  void scanFunction();
+  /// Returns the size in instructions of the block \p MBB, or None if we
+  /// couldn't get a safe upper bound.
+  Optional computeBlockSize(MachineBasicBlock &MBB);
+
+  /// Gather information about the function, returns false if we can't perform
+  /// this optimization for some reason.
+  bool scanFunction();
 
   bool compressJumpTable(MachineInstr &MI, int Offset);
 
@@ -64,14 +69,22 @@ char AArch64CompressJumpTables::ID = 0;
 INITIALIZE_PASS(AArch64CompressJumpTables, DEBUG_TYPE,
 "AArch64 compress jump tables pass", false, false)
 
-int AArch64CompressJumpTables::computeBlockSize(MachineBasicBlock &MBB) {
+Optional
+AArch64CompressJumpTables::computeBlockSize(MachineBasicBlock &MBB) {
   int Size = 0;
-  for (const MachineInstr &MI : MBB)
+  for (const MachineInstr &MI : MBB) {
+// Inline asm may contain some directives like .bytes which we don't
+// currently have the ability to parse accurately. To be safe, just avoid
+// computing a size and bail out.
+if (MI.getOpcode() == AArch64::INLINEASM ||
+MI.getOpcode() == AArch64::INLINEASM_BR)
+  return None;
 Size += TII->getInstSizeInBytes(MI);
+  }
   return Size;
 }
 
-void AArch64CompressJumpTables::scanFunction() {
+bool AArch64CompressJumpTables::scanFunction() {
   BlockInfo.clear();
   BlockInfo.resize(MF->getNumBlockIDs());
 
@@ -84,8 +97,12 @@ void AArch64CompressJumpTables::scanFunction() {
 else
   AlignedOffset = alignTo(Offset, Alignment);
 BlockInfo[MBB.getNumber()] = AlignedOffset;
-Offset = AlignedOffset + computeBlockSize(MBB);
+auto BlockSize = computeBlockSize(MBB);
+if (!BlockSize)
+  return false;
+Offset = AlignedOffset + *BlockSize;
   }
+  return true;
 }
 
 bool AArch64CompressJumpTables::compressJumpTable(MachineInstr &MI,
@@ -152,7 +169,8 @@ bool 
AArch64CompressJumpTables::runOnMachineFunction(MachineFunction &MFIn) {
   if (ST.force32BitJumpTables() && !MF->getFunction().hasMinSize())
 return false;
 
-  scanFunction();
+  if (!scanFunction())
+return false;
 
   for (MachineBasicBlock &MBB : *MF) {
 int Offset = BlockInfo[MBB.getNumber()];

diff  --git a/llvm/test/CodeGen/AArch64/jump-table-compress.mir 
b/llvm/test/CodeGen/AArch64/jump-table-compress.mir
index 272de36f8b6e..a46b7c6ac9c0 100644
--- a/llvm/test/CodeGen/AArch64/jump-table-compress.mir
+++ b/llvm/test/CodeGen/AArch64/jump-table-compress.mir
@@ -4,6 +4,8 @@
 unreachable
   }
 
+  define void @test_inline_asm_no_compress() { ret void }
+
 ...
 ---
 name:test_jumptable
@@ -110,3 +112,88 @@ body: |
 early-clobber $x10, dead early-clobber $x11 = JumpTableDest32 undef killed 
$x9, undef killed $x8, %jump-table.5
 BR killed $x10
 ...
+---
+name:test_inline_asm_no_compress
+alignment:   4
+tracksRegLiveness: true
+liveins:
+  - { reg: '$w0' }
+  - { reg: '$w1' }
+  - { reg: '$w2' }
+frameInfo:
+  maxAlignment:1
+  maxCallFrameSize: 0
+machineFunctionInfo:
+  hasRedZone:  false
+jumpTable:
+  kind:label-
diff erence32
+  entries:
+- id:  0
+  blocks:  [ '%bb.2', '%bb.4', '%bb.5', '%bb.6', '%bb.7', '%bb.8' ]
+body: |
+  bb.0:
+successors: %bb.3(0x12492492), %bb.1(0x6db6db6e)
+liveins: $w0, $w1, $w2
+  
+dead $wzr = SUBSWri renamable $w0, 5, 0, implicit-def $nzcv
+Bcc 8, %bb.3, implicit $nzcv
+  
+  bb.1:
+successors: %bb.2, %bb.4, %bb.5, %bb.6, %bb.7, %bb.8

[llvm-branch-commits] [llvm] 21de99d - [[GlobalISel][IRTranslator] Fix a crash when the use of an extractvalue is a non-dominated metadata use.

2020-12-12 Thread Amara Emerson via llvm-branch-commits

Author: Amara Emerson
Date: 2020-12-12T14:58:54-08:00
New Revision: 21de99d43c88c00c007a2b3e350d56328f26660e

URL: 
https://github.com/llvm/llvm-project/commit/21de99d43c88c00c007a2b3e350d56328f26660e
DIFF: 
https://github.com/llvm/llvm-project/commit/21de99d43c88c00c007a2b3e350d56328f26660e.diff

LOG: [[GlobalISel][IRTranslator] Fix a crash when the use of an extractvalue is 
a non-dominated metadata use.

We don't expect uses to come before defs in the CFG, so allocateVRegs() 
asserted.

Fixes PR48211

Added: 
llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-extract-used-by-dbg.ll

Modified: 
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp

Removed: 




diff  --git a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp 
b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
index a912b9c1bd00..202163ff9507 100644
--- a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
@@ -170,7 +170,9 @@ void IRTranslator::getAnalysisUsage(AnalysisUsage &AU) 
const {
 
 IRTranslator::ValueToVRegInfo::VRegListT &
 IRTranslator::allocateVRegs(const Value &Val) {
-  assert(!VMap.contains(Val) && "Value already allocated in VMap");
+  auto VRegsIt = VMap.findVRegs(Val);
+  if (VRegsIt != VMap.vregs_end())
+return *VRegsIt->second;
   auto *Regs = VMap.getVRegs(Val);
   auto *Offsets = VMap.getOffsets(Val);
   SmallVector SplitTys;

diff  --git 
a/llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-extract-used-by-dbg.ll 
b/llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-extract-used-by-dbg.ll
new file mode 100644
index ..dae85e6404b2
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-extract-used-by-dbg.ll
@@ -0,0 +1,400 @@
+; RUN: llc -O0 -stop-after=irtranslator -global-isel -verify-machineinstrs %s 
-o - 2>&1 | FileCheck %s
+
+target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
+target triple = "aarch64-unknown-fuchsia"
+
+declare void @llvm.dbg.value(metadata, metadata, metadata) #0
+; Check that we don't crash when we have a metadata use of %i not being 
dominated by the def.
+; CHECK-LABEL: @foo
+; CHECK: DBG_VALUE %1:_(p0), $noreg, !370, !DIExpression(DW_OP_LLVM_fragment, 
0, 64)
+define hidden void @foo() unnamed_addr #1 !dbg !230 {
+  br i1 undef, label %bb4, label %bb5
+
+bb4:  ; preds = %bb3
+  %i = extractvalue { i8*, i64 } undef, 0
+  ret void
+
+bb5:  ; preds = %bb3
+  call void @llvm.dbg.value(metadata i8* %i, metadata !370, metadata 
!DIExpression(DW_OP_LLVM_fragment, 0, 64)), !dbg !372
+  ret void
+}
+
+attributes #0 = { nofree nosync nounwind readnone speculatable willreturn }
+attributes #1 = { "target-cpu"="generic" }
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!229}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_Rust, file: !1, producer: 
"rustc", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: 
!2, globals: !228)
+!1 = !DIFile(filename: "library/std/src/lib.rs", directory: 
"/b/s/w/ir/x/w/rust")
+!2 = !{!3, !11, !16, !25, !31, !36, !45, !68, !75, !83, !90, !97, !106, !115, 
!121, !131, !153, !159, !163, !168, !179, !184, !189, !192, !194, !210}
+!3 = !DICompositeType(tag: DW_TAG_enumeration_type, name: "c_void", scope: !5, 
file: !4, baseType: !7, size: 8, align: 8, flags: DIFlagEnumClass, elements: !8)
+!4 = !DIFile(filename: "", directory: "")
+!5 = !DINamespace(name: "ffi", scope: !6)
+!6 = !DINamespace(name: "core", scope: null)
+!7 = !DIBasicType(name: "u8", size: 8, encoding: DW_ATE_unsigned)
+!8 = !{!9, !10}
+!9 = !DIEnumerator(name: "__variant1", value: 0, isUnsigned: true)
+!10 = !DIEnumerator(name: "__variant2", value: 1, isUnsigned: true)
+!11 = !DICompositeType(tag: DW_TAG_enumeration_type, name: "Option", scope: 
!12, file: !4, baseType: !7, size: 8, align: 8, flags: DIFlagEnumClass, 
elements: !13)
+!12 = !DINamespace(name: "option", scope: !6)
+!13 = !{!14, !15}
+!14 = !DIEnumerator(name: "None", value: 0)
+!15 = !DIEnumerator(name: "Some", value: 1)
+!16 = !DICompositeType(tag: DW_TAG_enumeration_type, name: 
"EscapeUnicodeState", scope: !17, file: !4, baseType: !7, size: 8, align: 8, 
flags: DIFlagEnumClass, elements: !18)
+!17 = !DINamespace(name: "char", scope: !6)
+!18 = !{!19, !20, !21, !22, !23, !24}
+!19 = !DIEnumerator(name: "Done", value: 0)
+!20 = !DIEnumerator(name: "RightBrace", value: 1)
+!21 = !DIEnumerator(name: "Value", value: 2)
+!22 = !DIEnumerator(name: "LeftBrace", value: 3)
+!23 = !DIEnumerator(name: "Type", value: 4)
+!24 = !DIEnumerator(name: "Backslash", value: 5)
+!25 = !DICompositeType(tag: DW_TAG_enumeration_type, name: "Format", scope: 
!26, file: !4, baseType: !7, size: 8, align: 8, flags: DIFlagEnumClass, 
elements: !28)
+!26 = !DINamespace(name: "common", scope: !27)
+!27 = !DINamespace(name: "gimli", scope: null)
+!28 = !{!29, !30}
+!29 = !DIEnumerator(name: "Dwarf6

[llvm-branch-commits] [llvm] GlobalISel: Fix combine duplicating atomic loads (PR #111730)

2024-10-15 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson approved this pull request.

LGTM.

https://github.com/llvm/llvm-project/pull/111730
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. (PR #121169)

2024-12-27 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson edited 
https://github.com/llvm/llvm-project/pull/121169
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. (PR #121169)

2024-12-27 Thread Amara Emerson via llvm-branch-commits


@@ -4142,9 +4142,40 @@ LegalizerHelper::LegalizeResult 
LegalizerHelper::lowerStore(GStore &StoreMI) {
   }
 
   if (MemTy.isVector()) {
-// TODO: Handle vector trunc stores
-if (MemTy != SrcTy)
+LLT MemScalarTy = MemTy.getElementType();
+if (MemTy != SrcTy) {
+  if (!MemScalarTy.isByteSized()) {
+// We need to build an integer scalar of the vector bit pattern.
+// It's not legal for us to add padding when storing a vector.
+unsigned NumBits = MemTy.getSizeInBits();
+LLT IntTy = LLT::scalar(NumBits);
+auto CurrVal = MIRBuilder.buildConstant(IntTy, 0);
+LLT IdxTy = getLLTForMVT(TLI.getVectorIdxTy(MF.getDataLayout()));

aemerson wrote:

I'd rather make that change separately since it'll make sense otherwise convert 
all of our existing uses of the MVT version over.

https://github.com/llvm/llvm-project/pull/121169
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Legalize vector boolean bitcasts to scalars by lowering via stack. (PR #121171)

2024-12-27 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson updated 
https://github.com/llvm/llvm-project/pull/121171

>From b9214baba592d4c7860d714b6d0dffd519a48400 Mon Sep 17 00:00:00 2001
From: Amara Emerson 
Date: Fri, 27 Dec 2024 17:34:25 -0800
Subject: [PATCH] Factor out into funct.

Created using spr 1.3.5
---
 .../llvm/CodeGen/GlobalISel/LegalizerHelper.h |   3 +
 .../CodeGen/GlobalISel/LegalizerHelper.cpp|  47 +-
 .../AArch64/GISel/AArch64LegalizerInfo.cpp|   3 +-
 .../AArch64/GlobalISel/legalize-bitcast.mir   |  59 +-
 .../legalize-store-vector-bools.mir   |  68 +-
 .../AArch64/vec-combine-compare-to-bitmask.ll | 605 ++
 6 files changed, 640 insertions(+), 145 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h 
b/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
index fac059803b9489..4e18f5cc913a7e 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
@@ -302,6 +302,9 @@ class LegalizerHelper {
   /// same type as \p Res.
   MachineInstrBuilder createStackStoreLoad(const DstOp &Res, const SrcOp &Val);
 
+  /// Given a store of a boolean vector, scalarize it.
+  LegalizeResult scalarizeVectorBooleanStore(GStore &MI);
+
   /// Get a pointer to vector element \p Index located in memory for a vector 
of
   /// type \p VecTy starting at a base address of \p VecPtr. If \p Index is out
   /// of bounds the returned pointer is unspecified, but will be within the
diff --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp 
b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
index 7dece931e8e0eb..0bfa897ecf4047 100644
--- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
@@ -4143,9 +4143,8 @@ LegalizerHelper::LegalizeResult 
LegalizerHelper::lowerStore(GStore &StoreMI) {
   }
 
   if (MemTy.isVector()) {
-// TODO: Handle vector trunc stores
 if (MemTy != SrcTy)
-  return UnableToLegalize;
+  return scalarizeVectorBooleanStore(StoreMI);
 
 // TODO: We can do better than scalarizing the vector and at least split it
 // in half.
@@ -4200,6 +4199,50 @@ LegalizerHelper::LegalizeResult 
LegalizerHelper::lowerStore(GStore &StoreMI) {
   return Legalized;
 }
 
+LegalizerHelper::LegalizeResult
+LegalizerHelper::scalarizeVectorBooleanStore(GStore &StoreMI) {
+  Register SrcReg = StoreMI.getValueReg();
+  Register PtrReg = StoreMI.getPointerReg();
+  LLT SrcTy = MRI.getType(SrcReg);
+  MachineMemOperand &MMO = **StoreMI.memoperands_begin();
+  LLT MemTy = MMO.getMemoryType();
+  LLT MemScalarTy = MemTy.getElementType();
+  MachineFunction &MF = MIRBuilder.getMF();
+
+  assert(SrcTy.isVector() && "Expect a vector store type");
+
+  if (!MemScalarTy.isByteSized()) {
+// We need to build an integer scalar of the vector bit pattern.
+// It's not legal for us to add padding when storing a vector.
+unsigned NumBits = MemTy.getSizeInBits();
+LLT IntTy = LLT::scalar(NumBits);
+auto CurrVal = MIRBuilder.buildConstant(IntTy, 0);
+LLT IdxTy = getLLTForMVT(TLI.getVectorIdxTy(MF.getDataLayout()));
+
+for (unsigned I = 0, E = MemTy.getNumElements(); I < E; ++I) {
+  auto Elt = MIRBuilder.buildExtractVectorElement(
+  SrcTy.getElementType(), SrcReg, MIRBuilder.buildConstant(IdxTy, I));
+  auto Trunc = MIRBuilder.buildTrunc(MemScalarTy, Elt);
+  auto ZExt = MIRBuilder.buildZExt(IntTy, Trunc);
+  unsigned ShiftIntoIdx = MF.getDataLayout().isBigEndian()
+  ? (MemTy.getNumElements() - 1) - I
+  : I;
+  auto ShiftAmt = MIRBuilder.buildConstant(
+  IntTy, ShiftIntoIdx * MemScalarTy.getSizeInBits());
+  auto Shifted = MIRBuilder.buildShl(IntTy, ZExt, ShiftAmt);
+  CurrVal = MIRBuilder.buildOr(IntTy, CurrVal, Shifted);
+}
+auto PtrInfo = MMO.getPointerInfo();
+auto *NewMMO = MF.getMachineMemOperand(&MMO, PtrInfo, IntTy);
+MIRBuilder.buildStore(CurrVal, PtrReg, *NewMMO);
+StoreMI.eraseFromParent();
+return Legalized;
+  }
+
+  // TODO: implement simple scalarization.
+  return UnableToLegalize;
+}
+
 LegalizerHelper::LegalizeResult
 LegalizerHelper::bitcast(MachineInstr &MI, unsigned TypeIdx, LLT CastTy) {
   switch (MI.getOpcode()) {
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp 
b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index 2fac100f81519a..641f06530a5c23 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -474,7 +474,8 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const 
AArch64Subtarget &ST)
  })
   .customIf(IsPtrVecPred)
   .scalarizeIf(typeInSet(0, {v2s16, v2s8}), 0)
-  .scalarizeIf(scalarOrEltWiderThan(0, 64), 0);
+  .scalarizeIf(scalarOrEltWiderThan(0, 64), 0)
+  .lower();
 
   getActionDefinitionsBuilder(G_INDEXED_STORE)
   // Idx 0 

[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. (PR #121169)

2025-01-06 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson updated 
https://github.com/llvm/llvm-project/pull/121169

>From a1c545bab55b0e9329044f469507149718a1d36f Mon Sep 17 00:00:00 2001
From: Amara Emerson 
Date: Thu, 26 Dec 2024 23:50:07 -0800
Subject: [PATCH 1/2] Add -aarch64-enable-collect-loh torun line to remove
 unnecessary LOH labels.

Created using spr 1.3.5
---
 .../AArch64/vec-combine-compare-to-bitmask.ll | 627 +-
 1 file changed, 172 insertions(+), 455 deletions(-)

diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll 
b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
index 496f7ebf300e50..1fa96979f45530 100644
--- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
+++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 2
-; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -verify-machineinstrs < 
%s | FileCheck %s --check-prefixes=CHECK,SDAG
-; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -global-isel 
-global-isel-abort=2 -verify-machineinstrs < %s 2>&1 | FileCheck %s 
--check-prefixes=CHECK,GISEL
+; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon 
-aarch64-enable-collect-loh=false -verify-machineinstrs < %s | FileCheck %s 
--check-prefixes=CHECK,SDAG
+; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon 
-aarch64-enable-collect-loh=false -global-isel -global-isel-abort=2 
-verify-machineinstrs < %s 2>&1 | FileCheck %s --check-prefixes=CHECK,GISEL
 
 ; Basic tests from input vector to bitmask
 ; IR generated from clang for:
@@ -26,10 +26,8 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 ; Bits used in mask
 ; SDAG-LABEL: convert_to_bitmask16:
 ; SDAG:   ; %bb.0:
-; SDAG-NEXT:  Lloh0:
 ; SDAG-NEXT:adrp x8, lCPI0_0@PAGE
 ; SDAG-NEXT:cmeq.16b v0, v0, #0
-; SDAG-NEXT:  Lloh1:
 ; SDAG-NEXT:ldr q1, [x8, lCPI0_0@PAGEOFF]
 ; SDAG-NEXT:bic.16b v0, v1, v0
 ; SDAG-NEXT:ext.16b v1, v0, v0, #8
@@ -37,7 +35,6 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 ; SDAG-NEXT:addv.8h h0, v0
 ; SDAG-NEXT:fmov w0, s0
 ; SDAG-NEXT:ret
-; SDAG-NEXT:.loh AdrpLdr Lloh0, Lloh1
 ;
 ; GISEL-LABEL: convert_to_bitmask16:
 ; GISEL:   ; %bb.0:
@@ -106,17 +103,14 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 define i16 @convert_to_bitmask8(<8 x i16> %vec) {
 ; SDAG-LABEL: convert_to_bitmask8:
 ; SDAG:   ; %bb.0:
-; SDAG-NEXT:  Lloh2:
 ; SDAG-NEXT:adrp x8, lCPI1_0@PAGE
 ; SDAG-NEXT:cmeq.8h v0, v0, #0
-; SDAG-NEXT:  Lloh3:
 ; SDAG-NEXT:ldr q1, [x8, lCPI1_0@PAGEOFF]
 ; SDAG-NEXT:bic.16b v0, v1, v0
 ; SDAG-NEXT:addv.8h h0, v0
 ; SDAG-NEXT:fmov w8, s0
 ; SDAG-NEXT:and w0, w8, #0xff
 ; SDAG-NEXT:ret
-; SDAG-NEXT:.loh AdrpLdr Lloh2, Lloh3
 ;
 ; GISEL-LABEL: convert_to_bitmask8:
 ; GISEL:   ; %bb.0:
@@ -160,31 +154,15 @@ define i16 @convert_to_bitmask8(<8 x i16> %vec) {
 }
 
 define i4 @convert_to_bitmask4(<4 x i32> %vec) {
-; SDAG-LABEL: convert_to_bitmask4:
-; SDAG:   ; %bb.0:
-; SDAG-NEXT:  Lloh4:
-; SDAG-NEXT:adrp x8, lCPI2_0@PAGE
-; SDAG-NEXT:cmeq.4s v0, v0, #0
-; SDAG-NEXT:  Lloh5:
-; SDAG-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF]
-; SDAG-NEXT:bic.16b v0, v1, v0
-; SDAG-NEXT:addv.4s s0, v0
-; SDAG-NEXT:fmov w0, s0
-; SDAG-NEXT:ret
-; SDAG-NEXT:.loh AdrpLdr Lloh4, Lloh5
-;
-; GISEL-LABEL: convert_to_bitmask4:
-; GISEL:   ; %bb.0:
-; GISEL-NEXT:  Lloh0:
-; GISEL-NEXT:adrp x8, lCPI2_0@PAGE
-; GISEL-NEXT:cmeq.4s v0, v0, #0
-; GISEL-NEXT:  Lloh1:
-; GISEL-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF]
-; GISEL-NEXT:bic.16b v0, v1, v0
-; GISEL-NEXT:addv.4s s0, v0
-; GISEL-NEXT:fmov w0, s0
-; GISEL-NEXT:ret
-; GISEL-NEXT:.loh AdrpLdr Lloh0, Lloh1
+; CHECK-LABEL: convert_to_bitmask4:
+; CHECK:   ; %bb.0:
+; CHECK-NEXT:adrp x8, lCPI2_0@PAGE
+; CHECK-NEXT:cmeq.4s v0, v0, #0
+; CHECK-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF]
+; CHECK-NEXT:bic.16b v0, v1, v0
+; CHECK-NEXT:addv.4s s0, v0
+; CHECK-NEXT:fmov w0, s0
+; CHECK-NEXT:ret
 
 
   %cmp_result = icmp ne <4 x i32> %vec, zeroinitializer
@@ -193,33 +171,16 @@ define i4 @convert_to_bitmask4(<4 x i32> %vec) {
 }
 
 define i8 @convert_to_bitmask2(<2 x i64> %vec) {
-; SDAG-LABEL: convert_to_bitmask2:
-; SDAG:   ; %bb.0:
-; SDAG-NEXT:  Lloh6:
-; SDAG-NEXT:adrp x8, lCPI3_0@PAGE
-; SDAG-NEXT:cmeq.2d v0, v0, #0
-; SDAG-NEXT:  Lloh7:
-; SDAG-NEXT:ldr q1, [x8, lCPI3_0@PAGEOFF]
-; SDAG-NEXT:bic.16b v0, v1, v0
-; SDAG-NEXT:addp.2d d0, v0
-; SDAG-NEXT:fmov w8, s0
-; SDAG-NEXT:and w0, w8, #0x3
-; SDAG-NEXT:ret
-; SDAG-NEXT:.loh AdrpLdr Lloh6, Lloh7
-;
-; GISEL-LABEL: convert_to_bitmask2:
-; GISEL:   ; %bb.0:
-; GISEL-NEXT:  Lloh2:
-; GISEL-NEXT:adrp x8, lCPI3_0@PAGE
-; GISEL-NEXT:cmeq.2d v0, v0, #0
-; GISEL-NEXT:  Lloh3:
-; GISEL-NEXT:ldr q1, [x8, lCPI3_0@PAGEOFF]
-; GISEL-NEXT:bic.16b v0

[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. (PR #121185)

2025-01-06 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson updated 
https://github.com/llvm/llvm-project/pull/121185

>From 3efe80b9457a33c68362489fc8c946d51113856a Mon Sep 17 00:00:00 2001
From: Amara Emerson 
Date: Fri, 27 Dec 2024 00:06:30 -0800
Subject: [PATCH] Fix remark checks in test.

Created using spr 1.3.5
---
 .../AArch64/vec-combine-compare-to-bitmask.ll   | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll 
b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
index cbb90c52835df8..7f3c1fdc93380e 100644
--- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
+++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
@@ -6,21 +6,10 @@
 ; IR generated from clang for:
 ; __builtin_convertvector + reinterpret_cast
 
-; GISEL: warning: Instruction selection used fallback path for 
convert_to_bitmask4
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask2
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
clang_builtins_undef_concat_convert_to_bitmask4
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_no_compare
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_compare_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_trunc_in_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_unknown_type_in_long_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_different_types_in_chain
+; GISEL: warning: Instruction selection used fallback path for 
clang_builtins_undef_concat_convert_to_bitmask4
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_2xi32
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_4xi8
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_8xi2
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_float
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_legalized_illegal_element_size
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
no_direct_convert_for_bad_concat
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
no_combine_illegal_num_elements
 
 define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 ; Bits used in mask

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. (PR #121185)

2025-01-06 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson updated 
https://github.com/llvm/llvm-project/pull/121185

>From 3efe80b9457a33c68362489fc8c946d51113856a Mon Sep 17 00:00:00 2001
From: Amara Emerson 
Date: Fri, 27 Dec 2024 00:06:30 -0800
Subject: [PATCH] Fix remark checks in test.

Created using spr 1.3.5
---
 .../AArch64/vec-combine-compare-to-bitmask.ll   | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll 
b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
index cbb90c52835df8..7f3c1fdc93380e 100644
--- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
+++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
@@ -6,21 +6,10 @@
 ; IR generated from clang for:
 ; __builtin_convertvector + reinterpret_cast
 
-; GISEL: warning: Instruction selection used fallback path for 
convert_to_bitmask4
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask2
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
clang_builtins_undef_concat_convert_to_bitmask4
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_no_compare
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_compare_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_trunc_in_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_unknown_type_in_long_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_different_types_in_chain
+; GISEL: warning: Instruction selection used fallback path for 
clang_builtins_undef_concat_convert_to_bitmask4
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_2xi32
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_4xi8
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_8xi2
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_float
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_legalized_illegal_element_size
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
no_direct_convert_for_bad_concat
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
no_combine_illegal_num_elements
 
 define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 ; Bits used in mask

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. (PR #121169)

2025-01-06 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson updated 
https://github.com/llvm/llvm-project/pull/121169

>From a1c545bab55b0e9329044f469507149718a1d36f Mon Sep 17 00:00:00 2001
From: Amara Emerson 
Date: Thu, 26 Dec 2024 23:50:07 -0800
Subject: [PATCH 1/2] Add -aarch64-enable-collect-loh torun line to remove
 unnecessary LOH labels.

Created using spr 1.3.5
---
 .../AArch64/vec-combine-compare-to-bitmask.ll | 627 +-
 1 file changed, 172 insertions(+), 455 deletions(-)

diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll 
b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
index 496f7ebf300e50..1fa96979f45530 100644
--- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
+++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 2
-; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -verify-machineinstrs < 
%s | FileCheck %s --check-prefixes=CHECK,SDAG
-; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -global-isel 
-global-isel-abort=2 -verify-machineinstrs < %s 2>&1 | FileCheck %s 
--check-prefixes=CHECK,GISEL
+; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon 
-aarch64-enable-collect-loh=false -verify-machineinstrs < %s | FileCheck %s 
--check-prefixes=CHECK,SDAG
+; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon 
-aarch64-enable-collect-loh=false -global-isel -global-isel-abort=2 
-verify-machineinstrs < %s 2>&1 | FileCheck %s --check-prefixes=CHECK,GISEL
 
 ; Basic tests from input vector to bitmask
 ; IR generated from clang for:
@@ -26,10 +26,8 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 ; Bits used in mask
 ; SDAG-LABEL: convert_to_bitmask16:
 ; SDAG:   ; %bb.0:
-; SDAG-NEXT:  Lloh0:
 ; SDAG-NEXT:adrp x8, lCPI0_0@PAGE
 ; SDAG-NEXT:cmeq.16b v0, v0, #0
-; SDAG-NEXT:  Lloh1:
 ; SDAG-NEXT:ldr q1, [x8, lCPI0_0@PAGEOFF]
 ; SDAG-NEXT:bic.16b v0, v1, v0
 ; SDAG-NEXT:ext.16b v1, v0, v0, #8
@@ -37,7 +35,6 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 ; SDAG-NEXT:addv.8h h0, v0
 ; SDAG-NEXT:fmov w0, s0
 ; SDAG-NEXT:ret
-; SDAG-NEXT:.loh AdrpLdr Lloh0, Lloh1
 ;
 ; GISEL-LABEL: convert_to_bitmask16:
 ; GISEL:   ; %bb.0:
@@ -106,17 +103,14 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 define i16 @convert_to_bitmask8(<8 x i16> %vec) {
 ; SDAG-LABEL: convert_to_bitmask8:
 ; SDAG:   ; %bb.0:
-; SDAG-NEXT:  Lloh2:
 ; SDAG-NEXT:adrp x8, lCPI1_0@PAGE
 ; SDAG-NEXT:cmeq.8h v0, v0, #0
-; SDAG-NEXT:  Lloh3:
 ; SDAG-NEXT:ldr q1, [x8, lCPI1_0@PAGEOFF]
 ; SDAG-NEXT:bic.16b v0, v1, v0
 ; SDAG-NEXT:addv.8h h0, v0
 ; SDAG-NEXT:fmov w8, s0
 ; SDAG-NEXT:and w0, w8, #0xff
 ; SDAG-NEXT:ret
-; SDAG-NEXT:.loh AdrpLdr Lloh2, Lloh3
 ;
 ; GISEL-LABEL: convert_to_bitmask8:
 ; GISEL:   ; %bb.0:
@@ -160,31 +154,15 @@ define i16 @convert_to_bitmask8(<8 x i16> %vec) {
 }
 
 define i4 @convert_to_bitmask4(<4 x i32> %vec) {
-; SDAG-LABEL: convert_to_bitmask4:
-; SDAG:   ; %bb.0:
-; SDAG-NEXT:  Lloh4:
-; SDAG-NEXT:adrp x8, lCPI2_0@PAGE
-; SDAG-NEXT:cmeq.4s v0, v0, #0
-; SDAG-NEXT:  Lloh5:
-; SDAG-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF]
-; SDAG-NEXT:bic.16b v0, v1, v0
-; SDAG-NEXT:addv.4s s0, v0
-; SDAG-NEXT:fmov w0, s0
-; SDAG-NEXT:ret
-; SDAG-NEXT:.loh AdrpLdr Lloh4, Lloh5
-;
-; GISEL-LABEL: convert_to_bitmask4:
-; GISEL:   ; %bb.0:
-; GISEL-NEXT:  Lloh0:
-; GISEL-NEXT:adrp x8, lCPI2_0@PAGE
-; GISEL-NEXT:cmeq.4s v0, v0, #0
-; GISEL-NEXT:  Lloh1:
-; GISEL-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF]
-; GISEL-NEXT:bic.16b v0, v1, v0
-; GISEL-NEXT:addv.4s s0, v0
-; GISEL-NEXT:fmov w0, s0
-; GISEL-NEXT:ret
-; GISEL-NEXT:.loh AdrpLdr Lloh0, Lloh1
+; CHECK-LABEL: convert_to_bitmask4:
+; CHECK:   ; %bb.0:
+; CHECK-NEXT:adrp x8, lCPI2_0@PAGE
+; CHECK-NEXT:cmeq.4s v0, v0, #0
+; CHECK-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF]
+; CHECK-NEXT:bic.16b v0, v1, v0
+; CHECK-NEXT:addv.4s s0, v0
+; CHECK-NEXT:fmov w0, s0
+; CHECK-NEXT:ret
 
 
   %cmp_result = icmp ne <4 x i32> %vec, zeroinitializer
@@ -193,33 +171,16 @@ define i4 @convert_to_bitmask4(<4 x i32> %vec) {
 }
 
 define i8 @convert_to_bitmask2(<2 x i64> %vec) {
-; SDAG-LABEL: convert_to_bitmask2:
-; SDAG:   ; %bb.0:
-; SDAG-NEXT:  Lloh6:
-; SDAG-NEXT:adrp x8, lCPI3_0@PAGE
-; SDAG-NEXT:cmeq.2d v0, v0, #0
-; SDAG-NEXT:  Lloh7:
-; SDAG-NEXT:ldr q1, [x8, lCPI3_0@PAGEOFF]
-; SDAG-NEXT:bic.16b v0, v1, v0
-; SDAG-NEXT:addp.2d d0, v0
-; SDAG-NEXT:fmov w8, s0
-; SDAG-NEXT:and w0, w8, #0x3
-; SDAG-NEXT:ret
-; SDAG-NEXT:.loh AdrpLdr Lloh6, Lloh7
-;
-; GISEL-LABEL: convert_to_bitmask2:
-; GISEL:   ; %bb.0:
-; GISEL-NEXT:  Lloh2:
-; GISEL-NEXT:adrp x8, lCPI3_0@PAGE
-; GISEL-NEXT:cmeq.2d v0, v0, #0
-; GISEL-NEXT:  Lloh3:
-; GISEL-NEXT:ldr q1, [x8, lCPI3_0@PAGEOFF]
-; GISEL-NEXT:bic.16b v0

[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. (PR #121185)

2025-01-06 Thread Amara Emerson via llvm-branch-commits

aemerson wrote:

> I think this sounds OK. LGTM
> 
> (Which of bitcast or load/store is considered the most fundamental for 
> v4i1/v8i1? I think I would have expected in GISel the loads to be converted 
> to a i4/i8 load with bitcast, and the bitcast legalizes however it does. It 
> could obviously go the other way where a bitcast is just legalized to 
> load+store. I wasn't sure why the v4i1 load needed to produce an extending 
> load just just to scalarize again, but perhaps it is necessary to get past 
> legalization successfully, I haven't looked a lot into it lately. )

I think for loads of v4i1 we should do as you say and bitcast to i4 and then 
legalize the bitcast. It looks like we currently don't do that and instead we 
try to lower it, which ends up failing.

https://github.com/llvm/llvm-project/pull/121185
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Legalize vector boolean bitcasts to scalars by lowering via stack. (PR #121171)

2025-01-05 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson updated 
https://github.com/llvm/llvm-project/pull/121171

>From 0be38ccf5c865b4fddc357b33c378c70a20532b9 Mon Sep 17 00:00:00 2001
From: Amara Emerson 
Date: Thu, 26 Dec 2024 16:13:55 -0800
Subject: [PATCH 1/4] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20ch?=
 =?UTF-8?q?anges=20to=20main=20this=20commit=20is=20based=20on?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Created using spr 1.3.5

[skip ci]
---
 .../CodeGen/GlobalISel/LegalizerHelper.cpp| 14 ++--
 .../AArch64/GISel/AArch64LegalizerInfo.cpp|  1 +
 .../legalize-store-vector-bools.mir   | 32 +++
 3 files changed, 45 insertions(+), 2 deletions(-)
 create mode 100644 
llvm/test/CodeGen/AArch64/GlobalISel/legalize-store-vector-bools.mir

diff --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp 
b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
index e2247f76098e97..a931123638ffb9 100644
--- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
@@ -3022,8 +3022,18 @@ LegalizerHelper::widenScalar(MachineInstr &MI, unsigned 
TypeIdx, LLT WideTy) {
   return UnableToLegalize;
 
 LLT Ty = MRI.getType(MI.getOperand(0).getReg());
-if (!Ty.isScalar())
-  return UnableToLegalize;
+if (!Ty.isScalar()) {
+  // We need to widen the vector element type.
+  Observer.changingInstr(MI);
+  widenScalarSrc(MI, WideTy, 0, TargetOpcode::G_ANYEXT);
+  // We also need to adjust the MMO to turn this into a truncating store.
+  MachineMemOperand &MMO = **MI.memoperands_begin();
+  MachineFunction &MF = MIRBuilder.getMF();
+  auto *NewMMO = MF.getMachineMemOperand(&MMO, MMO.getPointerInfo(), Ty);
+  MI.setMemRefs(MF, {NewMMO});
+  Observer.changedInstr(MI);
+  return Legalized;
+}
 
 Observer.changingInstr(MI);
 
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp 
b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index 4b7d4158faf069..2c35482b7c9e5f 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -454,6 +454,7 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const 
AArch64Subtarget &ST)
   {nxv2s64, p0, nxv2s64, 8},
   })
   .clampScalar(0, s8, s64)
+  .minScalarOrElt(0, s8)
   .lowerIf([=](const LegalityQuery &Query) {
 return Query.Types[0].isScalar() &&
Query.Types[0] != Query.MMODescrs[0].MemoryTy;
diff --git 
a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-store-vector-bools.mir 
b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-store-vector-bools.mir
new file mode 100644
index 00..de70f89461780b
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-store-vector-bools.mir
@@ -0,0 +1,32 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py 
UTC_ARGS: --version 5
+# RUN: llc -O0 -mtriple=aarch64 -run-pass=legalizer -global-isel-abort=2 %s -o 
- | FileCheck %s
+# This test currently is expected to fall back after reaching truncstore of <8 
x s8> as <8 x s1>.
+---
+name:store_8xs1
+tracksRegLiveness: true
+body: |
+  bb.1:
+liveins: $q0, $q1, $x0
+; CHECK-LABEL: name: store_8xs1
+; CHECK: liveins: $q0, $q1, $x0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: [[COPY:%[0-9]+]]:_(<4 x s32>) = COPY $q0
+; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(<4 x s32>) = COPY $q1
+; CHECK-NEXT: %ptr:_(p0) = COPY $x0
+; CHECK-NEXT: [[CONCAT_VECTORS:%[0-9]+]]:_(<8 x s32>) = G_CONCAT_VECTORS 
[[COPY]](<4 x s32>), [[COPY1]](<4 x s32>)
+; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR 
[[C]](s32), [[C]](s32), [[C]](s32), [[C]](s32), [[C]](s32), [[C]](s32), 
[[C]](s32), [[C]](s32)
+; CHECK-NEXT: [[ICMP:%[0-9]+]]:_(<8 x s1>) = G_ICMP intpred(slt), 
[[CONCAT_VECTORS]](<8 x s32>), [[BUILD_VECTOR]]
+; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:_(<8 x s8>) = G_ANYEXT [[ICMP]](<8 x s1>)
+; CHECK-NEXT: G_STORE [[ANYEXT]](<8 x s8>), %ptr(p0) :: (store (<8 x s1>))
+; CHECK-NEXT: RET_ReallyLR
+%1:_(<4 x s32>) = COPY $q0
+%2:_(<4 x s32>) = COPY $q1
+%ptr:_(p0) = COPY $x0
+%0:_(<8 x s32>) = G_CONCAT_VECTORS %1(<4 x s32>), %2(<4 x s32>)
+%4:_(s32) = G_CONSTANT i32 0
+%3:_(<8 x s32>) = G_BUILD_VECTOR %4(s32), %4(s32), %4(s32), %4(s32), 
%4(s32), %4(s32), %4(s32), %4(s32)
+%5:_(<8 x s1>) = G_ICMP intpred(slt), %0(<8 x s32>), %3
+G_STORE %5(<8 x s1>), %ptr(p0) :: (store (<8 x s1>))
+RET_ReallyLR
+...

>From 18da0bff65252d4ef62f7dcefa73b7b508d10bec Mon Sep 17 00:00:00 2001
From: Amara Emerson 
Date: Fri, 27 Dec 2024 10:49:17 -0800
Subject: [PATCH 2/4] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20ch?=
 =?UTF-8?q?anges=20introduced=20through=20rebase?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-T

[llvm-branch-commits] [AArch64][GlobalISel] Legalize vector boolean bitcasts to scalars by lowering via stack. (PR #121171)

2024-12-27 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson updated 
https://github.com/llvm/llvm-project/pull/121171


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AArch64][GlobalISel] Legalize vector boolean bitcasts to scalars by lowering via stack. (PR #121171)

2024-12-27 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson updated 
https://github.com/llvm/llvm-project/pull/121171


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. (PR #121169)

2024-12-26 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson updated 
https://github.com/llvm/llvm-project/pull/121169

>From a1c545bab55b0e9329044f469507149718a1d36f Mon Sep 17 00:00:00 2001
From: Amara Emerson 
Date: Thu, 26 Dec 2024 23:50:07 -0800
Subject: [PATCH] Add -aarch64-enable-collect-loh torun line to remove
 unnecessary LOH labels.

Created using spr 1.3.5
---
 .../AArch64/vec-combine-compare-to-bitmask.ll | 627 +-
 1 file changed, 172 insertions(+), 455 deletions(-)

diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll 
b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
index 496f7ebf300e50..1fa96979f45530 100644
--- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
+++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 2
-; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -verify-machineinstrs < 
%s | FileCheck %s --check-prefixes=CHECK,SDAG
-; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -global-isel 
-global-isel-abort=2 -verify-machineinstrs < %s 2>&1 | FileCheck %s 
--check-prefixes=CHECK,GISEL
+; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon 
-aarch64-enable-collect-loh=false -verify-machineinstrs < %s | FileCheck %s 
--check-prefixes=CHECK,SDAG
+; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon 
-aarch64-enable-collect-loh=false -global-isel -global-isel-abort=2 
-verify-machineinstrs < %s 2>&1 | FileCheck %s --check-prefixes=CHECK,GISEL
 
 ; Basic tests from input vector to bitmask
 ; IR generated from clang for:
@@ -26,10 +26,8 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 ; Bits used in mask
 ; SDAG-LABEL: convert_to_bitmask16:
 ; SDAG:   ; %bb.0:
-; SDAG-NEXT:  Lloh0:
 ; SDAG-NEXT:adrp x8, lCPI0_0@PAGE
 ; SDAG-NEXT:cmeq.16b v0, v0, #0
-; SDAG-NEXT:  Lloh1:
 ; SDAG-NEXT:ldr q1, [x8, lCPI0_0@PAGEOFF]
 ; SDAG-NEXT:bic.16b v0, v1, v0
 ; SDAG-NEXT:ext.16b v1, v0, v0, #8
@@ -37,7 +35,6 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 ; SDAG-NEXT:addv.8h h0, v0
 ; SDAG-NEXT:fmov w0, s0
 ; SDAG-NEXT:ret
-; SDAG-NEXT:.loh AdrpLdr Lloh0, Lloh1
 ;
 ; GISEL-LABEL: convert_to_bitmask16:
 ; GISEL:   ; %bb.0:
@@ -106,17 +103,14 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 define i16 @convert_to_bitmask8(<8 x i16> %vec) {
 ; SDAG-LABEL: convert_to_bitmask8:
 ; SDAG:   ; %bb.0:
-; SDAG-NEXT:  Lloh2:
 ; SDAG-NEXT:adrp x8, lCPI1_0@PAGE
 ; SDAG-NEXT:cmeq.8h v0, v0, #0
-; SDAG-NEXT:  Lloh3:
 ; SDAG-NEXT:ldr q1, [x8, lCPI1_0@PAGEOFF]
 ; SDAG-NEXT:bic.16b v0, v1, v0
 ; SDAG-NEXT:addv.8h h0, v0
 ; SDAG-NEXT:fmov w8, s0
 ; SDAG-NEXT:and w0, w8, #0xff
 ; SDAG-NEXT:ret
-; SDAG-NEXT:.loh AdrpLdr Lloh2, Lloh3
 ;
 ; GISEL-LABEL: convert_to_bitmask8:
 ; GISEL:   ; %bb.0:
@@ -160,31 +154,15 @@ define i16 @convert_to_bitmask8(<8 x i16> %vec) {
 }
 
 define i4 @convert_to_bitmask4(<4 x i32> %vec) {
-; SDAG-LABEL: convert_to_bitmask4:
-; SDAG:   ; %bb.0:
-; SDAG-NEXT:  Lloh4:
-; SDAG-NEXT:adrp x8, lCPI2_0@PAGE
-; SDAG-NEXT:cmeq.4s v0, v0, #0
-; SDAG-NEXT:  Lloh5:
-; SDAG-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF]
-; SDAG-NEXT:bic.16b v0, v1, v0
-; SDAG-NEXT:addv.4s s0, v0
-; SDAG-NEXT:fmov w0, s0
-; SDAG-NEXT:ret
-; SDAG-NEXT:.loh AdrpLdr Lloh4, Lloh5
-;
-; GISEL-LABEL: convert_to_bitmask4:
-; GISEL:   ; %bb.0:
-; GISEL-NEXT:  Lloh0:
-; GISEL-NEXT:adrp x8, lCPI2_0@PAGE
-; GISEL-NEXT:cmeq.4s v0, v0, #0
-; GISEL-NEXT:  Lloh1:
-; GISEL-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF]
-; GISEL-NEXT:bic.16b v0, v1, v0
-; GISEL-NEXT:addv.4s s0, v0
-; GISEL-NEXT:fmov w0, s0
-; GISEL-NEXT:ret
-; GISEL-NEXT:.loh AdrpLdr Lloh0, Lloh1
+; CHECK-LABEL: convert_to_bitmask4:
+; CHECK:   ; %bb.0:
+; CHECK-NEXT:adrp x8, lCPI2_0@PAGE
+; CHECK-NEXT:cmeq.4s v0, v0, #0
+; CHECK-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF]
+; CHECK-NEXT:bic.16b v0, v1, v0
+; CHECK-NEXT:addv.4s s0, v0
+; CHECK-NEXT:fmov w0, s0
+; CHECK-NEXT:ret
 
 
   %cmp_result = icmp ne <4 x i32> %vec, zeroinitializer
@@ -193,33 +171,16 @@ define i4 @convert_to_bitmask4(<4 x i32> %vec) {
 }
 
 define i8 @convert_to_bitmask2(<2 x i64> %vec) {
-; SDAG-LABEL: convert_to_bitmask2:
-; SDAG:   ; %bb.0:
-; SDAG-NEXT:  Lloh6:
-; SDAG-NEXT:adrp x8, lCPI3_0@PAGE
-; SDAG-NEXT:cmeq.2d v0, v0, #0
-; SDAG-NEXT:  Lloh7:
-; SDAG-NEXT:ldr q1, [x8, lCPI3_0@PAGEOFF]
-; SDAG-NEXT:bic.16b v0, v1, v0
-; SDAG-NEXT:addp.2d d0, v0
-; SDAG-NEXT:fmov w8, s0
-; SDAG-NEXT:and w0, w8, #0x3
-; SDAG-NEXT:ret
-; SDAG-NEXT:.loh AdrpLdr Lloh6, Lloh7
-;
-; GISEL-LABEL: convert_to_bitmask2:
-; GISEL:   ; %bb.0:
-; GISEL-NEXT:  Lloh2:
-; GISEL-NEXT:adrp x8, lCPI3_0@PAGE
-; GISEL-NEXT:cmeq.2d v0, v0, #0
-; GISEL-NEXT:  Lloh3:
-; GISEL-NEXT:ldr q1, [x8, lCPI3_0@PAGEOFF]
-; GISEL-NEXT:bic.16b v0, v1

[llvm-branch-commits] [AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. (PR #121185)

2024-12-26 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson created 
https://github.com/llvm/llvm-project/pull/121185

This case is different from the earlier <8 x i1> case handled because it 
triggers
a legalization failure in lowerStore() that's intended for scalar code.

It also was triggering incorrect bitcast actions in the AArch64 rules that 
weren't
expecting truncating stores.

With these two fixed, more cases are handled. The code is still bad, including
some missing load promotion in our combiners that result in dead stores hanging
around at the end of codegen. Again, we can fix these in separate changes.



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. (PR #121185)

2024-12-26 Thread Amara Emerson via llvm-branch-commits

aemerson wrote:

Depends on #121169 

https://github.com/llvm/llvm-project/pull/121185
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. (PR #121185)

2024-12-27 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson updated 
https://github.com/llvm/llvm-project/pull/121185

>From 3efe80b9457a33c68362489fc8c946d51113856a Mon Sep 17 00:00:00 2001
From: Amara Emerson 
Date: Fri, 27 Dec 2024 00:06:30 -0800
Subject: [PATCH] Fix remark checks in test.

Created using spr 1.3.5
---
 .../AArch64/vec-combine-compare-to-bitmask.ll   | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll 
b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
index cbb90c52835df8..7f3c1fdc93380e 100644
--- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
+++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
@@ -6,21 +6,10 @@
 ; IR generated from clang for:
 ; __builtin_convertvector + reinterpret_cast
 
-; GISEL: warning: Instruction selection used fallback path for 
convert_to_bitmask4
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask2
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
clang_builtins_undef_concat_convert_to_bitmask4
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_no_compare
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_compare_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_trunc_in_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_unknown_type_in_long_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_different_types_in_chain
+; GISEL: warning: Instruction selection used fallback path for 
clang_builtins_undef_concat_convert_to_bitmask4
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_2xi32
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_4xi8
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_8xi2
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_float
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_legalized_illegal_element_size
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
no_direct_convert_for_bad_concat
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
no_combine_illegal_num_elements
 
 define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 ; Bits used in mask

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AArch64][GlobalISel] Legalize vector boolean bitcasts to scalars by lowering via stack. (PR #121171)

2024-12-26 Thread Amara Emerson via llvm-branch-commits

aemerson wrote:

Depends on #121170

https://github.com/llvm/llvm-project/pull/121171
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. (PR #121169)

2024-12-26 Thread Amara Emerson via llvm-branch-commits

aemerson wrote:

Depends on #121171

https://github.com/llvm/llvm-project/pull/121169
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. (PR #121169)

2024-12-26 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson created 
https://github.com/llvm/llvm-project/pull/121169

This is essentially a port of TargetLowering::scalarizeVectorStore(), which
is used for the case where we have something like a store of <8 x s8> truncating
to <8 x s1> in memory. The naive lowering is a sequence of extracts to compute
a scalar value to store.

AArch64's DAG implementation has some more smarts to improve this further which
we can do later.



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AArch64][GlobalISel] Legalize vector boolean bitcasts to scalars by lowering via stack. (PR #121171)

2024-12-26 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson created 
https://github.com/llvm/llvm-project/pull/121171

None


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. (PR #121185)

2024-12-27 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson updated 
https://github.com/llvm/llvm-project/pull/121185

>From 3efe80b9457a33c68362489fc8c946d51113856a Mon Sep 17 00:00:00 2001
From: Amara Emerson 
Date: Fri, 27 Dec 2024 00:06:30 -0800
Subject: [PATCH] Fix remark checks in test.

Created using spr 1.3.5
---
 .../AArch64/vec-combine-compare-to-bitmask.ll   | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll 
b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
index cbb90c52835df8..7f3c1fdc93380e 100644
--- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
+++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
@@ -6,21 +6,10 @@
 ; IR generated from clang for:
 ; __builtin_convertvector + reinterpret_cast
 
-; GISEL: warning: Instruction selection used fallback path for 
convert_to_bitmask4
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask2
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
clang_builtins_undef_concat_convert_to_bitmask4
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_no_compare
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_compare_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_trunc_in_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_unknown_type_in_long_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_different_types_in_chain
+; GISEL: warning: Instruction selection used fallback path for 
clang_builtins_undef_concat_convert_to_bitmask4
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_2xi32
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_4xi8
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_8xi2
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_float
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_legalized_illegal_element_size
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
no_direct_convert_for_bad_concat
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
no_combine_illegal_num_elements
 
 define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 ; Bits used in mask

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. (PR #121185)

2024-12-27 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson updated 
https://github.com/llvm/llvm-project/pull/121185

>From 3efe80b9457a33c68362489fc8c946d51113856a Mon Sep 17 00:00:00 2001
From: Amara Emerson 
Date: Fri, 27 Dec 2024 00:06:30 -0800
Subject: [PATCH] Fix remark checks in test.

Created using spr 1.3.5
---
 .../AArch64/vec-combine-compare-to-bitmask.ll   | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll 
b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
index cbb90c52835df8..7f3c1fdc93380e 100644
--- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
+++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
@@ -6,21 +6,10 @@
 ; IR generated from clang for:
 ; __builtin_convertvector + reinterpret_cast
 
-; GISEL: warning: Instruction selection used fallback path for 
convert_to_bitmask4
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask2
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
clang_builtins_undef_concat_convert_to_bitmask4
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_no_compare
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_compare_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_trunc_in_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_unknown_type_in_long_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_different_types_in_chain
+; GISEL: warning: Instruction selection used fallback path for 
clang_builtins_undef_concat_convert_to_bitmask4
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_2xi32
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_4xi8
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_8xi2
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_float
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_legalized_illegal_element_size
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
no_direct_convert_for_bad_concat
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
no_combine_illegal_num_elements
 
 define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 ; Bits used in mask

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. (PR #121169)

2024-12-27 Thread Amara Emerson via llvm-branch-commits

aemerson wrote:

Ehthe heck... I ended up  somehow folding in the factoring out change from 
this PR into #121171 

... some weird `spr` bug?

https://github.com/llvm/llvm-project/pull/121169
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Legalize vector boolean bitcasts to scalars by lowering via stack. (PR #121171)

2024-12-27 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson updated 
https://github.com/llvm/llvm-project/pull/121171

>From b9214baba592d4c7860d714b6d0dffd519a48400 Mon Sep 17 00:00:00 2001
From: Amara Emerson 
Date: Fri, 27 Dec 2024 17:34:25 -0800
Subject: [PATCH 1/2] Factor out into funct.

Created using spr 1.3.5
---
 .../llvm/CodeGen/GlobalISel/LegalizerHelper.h |   3 +
 .../CodeGen/GlobalISel/LegalizerHelper.cpp|  47 +-
 .../AArch64/GISel/AArch64LegalizerInfo.cpp|   3 +-
 .../AArch64/GlobalISel/legalize-bitcast.mir   |  59 +-
 .../legalize-store-vector-bools.mir   |  68 +-
 .../AArch64/vec-combine-compare-to-bitmask.ll | 605 ++
 6 files changed, 640 insertions(+), 145 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h 
b/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
index fac059803b9489..4e18f5cc913a7e 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
@@ -302,6 +302,9 @@ class LegalizerHelper {
   /// same type as \p Res.
   MachineInstrBuilder createStackStoreLoad(const DstOp &Res, const SrcOp &Val);
 
+  /// Given a store of a boolean vector, scalarize it.
+  LegalizeResult scalarizeVectorBooleanStore(GStore &MI);
+
   /// Get a pointer to vector element \p Index located in memory for a vector 
of
   /// type \p VecTy starting at a base address of \p VecPtr. If \p Index is out
   /// of bounds the returned pointer is unspecified, but will be within the
diff --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp 
b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
index 7dece931e8e0eb..0bfa897ecf4047 100644
--- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
@@ -4143,9 +4143,8 @@ LegalizerHelper::LegalizeResult 
LegalizerHelper::lowerStore(GStore &StoreMI) {
   }
 
   if (MemTy.isVector()) {
-// TODO: Handle vector trunc stores
 if (MemTy != SrcTy)
-  return UnableToLegalize;
+  return scalarizeVectorBooleanStore(StoreMI);
 
 // TODO: We can do better than scalarizing the vector and at least split it
 // in half.
@@ -4200,6 +4199,50 @@ LegalizerHelper::LegalizeResult 
LegalizerHelper::lowerStore(GStore &StoreMI) {
   return Legalized;
 }
 
+LegalizerHelper::LegalizeResult
+LegalizerHelper::scalarizeVectorBooleanStore(GStore &StoreMI) {
+  Register SrcReg = StoreMI.getValueReg();
+  Register PtrReg = StoreMI.getPointerReg();
+  LLT SrcTy = MRI.getType(SrcReg);
+  MachineMemOperand &MMO = **StoreMI.memoperands_begin();
+  LLT MemTy = MMO.getMemoryType();
+  LLT MemScalarTy = MemTy.getElementType();
+  MachineFunction &MF = MIRBuilder.getMF();
+
+  assert(SrcTy.isVector() && "Expect a vector store type");
+
+  if (!MemScalarTy.isByteSized()) {
+// We need to build an integer scalar of the vector bit pattern.
+// It's not legal for us to add padding when storing a vector.
+unsigned NumBits = MemTy.getSizeInBits();
+LLT IntTy = LLT::scalar(NumBits);
+auto CurrVal = MIRBuilder.buildConstant(IntTy, 0);
+LLT IdxTy = getLLTForMVT(TLI.getVectorIdxTy(MF.getDataLayout()));
+
+for (unsigned I = 0, E = MemTy.getNumElements(); I < E; ++I) {
+  auto Elt = MIRBuilder.buildExtractVectorElement(
+  SrcTy.getElementType(), SrcReg, MIRBuilder.buildConstant(IdxTy, I));
+  auto Trunc = MIRBuilder.buildTrunc(MemScalarTy, Elt);
+  auto ZExt = MIRBuilder.buildZExt(IntTy, Trunc);
+  unsigned ShiftIntoIdx = MF.getDataLayout().isBigEndian()
+  ? (MemTy.getNumElements() - 1) - I
+  : I;
+  auto ShiftAmt = MIRBuilder.buildConstant(
+  IntTy, ShiftIntoIdx * MemScalarTy.getSizeInBits());
+  auto Shifted = MIRBuilder.buildShl(IntTy, ZExt, ShiftAmt);
+  CurrVal = MIRBuilder.buildOr(IntTy, CurrVal, Shifted);
+}
+auto PtrInfo = MMO.getPointerInfo();
+auto *NewMMO = MF.getMachineMemOperand(&MMO, PtrInfo, IntTy);
+MIRBuilder.buildStore(CurrVal, PtrReg, *NewMMO);
+StoreMI.eraseFromParent();
+return Legalized;
+  }
+
+  // TODO: implement simple scalarization.
+  return UnableToLegalize;
+}
+
 LegalizerHelper::LegalizeResult
 LegalizerHelper::bitcast(MachineInstr &MI, unsigned TypeIdx, LLT CastTy) {
   switch (MI.getOpcode()) {
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp 
b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index 2fac100f81519a..641f06530a5c23 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -474,7 +474,8 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const 
AArch64Subtarget &ST)
  })
   .customIf(IsPtrVecPred)
   .scalarizeIf(typeInSet(0, {v2s16, v2s8}), 0)
-  .scalarizeIf(scalarOrEltWiderThan(0, 64), 0);
+  .scalarizeIf(scalarOrEltWiderThan(0, 64), 0)
+  .lower();
 
   getActionDefinitionsBuilder(G_INDEXED_STORE)
   // Id

[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. (PR #121169)

2024-12-27 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson updated 
https://github.com/llvm/llvm-project/pull/121169

>From a1c545bab55b0e9329044f469507149718a1d36f Mon Sep 17 00:00:00 2001
From: Amara Emerson 
Date: Thu, 26 Dec 2024 23:50:07 -0800
Subject: [PATCH] Add -aarch64-enable-collect-loh torun line to remove
 unnecessary LOH labels.

Created using spr 1.3.5
---
 .../AArch64/vec-combine-compare-to-bitmask.ll | 627 +-
 1 file changed, 172 insertions(+), 455 deletions(-)

diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll 
b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
index 496f7ebf300e50..1fa96979f45530 100644
--- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
+++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 2
-; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -verify-machineinstrs < 
%s | FileCheck %s --check-prefixes=CHECK,SDAG
-; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -global-isel 
-global-isel-abort=2 -verify-machineinstrs < %s 2>&1 | FileCheck %s 
--check-prefixes=CHECK,GISEL
+; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon 
-aarch64-enable-collect-loh=false -verify-machineinstrs < %s | FileCheck %s 
--check-prefixes=CHECK,SDAG
+; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon 
-aarch64-enable-collect-loh=false -global-isel -global-isel-abort=2 
-verify-machineinstrs < %s 2>&1 | FileCheck %s --check-prefixes=CHECK,GISEL
 
 ; Basic tests from input vector to bitmask
 ; IR generated from clang for:
@@ -26,10 +26,8 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 ; Bits used in mask
 ; SDAG-LABEL: convert_to_bitmask16:
 ; SDAG:   ; %bb.0:
-; SDAG-NEXT:  Lloh0:
 ; SDAG-NEXT:adrp x8, lCPI0_0@PAGE
 ; SDAG-NEXT:cmeq.16b v0, v0, #0
-; SDAG-NEXT:  Lloh1:
 ; SDAG-NEXT:ldr q1, [x8, lCPI0_0@PAGEOFF]
 ; SDAG-NEXT:bic.16b v0, v1, v0
 ; SDAG-NEXT:ext.16b v1, v0, v0, #8
@@ -37,7 +35,6 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 ; SDAG-NEXT:addv.8h h0, v0
 ; SDAG-NEXT:fmov w0, s0
 ; SDAG-NEXT:ret
-; SDAG-NEXT:.loh AdrpLdr Lloh0, Lloh1
 ;
 ; GISEL-LABEL: convert_to_bitmask16:
 ; GISEL:   ; %bb.0:
@@ -106,17 +103,14 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 define i16 @convert_to_bitmask8(<8 x i16> %vec) {
 ; SDAG-LABEL: convert_to_bitmask8:
 ; SDAG:   ; %bb.0:
-; SDAG-NEXT:  Lloh2:
 ; SDAG-NEXT:adrp x8, lCPI1_0@PAGE
 ; SDAG-NEXT:cmeq.8h v0, v0, #0
-; SDAG-NEXT:  Lloh3:
 ; SDAG-NEXT:ldr q1, [x8, lCPI1_0@PAGEOFF]
 ; SDAG-NEXT:bic.16b v0, v1, v0
 ; SDAG-NEXT:addv.8h h0, v0
 ; SDAG-NEXT:fmov w8, s0
 ; SDAG-NEXT:and w0, w8, #0xff
 ; SDAG-NEXT:ret
-; SDAG-NEXT:.loh AdrpLdr Lloh2, Lloh3
 ;
 ; GISEL-LABEL: convert_to_bitmask8:
 ; GISEL:   ; %bb.0:
@@ -160,31 +154,15 @@ define i16 @convert_to_bitmask8(<8 x i16> %vec) {
 }
 
 define i4 @convert_to_bitmask4(<4 x i32> %vec) {
-; SDAG-LABEL: convert_to_bitmask4:
-; SDAG:   ; %bb.0:
-; SDAG-NEXT:  Lloh4:
-; SDAG-NEXT:adrp x8, lCPI2_0@PAGE
-; SDAG-NEXT:cmeq.4s v0, v0, #0
-; SDAG-NEXT:  Lloh5:
-; SDAG-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF]
-; SDAG-NEXT:bic.16b v0, v1, v0
-; SDAG-NEXT:addv.4s s0, v0
-; SDAG-NEXT:fmov w0, s0
-; SDAG-NEXT:ret
-; SDAG-NEXT:.loh AdrpLdr Lloh4, Lloh5
-;
-; GISEL-LABEL: convert_to_bitmask4:
-; GISEL:   ; %bb.0:
-; GISEL-NEXT:  Lloh0:
-; GISEL-NEXT:adrp x8, lCPI2_0@PAGE
-; GISEL-NEXT:cmeq.4s v0, v0, #0
-; GISEL-NEXT:  Lloh1:
-; GISEL-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF]
-; GISEL-NEXT:bic.16b v0, v1, v0
-; GISEL-NEXT:addv.4s s0, v0
-; GISEL-NEXT:fmov w0, s0
-; GISEL-NEXT:ret
-; GISEL-NEXT:.loh AdrpLdr Lloh0, Lloh1
+; CHECK-LABEL: convert_to_bitmask4:
+; CHECK:   ; %bb.0:
+; CHECK-NEXT:adrp x8, lCPI2_0@PAGE
+; CHECK-NEXT:cmeq.4s v0, v0, #0
+; CHECK-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF]
+; CHECK-NEXT:bic.16b v0, v1, v0
+; CHECK-NEXT:addv.4s s0, v0
+; CHECK-NEXT:fmov w0, s0
+; CHECK-NEXT:ret
 
 
   %cmp_result = icmp ne <4 x i32> %vec, zeroinitializer
@@ -193,33 +171,16 @@ define i4 @convert_to_bitmask4(<4 x i32> %vec) {
 }
 
 define i8 @convert_to_bitmask2(<2 x i64> %vec) {
-; SDAG-LABEL: convert_to_bitmask2:
-; SDAG:   ; %bb.0:
-; SDAG-NEXT:  Lloh6:
-; SDAG-NEXT:adrp x8, lCPI3_0@PAGE
-; SDAG-NEXT:cmeq.2d v0, v0, #0
-; SDAG-NEXT:  Lloh7:
-; SDAG-NEXT:ldr q1, [x8, lCPI3_0@PAGEOFF]
-; SDAG-NEXT:bic.16b v0, v1, v0
-; SDAG-NEXT:addp.2d d0, v0
-; SDAG-NEXT:fmov w8, s0
-; SDAG-NEXT:and w0, w8, #0x3
-; SDAG-NEXT:ret
-; SDAG-NEXT:.loh AdrpLdr Lloh6, Lloh7
-;
-; GISEL-LABEL: convert_to_bitmask2:
-; GISEL:   ; %bb.0:
-; GISEL-NEXT:  Lloh2:
-; GISEL-NEXT:adrp x8, lCPI3_0@PAGE
-; GISEL-NEXT:cmeq.2d v0, v0, #0
-; GISEL-NEXT:  Lloh3:
-; GISEL-NEXT:ldr q1, [x8, lCPI3_0@PAGEOFF]
-; GISEL-NEXT:bic.16b v0, v1

[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. (PR #121169)

2024-12-27 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson updated 
https://github.com/llvm/llvm-project/pull/121169

>From a1c545bab55b0e9329044f469507149718a1d36f Mon Sep 17 00:00:00 2001
From: Amara Emerson 
Date: Thu, 26 Dec 2024 23:50:07 -0800
Subject: [PATCH] Add -aarch64-enable-collect-loh torun line to remove
 unnecessary LOH labels.

Created using spr 1.3.5
---
 .../AArch64/vec-combine-compare-to-bitmask.ll | 627 +-
 1 file changed, 172 insertions(+), 455 deletions(-)

diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll 
b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
index 496f7ebf300e50..1fa96979f45530 100644
--- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
+++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 2
-; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -verify-machineinstrs < 
%s | FileCheck %s --check-prefixes=CHECK,SDAG
-; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -global-isel 
-global-isel-abort=2 -verify-machineinstrs < %s 2>&1 | FileCheck %s 
--check-prefixes=CHECK,GISEL
+; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon 
-aarch64-enable-collect-loh=false -verify-machineinstrs < %s | FileCheck %s 
--check-prefixes=CHECK,SDAG
+; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon 
-aarch64-enable-collect-loh=false -global-isel -global-isel-abort=2 
-verify-machineinstrs < %s 2>&1 | FileCheck %s --check-prefixes=CHECK,GISEL
 
 ; Basic tests from input vector to bitmask
 ; IR generated from clang for:
@@ -26,10 +26,8 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 ; Bits used in mask
 ; SDAG-LABEL: convert_to_bitmask16:
 ; SDAG:   ; %bb.0:
-; SDAG-NEXT:  Lloh0:
 ; SDAG-NEXT:adrp x8, lCPI0_0@PAGE
 ; SDAG-NEXT:cmeq.16b v0, v0, #0
-; SDAG-NEXT:  Lloh1:
 ; SDAG-NEXT:ldr q1, [x8, lCPI0_0@PAGEOFF]
 ; SDAG-NEXT:bic.16b v0, v1, v0
 ; SDAG-NEXT:ext.16b v1, v0, v0, #8
@@ -37,7 +35,6 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 ; SDAG-NEXT:addv.8h h0, v0
 ; SDAG-NEXT:fmov w0, s0
 ; SDAG-NEXT:ret
-; SDAG-NEXT:.loh AdrpLdr Lloh0, Lloh1
 ;
 ; GISEL-LABEL: convert_to_bitmask16:
 ; GISEL:   ; %bb.0:
@@ -106,17 +103,14 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 define i16 @convert_to_bitmask8(<8 x i16> %vec) {
 ; SDAG-LABEL: convert_to_bitmask8:
 ; SDAG:   ; %bb.0:
-; SDAG-NEXT:  Lloh2:
 ; SDAG-NEXT:adrp x8, lCPI1_0@PAGE
 ; SDAG-NEXT:cmeq.8h v0, v0, #0
-; SDAG-NEXT:  Lloh3:
 ; SDAG-NEXT:ldr q1, [x8, lCPI1_0@PAGEOFF]
 ; SDAG-NEXT:bic.16b v0, v1, v0
 ; SDAG-NEXT:addv.8h h0, v0
 ; SDAG-NEXT:fmov w8, s0
 ; SDAG-NEXT:and w0, w8, #0xff
 ; SDAG-NEXT:ret
-; SDAG-NEXT:.loh AdrpLdr Lloh2, Lloh3
 ;
 ; GISEL-LABEL: convert_to_bitmask8:
 ; GISEL:   ; %bb.0:
@@ -160,31 +154,15 @@ define i16 @convert_to_bitmask8(<8 x i16> %vec) {
 }
 
 define i4 @convert_to_bitmask4(<4 x i32> %vec) {
-; SDAG-LABEL: convert_to_bitmask4:
-; SDAG:   ; %bb.0:
-; SDAG-NEXT:  Lloh4:
-; SDAG-NEXT:adrp x8, lCPI2_0@PAGE
-; SDAG-NEXT:cmeq.4s v0, v0, #0
-; SDAG-NEXT:  Lloh5:
-; SDAG-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF]
-; SDAG-NEXT:bic.16b v0, v1, v0
-; SDAG-NEXT:addv.4s s0, v0
-; SDAG-NEXT:fmov w0, s0
-; SDAG-NEXT:ret
-; SDAG-NEXT:.loh AdrpLdr Lloh4, Lloh5
-;
-; GISEL-LABEL: convert_to_bitmask4:
-; GISEL:   ; %bb.0:
-; GISEL-NEXT:  Lloh0:
-; GISEL-NEXT:adrp x8, lCPI2_0@PAGE
-; GISEL-NEXT:cmeq.4s v0, v0, #0
-; GISEL-NEXT:  Lloh1:
-; GISEL-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF]
-; GISEL-NEXT:bic.16b v0, v1, v0
-; GISEL-NEXT:addv.4s s0, v0
-; GISEL-NEXT:fmov w0, s0
-; GISEL-NEXT:ret
-; GISEL-NEXT:.loh AdrpLdr Lloh0, Lloh1
+; CHECK-LABEL: convert_to_bitmask4:
+; CHECK:   ; %bb.0:
+; CHECK-NEXT:adrp x8, lCPI2_0@PAGE
+; CHECK-NEXT:cmeq.4s v0, v0, #0
+; CHECK-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF]
+; CHECK-NEXT:bic.16b v0, v1, v0
+; CHECK-NEXT:addv.4s s0, v0
+; CHECK-NEXT:fmov w0, s0
+; CHECK-NEXT:ret
 
 
   %cmp_result = icmp ne <4 x i32> %vec, zeroinitializer
@@ -193,33 +171,16 @@ define i4 @convert_to_bitmask4(<4 x i32> %vec) {
 }
 
 define i8 @convert_to_bitmask2(<2 x i64> %vec) {
-; SDAG-LABEL: convert_to_bitmask2:
-; SDAG:   ; %bb.0:
-; SDAG-NEXT:  Lloh6:
-; SDAG-NEXT:adrp x8, lCPI3_0@PAGE
-; SDAG-NEXT:cmeq.2d v0, v0, #0
-; SDAG-NEXT:  Lloh7:
-; SDAG-NEXT:ldr q1, [x8, lCPI3_0@PAGEOFF]
-; SDAG-NEXT:bic.16b v0, v1, v0
-; SDAG-NEXT:addp.2d d0, v0
-; SDAG-NEXT:fmov w8, s0
-; SDAG-NEXT:and w0, w8, #0x3
-; SDAG-NEXT:ret
-; SDAG-NEXT:.loh AdrpLdr Lloh6, Lloh7
-;
-; GISEL-LABEL: convert_to_bitmask2:
-; GISEL:   ; %bb.0:
-; GISEL-NEXT:  Lloh2:
-; GISEL-NEXT:adrp x8, lCPI3_0@PAGE
-; GISEL-NEXT:cmeq.2d v0, v0, #0
-; GISEL-NEXT:  Lloh3:
-; GISEL-NEXT:ldr q1, [x8, lCPI3_0@PAGEOFF]
-; GISEL-NEXT:bic.16b v0, v1

[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. (PR #121185)

2024-12-27 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson updated 
https://github.com/llvm/llvm-project/pull/121185

>From 3efe80b9457a33c68362489fc8c946d51113856a Mon Sep 17 00:00:00 2001
From: Amara Emerson 
Date: Fri, 27 Dec 2024 00:06:30 -0800
Subject: [PATCH] Fix remark checks in test.

Created using spr 1.3.5
---
 .../AArch64/vec-combine-compare-to-bitmask.ll   | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll 
b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
index cbb90c52835df8..7f3c1fdc93380e 100644
--- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
+++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
@@ -6,21 +6,10 @@
 ; IR generated from clang for:
 ; __builtin_convertvector + reinterpret_cast
 
-; GISEL: warning: Instruction selection used fallback path for 
convert_to_bitmask4
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask2
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
clang_builtins_undef_concat_convert_to_bitmask4
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_no_compare
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_compare_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_trunc_in_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_unknown_type_in_long_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_different_types_in_chain
+; GISEL: warning: Instruction selection used fallback path for 
clang_builtins_undef_concat_convert_to_bitmask4
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_2xi32
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_4xi8
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_8xi2
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_float
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_legalized_illegal_element_size
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
no_direct_convert_for_bad_concat
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
no_combine_illegal_num_elements
 
 define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 ; Bits used in mask

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. (PR #121185)

2024-12-27 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson updated 
https://github.com/llvm/llvm-project/pull/121185

>From 3efe80b9457a33c68362489fc8c946d51113856a Mon Sep 17 00:00:00 2001
From: Amara Emerson 
Date: Fri, 27 Dec 2024 00:06:30 -0800
Subject: [PATCH] Fix remark checks in test.

Created using spr 1.3.5
---
 .../AArch64/vec-combine-compare-to-bitmask.ll   | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll 
b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
index cbb90c52835df8..7f3c1fdc93380e 100644
--- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
+++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
@@ -6,21 +6,10 @@
 ; IR generated from clang for:
 ; __builtin_convertvector + reinterpret_cast
 
-; GISEL: warning: Instruction selection used fallback path for 
convert_to_bitmask4
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask2
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
clang_builtins_undef_concat_convert_to_bitmask4
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_no_compare
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_compare_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_trunc_in_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_unknown_type_in_long_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_different_types_in_chain
+; GISEL: warning: Instruction selection used fallback path for 
clang_builtins_undef_concat_convert_to_bitmask4
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_2xi32
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_4xi8
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_8xi2
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_float
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_legalized_illegal_element_size
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
no_direct_convert_for_bad_concat
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
no_combine_illegal_num_elements
 
 define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 ; Bits used in mask

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. (PR #121185)

2024-12-27 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson updated 
https://github.com/llvm/llvm-project/pull/121185

>From 3efe80b9457a33c68362489fc8c946d51113856a Mon Sep 17 00:00:00 2001
From: Amara Emerson 
Date: Fri, 27 Dec 2024 00:06:30 -0800
Subject: [PATCH] Fix remark checks in test.

Created using spr 1.3.5
---
 .../AArch64/vec-combine-compare-to-bitmask.ll   | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll 
b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
index cbb90c52835df8..7f3c1fdc93380e 100644
--- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
+++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
@@ -6,21 +6,10 @@
 ; IR generated from clang for:
 ; __builtin_convertvector + reinterpret_cast
 
-; GISEL: warning: Instruction selection used fallback path for 
convert_to_bitmask4
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask2
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
clang_builtins_undef_concat_convert_to_bitmask4
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_no_compare
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_compare_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_trunc_in_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_unknown_type_in_long_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_different_types_in_chain
+; GISEL: warning: Instruction selection used fallback path for 
clang_builtins_undef_concat_convert_to_bitmask4
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_2xi32
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_4xi8
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_8xi2
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_float
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_legalized_illegal_element_size
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
no_direct_convert_for_bad_concat
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
no_combine_illegal_num_elements
 
 define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 ; Bits used in mask

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. (PR #121185)

2024-12-27 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson updated 
https://github.com/llvm/llvm-project/pull/121185

>From 3efe80b9457a33c68362489fc8c946d51113856a Mon Sep 17 00:00:00 2001
From: Amara Emerson 
Date: Fri, 27 Dec 2024 00:06:30 -0800
Subject: [PATCH] Fix remark checks in test.

Created using spr 1.3.5
---
 .../AArch64/vec-combine-compare-to-bitmask.ll   | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll 
b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
index cbb90c52835df8..7f3c1fdc93380e 100644
--- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
+++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
@@ -6,21 +6,10 @@
 ; IR generated from clang for:
 ; __builtin_convertvector + reinterpret_cast
 
-; GISEL: warning: Instruction selection used fallback path for 
convert_to_bitmask4
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask2
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
clang_builtins_undef_concat_convert_to_bitmask4
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_no_compare
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_compare_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_trunc_in_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_unknown_type_in_long_chain
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_with_different_types_in_chain
+; GISEL: warning: Instruction selection used fallback path for 
clang_builtins_undef_concat_convert_to_bitmask4
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_2xi32
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_4xi8
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_8xi2
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_to_bitmask_float
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
convert_legalized_illegal_element_size
 ; GISEL-NEXT: warning: Instruction selection used fallback path for 
no_direct_convert_for_bad_concat
-; GISEL-NEXT: warning: Instruction selection used fallback path for 
no_combine_illegal_num_elements
 
 define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 ; Bits used in mask

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. (PR #121169)

2024-12-27 Thread Amara Emerson via llvm-branch-commits

aemerson wrote:

Ok, should be fixed now. The factoring out change is now in this PR where it 
belongs.

https://github.com/llvm/llvm-project/pull/121169
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. (PR #121169)

2024-12-27 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson updated 
https://github.com/llvm/llvm-project/pull/121169

>From a1c545bab55b0e9329044f469507149718a1d36f Mon Sep 17 00:00:00 2001
From: Amara Emerson 
Date: Thu, 26 Dec 2024 23:50:07 -0800
Subject: [PATCH 1/2] Add -aarch64-enable-collect-loh torun line to remove
 unnecessary LOH labels.

Created using spr 1.3.5
---
 .../AArch64/vec-combine-compare-to-bitmask.ll | 627 +-
 1 file changed, 172 insertions(+), 455 deletions(-)

diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll 
b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
index 496f7ebf300e50..1fa96979f45530 100644
--- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
+++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 2
-; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -verify-machineinstrs < 
%s | FileCheck %s --check-prefixes=CHECK,SDAG
-; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -global-isel 
-global-isel-abort=2 -verify-machineinstrs < %s 2>&1 | FileCheck %s 
--check-prefixes=CHECK,GISEL
+; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon 
-aarch64-enable-collect-loh=false -verify-machineinstrs < %s | FileCheck %s 
--check-prefixes=CHECK,SDAG
+; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon 
-aarch64-enable-collect-loh=false -global-isel -global-isel-abort=2 
-verify-machineinstrs < %s 2>&1 | FileCheck %s --check-prefixes=CHECK,GISEL
 
 ; Basic tests from input vector to bitmask
 ; IR generated from clang for:
@@ -26,10 +26,8 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 ; Bits used in mask
 ; SDAG-LABEL: convert_to_bitmask16:
 ; SDAG:   ; %bb.0:
-; SDAG-NEXT:  Lloh0:
 ; SDAG-NEXT:adrp x8, lCPI0_0@PAGE
 ; SDAG-NEXT:cmeq.16b v0, v0, #0
-; SDAG-NEXT:  Lloh1:
 ; SDAG-NEXT:ldr q1, [x8, lCPI0_0@PAGEOFF]
 ; SDAG-NEXT:bic.16b v0, v1, v0
 ; SDAG-NEXT:ext.16b v1, v0, v0, #8
@@ -37,7 +35,6 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 ; SDAG-NEXT:addv.8h h0, v0
 ; SDAG-NEXT:fmov w0, s0
 ; SDAG-NEXT:ret
-; SDAG-NEXT:.loh AdrpLdr Lloh0, Lloh1
 ;
 ; GISEL-LABEL: convert_to_bitmask16:
 ; GISEL:   ; %bb.0:
@@ -106,17 +103,14 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) {
 define i16 @convert_to_bitmask8(<8 x i16> %vec) {
 ; SDAG-LABEL: convert_to_bitmask8:
 ; SDAG:   ; %bb.0:
-; SDAG-NEXT:  Lloh2:
 ; SDAG-NEXT:adrp x8, lCPI1_0@PAGE
 ; SDAG-NEXT:cmeq.8h v0, v0, #0
-; SDAG-NEXT:  Lloh3:
 ; SDAG-NEXT:ldr q1, [x8, lCPI1_0@PAGEOFF]
 ; SDAG-NEXT:bic.16b v0, v1, v0
 ; SDAG-NEXT:addv.8h h0, v0
 ; SDAG-NEXT:fmov w8, s0
 ; SDAG-NEXT:and w0, w8, #0xff
 ; SDAG-NEXT:ret
-; SDAG-NEXT:.loh AdrpLdr Lloh2, Lloh3
 ;
 ; GISEL-LABEL: convert_to_bitmask8:
 ; GISEL:   ; %bb.0:
@@ -160,31 +154,15 @@ define i16 @convert_to_bitmask8(<8 x i16> %vec) {
 }
 
 define i4 @convert_to_bitmask4(<4 x i32> %vec) {
-; SDAG-LABEL: convert_to_bitmask4:
-; SDAG:   ; %bb.0:
-; SDAG-NEXT:  Lloh4:
-; SDAG-NEXT:adrp x8, lCPI2_0@PAGE
-; SDAG-NEXT:cmeq.4s v0, v0, #0
-; SDAG-NEXT:  Lloh5:
-; SDAG-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF]
-; SDAG-NEXT:bic.16b v0, v1, v0
-; SDAG-NEXT:addv.4s s0, v0
-; SDAG-NEXT:fmov w0, s0
-; SDAG-NEXT:ret
-; SDAG-NEXT:.loh AdrpLdr Lloh4, Lloh5
-;
-; GISEL-LABEL: convert_to_bitmask4:
-; GISEL:   ; %bb.0:
-; GISEL-NEXT:  Lloh0:
-; GISEL-NEXT:adrp x8, lCPI2_0@PAGE
-; GISEL-NEXT:cmeq.4s v0, v0, #0
-; GISEL-NEXT:  Lloh1:
-; GISEL-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF]
-; GISEL-NEXT:bic.16b v0, v1, v0
-; GISEL-NEXT:addv.4s s0, v0
-; GISEL-NEXT:fmov w0, s0
-; GISEL-NEXT:ret
-; GISEL-NEXT:.loh AdrpLdr Lloh0, Lloh1
+; CHECK-LABEL: convert_to_bitmask4:
+; CHECK:   ; %bb.0:
+; CHECK-NEXT:adrp x8, lCPI2_0@PAGE
+; CHECK-NEXT:cmeq.4s v0, v0, #0
+; CHECK-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF]
+; CHECK-NEXT:bic.16b v0, v1, v0
+; CHECK-NEXT:addv.4s s0, v0
+; CHECK-NEXT:fmov w0, s0
+; CHECK-NEXT:ret
 
 
   %cmp_result = icmp ne <4 x i32> %vec, zeroinitializer
@@ -193,33 +171,16 @@ define i4 @convert_to_bitmask4(<4 x i32> %vec) {
 }
 
 define i8 @convert_to_bitmask2(<2 x i64> %vec) {
-; SDAG-LABEL: convert_to_bitmask2:
-; SDAG:   ; %bb.0:
-; SDAG-NEXT:  Lloh6:
-; SDAG-NEXT:adrp x8, lCPI3_0@PAGE
-; SDAG-NEXT:cmeq.2d v0, v0, #0
-; SDAG-NEXT:  Lloh7:
-; SDAG-NEXT:ldr q1, [x8, lCPI3_0@PAGEOFF]
-; SDAG-NEXT:bic.16b v0, v1, v0
-; SDAG-NEXT:addp.2d d0, v0
-; SDAG-NEXT:fmov w8, s0
-; SDAG-NEXT:and w0, w8, #0x3
-; SDAG-NEXT:ret
-; SDAG-NEXT:.loh AdrpLdr Lloh6, Lloh7
-;
-; GISEL-LABEL: convert_to_bitmask2:
-; GISEL:   ; %bb.0:
-; GISEL-NEXT:  Lloh2:
-; GISEL-NEXT:adrp x8, lCPI3_0@PAGE
-; GISEL-NEXT:cmeq.2d v0, v0, #0
-; GISEL-NEXT:  Lloh3:
-; GISEL-NEXT:ldr q1, [x8, lCPI3_0@PAGEOFF]
-; GISEL-NEXT:bic.16b v0

[llvm-branch-commits] [clang] release/20.x: [AArch64] Enable vscale_range with +sme (#124466) (PR #125386)

2025-02-02 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson approved this pull request.


https://github.com/llvm/llvm-project/pull/125386
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)

2025-06-24 Thread Amara Emerson via llvm-branch-commits


@@ -129,6 +147,245 @@ bool AlwaysInlineImpl(
   return Changed;
 }
 
+/// Promote allocas to registers if possible.
+static void promoteAllocas(
+Function *Caller, SmallPtrSetImpl &AllocasToPromote,
+function_ref &GetAssumptionCache) {
+  if (AllocasToPromote.empty())
+return;
+
+  SmallVector PromotableAllocas;
+  llvm::copy_if(AllocasToPromote, std::back_inserter(PromotableAllocas),
+isAllocaPromotable);
+  if (PromotableAllocas.empty())
+return;
+
+  DominatorTree DT(*Caller);
+  AssumptionCache &AC = GetAssumptionCache(*Caller);
+  PromoteMemToReg(PromotableAllocas, DT, &AC);
+  NumAllocasPromoted += PromotableAllocas.size();
+  // Emit a remark for the promotion.
+  OptimizationRemarkEmitter ORE(Caller);
+  DebugLoc DLoc = Caller->getEntryBlock().getTerminator()->getDebugLoc();
+  ORE.emit([&]() {
+return OptimizationRemark(DEBUG_TYPE, "PromoteAllocas", DLoc,
+  &Caller->getEntryBlock())
+   << "Promoting " << ore::NV("NumAlloca", PromotableAllocas.size())
+   << " allocas to SSA registers in function '"
+   << ore::NV("Function", Caller) << "'";
+  });
+  LLVM_DEBUG(dbgs() << "Promoted " << PromotableAllocas.size()
+<< " allocas to registers in function " << 
Caller->getName()
+<< "\n");
+}
+
+/// We use a different visitation order of functions here to solve a phase
+/// ordering problem. After inlining, a caller function may have allocas that
+/// were previously used for passing reference arguments to the callee that
+/// are now promotable to registers, using SROA/mem2reg. However if we just let
+/// the AlwaysInliner continue inlining everything at once, the later SROA pass
+/// in the pipeline will end up placing phis for these allocas into blocks 
along
+/// the dominance frontier which may extend further than desired (e.g. loop
+/// headers). This can happen when the caller is then inlined into another
+/// caller, and the allocas end up hoisted further before SROA is run.
+///
+/// Instead what we want is to try to do, as best as we can, is to inline leaf
+/// functions into callers, and then run PromoteMemToReg() on the allocas that
+/// were passed into the callee before it was inlined.
+///
+/// We want to do this *before* the caller is inlined into another caller
+/// because we want the alloca promotion to happen before its scope extends too
+/// far because of further inlining.
+///
+/// Here's a simple pseudo-example:
+/// outermost_caller() {
+///   for (...) {
+/// middle_caller();
+///   }
+/// }
+///
+/// middle_caller() {
+///   int stack_var;
+///   inner_callee(&stack_var);
+/// }
+///
+/// inner_callee(int *x) {
+///   // Do something with x.
+/// }
+///
+/// In this case, we want to inline inner_callee() into middle_caller() and
+/// then promote stack_var to a register before we inline middle_caller() into
+/// outermost_caller(). The regular always_inliner would inline everything at
+/// once, and then SROA/mem2reg would promote stack_var to a register but in
+/// the context of outermost_caller() which is not what we want.

aemerson wrote:

Sure. The problem is that mem2reg promotion has to place phi nodes for the 
value along the dominance frontier. This frontier is different depending on 
inlining order. For allocas, what you want is to insert phis when the size of 
the dominance frontier is as small as possible. The motivation is that allocas 
inside nested loops can "leak" phis beyond the innermost loop header, and 
that's bad for register pressure.

The main inline already handles this because the pass manager interleaves 
optimizations with inlining, but for always-inliner we don't have that 
capability.

https://github.com/llvm/llvm-project/pull/145613
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)

2025-06-24 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson edited 
https://github.com/llvm/llvm-project/pull/145613
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)

2025-06-24 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson updated 
https://github.com/llvm/llvm-project/pull/145613


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)

2025-06-24 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson updated 
https://github.com/llvm/llvm-project/pull/145613


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)

2025-07-03 Thread Amara Emerson via llvm-branch-commits

aemerson wrote:

> ⚠️ undef deprecator found issues in your code. ⚠️

This looks to be just the IR output containing undef, not the input.

https://github.com/llvm/llvm-project/pull/145613
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)

2025-06-25 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson edited 
https://github.com/llvm/llvm-project/pull/145613
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)

2025-06-25 Thread Amara Emerson via llvm-branch-commits


@@ -129,6 +147,245 @@ bool AlwaysInlineImpl(
   return Changed;
 }
 
+/// Promote allocas to registers if possible.
+static void promoteAllocas(
+Function *Caller, SmallPtrSetImpl &AllocasToPromote,
+function_ref &GetAssumptionCache) {
+  if (AllocasToPromote.empty())
+return;
+
+  SmallVector PromotableAllocas;
+  llvm::copy_if(AllocasToPromote, std::back_inserter(PromotableAllocas),
+isAllocaPromotable);
+  if (PromotableAllocas.empty())
+return;
+
+  DominatorTree DT(*Caller);
+  AssumptionCache &AC = GetAssumptionCache(*Caller);
+  PromoteMemToReg(PromotableAllocas, DT, &AC);
+  NumAllocasPromoted += PromotableAllocas.size();
+  // Emit a remark for the promotion.
+  OptimizationRemarkEmitter ORE(Caller);
+  DebugLoc DLoc = Caller->getEntryBlock().getTerminator()->getDebugLoc();
+  ORE.emit([&]() {
+return OptimizationRemark(DEBUG_TYPE, "PromoteAllocas", DLoc,
+  &Caller->getEntryBlock())
+   << "Promoting " << ore::NV("NumAlloca", PromotableAllocas.size())
+   << " allocas to SSA registers in function '"
+   << ore::NV("Function", Caller) << "'";
+  });
+  LLVM_DEBUG(dbgs() << "Promoted " << PromotableAllocas.size()
+<< " allocas to registers in function " << 
Caller->getName()
+<< "\n");
+}
+
+/// We use a different visitation order of functions here to solve a phase
+/// ordering problem. After inlining, a caller function may have allocas that
+/// were previously used for passing reference arguments to the callee that
+/// are now promotable to registers, using SROA/mem2reg. However if we just let
+/// the AlwaysInliner continue inlining everything at once, the later SROA pass
+/// in the pipeline will end up placing phis for these allocas into blocks 
along
+/// the dominance frontier which may extend further than desired (e.g. loop
+/// headers). This can happen when the caller is then inlined into another
+/// caller, and the allocas end up hoisted further before SROA is run.
+///
+/// Instead what we want is to try to do, as best as we can, is to inline leaf
+/// functions into callers, and then run PromoteMemToReg() on the allocas that
+/// were passed into the callee before it was inlined.
+///
+/// We want to do this *before* the caller is inlined into another caller
+/// because we want the alloca promotion to happen before its scope extends too
+/// far because of further inlining.
+///
+/// Here's a simple pseudo-example:
+/// outermost_caller() {
+///   for (...) {
+/// middle_caller();
+///   }
+/// }
+///
+/// middle_caller() {
+///   int stack_var;
+///   inner_callee(&stack_var);
+/// }
+///
+/// inner_callee(int *x) {
+///   // Do something with x.
+/// }
+///
+/// In this case, we want to inline inner_callee() into middle_caller() and
+/// then promote stack_var to a register before we inline middle_caller() into
+/// outermost_caller(). The regular always_inliner would inline everything at
+/// once, and then SROA/mem2reg would promote stack_var to a register but in
+/// the context of outermost_caller() which is not what we want.

aemerson wrote:

Yes the traversal order matters here, because for optimal codegen we want 
mem2reg to happen between the inner->middle and middle->outer inlines. If you 
don't the other way around mem2reg can't do anything until the final 
inner->outer inline and by that point it's too late.

For now I think only this promotion is a known issue, I don't know of general 
issues with simplification.

https://github.com/llvm/llvm-project/pull/145613
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)

2025-06-25 Thread Amara Emerson via llvm-branch-commits


@@ -129,6 +147,245 @@ bool AlwaysInlineImpl(
   return Changed;
 }
 
+/// Promote allocas to registers if possible.
+static void promoteAllocas(
+Function *Caller, SmallPtrSetImpl &AllocasToPromote,
+function_ref &GetAssumptionCache) {
+  if (AllocasToPromote.empty())
+return;
+
+  SmallVector PromotableAllocas;
+  llvm::copy_if(AllocasToPromote, std::back_inserter(PromotableAllocas),
+isAllocaPromotable);
+  if (PromotableAllocas.empty())
+return;
+
+  DominatorTree DT(*Caller);
+  AssumptionCache &AC = GetAssumptionCache(*Caller);
+  PromoteMemToReg(PromotableAllocas, DT, &AC);
+  NumAllocasPromoted += PromotableAllocas.size();
+  // Emit a remark for the promotion.
+  OptimizationRemarkEmitter ORE(Caller);
+  DebugLoc DLoc = Caller->getEntryBlock().getTerminator()->getDebugLoc();
+  ORE.emit([&]() {
+return OptimizationRemark(DEBUG_TYPE, "PromoteAllocas", DLoc,
+  &Caller->getEntryBlock())
+   << "Promoting " << ore::NV("NumAlloca", PromotableAllocas.size())
+   << " allocas to SSA registers in function '"
+   << ore::NV("Function", Caller) << "'";
+  });
+  LLVM_DEBUG(dbgs() << "Promoted " << PromotableAllocas.size()
+<< " allocas to registers in function " << 
Caller->getName()
+<< "\n");
+}
+
+/// We use a different visitation order of functions here to solve a phase
+/// ordering problem. After inlining, a caller function may have allocas that
+/// were previously used for passing reference arguments to the callee that
+/// are now promotable to registers, using SROA/mem2reg. However if we just let
+/// the AlwaysInliner continue inlining everything at once, the later SROA pass
+/// in the pipeline will end up placing phis for these allocas into blocks 
along
+/// the dominance frontier which may extend further than desired (e.g. loop
+/// headers). This can happen when the caller is then inlined into another
+/// caller, and the allocas end up hoisted further before SROA is run.
+///
+/// Instead what we want is to try to do, as best as we can, is to inline leaf
+/// functions into callers, and then run PromoteMemToReg() on the allocas that
+/// were passed into the callee before it was inlined.
+///
+/// We want to do this *before* the caller is inlined into another caller
+/// because we want the alloca promotion to happen before its scope extends too
+/// far because of further inlining.
+///
+/// Here's a simple pseudo-example:
+/// outermost_caller() {
+///   for (...) {
+/// middle_caller();
+///   }
+/// }
+///
+/// middle_caller() {
+///   int stack_var;
+///   inner_callee(&stack_var);
+/// }
+///
+/// inner_callee(int *x) {
+///   // Do something with x.
+/// }
+///
+/// In this case, we want to inline inner_callee() into middle_caller() and
+/// then promote stack_var to a register before we inline middle_caller() into
+/// outermost_caller(). The regular always_inliner would inline everything at
+/// once, and then SROA/mem2reg would promote stack_var to a register but in
+/// the context of outermost_caller() which is not what we want.

aemerson wrote:

> In that context, could the problem addressed here be decoupled from inlining 
> order? It seems like it'd result in a more robust system.

I don't *think* so, unless there's something I've missed. Before doing this I 
tried other approaches, such as:
  - Trying to detect these over-extended PHIs and then demoting them back to 
allocas. Didn't work as we end up pessimizing codegen.
  - Avoiding hoisting large vector allocas to the entry block, in order to 
block mem2reg. This works but is conceptually the wrong place to do it (no 
other heuristics code exists there).

I wasn't aware of ModuleInliner. Is the long term plan for it to replace the 
existing inliner? If so we could in future merge it with AlwaysInliner and if 
we interleave optimization as the current SCC manager does then this should fix 
the problem.

https://github.com/llvm/llvm-project/pull/145613
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)

2025-07-02 Thread Amara Emerson via llvm-branch-commits

aemerson wrote:

ping

https://github.com/llvm/llvm-project/pull/145613
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)

2025-07-02 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson updated 
https://github.com/llvm/llvm-project/pull/145613


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)

2025-07-02 Thread Amara Emerson via llvm-branch-commits

aemerson wrote:

I managed to reduce down the original SME test to 
`Transforms/PhaseOrdering/always-inline-alloca-promotion.ll`. Compiling that to 
assembly with clang with and without the change shows the differences in 
codegen quality, but the IR shows the kind of scenario this patch is meant to 
handle.

https://github.com/llvm/llvm-project/pull/145613
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)

2025-07-02 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson updated 
https://github.com/llvm/llvm-project/pull/145613


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits