[llvm-branch-commits] [GlobalISel] Combiner: Observer-based DCE and retrying of combines (PR #102163)
https://github.com/aemerson approved this pull request. LGTM with nit. https://github.com/llvm/llvm-project/pull/102163 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [GlobalISel] Combiner: Observer-based DCE and retrying of combines (PR #102163)
@@ -45,61 +45,189 @@ cl::OptionCategory GICombinerOptionCategory( ); } // end namespace llvm -/// This class acts as the glue the joins the CombinerHelper to the overall +/// This class acts as the glue that joins the CombinerHelper to the overall /// Combine algorithm. The CombinerHelper is intended to report the /// modifications it makes to the MIR to the GISelChangeObserver and the -/// observer subclass will act on these events. In this case, instruction -/// erasure will cancel any future visits to the erased instruction and -/// instruction creation will schedule that instruction for a future visit. -/// Other Combiner implementations may require more complex behaviour from -/// their GISelChangeObserver subclass. +/// observer subclass will act on these events. class Combiner::WorkListMaintainer : public GISelChangeObserver { - using WorkListTy = GISelWorkList<512>; - WorkListTy &WorkList; +protected: +#ifndef NDEBUG /// The instructions that have been created but we want to report once they /// have their operands. This is only maintained if debug output is requested. -#ifndef NDEBUG - SetVector CreatedInstrs; + SmallSetVector CreatedInstrs; #endif + using Level = CombinerInfo::ObserverLevel; public: - WorkListMaintainer(WorkListTy &WorkList) : WorkList(WorkList) {} + static std::unique_ptr + create(Level Lvl, WorkListTy &WorkList, MachineRegisterInfo &MRI); + virtual ~WorkListMaintainer() = default; + void reportFullyCreatedInstrs() { +LLVM_DEBUG({ + for (auto *MI : CreatedInstrs) { +dbgs() << "Created: " << *MI; + } + CreatedInstrs.clear(); +}); + } + + virtual void reset() = 0; + virtual void appliedCombine() = 0; +}; + +/// A configurable WorkListMaintainer implementation. +/// The ObserverLevel determines how the WorkListMaintainer reacts to MIR +/// changes. +template +class Combiner::WorkListMaintainerImpl : public Combiner::WorkListMaintainer { + WorkListTy &WorkList; + MachineRegisterInfo &MRI; + + // Defer handling these instructions until the combine finishes. + SmallSetVector DeferList; + + // Track VRegs that (might) have lost a use. + SmallSetVector LostUses; + +public: + WorkListMaintainerImpl(WorkListTy &WorkList, MachineRegisterInfo &MRI) + : WorkList(WorkList), MRI(MRI) {} + + virtual ~WorkListMaintainerImpl() = default; + + void reset() override { +DeferList.clear(); +LostUses.clear(); + } + void erasingInstr(MachineInstr &MI) override { -LLVM_DEBUG(dbgs() << "Erasing: " << MI << "\n"); +// MI will become dangling, remove it from all lists. +LLVM_DEBUG(dbgs() << "Erasing: " << MI; CreatedInstrs.remove(&MI)); WorkList.remove(&MI); +if constexpr (Lvl != Level::Basic) { + DeferList.remove(&MI); + noteLostUses(MI); +} } + void createdInstr(MachineInstr &MI) override { -LLVM_DEBUG(dbgs() << "Creating: " << MI << "\n"); -WorkList.insert(&MI); -LLVM_DEBUG(CreatedInstrs.insert(&MI)); +LLVM_DEBUG(dbgs() << "Creating: " << MI; CreatedInstrs.insert(&MI)); +if constexpr (Lvl == Level::Basic) + WorkList.insert(&MI); +else + // Defer handling newly created instructions, because they don't have + // operands yet. We also insert them into the WorkList in reverse + // order so that they will be combined top down. + DeferList.insert(&MI); } + void changingInstr(MachineInstr &MI) override { -LLVM_DEBUG(dbgs() << "Changing: " << MI << "\n"); -WorkList.insert(&MI); +LLVM_DEBUG(dbgs() << "Changing: " << MI); +// Some uses might get dropped when MI is changed. +// For now, overapproximate by assuming all uses will be dropped. +// TODO: Is a more precise heuristic or manual tracking of use count +// decrements worth it? +if constexpr (Lvl != Level::Basic) + noteLostUses(MI); } + void changedInstr(MachineInstr &MI) override { -LLVM_DEBUG(dbgs() << "Changed: " << MI << "\n"); -WorkList.insert(&MI); +LLVM_DEBUG(dbgs() << "Changed: " << MI); +if constexpr (Lvl == Level::Basic) + WorkList.insert(&MI); +else + // Defer this for DCE + DeferList.insert(&MI); } - void reportFullyCreatedInstrs() { -LLVM_DEBUG(for (const auto *MI -: CreatedInstrs) { - dbgs() << "Created: "; - MI->print(dbgs()); -}); -LLVM_DEBUG(CreatedInstrs.clear()); + // Only track changes during the combine and then walk the def/use-chains once + // the combine is finished, because: + // - instructions might have multiple defs during the combine. + // - use counts aren't accurate during the combine. + void appliedCombine() override { +if constexpr (Lvl == Level::Basic) + return; + +// DCE deferred instructions and add them to the WorkList bottom up. +while (!DeferList.empty()) { + MachineInstr &MI = *DeferList.pop_back_val(); + if (tryDCE(MI, MRI)) +continue; + + if const
[llvm-branch-commits] [GlobalISel] Combiner: Observer-based DCE and retrying of combines (PR #102163)
https://github.com/aemerson edited https://github.com/llvm/llvm-project/pull/102163 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AArch64][GlobalISel] Disable fixed-point iteration in all Combiners (PR #102167)
https://github.com/aemerson approved this pull request. These are some very nice improvements, thanks for working on this. None of the test output changes look to be exposing problems with this patch, so LGTM. https://github.com/llvm/llvm-project/pull/102167 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [GlobalISel] Fix store merging incorrectly classifying an unknown index expr as 0. (#90375) (PR #90673)
https://github.com/aemerson approved this pull request. https://github.com/llvm/llvm-project/pull/90673 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [GlobalISel] Don't form anyextending atomic loads. (PR #90435)
https://github.com/aemerson approved this pull request. https://github.com/llvm/llvm-project/pull/90435 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT (PR #90827)
https://github.com/aemerson approved this pull request. https://github.com/llvm/llvm-project/pull/90827 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT (PR #90827)
aemerson wrote: @tstellar It looks like this cherry-pick has a test failure, what's the recommended way to resolve this? Make a new PR or modify this one (if that's possible?) https://github.com/llvm/llvm-project/pull/90827 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT (PR #90827)
aemerson wrote: @nikic do you know the procedure here? https://github.com/llvm/llvm-project/pull/90827 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT (PR #90827)
aemerson wrote: > @aemerson Did you submit a new pull request with a fix? I have not yet, will do so now... https://github.com/llvm/llvm-project/pull/90827 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT - manual merge (PR #91672)
https://github.com/aemerson edited https://github.com/llvm/llvm-project/pull/91672 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT - manual merge (PR #91672)
https://github.com/aemerson edited https://github.com/llvm/llvm-project/pull/91672 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT - manual merge (PR #91672)
https://github.com/aemerson ready_for_review https://github.com/llvm/llvm-project/pull/91672 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT (PR #90827)
https://github.com/aemerson closed https://github.com/llvm/llvm-project/pull/90827 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT (PR #90827)
aemerson wrote: New PR: https://github.com/llvm/llvm-project/pull/91672 https://github.com/llvm/llvm-project/pull/90827 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT - manual merge (PR #91672)
aemerson wrote: Test has been changed from original commit due to a fallback in a G_BITCAST. Added abort=2 so we can see partial legalization and check no crash. https://github.com/llvm/llvm-project/pull/91672 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT - manual merge (PR #91672)
https://github.com/aemerson milestoned https://github.com/llvm/llvm-project/pull/91672 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT - manual merge (PR #91672)
aemerson wrote: @tstellar could we merge this now? https://github.com/llvm/llvm-project/pull/91672 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] [llvm] [clang] [GlobalISel] Always direct-call IFuncs and Aliases (PR #74902)
https://github.com/aemerson approved this pull request. LGTM. https://github.com/llvm/llvm-project/pull/74902 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Disable SVE paired ld1/st1 for callee-saves. (PR #107406)
https://github.com/aemerson approved this pull request. https://github.com/llvm/llvm-project/pull/107406 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [AArch64] Remove redundant COPY from loadRegFromStackSlot (#107396) (PR #107435)
https://github.com/aemerson approved this pull request. https://github.com/llvm/llvm-project/pull/107435 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [AArch64] Remove redundant COPY from loadRegFromStackSlot (#107396) (PR #107435)
aemerson wrote: To justify this for the 19 release: this is easily triggered by small IR so we should take this. https://github.com/llvm/llvm-project/pull/107435 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] bd64ad3 - Recommit "[AArch64][GlobalISel] Make G_USUBO legal and select it."
Author: Cassie Jones Date: 2021-01-22T17:29:54-08:00 New Revision: bd64ad3fe17506933ac2971dcc900271d6ae5969 URL: https://github.com/llvm/llvm-project/commit/bd64ad3fe17506933ac2971dcc900271d6ae5969 DIFF: https://github.com/llvm/llvm-project/commit/bd64ad3fe17506933ac2971dcc900271d6ae5969.diff LOG: Recommit "[AArch64][GlobalISel] Make G_USUBO legal and select it." The expansion for wide subtractions includes G_USUBO. Differential Revision: https://reviews.llvm.org/D95032 This was miscompiling on ubsan bots. Added: llvm/test/CodeGen/AArch64/GlobalISel/select-saddo.mir llvm/test/CodeGen/AArch64/GlobalISel/select-ssubo.mir llvm/test/CodeGen/AArch64/GlobalISel/select-usubo.mir Modified: llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir llvm/test/CodeGen/AArch64/GlobalISel/legalize-sub.mir llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir Removed: diff --git a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp index 9619bb43ae9c..5259f4f5a4d0 100644 --- a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp +++ b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp @@ -2745,7 +2745,8 @@ bool AArch64InstructionSelector::select(MachineInstr &I) { } case TargetOpcode::G_SADDO: case TargetOpcode::G_UADDO: - case TargetOpcode::G_SSUBO: { + case TargetOpcode::G_SSUBO: + case TargetOpcode::G_USUBO: { // Emit the operation and get the correct condition code. MachineIRBuilder MIRBuilder(I); auto OpAndCC = emitOverflowOp(Opcode, I.getOperand(0).getReg(), @@ -4376,6 +4377,8 @@ AArch64InstructionSelector::emitOverflowOp(unsigned Opcode, Register Dst, return std::make_pair(emitADDS(Dst, LHS, RHS, MIRBuilder), AArch64CC::HS); case TargetOpcode::G_SSUBO: return std::make_pair(emitSUBS(Dst, LHS, RHS, MIRBuilder), AArch64CC::VS); + case TargetOpcode::G_USUBO: +return std::make_pair(emitSUBS(Dst, LHS, RHS, MIRBuilder), AArch64CC::LO); } } diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp index cc7aada211bb..5a6c904e3f5d 100644 --- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp +++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp @@ -165,7 +165,8 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST) getActionDefinitionsBuilder({G_SMULH, G_UMULH}).legalFor({s32, s64}); - getActionDefinitionsBuilder({G_UADDE, G_USUBE, G_SADDO, G_SSUBO, G_UADDO}) + getActionDefinitionsBuilder( + {G_UADDE, G_USUBE, G_SADDO, G_SSUBO, G_UADDO, G_USUBO}) .legalFor({{s32, s1}, {s64, s1}}) .minScalar(0, s32); diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir index ab8510bf9d92..4f97d153d28b 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir +++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir @@ -73,6 +73,44 @@ body: | %5:_(s64) = G_ANYEXT %4(s8) $x0 = COPY %5(s64) +... +--- +name:test_scalar_uaddo_32 +body: | + bb.0.entry: +; CHECK-LABEL: name: test_scalar_uaddo_32 +; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $w0 +; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $w1 +; CHECK: [[UADDO:%[0-9]+]]:_(s32), [[UADDO1:%[0-9]+]]:_(s1) = G_UADDO [[COPY]], [[COPY1]] +; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[UADDO1]](s1) +; CHECK: $w0 = COPY [[UADDO]](s32) +; CHECK: $w1 = COPY [[ANYEXT]](s32) +%0:_(s32) = COPY $w0 +%1:_(s32) = COPY $w1 +%2:_(s32), %3:_(s1) = G_UADDO %0, %1 +%4:_(s32) = G_ANYEXT %3 +$w0 = COPY %2(s32) +$w1 = COPY %4(s32) + +... +--- +name:test_scalar_saddo_32 +body: | + bb.0.entry: +; CHECK-LABEL: name: test_scalar_saddo_32 +; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $w0 +; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $w1 +; CHECK: [[SADDO:%[0-9]+]]:_(s32), [[SADDO1:%[0-9]+]]:_(s1) = G_SADDO [[COPY]], [[COPY1]] +; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[SADDO1]](s1) +; CHECK: $w0 = COPY [[SADDO]](s32) +; CHECK: $w1 = COPY [[ANYEXT]](s32) +%0:_(s32) = COPY $w0 +%1:_(s32) = COPY $w1 +%2:_(s32), %3:_(s1) = G_SADDO %0, %1 +%4:_(s32) = G_ANYEXT %3 +$w0 = COPY %2(s32) +$w1 = COPY %4(s32) + ... --- name:test_vector_add diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-sub.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-sub.mir index 32796e0948cc..b372a32eb7fc 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-sub.mir +++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-sub.mir @@ -1,6 +1,59 @@ # NOTE: As
[llvm-branch-commits] [llvm] fa0971b - GlobalISel: check type size before getZExtValue()ing it.
Author: Tim Northover Date: 2021-04-09T11:19:39-07:00 New Revision: fa0971b87fb2c9d14d1bba2551e61f02f18f329b URL: https://github.com/llvm/llvm-project/commit/fa0971b87fb2c9d14d1bba2551e61f02f18f329b DIFF: https://github.com/llvm/llvm-project/commit/fa0971b87fb2c9d14d1bba2551e61f02f18f329b.diff LOG: GlobalISel: check type size before getZExtValue()ing it. Otherwise getZExtValue() asserts. (cherry picked from commit c2b322fc19e829162ed4c7dcd04d9e9b2cd4e66c) Added: llvm/test/CodeGen/AArch64/GlobalISel/huge-switch.ll Modified: llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp Removed: diff --git a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp index b97c369b832da..b7883cbc3120f 100644 --- a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp +++ b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp @@ -840,9 +840,8 @@ void IRTranslator::emitSwitchCase(SwitchCG::CaseBlock &CB, // For conditional branch lowering, we might try to do something silly like // emit an G_ICMP to compare an existing G_ICMP i1 result with true. If so, // just re-use the existing condition vreg. -if (CI && CI->getZExtValue() == 1 && -MRI->getType(CondLHS).getSizeInBits() == 1 && -CB.PredInfo.Pred == CmpInst::ICMP_EQ) { +if (MRI->getType(CondLHS).getSizeInBits() == 1 && CI && +CI->getZExtValue() == 1 && CB.PredInfo.Pred == CmpInst::ICMP_EQ) { Cond = CondLHS; } else { Register CondRHS = getOrCreateVReg(*CB.CmpRHS); diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/huge-switch.ll b/llvm/test/CodeGen/AArch64/GlobalISel/huge-switch.ll new file mode 100644 index 0..8742a848c4af1 --- /dev/null +++ b/llvm/test/CodeGen/AArch64/GlobalISel/huge-switch.ll @@ -0,0 +1,22 @@ +; RUN: llc -mtriple=arm64-apple-ios %s -o - -O0 -global-isel=1 | FileCheck %s +define void @foo(i512 %in) { +; CHECK-LABEL: foo: +; CHECK: cbz + switch i512 %in, label %default [ +i512 3923188584616675477397368389504791510063972152790021570560, label %l1 +i512 3923188584616675477397368389504791510063972152790021570561, label %l2 +i512 3923188584616675477397368389504791510063972152790021570562, label %l3 + ] + +default: + ret void + +l1: + ret void + +l2: + ret void + +l3: + ret void +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] bc3606d - [AArch64][GlobalISel] Assign FPR banks to loads which are used by integer->float conversions.
Author: Amara Emerson Date: 2021-01-14T16:14:59-08:00 New Revision: bc3606d0b27b2ba13a826b5c3fcba81f7e737387 URL: https://github.com/llvm/llvm-project/commit/bc3606d0b27b2ba13a826b5c3fcba81f7e737387 DIFF: https://github.com/llvm/llvm-project/commit/bc3606d0b27b2ba13a826b5c3fcba81f7e737387.diff LOG: [AArch64][GlobalISel] Assign FPR banks to loads which are used by integer->float conversions. G_[US]ITOFP users of loads on AArch64 can operate on both gpr and fpr banks for scalars. Because of this, if their source is a load, then that load can be assigned to an fpr bank and therefore avoid having to do a cross bank copy via a gpr->fpr conversion. Differential Revision: https://reviews.llvm.org/D94701 Added: Modified: llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp llvm/test/CodeGen/AArch64/GlobalISel/regbank-fp-use-def.mir Removed: diff --git a/llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp index eeb7d5bc6eb7..c76c43389b37 100644 --- a/llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp +++ b/llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp @@ -680,11 +680,18 @@ AArch64RegisterBankInfo::getInstrMapping(const MachineInstr &MI) const { break; } case TargetOpcode::G_SITOFP: - case TargetOpcode::G_UITOFP: + case TargetOpcode::G_UITOFP: { if (MRI.getType(MI.getOperand(0).getReg()).isVector()) break; -OpRegBankIdx = {PMI_FirstFPR, PMI_FirstGPR}; +// Integer to FP conversions don't necessarily happen between GPR -> FPR +// regbanks. They can also be done within an FPR register. +Register SrcReg = MI.getOperand(1).getReg(); +if (getRegBank(SrcReg, MRI, TRI) == &AArch64::FPRRegBank) + OpRegBankIdx = {PMI_FirstFPR, PMI_FirstFPR}; +else + OpRegBankIdx = {PMI_FirstFPR, PMI_FirstGPR}; break; + } case TargetOpcode::G_FPTOSI: case TargetOpcode::G_FPTOUI: if (MRI.getType(MI.getOperand(0).getReg()).isVector()) @@ -722,7 +729,8 @@ AArch64RegisterBankInfo::getInstrMapping(const MachineInstr &MI) const { // assume this was a floating point load in the IR. // If it was not, we would have had a bitcast before // reaching that instruction. -if (onlyUsesFP(UseMI, MRI, TRI)) { +// Int->FP conversion operations are also captured in onlyDefinesFP(). +if (onlyUsesFP(UseMI, MRI, TRI) || onlyDefinesFP(UseMI, MRI, TRI)) { OpRegBankIdx[0] = PMI_FirstFPR; break; } diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/regbank-fp-use-def.mir b/llvm/test/CodeGen/AArch64/GlobalISel/regbank-fp-use-def.mir index a7aae275fa5d..46177b4f1b1f 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/regbank-fp-use-def.mir +++ b/llvm/test/CodeGen/AArch64/GlobalISel/regbank-fp-use-def.mir @@ -4,7 +4,7 @@ # Check that we correctly assign register banks based off of instructions which # only use or only define FPRs. # -# For example, G_SITOFP takes in a GPR, but only ever produces values on FPRs. +# For example, G_SITOFP may take in a GPR, but only ever produces values on FPRs. # Some instructions can have inputs/outputs on either FPRs or GPRs. If one of # those instructions takes in the result of a G_SITOFP as a source, we should # put that source on a FPR. @@ -361,3 +361,47 @@ body: | %phi:_(s32) = G_PHI %gpr_copy(s32), %bb.0, %unmerge_1(s32), %bb.1 $s0 = COPY %phi(s32) RET_ReallyLR implicit $s0 + +... +--- +name:load_used_by_sitofp +legalized: true +tracksRegLiveness: true +body: | + bb.0: +liveins: $x0 +; The load should be assigned an fpr bank because it's used by the sitofp. +; The sitofp should assign both src and dest to FPR, resulting in no copies. +; CHECK-LABEL: name: load_used_by_sitofp +; CHECK: liveins: $x0 +; CHECK: [[COPY:%[0-9]+]]:gpr(p0) = COPY $x0 +; CHECK: [[LOAD:%[0-9]+]]:fpr(s32) = G_LOAD [[COPY]](p0) :: (load 4) +; CHECK: [[SITOFP:%[0-9]+]]:fpr(s32) = G_SITOFP [[LOAD]](s32) +; CHECK: $s0 = COPY [[SITOFP]](s32) +; CHECK: RET_ReallyLR implicit $s0 +%0:_(p0) = COPY $x0 +%1:_(s32) = G_LOAD %0 :: (load 4) +%2:_(s32) = G_SITOFP %1:_(s32) +$s0 = COPY %2(s32) +RET_ReallyLR implicit $s0 +... +--- +name:load_used_by_uitofp +legalized: true +tracksRegLiveness: true +body: | + bb.0: +liveins: $x0 +; CHECK-LABEL: name: load_used_by_uitofp +; CHECK: liveins: $x0 +; CHECK: [[COPY:%[0-9]+]]:gpr(p0) = COPY $x0 +; CHECK: [[LOAD:%[0-9]+]]:fpr(s32) = G_LOAD [[COPY]](p0) :: (load 4) +; CHECK: [[UITOFP:%[0-9]+]]:fpr(s32) = G_UITOFP [[LOAD]](s32) +; CHECK: $s0 = COPY [[UITOFP]](s32) +; CHECK: RET_ReallyLR implicit $s0 +%0:_(p0) = COPY $x0 +%1:_(s32) = G_LOAD %0 :: (load 4) +%2:_(s32) = G_UITOFP %1:_(s32) +$s0 = COPY %2
[llvm-branch-commits] [llvm] 036bc79 - [AArch64][GlobalISel] Assign FPR banks to loads which are used by integer->float conversions.
Author: Amara Emerson Date: 2021-01-14T16:33:34-08:00 New Revision: 036bc798f2ae4d266fe01e70778afe0b3381c088 URL: https://github.com/llvm/llvm-project/commit/036bc798f2ae4d266fe01e70778afe0b3381c088 DIFF: https://github.com/llvm/llvm-project/commit/036bc798f2ae4d266fe01e70778afe0b3381c088.diff LOG: [AArch64][GlobalISel] Assign FPR banks to loads which are used by integer->float conversions. G_[US]ITOFP users of loads on AArch64 can operate on both gpr and fpr banks for scalars. Because of this, if their source is a load, then that load can be assigned to an fpr bank and therefore avoid having to do a cross bank copy via a gpr->fpr conversion. Differential Revision: https://reviews.llvm.org/D94701 Added: Modified: llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp llvm/test/CodeGen/AArch64/GlobalISel/regbank-fp-use-def.mir Removed: diff --git a/llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp index eeb7d5bc6eb7..c76c43389b37 100644 --- a/llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp +++ b/llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp @@ -680,11 +680,18 @@ AArch64RegisterBankInfo::getInstrMapping(const MachineInstr &MI) const { break; } case TargetOpcode::G_SITOFP: - case TargetOpcode::G_UITOFP: + case TargetOpcode::G_UITOFP: { if (MRI.getType(MI.getOperand(0).getReg()).isVector()) break; -OpRegBankIdx = {PMI_FirstFPR, PMI_FirstGPR}; +// Integer to FP conversions don't necessarily happen between GPR -> FPR +// regbanks. They can also be done within an FPR register. +Register SrcReg = MI.getOperand(1).getReg(); +if (getRegBank(SrcReg, MRI, TRI) == &AArch64::FPRRegBank) + OpRegBankIdx = {PMI_FirstFPR, PMI_FirstFPR}; +else + OpRegBankIdx = {PMI_FirstFPR, PMI_FirstGPR}; break; + } case TargetOpcode::G_FPTOSI: case TargetOpcode::G_FPTOUI: if (MRI.getType(MI.getOperand(0).getReg()).isVector()) @@ -722,7 +729,8 @@ AArch64RegisterBankInfo::getInstrMapping(const MachineInstr &MI) const { // assume this was a floating point load in the IR. // If it was not, we would have had a bitcast before // reaching that instruction. -if (onlyUsesFP(UseMI, MRI, TRI)) { +// Int->FP conversion operations are also captured in onlyDefinesFP(). +if (onlyUsesFP(UseMI, MRI, TRI) || onlyDefinesFP(UseMI, MRI, TRI)) { OpRegBankIdx[0] = PMI_FirstFPR; break; } diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/regbank-fp-use-def.mir b/llvm/test/CodeGen/AArch64/GlobalISel/regbank-fp-use-def.mir index a7aae275fa5d..46177b4f1b1f 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/regbank-fp-use-def.mir +++ b/llvm/test/CodeGen/AArch64/GlobalISel/regbank-fp-use-def.mir @@ -4,7 +4,7 @@ # Check that we correctly assign register banks based off of instructions which # only use or only define FPRs. # -# For example, G_SITOFP takes in a GPR, but only ever produces values on FPRs. +# For example, G_SITOFP may take in a GPR, but only ever produces values on FPRs. # Some instructions can have inputs/outputs on either FPRs or GPRs. If one of # those instructions takes in the result of a G_SITOFP as a source, we should # put that source on a FPR. @@ -361,3 +361,47 @@ body: | %phi:_(s32) = G_PHI %gpr_copy(s32), %bb.0, %unmerge_1(s32), %bb.1 $s0 = COPY %phi(s32) RET_ReallyLR implicit $s0 + +... +--- +name:load_used_by_sitofp +legalized: true +tracksRegLiveness: true +body: | + bb.0: +liveins: $x0 +; The load should be assigned an fpr bank because it's used by the sitofp. +; The sitofp should assign both src and dest to FPR, resulting in no copies. +; CHECK-LABEL: name: load_used_by_sitofp +; CHECK: liveins: $x0 +; CHECK: [[COPY:%[0-9]+]]:gpr(p0) = COPY $x0 +; CHECK: [[LOAD:%[0-9]+]]:fpr(s32) = G_LOAD [[COPY]](p0) :: (load 4) +; CHECK: [[SITOFP:%[0-9]+]]:fpr(s32) = G_SITOFP [[LOAD]](s32) +; CHECK: $s0 = COPY [[SITOFP]](s32) +; CHECK: RET_ReallyLR implicit $s0 +%0:_(p0) = COPY $x0 +%1:_(s32) = G_LOAD %0 :: (load 4) +%2:_(s32) = G_SITOFP %1:_(s32) +$s0 = COPY %2(s32) +RET_ReallyLR implicit $s0 +... +--- +name:load_used_by_uitofp +legalized: true +tracksRegLiveness: true +body: | + bb.0: +liveins: $x0 +; CHECK-LABEL: name: load_used_by_uitofp +; CHECK: liveins: $x0 +; CHECK: [[COPY:%[0-9]+]]:gpr(p0) = COPY $x0 +; CHECK: [[LOAD:%[0-9]+]]:fpr(s32) = G_LOAD [[COPY]](p0) :: (load 4) +; CHECK: [[UITOFP:%[0-9]+]]:fpr(s32) = G_UITOFP [[LOAD]](s32) +; CHECK: $s0 = COPY [[UITOFP]](s32) +; CHECK: RET_ReallyLR implicit $s0 +%0:_(p0) = COPY $x0 +%1:_(s32) = G_LOAD %0 :: (load 4) +%2:_(s32) = G_UITOFP %1:_(s32) +$s0 = COPY %2
[llvm-branch-commits] [llvm] 8f283ca - [AArch64][GlobalISel] Add selection support for fpr bank source variants of G_SITOFP and G_UITOFP.
Author: Amara Emerson Date: 2021-01-14T19:31:19-08:00 New Revision: 8f283cafddfa8d6d01a94b48cdc5d25817569e91 URL: https://github.com/llvm/llvm-project/commit/8f283cafddfa8d6d01a94b48cdc5d25817569e91 DIFF: https://github.com/llvm/llvm-project/commit/8f283cafddfa8d6d01a94b48cdc5d25817569e91.diff LOG: [AArch64][GlobalISel] Add selection support for fpr bank source variants of G_SITOFP and G_UITOFP. In order to import patterns for these, we need to define new ops that can map to the AArch64ISD::[SU]ITOF nodes. We then transform fpr->fpr variants of the generic opcodes to these custom opcodes in preisel-lowering. We have to do it here and not the PostLegalizer combiner because this has to run after regbankselect. Differential Revision: https://reviews.llvm.org/D94702 Added: Modified: llvm/lib/Target/AArch64/AArch64InstrGISel.td llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp llvm/test/CodeGen/AArch64/GlobalISel/select-fp-casts.mir Removed: diff --git a/llvm/lib/Target/AArch64/AArch64InstrGISel.td b/llvm/lib/Target/AArch64/AArch64InstrGISel.td index eadb6847ceb6..25656fac1d2f 100644 --- a/llvm/lib/Target/AArch64/AArch64InstrGISel.td +++ b/llvm/lib/Target/AArch64/AArch64InstrGISel.td @@ -146,6 +146,16 @@ def G_VLSHR : AArch64GenericInstruction { let InOperandList = (ins type0:$src1, untyped_imm_0:$imm); } +// Represents an integer to FP conversion on the FPR bank. +def G_SITOF : AArch64GenericInstruction { + let OutOperandList = (outs type0:$dst); + let InOperandList = (ins type0:$src); +} +def G_UITOF : AArch64GenericInstruction { + let OutOperandList = (outs type0:$dst); + let InOperandList = (ins type0:$src); +} + def : GINodeEquiv; def : GINodeEquiv; def : GINodeEquiv; @@ -163,6 +173,8 @@ def : GINodeEquiv; def : GINodeEquiv; def : GINodeEquiv; def : GINodeEquiv; +def : GINodeEquiv; +def : GINodeEquiv; def : GINodeEquiv; diff --git a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp index 6dc0d1fb97e2..c2e3d9484207 100644 --- a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp +++ b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp @@ -1941,6 +1941,24 @@ bool AArch64InstructionSelector::preISelLower(MachineInstr &I) { I.getOperand(1).setReg(NewSrc.getReg(0)); return true; } + case TargetOpcode::G_UITOFP: + case TargetOpcode::G_SITOFP: { +// If both source and destination regbanks are FPR, then convert the opcode +// to G_SITOF so that the importer can select it to an fpr variant. +// Otherwise, it ends up matching an fpr/gpr variant and adding a cross-bank +// copy. +Register SrcReg = I.getOperand(1).getReg(); +if (MRI.getType(SrcReg).isVector()) + return false; +if (RBI.getRegBank(SrcReg, MRI, TRI)->getID() == AArch64::FPRRegBankID) { + if (I.getOpcode() == TargetOpcode::G_SITOFP) +I.setDesc(TII.get(AArch64::G_SITOF)); + else +I.setDesc(TII.get(AArch64::G_UITOF)); + return true; +} +return false; + } default: return false; } diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/select-fp-casts.mir b/llvm/test/CodeGen/AArch64/GlobalISel/select-fp-casts.mir index aea10c5c6c9d..aad71bd99f8f 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/select-fp-casts.mir +++ b/llvm/test/CodeGen/AArch64/GlobalISel/select-fp-casts.mir @@ -218,7 +218,7 @@ body: | ... --- -name:sitofp_s32_s32_fpr +name:sitofp_s32_s32_fpr_gpr legalized: true regBankSelected: true @@ -230,7 +230,7 @@ body: | bb.0: liveins: $w0 -; CHECK-LABEL: name: sitofp_s32_s32_fpr +; CHECK-LABEL: name: sitofp_s32_s32_fpr_gpr ; CHECK: [[COPY:%[0-9]+]]:gpr32 = COPY $w0 ; CHECK: [[SCVTFUWSri:%[0-9]+]]:fpr32 = SCVTFUWSri [[COPY]] ; CHECK: $s0 = COPY [[SCVTFUWSri]] @@ -239,6 +239,50 @@ body: | $s0 = COPY %1(s32) ... +--- +name:sitofp_s32_s32_fpr_fpr +legalized: true +regBankSelected: true + +registers: + - { id: 0, class: fpr } + - { id: 1, class: fpr } + +body: | + bb.0: +liveins: $s0 + +; CHECK-LABEL: name: sitofp_s32_s32_fpr_fpr +; CHECK: [[COPY:%[0-9]+]]:fpr32 = COPY $s0 +; CHECK: [[SCVTFv1i32:%[0-9]+]]:fpr32 = SCVTFv1i32 [[COPY]] +; CHECK: $s0 = COPY [[SCVTFv1i32]] +%0(s32) = COPY $s0 +%1(s32) = G_SITOFP %0 +$s0 = COPY %1(s32) +... + +--- +name:uitofp_s32_s32_fpr_fpr +legalized: true +regBankSelected: true + +registers: + - { id: 0, class: fpr } + - { id: 1, class: fpr } + +body: | + bb.0: +liveins: $s0 + +; CHECK-LABEL: name: uitofp_s32_s32_fpr_fpr +; CHECK: [[COPY:%[0-9]+]]:fpr32 = COPY $s0 +; CHECK: [[UCVTFv1i32:%[0-9]+]]:fpr32 = UCVTFv1i32 [[COPY]] +; CHECK: $s0 = COPY [[UCVTFv1
[llvm-branch-commits] [llvm] 89e84de - [AArch64][GlobalISel] Fix fallbacks introduced for G_SITOFP in 8f283cafddfa8d6d01a94b48cdc5d25817569e91
Author: Amara Emerson Date: 2021-01-15T01:10:49-08:00 New Revision: 89e84dec1879417fb7eb96edaa55dac7eca204ab URL: https://github.com/llvm/llvm-project/commit/89e84dec1879417fb7eb96edaa55dac7eca204ab DIFF: https://github.com/llvm/llvm-project/commit/89e84dec1879417fb7eb96edaa55dac7eca204ab.diff LOG: [AArch64][GlobalISel] Fix fallbacks introduced for G_SITOFP in 8f283cafddfa8d6d01a94b48cdc5d25817569e91 If we have an integer->fp convert that has differing sizes, e.g. s32 to s64, then don't try to convert it to AArch64::G_SITOF since it won't select. Added: Modified: llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp llvm/test/CodeGen/AArch64/GlobalISel/select-fp-casts.mir Removed: diff --git a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp index 5dcb9b2d00da..797f33ce2ab4 100644 --- a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp +++ b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp @@ -1947,8 +1947,11 @@ bool AArch64InstructionSelector::preISelLower(MachineInstr &I) { // Otherwise, it ends up matching an fpr/gpr variant and adding a cross-bank // copy. Register SrcReg = I.getOperand(1).getReg(); -if (MRI.getType(SrcReg).isVector()) +LLT SrcTy = MRI.getType(SrcReg); +LLT DstTy = MRI.getType(I.getOperand(0).getReg()); +if (SrcTy.isVector() || SrcTy.getSizeInBits() != DstTy.getSizeInBits()) return false; + if (RBI.getRegBank(SrcReg, MRI, TRI)->getID() == AArch64::FPRRegBankID) { if (I.getOpcode() == TargetOpcode::G_SITOFP) I.setDesc(TII.get(AArch64::G_SITOF)); diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/select-fp-casts.mir b/llvm/test/CodeGen/AArch64/GlobalISel/select-fp-casts.mir index aad71bd99f8f..4274f91dba49 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/select-fp-casts.mir +++ b/llvm/test/CodeGen/AArch64/GlobalISel/select-fp-casts.mir @@ -327,6 +327,29 @@ body: | $d0 = COPY %1(s64) ... +--- +name:sitofp_s64_s32_fpr_both +legalized: true +regBankSelected: true + +registers: + - { id: 0, class: fpr } + - { id: 1, class: fpr } + +body: | + bb.0: +liveins: $s0 + +; CHECK-LABEL: name: sitofp_s64_s32_fpr +; CHECK: [[COPY:%[0-9]+]]:fpr32 = COPY $s0 +; CHECK: [[COPY2:%[0-9]+]]:gpr32 = COPY [[COPY]] +; CHECK: [[SCVTFUWDri:%[0-9]+]]:fpr64 = SCVTFUWDri [[COPY2]] +; CHECK: $d0 = COPY [[SCVTFUWDri]] +%0(s32) = COPY $s0 +%1(s64) = G_SITOFP %0 +$d0 = COPY %1(s64) +... + --- name:sitofp_s64_s64_fpr legalized: true ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] aa8a2d8 - [AArch64][GlobalISel] Select immediate fcmp if the zero is on the LHS.
Author: Amara Emerson Date: 2021-01-15T14:31:39-08:00 New Revision: aa8a2d8a3da3704f82ba4ea3a6e7b463737597e1 URL: https://github.com/llvm/llvm-project/commit/aa8a2d8a3da3704f82ba4ea3a6e7b463737597e1 DIFF: https://github.com/llvm/llvm-project/commit/aa8a2d8a3da3704f82ba4ea3a6e7b463737597e1.diff LOG: [AArch64][GlobalISel] Select immediate fcmp if the zero is on the LHS. Added: Modified: llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp llvm/test/CodeGen/AArch64/GlobalISel/select-fcmp.mir Removed: diff --git a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp index 797f33ce2ab4..b24fad35e32b 100644 --- a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp +++ b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp @@ -4224,6 +4224,14 @@ AArch64InstructionSelector::emitFPCompare(Register LHS, Register RHS, // to explicitly materialize a constant. const ConstantFP *FPImm = getConstantFPVRegVal(RHS, MRI); bool ShouldUseImm = FPImm && (FPImm->isZero() && !FPImm->isNegative()); + if (!ShouldUseImm) { +// Try commutating the operands. +const ConstantFP *LHSImm = getConstantFPVRegVal(LHS, MRI); +if (LHSImm && (LHSImm->isZero() && !LHSImm->isNegative())) { + ShouldUseImm = true; + std::swap(LHS, RHS); +} + } unsigned CmpOpcTbl[2][2] = {{AArch64::FCMPSrr, AArch64::FCMPDrr}, {AArch64::FCMPSri, AArch64::FCMPDri}}; unsigned CmpOpc = CmpOpcTbl[ShouldUseImm][OpSize == 64]; diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/select-fcmp.mir b/llvm/test/CodeGen/AArch64/GlobalISel/select-fcmp.mir index 45799079f920..c12cd3343c7e 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/select-fcmp.mir +++ b/llvm/test/CodeGen/AArch64/GlobalISel/select-fcmp.mir @@ -107,3 +107,30 @@ body: | %3:gpr(s32) = G_FCMP floatpred(oeq), %0(s64), %2 $s0 = COPY %3(s32) RET_ReallyLR implicit $s0 +... + +--- +name:zero_lhs +alignment: 4 +legalized: true +regBankSelected: true +tracksRegLiveness: true +body: | + bb.1: +liveins: $s0, $s1 + +; CHECK-LABEL: name: zero_lhs +; CHECK: liveins: $s0, $s1 +; CHECK: [[COPY:%[0-9]+]]:fpr32 = COPY $s0 +; CHECK: FCMPSri [[COPY]], implicit-def $nzcv +; CHECK: [[CSINCWr:%[0-9]+]]:gpr32 = CSINCWr $wzr, $wzr, 1, implicit $nzcv +; CHECK: $s0 = COPY [[CSINCWr]] +; CHECK: RET_ReallyLR implicit $s0 +%0:fpr(s32) = COPY $s0 +%1:fpr(s32) = COPY $s1 +%2:fpr(s32) = G_FCONSTANT float 0.00e+00 +%3:gpr(s32) = G_FCMP floatpred(oeq), %2(s32), %0 +$s0 = COPY %3(s32) +RET_ReallyLR implicit $s0 + +... ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 8456c3a - AArch64: fix regression introduced by fcmp immediate selection.
Author: Amara Emerson Date: 2021-01-15T22:53:25-08:00 New Revision: 8456c3a789285079ad35d146e487436b5a27b027 URL: https://github.com/llvm/llvm-project/commit/8456c3a789285079ad35d146e487436b5a27b027 DIFF: https://github.com/llvm/llvm-project/commit/8456c3a789285079ad35d146e487436b5a27b027.diff LOG: AArch64: fix regression introduced by fcmp immediate selection. Forgot to check if the predicate is safe to commutate operands. Added: Modified: llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp llvm/test/CodeGen/AArch64/GlobalISel/select-fcmp.mir Removed: diff --git a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp index b24fad35e32b..0021456a596d 100644 --- a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp +++ b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp @@ -34,6 +34,7 @@ #include "llvm/CodeGen/MachineRegisterInfo.h" #include "llvm/CodeGen/TargetOpcodes.h" #include "llvm/IR/Constants.h" +#include "llvm/IR/Instructions.h" #include "llvm/IR/PatternMatch.h" #include "llvm/IR/Type.h" #include "llvm/IR/IntrinsicsAArch64.h" @@ -177,8 +178,10 @@ class AArch64InstructionSelector : public InstructionSelector { MachineIRBuilder &MIRBuilder) const; /// Emit a floating point comparison between \p LHS and \p RHS. + /// \p Pred if given is the intended predicate to use. MachineInstr *emitFPCompare(Register LHS, Register RHS, - MachineIRBuilder &MIRBuilder) const; + MachineIRBuilder &MIRBuilder, + Optional = None) const; MachineInstr *emitInstr(unsigned Opcode, std::initializer_list DstOps, @@ -1483,11 +1486,11 @@ bool AArch64InstructionSelector::selectCompareBranchFedByFCmp( assert(I.getOpcode() == TargetOpcode::G_BRCOND); // Unfortunately, the mapping of LLVM FP CC's onto AArch64 CC's isn't // totally clean. Some of them require two branches to implement. - emitFPCompare(FCmp.getOperand(2).getReg(), FCmp.getOperand(3).getReg(), MIB); + auto Pred = (CmpInst::Predicate)FCmp.getOperand(1).getPredicate(); + emitFPCompare(FCmp.getOperand(2).getReg(), FCmp.getOperand(3).getReg(), MIB, +Pred); AArch64CC::CondCode CC1, CC2; - changeFCMPPredToAArch64CC( - static_cast(FCmp.getOperand(1).getPredicate()), CC1, - CC2); + changeFCMPPredToAArch64CC(static_cast(Pred), CC1, CC2); MachineBasicBlock *DestMBB = I.getOperand(1).getMBB(); MIB.buildInstr(AArch64::Bcc, {}, {}).addImm(CC1).addMBB(DestMBB); if (CC2 != AArch64CC::AL) @@ -3090,7 +3093,7 @@ bool AArch64InstructionSelector::select(MachineInstr &I) { CmpInst::Predicate Pred = static_cast(I.getOperand(1).getPredicate()); if (!emitFPCompare(I.getOperand(2).getReg(), I.getOperand(3).getReg(), - MIRBuilder) || + MIRBuilder, Pred) || !emitCSetForFCmp(I.getOperand(0).getReg(), Pred, MIRBuilder)) return false; I.eraseFromParent(); @@ -4211,7 +4214,8 @@ MachineInstr *AArch64InstructionSelector::emitCSetForFCmp( MachineInstr * AArch64InstructionSelector::emitFPCompare(Register LHS, Register RHS, - MachineIRBuilder &MIRBuilder) const { + MachineIRBuilder &MIRBuilder, + Optional Pred) const { MachineRegisterInfo &MRI = *MIRBuilder.getMRI(); LLT Ty = MRI.getType(LHS); if (Ty.isVector()) @@ -4224,7 +4228,12 @@ AArch64InstructionSelector::emitFPCompare(Register LHS, Register RHS, // to explicitly materialize a constant. const ConstantFP *FPImm = getConstantFPVRegVal(RHS, MRI); bool ShouldUseImm = FPImm && (FPImm->isZero() && !FPImm->isNegative()); - if (!ShouldUseImm) { + + auto IsEqualityPred = [](CmpInst::Predicate P) { +return P == CmpInst::FCMP_OEQ || P == CmpInst::FCMP_ONE || + P == CmpInst::FCMP_UEQ || P == CmpInst::FCMP_UNE; + }; + if (!ShouldUseImm && Pred && IsEqualityPred(*Pred)) { // Try commutating the operands. const ConstantFP *LHSImm = getConstantFPVRegVal(LHS, MRI); if (LHSImm && (LHSImm->isZero() && !LHSImm->isNegative())) { diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/select-fcmp.mir b/llvm/test/CodeGen/AArch64/GlobalISel/select-fcmp.mir index c12cd3343c7e..cde785a6a446 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/select-fcmp.mir +++ b/llvm/test/CodeGen/AArch64/GlobalISel/select-fcmp.mir @@ -134,3 +134,29 @@ body: | RET_ReallyLR implicit $s0 ... +--- +name:zero_lhs_not_commutative_pred +alignment: 4 +legalized: true +regBankSelected: true +tracksRegLiveness: true +body: | + bb.1: +liveins: $s0, $s1 + +
[llvm-branch-commits] [llvm] 3dedad4 - [AArch64][GlobalISel] Make G_USUBO legal and select it.
Author: Cassie Jones Date: 2021-01-21T18:53:33-08:00 New Revision: 3dedad475da45c05bc4f66cd14e9f44581edf0bc URL: https://github.com/llvm/llvm-project/commit/3dedad475da45c05bc4f66cd14e9f44581edf0bc DIFF: https://github.com/llvm/llvm-project/commit/3dedad475da45c05bc4f66cd14e9f44581edf0bc.diff LOG: [AArch64][GlobalISel] Make G_USUBO legal and select it. The expansion for wide subtractions includes G_USUBO. Differential Revision: https://reviews.llvm.org/D95032 Added: llvm/test/CodeGen/AArch64/GlobalISel/select-saddo.mir llvm/test/CodeGen/AArch64/GlobalISel/select-ssubo.mir llvm/test/CodeGen/AArch64/GlobalISel/select-usubo.mir Modified: llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir llvm/test/CodeGen/AArch64/GlobalISel/legalize-sub.mir llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir Removed: diff --git a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp index 9619bb43ae9c..43ad18101069 100644 --- a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp +++ b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp @@ -2745,7 +2745,8 @@ bool AArch64InstructionSelector::select(MachineInstr &I) { } case TargetOpcode::G_SADDO: case TargetOpcode::G_UADDO: - case TargetOpcode::G_SSUBO: { + case TargetOpcode::G_SSUBO: + case TargetOpcode::G_USUBO: { // Emit the operation and get the correct condition code. MachineIRBuilder MIRBuilder(I); auto OpAndCC = emitOverflowOp(Opcode, I.getOperand(0).getReg(), @@ -4376,6 +4377,8 @@ AArch64InstructionSelector::emitOverflowOp(unsigned Opcode, Register Dst, return std::make_pair(emitADDS(Dst, LHS, RHS, MIRBuilder), AArch64CC::HS); case TargetOpcode::G_SSUBO: return std::make_pair(emitSUBS(Dst, LHS, RHS, MIRBuilder), AArch64CC::VS); + case TargetOpcode::G_USUBO: +return std::make_pair(emitSUBS(Dst, LHS, RHS, MIRBuilder), AArch64CC::HS); } } diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp index cc7aada211bb..5a6c904e3f5d 100644 --- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp +++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp @@ -165,7 +165,8 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST) getActionDefinitionsBuilder({G_SMULH, G_UMULH}).legalFor({s32, s64}); - getActionDefinitionsBuilder({G_UADDE, G_USUBE, G_SADDO, G_SSUBO, G_UADDO}) + getActionDefinitionsBuilder( + {G_UADDE, G_USUBE, G_SADDO, G_SSUBO, G_UADDO, G_USUBO}) .legalFor({{s32, s1}, {s64, s1}}) .minScalar(0, s32); diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir index ab8510bf9d92..4f97d153d28b 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir +++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir @@ -73,6 +73,44 @@ body: | %5:_(s64) = G_ANYEXT %4(s8) $x0 = COPY %5(s64) +... +--- +name:test_scalar_uaddo_32 +body: | + bb.0.entry: +; CHECK-LABEL: name: test_scalar_uaddo_32 +; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $w0 +; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $w1 +; CHECK: [[UADDO:%[0-9]+]]:_(s32), [[UADDO1:%[0-9]+]]:_(s1) = G_UADDO [[COPY]], [[COPY1]] +; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[UADDO1]](s1) +; CHECK: $w0 = COPY [[UADDO]](s32) +; CHECK: $w1 = COPY [[ANYEXT]](s32) +%0:_(s32) = COPY $w0 +%1:_(s32) = COPY $w1 +%2:_(s32), %3:_(s1) = G_UADDO %0, %1 +%4:_(s32) = G_ANYEXT %3 +$w0 = COPY %2(s32) +$w1 = COPY %4(s32) + +... +--- +name:test_scalar_saddo_32 +body: | + bb.0.entry: +; CHECK-LABEL: name: test_scalar_saddo_32 +; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $w0 +; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $w1 +; CHECK: [[SADDO:%[0-9]+]]:_(s32), [[SADDO1:%[0-9]+]]:_(s1) = G_SADDO [[COPY]], [[COPY1]] +; CHECK: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[SADDO1]](s1) +; CHECK: $w0 = COPY [[SADDO]](s32) +; CHECK: $w1 = COPY [[ANYEXT]](s32) +%0:_(s32) = COPY $w0 +%1:_(s32) = COPY $w1 +%2:_(s32), %3:_(s1) = G_SADDO %0, %1 +%4:_(s32) = G_ANYEXT %3 +$w0 = COPY %2(s32) +$w1 = COPY %4(s32) + ... --- name:test_vector_add diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-sub.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-sub.mir index 32796e0948cc..b372a32eb7fc 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-sub.mir +++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-sub.mir @@ -1,6 +1,59 @@ # NOTE: Assertions have been autogenerated by utils/update_
[llvm-branch-commits] [llvm] 541d98e - [AArch64][GlobalISel] Implement widenScalar for signed overflow
Author: Cassie Jones Date: 2021-01-21T22:55:42-08:00 New Revision: 541d98efa222b00e16c67348810898c2fa11f398 URL: https://github.com/llvm/llvm-project/commit/541d98efa222b00e16c67348810898c2fa11f398 DIFF: https://github.com/llvm/llvm-project/commit/541d98efa222b00e16c67348810898c2fa11f398.diff LOG: [AArch64][GlobalISel] Implement widenScalar for signed overflow Implement widening for G_SADDO and G_SSUBO. Previously it was only implemented for G_UADDO and G_USUBO. Also add legalize-add/sub tests for narrow overflowing add/sub on AArch64. Differential Revision: https://reviews.llvm.org/D95034 Added: Modified: llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir llvm/test/CodeGen/AArch64/GlobalISel/legalize-sub.mir Removed: diff --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp index b9e32257d2c8..aef9e6f70c65 100644 --- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp +++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp @@ -1814,6 +1814,26 @@ LegalizerHelper::widenScalar(MachineInstr &MI, unsigned TypeIdx, LLT WideTy) { return widenScalarMergeValues(MI, TypeIdx, WideTy); case TargetOpcode::G_UNMERGE_VALUES: return widenScalarUnmergeValues(MI, TypeIdx, WideTy); + case TargetOpcode::G_SADDO: + case TargetOpcode::G_SSUBO: { +if (TypeIdx == 1) + return UnableToLegalize; // TODO +auto LHSExt = MIRBuilder.buildSExt(WideTy, MI.getOperand(2)); +auto RHSExt = MIRBuilder.buildSExt(WideTy, MI.getOperand(3)); +unsigned Opcode = MI.getOpcode() == TargetOpcode::G_SADDO + ? TargetOpcode::G_ADD + : TargetOpcode::G_SUB; +auto NewOp = MIRBuilder.buildInstr(Opcode, {WideTy}, {LHSExt, RHSExt}); +LLT OrigTy = MRI.getType(MI.getOperand(0).getReg()); +auto TruncOp = MIRBuilder.buildTrunc(OrigTy, NewOp); +auto ExtOp = MIRBuilder.buildSExt(WideTy, TruncOp); +// There is no overflow if the re-extended result is the same as NewOp. +MIRBuilder.buildICmp(CmpInst::ICMP_NE, MI.getOperand(1), NewOp, ExtOp); +// Now trunc the NewOp to the original result. +MIRBuilder.buildTrunc(MI.getOperand(0), NewOp); +MI.eraseFromParent(); +return Legalized; + } case TargetOpcode::G_UADDO: case TargetOpcode::G_USUBO: { if (TypeIdx == 1) diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir index 4f97d153d28b..f3564a950310 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir +++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir @@ -73,6 +73,66 @@ body: | %5:_(s64) = G_ANYEXT %4(s8) $x0 = COPY %5(s64) +... +--- +name:test_scalar_uaddo_small +body: | + bb.0.entry: +; CHECK-LABEL: name: test_scalar_uaddo_small +; CHECK: [[COPY:%[0-9]+]]:_(s64) = COPY $x0 +; CHECK: [[COPY1:%[0-9]+]]:_(s64) = COPY $x1 +; CHECK: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 255 +; CHECK: [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[COPY]](s64) +; CHECK: [[AND:%[0-9]+]]:_(s32) = G_AND [[TRUNC]], [[C]] +; CHECK: [[TRUNC1:%[0-9]+]]:_(s32) = G_TRUNC [[COPY1]](s64) +; CHECK: [[AND1:%[0-9]+]]:_(s32) = G_AND [[TRUNC1]], [[C]] +; CHECK: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[AND]], [[AND1]] +; CHECK: [[AND2:%[0-9]+]]:_(s32) = G_AND [[ADD]], [[C]] +; CHECK: [[ICMP:%[0-9]+]]:_(s32) = G_ICMP intpred(ne), [[ADD]](s32), [[AND2]] +; CHECK: [[ANYEXT:%[0-9]+]]:_(s64) = G_ANYEXT [[ADD]](s32) +; CHECK: [[ANYEXT1:%[0-9]+]]:_(s64) = G_ANYEXT [[ICMP]](s32) +; CHECK: $x0 = COPY [[ANYEXT]](s64) +; CHECK: $x1 = COPY [[ANYEXT1]](s64) +%0:_(s64) = COPY $x0 +%1:_(s64) = COPY $x1 +%2:_(s8) = G_TRUNC %0(s64) +%3:_(s8) = G_TRUNC %1(s64) +%4:_(s8), %5:_(s1) = G_UADDO %2, %3 +%6:_(s64) = G_ANYEXT %4(s8) +%7:_(s64) = G_ANYEXT %5(s1) +$x0 = COPY %6(s64) +$x1 = COPY %7(s64) + +... +--- +name:test_scalar_saddo_small +body: | + bb.0.entry: +; CHECK-LABEL: name: test_scalar_saddo_small +; CHECK: [[COPY:%[0-9]+]]:_(s64) = COPY $x0 +; CHECK: [[COPY1:%[0-9]+]]:_(s64) = COPY $x1 +; CHECK: [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[COPY]](s64) +; CHECK: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[TRUNC]], 8 +; CHECK: [[TRUNC1:%[0-9]+]]:_(s32) = G_TRUNC [[COPY1]](s64) +; CHECK: [[SEXT_INREG1:%[0-9]+]]:_(s32) = G_SEXT_INREG [[TRUNC1]], 8 +; CHECK: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[SEXT_INREG]], [[SEXT_INREG1]] +; CHECK: [[COPY2:%[0-9]+]]:_(s32) = COPY [[ADD]](s32) +; CHECK: [[SEXT_INREG2:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY2]], 8 +; CHECK: [[ICMP:%[0-9]+]]:_(s32) = G_ICMP intpred(ne), [[ADD]](s32), [[SEXT_INREG2]] +; CHECK: [[ANYEXT:%[0-9]+]]:_(s64) = G_ANYEXT [[ADD]](s32) +; C
[llvm-branch-commits] [llvm] 2bb92bf - [GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method
Author: Cassie Jones Date: 2021-01-22T14:08:46-08:00 New Revision: 2bb92bf451d7eb2c817f3e5403353e7c0c14d350 URL: https://github.com/llvm/llvm-project/commit/2bb92bf451d7eb2c817f3e5403353e7c0c14d350 DIFF: https://github.com/llvm/llvm-project/commit/2bb92bf451d7eb2c817f3e5403353e7c0c14d350.diff LOG: [GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method The widenScalar implementation for signed and unsigned overflowing operations were very similar: both are checked by truncating the result and then re-sign/zero-extending it and checking that it matches the computed operation. Using a truncate + zero-extend for the unsigned case instead of manually producing the AND instruction like before leads to an extra copy instruction during legalization, but this should be harmless. Differential Revision: https://reviews.llvm.org/D95035 Added: Modified: llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp llvm/test/CodeGen/AArch64/GlobalISel/legalize-add.mir llvm/test/CodeGen/AArch64/GlobalISel/legalize-sub.mir llvm/test/CodeGen/AArch64/legalize-uaddo.mir llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-uaddo.mir llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-usubo.mir llvm/unittests/CodeGen/GlobalISel/LegalizerHelperTest.cpp Removed: diff --git a/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h index 2e9c7d8250ba..c3b494e94ff1 100644 --- a/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h +++ b/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h @@ -170,8 +170,10 @@ class LegalizerHelper { widenScalarExtract(MachineInstr &MI, unsigned TypeIdx, LLT WideTy); LegalizeResult widenScalarInsert(MachineInstr &MI, unsigned TypeIdx, LLT WideTy); - LegalizeResult - widenScalarAddSubShlSat(MachineInstr &MI, unsigned TypeIdx, LLT WideTy); + LegalizeResult widenScalarAddoSubo(MachineInstr &MI, unsigned TypeIdx, + LLT WideTy); + LegalizeResult widenScalarAddSubShlSat(MachineInstr &MI, unsigned TypeIdx, + LLT WideTy); /// Helper function to split a wide generic register into bitwise blocks with /// the given Type (which implies the number of blocks needed). The generic diff --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp index aef9e6f70c65..e7f40523efaf 100644 --- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp +++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp @@ -1757,6 +1757,34 @@ LegalizerHelper::widenScalarInsert(MachineInstr &MI, unsigned TypeIdx, return Legalized; } +LegalizerHelper::LegalizeResult +LegalizerHelper::widenScalarAddoSubo(MachineInstr &MI, unsigned TypeIdx, + LLT WideTy) { + if (TypeIdx == 1) +return UnableToLegalize; // TODO + unsigned Op = MI.getOpcode(); + unsigned Opcode = Op == TargetOpcode::G_UADDO || Op == TargetOpcode::G_SADDO +? TargetOpcode::G_ADD +: TargetOpcode::G_SUB; + unsigned ExtOpcode = + Op == TargetOpcode::G_UADDO || Op == TargetOpcode::G_USUBO + ? TargetOpcode::G_ZEXT + : TargetOpcode::G_SEXT; + auto LHSExt = MIRBuilder.buildInstr(ExtOpcode, {WideTy}, {MI.getOperand(2)}); + auto RHSExt = MIRBuilder.buildInstr(ExtOpcode, {WideTy}, {MI.getOperand(3)}); + // Do the arithmetic in the larger type. + auto NewOp = MIRBuilder.buildInstr(Opcode, {WideTy}, {LHSExt, RHSExt}); + LLT OrigTy = MRI.getType(MI.getOperand(0).getReg()); + auto TruncOp = MIRBuilder.buildTrunc(OrigTy, NewOp); + auto ExtOp = MIRBuilder.buildInstr(ExtOpcode, {WideTy}, {TruncOp}); + // There is no overflow if the ExtOp is the same as NewOp. + MIRBuilder.buildICmp(CmpInst::ICMP_NE, MI.getOperand(1), NewOp, ExtOp); + // Now trunc the NewOp to the original result. + MIRBuilder.buildTrunc(MI.getOperand(0), NewOp); + MI.eraseFromParent(); + return Legalized; +} + LegalizerHelper::LegalizeResult LegalizerHelper::widenScalarAddSubShlSat(MachineInstr &MI, unsigned TypeIdx, LLT WideTy) { @@ -1815,48 +1843,10 @@ LegalizerHelper::widenScalar(MachineInstr &MI, unsigned TypeIdx, LLT WideTy) { case TargetOpcode::G_UNMERGE_VALUES: return widenScalarUnmergeValues(MI, TypeIdx, WideTy); case TargetOpcode::G_SADDO: - case TargetOpcode::G_SSUBO: { -if (TypeIdx == 1) - return UnableToLegalize; // TODO -auto LHSExt = MIRBuilder.buildSExt(WideTy, MI.getOperand(2)); -auto RHSExt = MIRBuilder.buildSExt(WideTy, MI.getOperand(3)); -unsigned Opcode = MI.getOpcode() == TargetOpcode::G_SADDO - ? TargetOpcode::G_ADD - : TargetOpcode::G_SUB; -auto NewOp = MIRBuilder.buildInstr(Op
[llvm-branch-commits] [llvm] a126569 - Fix failing triple test for macOS 11 with non-zero minor versions.
Author: Amara Emerson Date: 2021-01-06T14:57:37-08:00 New Revision: a1265690cf614bde8a7fd1d503c5f13c184dc786 URL: https://github.com/llvm/llvm-project/commit/a1265690cf614bde8a7fd1d503c5f13c184dc786 DIFF: https://github.com/llvm/llvm-project/commit/a1265690cf614bde8a7fd1d503c5f13c184dc786.diff LOG: Fix failing triple test for macOS 11 with non-zero minor versions. Differential Revision: https://reviews.llvm.org/D94197 Added: Modified: llvm/unittests/ADT/TripleTest.cpp llvm/unittests/Support/Host.cpp Removed: diff --git a/llvm/unittests/ADT/TripleTest.cpp b/llvm/unittests/ADT/TripleTest.cpp index ffce07ba2b12..ff6c2dde4b16 100644 --- a/llvm/unittests/ADT/TripleTest.cpp +++ b/llvm/unittests/ADT/TripleTest.cpp @@ -1264,6 +1264,14 @@ TEST(TripleTest, getOSVersion) { EXPECT_EQ((unsigned)0, Minor); EXPECT_EQ((unsigned)0, Micro); + // For darwin triples on macOS 11, only compare the major version. + T = Triple("x86_64-apple-darwin20.2"); + EXPECT_TRUE(T.isMacOSX()); + T.getMacOSXVersion(Major, Minor, Micro); + EXPECT_EQ((unsigned)11, Major); + EXPECT_EQ((unsigned)0, Minor); + EXPECT_EQ((unsigned)0, Micro); + T = Triple("armv7-apple-ios"); EXPECT_FALSE(T.isMacOSX()); EXPECT_TRUE(T.isiOS()); diff --git a/llvm/unittests/Support/Host.cpp b/llvm/unittests/Support/Host.cpp index 8029bb5830fc..b452048361db 100644 --- a/llvm/unittests/Support/Host.cpp +++ b/llvm/unittests/Support/Host.cpp @@ -348,9 +348,15 @@ TEST_F(HostTest, getMacOSHostVersion) { unsigned HostMajor, HostMinor, HostMicro; ASSERT_EQ(HostTriple.getMacOSXVersion(HostMajor, HostMinor, HostMicro), true); - // Don't compare the 'Micro' version, as it's always '0' for the 'Darwin' - // triples. - ASSERT_EQ(std::tie(SystemMajor, SystemMinor), std::tie(HostMajor, HostMinor)); + if (SystemMajor > 10) { +// Don't compare the 'Minor' and 'Micro' versions, as they're always '0' for +// the 'Darwin' triples on 11.x. +ASSERT_EQ(SystemMajor, HostMajor); + } else { +// Don't compare the 'Micro' version, as it's always '0' for the 'Darwin' +// triples. +ASSERT_EQ(std::tie(SystemMajor, SystemMinor), std::tie(HostMajor, HostMinor)); + } } #endif ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] a69b76c - [GlobalISel][IRTranslator] Ensure branch probabilities are added when translating invoke edges.
Author: Amara Emerson Date: 2020-12-14T23:36:54-08:00 New Revision: a69b76c500849bacc0ba494df03b465e4bcff0ef URL: https://github.com/llvm/llvm-project/commit/a69b76c500849bacc0ba494df03b465e4bcff0ef DIFF: https://github.com/llvm/llvm-project/commit/a69b76c500849bacc0ba494df03b465e4bcff0ef.diff LOG: [GlobalISel][IRTranslator] Ensure branch probabilities are added when translating invoke edges. This uses a straightforward port of findUnwindDestinations() from SelectionDAG. Differential Revision: https://reviews.llvm.org/D93256 Added: llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-invoke-probabilities.ll Modified: llvm/include/llvm/CodeGen/GlobalISel/IRTranslator.h llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp Removed: diff --git a/llvm/include/llvm/CodeGen/GlobalISel/IRTranslator.h b/llvm/include/llvm/CodeGen/GlobalISel/IRTranslator.h index 37c94ccbbd20..8eab8a5846a7 100644 --- a/llvm/include/llvm/CodeGen/GlobalISel/IRTranslator.h +++ b/llvm/include/llvm/CodeGen/GlobalISel/IRTranslator.h @@ -260,6 +260,19 @@ class IRTranslator : public MachineFunctionPass { /// \pre \p U is a call instruction. bool translateCall(const User &U, MachineIRBuilder &MIRBuilder); + /// When an invoke or a cleanupret unwinds to the next EH pad, there are + /// many places it could ultimately go. In the IR, we have a single unwind + /// destination, but in the machine CFG, we enumerate all the possible blocks. + /// This function skips over imaginary basic blocks that hold catchswitch + /// instructions, and finds all the "real" machine + /// basic block destinations. As those destinations may not be successors of + /// EHPadBB, here we also calculate the edge probability to those + /// destinations. The passed-in Prob is the edge probability to EHPadBB. + bool findUnwindDestinations( + const BasicBlock *EHPadBB, BranchProbability Prob, + SmallVectorImpl> + &UnwindDests); + bool translateInvoke(const User &U, MachineIRBuilder &MIRBuilder); bool translateCallBr(const User &U, MachineIRBuilder &MIRBuilder); @@ -659,8 +672,9 @@ class IRTranslator : public MachineFunctionPass { BranchProbability getEdgeProbability(const MachineBasicBlock *Src, const MachineBasicBlock *Dst) const; - void addSuccessorWithProb(MachineBasicBlock *Src, MachineBasicBlock *Dst, -BranchProbability Prob); + void addSuccessorWithProb( + MachineBasicBlock *Src, MachineBasicBlock *Dst, + BranchProbability Prob = BranchProbability::getUnknown()); public: IRTranslator(CodeGenOpt::Level OptLevel = CodeGenOpt::None); diff --git a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp index 202163ff9507..dde97ba599b9 100644 --- a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp +++ b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp @@ -2347,6 +2347,62 @@ bool IRTranslator::translateCall(const User &U, MachineIRBuilder &MIRBuilder) { return true; } +bool IRTranslator::findUnwindDestinations( +const BasicBlock *EHPadBB, +BranchProbability Prob, +SmallVectorImpl> +&UnwindDests) { + EHPersonality Personality = classifyEHPersonality( + EHPadBB->getParent()->getFunction().getPersonalityFn()); + bool IsMSVCCXX = Personality == EHPersonality::MSVC_CXX; + bool IsCoreCLR = Personality == EHPersonality::CoreCLR; + bool IsWasmCXX = Personality == EHPersonality::Wasm_CXX; + bool IsSEH = isAsynchronousEHPersonality(Personality); + + if (IsWasmCXX) { +// Ignore this for now. +return false; + } + + while (EHPadBB) { +const Instruction *Pad = EHPadBB->getFirstNonPHI(); +BasicBlock *NewEHPadBB = nullptr; +if (isa(Pad)) { + // Stop on landingpads. They are not funclets. + UnwindDests.emplace_back(&getMBB(*EHPadBB), Prob); + break; +} +if (isa(Pad)) { + // Stop on cleanup pads. Cleanups are always funclet entries for all known + // personalities. + UnwindDests.emplace_back(&getMBB(*EHPadBB), Prob); + UnwindDests.back().first->setIsEHScopeEntry(); + UnwindDests.back().first->setIsEHFuncletEntry(); + break; +} +if (auto *CatchSwitch = dyn_cast(Pad)) { + // Add the catchpad handlers to the possible destinations. + for (const BasicBlock *CatchPadBB : CatchSwitch->handlers()) { +UnwindDests.emplace_back(&getMBB(*CatchPadBB), Prob); +// For MSVC++ and the CLR, catchblocks are funclets and need prologues. +if (IsMSVCCXX || IsCoreCLR) + UnwindDests.back().first->setIsEHFuncletEntry(); +if (!IsSEH) + UnwindDests.back().first->setIsEHScopeEntry(); + } + NewEHPadBB = CatchSwitch->getUnwindDest(); +} else { + continue; +} + +BranchProbabilityInfo *BPI = FuncInfo.BPI; +if (BPI && NewEHPadBB) + Prob *= BPI->g
[llvm-branch-commits] [llvm] 9caca72 - [AArch64][GlobalISel] Use the look-through constant helper for the shift s32->s64 custom legalization.
Author: Amara Emerson Date: 2020-12-18T11:57:24-08:00 New Revision: 9caca7241d447266a23a99ea0536f30faaf19694 URL: https://github.com/llvm/llvm-project/commit/9caca7241d447266a23a99ea0536f30faaf19694 DIFF: https://github.com/llvm/llvm-project/commit/9caca7241d447266a23a99ea0536f30faaf19694.diff LOG: [AArch64][GlobalISel] Use the look-through constant helper for the shift s32->s64 custom legalization. Almost NFC, except it catches more cases and gives a 0.1% CTMark -O0 size win. Added: Modified: llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp llvm/test/CodeGen/AArch64/GlobalISel/legalize-unmerge-values.mir Removed: diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp index 2eaec0b970fa..3dcc244a08fa 100644 --- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp +++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp @@ -710,16 +710,14 @@ bool AArch64LegalizerInfo::legalizeShlAshrLshr( // If the shift amount is a G_CONSTANT, promote it to a 64 bit type so the // imported patterns can select it later. Either way, it will be legal. Register AmtReg = MI.getOperand(2).getReg(); - auto *CstMI = MRI.getVRegDef(AmtReg); - assert(CstMI && "expected to find a vreg def"); - if (CstMI->getOpcode() != TargetOpcode::G_CONSTANT) + auto VRegAndVal = getConstantVRegValWithLookThrough(AmtReg, MRI); + if (!VRegAndVal) return true; // Check the shift amount is in range for an immediate form. - unsigned Amount = CstMI->getOperand(1).getCImm()->getZExtValue(); + int64_t Amount = VRegAndVal->Value; if (Amount > 31) return true; // This will have to remain a register variant. - assert(MRI.getType(AmtReg).getSizeInBits() == 32); - auto ExtCst = MIRBuilder.buildZExt(LLT::scalar(64), AmtReg); + auto ExtCst = MIRBuilder.buildConstant(LLT::scalar(64), Amount); MI.getOperand(2).setReg(ExtCst.getReg(0)); return true; } diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-unmerge-values.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-unmerge-values.mir index 56c5b8a8f1e2..9c1f6fc6f41b 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-unmerge-values.mir +++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-unmerge-values.mir @@ -24,9 +24,10 @@ body: | ; CHECK-LABEL: name: test_unmerge_s4 ; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $w0 ; CHECK: [[UV:%[0-9]+]]:_(s8), [[UV1:%[0-9]+]]:_(s8), [[UV2:%[0-9]+]]:_(s8), [[UV3:%[0-9]+]]:_(s8) = G_UNMERGE_VALUES [[COPY]](s32) -; CHECK: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 4 +; CHECK: [[C:%[0-9]+]]:_(s8) = G_CONSTANT i8 4 ; CHECK: [[ZEXT:%[0-9]+]]:_(s32) = G_ZEXT [[UV]](s8) -; CHECK: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[ZEXT]], [[C]](s32) +; CHECK: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 4 +; CHECK: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[ZEXT]], [[C1]](s64) ; CHECK: [[ANYEXT:%[0-9]+]]:_(s64) = G_ANYEXT [[UV]](s8) ; CHECK: [[ANYEXT1:%[0-9]+]]:_(s64) = G_ANYEXT [[LSHR]](s32) ; CHECK: $x0 = COPY [[ANYEXT]](s64) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 43ff75f - [AArch64][GlobalISel] Promote scalar G_SHL constant shift amounts to s64.
Author: Amara Emerson Date: 2020-12-18T11:57:38-08:00 New Revision: 43ff75f2c3feef64f9d73328230d34dac8832a91 URL: https://github.com/llvm/llvm-project/commit/43ff75f2c3feef64f9d73328230d34dac8832a91 DIFF: https://github.com/llvm/llvm-project/commit/43ff75f2c3feef64f9d73328230d34dac8832a91.diff LOG: [AArch64][GlobalISel] Promote scalar G_SHL constant shift amounts to s64. This was supposed to be done in the first place as is currently the case for G_ASHR and G_LSHR but was forgotten when the original shift legalization overhaul was done last year. This was exposed because we started falling back on s32 = s32, s64 SHLs due to a recent combiner change. Gives a very minor (0.1%) code size -O0 improvement on consumer-typeset. Added: Modified: llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp llvm/test/CodeGen/AArch64/GlobalISel/legalize-merge-values.mir llvm/test/CodeGen/AArch64/GlobalISel/legalize-non-pow2-load-store.mir llvm/test/CodeGen/AArch64/GlobalISel/legalize-shift.mir llvm/test/CodeGen/AArch64/arm64-clrsb.ll Removed: diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp index 3dcc244a08fa..4ffde2a7e3c4 100644 --- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp +++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp @@ -97,15 +97,25 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST) .moreElementsToNextPow2(0); getActionDefinitionsBuilder(G_SHL) -.legalFor({{s32, s32}, {s64, s64}, - {v2s32, v2s32}, {v4s32, v4s32}, {v2s64, v2s64}}) -.clampScalar(1, s32, s64) -.clampScalar(0, s32, s64) -.widenScalarToNextPow2(0) -.clampNumElements(0, v2s32, v4s32) -.clampNumElements(0, v2s64, v2s64) -.moreElementsToNextPow2(0) -.minScalarSameAs(1, 0); + .customIf([=](const LegalityQuery &Query) { +const auto &SrcTy = Query.Types[0]; +const auto &AmtTy = Query.Types[1]; +return !SrcTy.isVector() && SrcTy.getSizeInBits() == 32 && + AmtTy.getSizeInBits() == 32; + }) + .legalFor({{s32, s32}, + {s64, s64}, + {s32, s64}, + {v2s32, v2s32}, + {v4s32, v4s32}, + {v2s64, v2s64}}) + .clampScalar(1, s32, s64) + .clampScalar(0, s32, s64) + .widenScalarToNextPow2(0) + .clampNumElements(0, v2s32, v4s32) + .clampNumElements(0, v2s64, v2s64) + .moreElementsToNextPow2(0) + .minScalarSameAs(1, 0); getActionDefinitionsBuilder(G_PTR_ADD) .legalFor({{p0, s64}, {v2p0, v2s64}}) diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-merge-values.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-merge-values.mir index 09ae228b4f1d..a802baca4c8d 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-merge-values.mir +++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-merge-values.mir @@ -6,11 +6,12 @@ name:test_merge_s4 body: | bb.0: ; CHECK-LABEL: name: test_merge_s4 -; CHECK: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 4 +; CHECK: [[C:%[0-9]+]]:_(s8) = G_CONSTANT i8 4 ; CHECK: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 15 ; CHECK: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 ; CHECK: [[AND:%[0-9]+]]:_(s32) = G_AND [[C2]], [[C1]] -; CHECK: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND]], [[C]](s32) +; CHECK: [[C3:%[0-9]+]]:_(s64) = G_CONSTANT i64 4 +; CHECK: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND]], [[C3]](s64) ; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY [[C2]](s32) ; CHECK: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY]], [[C1]] ; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY [[SHL]](s32) diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-non-pow2-load-store.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-non-pow2-load-store.mir index 7d7b77aa7535..6dc28e738dbc 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-non-pow2-load-store.mir +++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-non-pow2-load-store.mir @@ -28,12 +28,11 @@ body: | ; CHECK: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 2 ; CHECK: [[PTR_ADD:%[0-9]+]]:_(p0) = G_PTR_ADD [[COPY]], [[C1]](s64) ; CHECK: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[PTR_ADD]](p0) :: (load 1 from %ir.ptr + 2, align 4) -; CHECK: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 16 -; CHECK: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[LOAD]], [[C2]](s32) +; CHECK: [[C2:%[0-9]+]]:_(s64) = G_CONSTANT i64 16 +; CHECK: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[LOAD]], [[C2]](s64) ; CHECK: [[OR:%[0-9]+]]:_(s32) = G_OR [[SHL]], [[ZEXTLOAD]] ; CHECK: [[COPY2:%[0-9]+]]:_(s32) = COPY [[OR]](s32) -; CHECK: [[C3:%[0-9]+]]:_(s64) = G_CONSTANT i64 16 -; CHECK: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[COPY2]], [[C3]](s64) +; CHECK: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[COPY
[llvm-branch-commits] [llvm] e0721a0 - [AArch64][GlobalISel] Notify observer of mutated instruction for shift custom legalization.
Author: Amara Emerson Date: 2020-12-25T00:31:47-08:00 New Revision: e0721a0992288122d62940f622b4c2127098a2da URL: https://github.com/llvm/llvm-project/commit/e0721a0992288122d62940f622b4c2127098a2da DIFF: https://github.com/llvm/llvm-project/commit/e0721a0992288122d62940f622b4c2127098a2da.diff LOG: [AArch64][GlobalISel] Notify observer of mutated instruction for shift custom legalization. No test for this because it's a CSE verifier failure that's only exposed in a WIP patch for enabling CSE throughout the AArch64 GISel pipeline. Added: Modified: llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp Removed: diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp index 0774f7b02dd2..a611d68cb2e5 100644 --- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp +++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp @@ -841,7 +841,9 @@ bool AArch64LegalizerInfo::legalizeShlAshrLshr( if (Amount > 31) return true; // This will have to remain a register variant. auto ExtCst = MIRBuilder.buildConstant(LLT::scalar(64), Amount); + Observer.changingInstr(MI); MI.getOperand(2).setReg(ExtCst.getReg(0)); + Observer.changedInstr(MI); return true; } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 7df3544 - [GlobalISel] Fix assertion failures after "GlobalISel: Return APInt from getConstantVRegVal" landed.
Author: Amara Emerson Date: 2020-12-26T23:51:44-08:00 New Revision: 7df3544e80fb40c742707613cd39ca77f7fea558 URL: https://github.com/llvm/llvm-project/commit/7df3544e80fb40c742707613cd39ca77f7fea558 DIFF: https://github.com/llvm/llvm-project/commit/7df3544e80fb40c742707613cd39ca77f7fea558.diff LOG: [GlobalISel] Fix assertion failures after "GlobalISel: Return APInt from getConstantVRegVal" landed. APInt binary ops don't promote types but instead assert, which a combine was relying on. Added: llvm/test/CodeGen/AArch64/GlobalISel/combine-shift-immed-mismatch-crash.mir Modified: llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp Removed: diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp index 90b1dcea2648..abc23da3d418 100644 --- a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp +++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp @@ -1570,7 +1570,8 @@ bool CombinerHelper::matchShiftImmedChain(MachineInstr &MI, return false; // Pass the combined immediate to the apply function. - MatchInfo.Imm = (MaybeImmVal->Value + MaybeImm2Val->Value).getSExtValue(); + MatchInfo.Imm = + (MaybeImmVal->Value.getSExtValue() + MaybeImm2Val->Value).getSExtValue(); MatchInfo.Reg = Base; // There is no simple replacement for a saturating unsigned left shift that diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-shift-immed-mismatch-crash.mir b/llvm/test/CodeGen/AArch64/GlobalISel/combine-shift-immed-mismatch-crash.mir new file mode 100644 index ..481c71fbed60 --- /dev/null +++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-shift-immed-mismatch-crash.mir @@ -0,0 +1,58 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py +# RUN: llc -run-pass=aarch64-prelegalizer-combiner -verify-machineinstrs -mtriple aarch64-unknown-unknown %s -o - | FileCheck %s +--- +name:shift_immed_chain_mismatch_size_crash +alignment: 4 +tracksRegLiveness: true +liveins: + - { reg: '$x0' } +body: | + ; CHECK-LABEL: name: shift_immed_chain_mismatch_size_crash + ; CHECK: bb.0: + ; CHECK: successors: %bb.1(0x4000), %bb.2(0x4000) + ; CHECK: liveins: $x0 + ; CHECK: [[DEF:%[0-9]+]]:_(p0) = G_IMPLICIT_DEF + ; CHECK: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16 + ; CHECK: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 9 + ; CHECK: [[DEF1:%[0-9]+]]:_(s1) = G_IMPLICIT_DEF + ; CHECK: G_BRCOND [[DEF1]](s1), %bb.2 + ; CHECK: G_BR %bb.1 + ; CHECK: bb.1: + ; CHECK: successors: + ; CHECK: bb.2: + ; CHECK: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[DEF]](p0) :: (load 4 from `i32* undef`, align 8) + ; CHECK: [[MUL:%[0-9]+]]:_(s32) = nsw G_MUL [[C]], [[LOAD]] + ; CHECK: [[MUL1:%[0-9]+]]:_(s32) = nsw G_MUL [[MUL]], [[C1]] + ; CHECK: [[C2:%[0-9]+]]:_(s64) = G_CONSTANT i64 2 + ; CHECK: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[MUL1]], [[C2]](s64) + ; CHECK: $w0 = COPY [[SHL]](s32) + ; CHECK: RET_ReallyLR implicit $w0 + bb.1: +liveins: $x0 + +%0:_(p0) = COPY $x0 +%1:_(s1) = G_IMPLICIT_DEF +%3:_(p0) = G_IMPLICIT_DEF +%4:_(s32) = G_CONSTANT i32 16 +%6:_(s32) = G_CONSTANT i32 9 +%8:_(s32) = G_CONSTANT i32 2 +%11:_(s64) = G_CONSTANT i64 2 +G_BRCOND %1(s1), %bb.2 +G_BR %bb.3 + + bb.2: +successors: + + + bb.3: +%2:_(s32) = G_LOAD %3(p0) :: (load 4 from `i32* undef`, align 8) +%5:_(s32) = nsw G_MUL %4, %2 +%7:_(s32) = nsw G_MUL %5, %6 +%9:_(s32) = nsw G_MUL %7, %8 +%10:_(s64) = G_SEXT %9(s32) +%12:_(s64) = G_MUL %10, %11 +%13:_(s32) = G_TRUNC %12(s64) +$w0 = COPY %13(s32) +RET_ReallyLR implicit $w0 + +... ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 4ab45cc - [AArch64][GlobalISel] Add some more legal types for G_PHI, G_IMPLICIT_DEF, G_FREEZE.
Author: Amara Emerson Date: 2020-09-30T17:25:33-07:00 New Revision: 4ab45cc2260d87f18e1b05517d5d366b2e754b72 URL: https://github.com/llvm/llvm-project/commit/4ab45cc2260d87f18e1b05517d5d366b2e754b72 DIFF: https://github.com/llvm/llvm-project/commit/4ab45cc2260d87f18e1b05517d5d366b2e754b72.diff LOG: [AArch64][GlobalISel] Add some more legal types for G_PHI, G_IMPLICIT_DEF, G_FREEZE. Also use this opportunity start to clean up the mess of vector type lists we have in the LegalizerInfo. Unfortunately since the legalizer rule builders require std::initializer_list objects as parameters we can't programmatically generate the type lists. Added: Modified: llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp llvm/test/CodeGen/AArch64/GlobalISel/legalize-freeze.mir llvm/test/CodeGen/AArch64/GlobalISel/legalize-phi.mir Removed: diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp index 7d013c439883..206e40999224 100644 --- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp +++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp @@ -54,6 +54,12 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST) const LLT v2s64 = LLT::vector(2, 64); const LLT v2p0 = LLT::vector(2, p0); + const auto PackedVectorAllTypeList = {/* Begin 128bit types */ +v16s8, v8s16, v4s32, v2s64, v2p0, +/* End 128bit types */ +/* Begin 64bit types */ +v8s8, v4s16, v2s32}; + const TargetMachine &TM = ST.getTargetLowering()->getTargetMachine(); // FIXME: support subtargets which have neon/fp-armv8 disabled. @@ -63,7 +69,8 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST) } getActionDefinitionsBuilder({G_IMPLICIT_DEF, G_FREEZE}) - .legalFor({p0, s1, s8, s16, s32, s64, v2s32, v4s32, v2s64, v16s8, v8s16}) + .legalFor({p0, s1, s8, s16, s32, s64}) + .legalFor(PackedVectorAllTypeList) .clampScalar(0, s1, s64) .widenScalarToNextPow2(0, 8) .fewerElementsIf( @@ -79,8 +86,8 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST) return std::make_pair(0, EltTy); }); - getActionDefinitionsBuilder(G_PHI) - .legalFor({p0, s16, s32, s64, v2s32, v4s32, v2s64}) + getActionDefinitionsBuilder(G_PHI).legalFor({p0, s16, s32, s64}) + .legalFor(PackedVectorAllTypeList) .clampScalar(0, s16, s64) .widenScalarToNextPow2(0); diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-freeze.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-freeze.mir index 9417df066a46..f6c15ec4925d 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-freeze.mir +++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-freeze.mir @@ -1,5 +1,5 @@ # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py -# RUN: llc -march=aarch64 -run-pass=legalizer -O0 %s -o - | FileCheck %s +# RUN: llc -march=aarch64 -run-pass=legalizer -global-isel-abort=1 -O0 %s -o - | FileCheck %s --- name:test_freeze_s64 body: | @@ -67,3 +67,21 @@ body: | $w0 = COPY %1 $w1 = COPY %2 ... +--- +name: test_freeze_v8s8 +body: | + bb.0: +liveins: $d0 + +; CHECK-LABEL: name: test_freeze_v8s8 +; CHECK: %d0:_(<8 x s8>) = COPY $d0 +; CHECK: [[FREEZE:%[0-9]+]]:_(<8 x s8>) = G_FREEZE %d0 +; CHECK: [[UV:%[0-9]+]]:_(<4 x s8>), [[UV1:%[0-9]+]]:_(<4 x s8>) = G_UNMERGE_VALUES [[FREEZE]](<8 x s8>) +; CHECK: $w0 = COPY [[UV]](<4 x s8>) +; CHECK: $w1 = COPY [[UV1]](<4 x s8>) +%d0:_(<8 x s8>) = COPY $d0 +%0:_(<8 x s8>) = G_FREEZE %d0 +%1:_(<4 x s8>), %2:_(<4 x s8>) = G_UNMERGE_VALUES %0 +$w0 = COPY %1 +$w1 = COPY %2 +... diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-phi.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-phi.mir index c909b27b83cc..b9fbd17c07da 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-phi.mir +++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-phi.mir @@ -1,51 +1,5 @@ # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py -# RUN: llc -O0 -mtriple=aarch64-unknown-unknown -verify-machineinstrs -run-pass=legalizer %s -o - | FileCheck %s | - ; ModuleID = '/tmp/test.ll' - source_filename = "/tmp/test.ll" - target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128" - target triple = "aarch64-unknown-unknown" - - define i32 @legalize_phi(i32 %argc) { - entry: -ret i32 0 - } - - define i64* @legalize_phi_ptr(i64* %a, i64* %b, i1 %cond) { - entry: -ret i64* null - } - - define i32 @legalize_phi_empty(i32 %argc) { - entry: -ret i32 0 - } - - define i32 @legalize_phi_loop(i32 %argc) { - entry: -
[llvm-branch-commits] [llvm] 87ff156 - [AArch64][GlobalISel] Fix crash during legalization of a vector G_SELECT with scalar mask.
Author: Amara Emerson Date: 2020-11-30T16:37:49-08:00 New Revision: 87ff156414370043cf149e0c77513c5227b336b1 URL: https://github.com/llvm/llvm-project/commit/87ff156414370043cf149e0c77513c5227b336b1 DIFF: https://github.com/llvm/llvm-project/commit/87ff156414370043cf149e0c77513c5227b336b1.diff LOG: [AArch64][GlobalISel] Fix crash during legalization of a vector G_SELECT with scalar mask. The lowering of vector selects needs to first splat the scalar mask into a vector first. This was causing a crash when building oggenc in the test suite. Differential Revision: https://reviews.llvm.org/D91655 Added: Modified: llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp llvm/test/CodeGen/AArch64/GlobalISel/legalize-select.mir Removed: diff --git a/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h b/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h index 0ce40e60e6fc..739600ead21a 100644 --- a/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h +++ b/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h @@ -958,6 +958,23 @@ class MachineIRBuilder { MachineInstrBuilder buildBuildVectorTrunc(const DstOp &Res, ArrayRef Ops); + /// Build and insert a vector splat of a scalar \p Src using a + /// G_INSERT_VECTOR_ELT and G_SHUFFLE_VECTOR idiom. + /// + /// \pre setBasicBlock or setMI must have been called. + /// \pre \p Src must have the same type as the element type of \p Dst + /// + /// \return a MachineInstrBuilder for the newly created instruction. + MachineInstrBuilder buildShuffleSplat(const DstOp &Res, const SrcOp &Src); + + /// Build and insert \p Res = G_SHUFFLE_VECTOR \p Src1, \p Src2, \p Mask + /// + /// \pre setBasicBlock or setMI must have been called. + /// + /// \return a MachineInstrBuilder for the newly created instruction. + MachineInstrBuilder buildShuffleVector(const DstOp &Res, const SrcOp &Src1, + const SrcOp &Src2, ArrayRef Mask); + /// Build and insert \p Res = G_CONCAT_VECTORS \p Op0, ... /// /// G_CONCAT_VECTORS creates a vector from the concatenation of 2 or more diff --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp index 1ad6109a65be..7b346a1bbbec 100644 --- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp +++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp @@ -6217,8 +6217,23 @@ LegalizerHelper::LegalizeResult LegalizerHelper::lowerSelect(MachineInstr &MI) { if (!DstTy.isVector()) return UnableToLegalize; - if (MaskTy.getSizeInBits() != Op1Ty.getSizeInBits()) + // Vector selects can have a scalar predicate. If so, splat into a vector and + // finish for later legalization attempts to try again. + if (MaskTy.isScalar()) { +Register MaskElt = MaskReg; +if (MaskTy.getSizeInBits() < DstTy.getScalarSizeInBits()) + MaskElt = MIRBuilder.buildSExt(DstTy.getElementType(), MaskElt).getReg(0); +// Generate a vector splat idiom to be pattern matched later. +auto ShufSplat = MIRBuilder.buildShuffleSplat(DstTy, MaskElt); +Observer.changingInstr(MI); +MI.getOperand(1).setReg(ShufSplat.getReg(0)); +Observer.changedInstr(MI); +return Legalized; + } + + if (MaskTy.getSizeInBits() != Op1Ty.getSizeInBits()) { return UnableToLegalize; + } auto NotMask = MIRBuilder.buildNot(MaskTy, MaskReg); auto NewOp1 = MIRBuilder.buildAnd(MaskTy, Op1Reg, MaskReg); diff --git a/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp b/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp index 4a0f70811057..1e39605c90be 100644 --- a/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp +++ b/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp @@ -635,6 +635,33 @@ MachineIRBuilder::buildBuildVectorTrunc(const DstOp &Res, return buildInstr(TargetOpcode::G_BUILD_VECTOR_TRUNC, Res, TmpVec); } +MachineInstrBuilder MachineIRBuilder::buildShuffleSplat(const DstOp &Res, +const SrcOp &Src) { + LLT DstTy = Res.getLLTTy(*getMRI()); + LLT SrcTy = Src.getLLTTy(*getMRI()); + assert(SrcTy == DstTy.getElementType() && "Expected Src to match Dst elt ty"); + auto UndefVec = buildUndef(DstTy); + auto Zero = buildConstant(LLT::scalar(64), 0); + auto InsElt = buildInsertVectorElement(DstTy, UndefVec, Src, Zero); + SmallVector ZeroMask(DstTy.getNumElements()); + return buildShuffleVector(DstTy, InsElt, UndefVec, ZeroMask); +} + +MachineInstrBuilder MachineIRBuilder::buildShuffleVector(const DstOp &Res, + const SrcOp &Src1, + const SrcOp &Src2, +
[llvm-branch-commits] [llvm] 2ac4d0f - [AArch64] Fix some minor coding style issues in AArch64CompressJumpTables
Author: Amara Emerson Date: 2020-12-07T12:48:09-08:00 New Revision: 2ac4d0f45a2a301163ca53f3e23e675f4f5bdbd3 URL: https://github.com/llvm/llvm-project/commit/2ac4d0f45a2a301163ca53f3e23e675f4f5bdbd3 DIFF: https://github.com/llvm/llvm-project/commit/2ac4d0f45a2a301163ca53f3e23e675f4f5bdbd3.diff LOG: [AArch64] Fix some minor coding style issues in AArch64CompressJumpTables Added: Modified: llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp Removed: diff --git a/llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp b/llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp index 57dc8a4061f1..c265592d05a7 100644 --- a/llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp +++ b/llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp @@ -59,7 +59,7 @@ class AArch64CompressJumpTables : public MachineFunctionPass { } }; char AArch64CompressJumpTables::ID = 0; -} +} // namespace INITIALIZE_PASS(AArch64CompressJumpTables, DEBUG_TYPE, "AArch64 compress jump tables pass", false, false) @@ -104,7 +104,7 @@ bool AArch64CompressJumpTables::compressJumpTable(MachineInstr &MI, int MaxOffset = std::numeric_limits::min(), MinOffset = std::numeric_limits::max(); MachineBasicBlock *MinBlock = nullptr; - for (auto Block : JT.MBBs) { + for (auto *Block : JT.MBBs) { int BlockOffset = BlockInfo[Block->getNumber()]; assert(BlockOffset % 4 == 0 && "misaligned basic block"); @@ -124,13 +124,14 @@ bool AArch64CompressJumpTables::compressJumpTable(MachineInstr &MI, } int Span = MaxOffset - MinOffset; - auto AFI = MF->getInfo(); + auto *AFI = MF->getInfo(); if (isUInt<8>(Span / 4)) { AFI->setJumpTableEntryInfo(JTIdx, 1, MinBlock->getSymbol()); MI.setDesc(TII->get(AArch64::JumpTableDest8)); ++NumJT8; return true; - } else if (isUInt<16>(Span / 4)) { + } + if (isUInt<16>(Span / 4)) { AFI->setJumpTableEntryInfo(JTIdx, 2, MinBlock->getSymbol()); MI.setDesc(TII->get(AArch64::JumpTableDest16)); ++NumJT16; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] c29af37 - [AArch64] Don't try to compress jump tables if there are any inline asm instructions.
Author: Amara Emerson Date: 2020-12-10T12:20:02-08:00 New Revision: c29af37c6c9d74ca330bd7f1d084f1f676ba2824 URL: https://github.com/llvm/llvm-project/commit/c29af37c6c9d74ca330bd7f1d084f1f676ba2824 DIFF: https://github.com/llvm/llvm-project/commit/c29af37c6c9d74ca330bd7f1d084f1f676ba2824.diff LOG: [AArch64] Don't try to compress jump tables if there are any inline asm instructions. Inline asm can contain constructs like .bytes which may have arbitrary size. In some cases, this causes us to miscalculate the size of blocks and therefore offsets, causing us to incorrectly compress a JT. To be safe, just bail out of the whole thing if we find any inline asm. Fixes PR48255 Differential Revision: https://reviews.llvm.org/D92865 Added: Modified: llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp llvm/test/CodeGen/AArch64/jump-table-compress.mir llvm/test/CodeGen/AArch64/jump-table.ll Removed: diff --git a/llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp b/llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp index c265592d05a7..2328a8b4deb8 100644 --- a/llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp +++ b/llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp @@ -37,8 +37,13 @@ class AArch64CompressJumpTables : public MachineFunctionPass { MachineFunction *MF; SmallVector BlockInfo; - int computeBlockSize(MachineBasicBlock &MBB); - void scanFunction(); + /// Returns the size in instructions of the block \p MBB, or None if we + /// couldn't get a safe upper bound. + Optional computeBlockSize(MachineBasicBlock &MBB); + + /// Gather information about the function, returns false if we can't perform + /// this optimization for some reason. + bool scanFunction(); bool compressJumpTable(MachineInstr &MI, int Offset); @@ -64,14 +69,22 @@ char AArch64CompressJumpTables::ID = 0; INITIALIZE_PASS(AArch64CompressJumpTables, DEBUG_TYPE, "AArch64 compress jump tables pass", false, false) -int AArch64CompressJumpTables::computeBlockSize(MachineBasicBlock &MBB) { +Optional +AArch64CompressJumpTables::computeBlockSize(MachineBasicBlock &MBB) { int Size = 0; - for (const MachineInstr &MI : MBB) + for (const MachineInstr &MI : MBB) { +// Inline asm may contain some directives like .bytes which we don't +// currently have the ability to parse accurately. To be safe, just avoid +// computing a size and bail out. +if (MI.getOpcode() == AArch64::INLINEASM || +MI.getOpcode() == AArch64::INLINEASM_BR) + return None; Size += TII->getInstSizeInBytes(MI); + } return Size; } -void AArch64CompressJumpTables::scanFunction() { +bool AArch64CompressJumpTables::scanFunction() { BlockInfo.clear(); BlockInfo.resize(MF->getNumBlockIDs()); @@ -84,8 +97,12 @@ void AArch64CompressJumpTables::scanFunction() { else AlignedOffset = alignTo(Offset, Alignment); BlockInfo[MBB.getNumber()] = AlignedOffset; -Offset = AlignedOffset + computeBlockSize(MBB); +auto BlockSize = computeBlockSize(MBB); +if (!BlockSize) + return false; +Offset = AlignedOffset + *BlockSize; } + return true; } bool AArch64CompressJumpTables::compressJumpTable(MachineInstr &MI, @@ -152,7 +169,8 @@ bool AArch64CompressJumpTables::runOnMachineFunction(MachineFunction &MFIn) { if (ST.force32BitJumpTables() && !MF->getFunction().hasMinSize()) return false; - scanFunction(); + if (!scanFunction()) +return false; for (MachineBasicBlock &MBB : *MF) { int Offset = BlockInfo[MBB.getNumber()]; diff --git a/llvm/test/CodeGen/AArch64/jump-table-compress.mir b/llvm/test/CodeGen/AArch64/jump-table-compress.mir index 272de36f8b6e..a46b7c6ac9c0 100644 --- a/llvm/test/CodeGen/AArch64/jump-table-compress.mir +++ b/llvm/test/CodeGen/AArch64/jump-table-compress.mir @@ -4,6 +4,8 @@ unreachable } + define void @test_inline_asm_no_compress() { ret void } + ... --- name:test_jumptable @@ -110,3 +112,88 @@ body: | early-clobber $x10, dead early-clobber $x11 = JumpTableDest32 undef killed $x9, undef killed $x8, %jump-table.5 BR killed $x10 ... +--- +name:test_inline_asm_no_compress +alignment: 4 +tracksRegLiveness: true +liveins: + - { reg: '$w0' } + - { reg: '$w1' } + - { reg: '$w2' } +frameInfo: + maxAlignment:1 + maxCallFrameSize: 0 +machineFunctionInfo: + hasRedZone: false +jumpTable: + kind:label- diff erence32 + entries: +- id: 0 + blocks: [ '%bb.2', '%bb.4', '%bb.5', '%bb.6', '%bb.7', '%bb.8' ] +body: | + bb.0: +successors: %bb.3(0x12492492), %bb.1(0x6db6db6e) +liveins: $w0, $w1, $w2 + +dead $wzr = SUBSWri renamable $w0, 5, 0, implicit-def $nzcv +Bcc 8, %bb.3, implicit $nzcv + + bb.1: +successors: %bb.2, %bb.4, %bb.5, %bb.6, %bb.7, %bb.8
[llvm-branch-commits] [llvm] 21de99d - [[GlobalISel][IRTranslator] Fix a crash when the use of an extractvalue is a non-dominated metadata use.
Author: Amara Emerson Date: 2020-12-12T14:58:54-08:00 New Revision: 21de99d43c88c00c007a2b3e350d56328f26660e URL: https://github.com/llvm/llvm-project/commit/21de99d43c88c00c007a2b3e350d56328f26660e DIFF: https://github.com/llvm/llvm-project/commit/21de99d43c88c00c007a2b3e350d56328f26660e.diff LOG: [[GlobalISel][IRTranslator] Fix a crash when the use of an extractvalue is a non-dominated metadata use. We don't expect uses to come before defs in the CFG, so allocateVRegs() asserted. Fixes PR48211 Added: llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-extract-used-by-dbg.ll Modified: llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp Removed: diff --git a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp index a912b9c1bd00..202163ff9507 100644 --- a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp +++ b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp @@ -170,7 +170,9 @@ void IRTranslator::getAnalysisUsage(AnalysisUsage &AU) const { IRTranslator::ValueToVRegInfo::VRegListT & IRTranslator::allocateVRegs(const Value &Val) { - assert(!VMap.contains(Val) && "Value already allocated in VMap"); + auto VRegsIt = VMap.findVRegs(Val); + if (VRegsIt != VMap.vregs_end()) +return *VRegsIt->second; auto *Regs = VMap.getVRegs(Val); auto *Offsets = VMap.getOffsets(Val); SmallVector SplitTys; diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-extract-used-by-dbg.ll b/llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-extract-used-by-dbg.ll new file mode 100644 index ..dae85e6404b2 --- /dev/null +++ b/llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-extract-used-by-dbg.ll @@ -0,0 +1,400 @@ +; RUN: llc -O0 -stop-after=irtranslator -global-isel -verify-machineinstrs %s -o - 2>&1 | FileCheck %s + +target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128" +target triple = "aarch64-unknown-fuchsia" + +declare void @llvm.dbg.value(metadata, metadata, metadata) #0 +; Check that we don't crash when we have a metadata use of %i not being dominated by the def. +; CHECK-LABEL: @foo +; CHECK: DBG_VALUE %1:_(p0), $noreg, !370, !DIExpression(DW_OP_LLVM_fragment, 0, 64) +define hidden void @foo() unnamed_addr #1 !dbg !230 { + br i1 undef, label %bb4, label %bb5 + +bb4: ; preds = %bb3 + %i = extractvalue { i8*, i64 } undef, 0 + ret void + +bb5: ; preds = %bb3 + call void @llvm.dbg.value(metadata i8* %i, metadata !370, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 64)), !dbg !372 + ret void +} + +attributes #0 = { nofree nosync nounwind readnone speculatable willreturn } +attributes #1 = { "target-cpu"="generic" } + +!llvm.dbg.cu = !{!0} +!llvm.module.flags = !{!229} + +!0 = distinct !DICompileUnit(language: DW_LANG_Rust, file: !1, producer: "rustc", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, globals: !228) +!1 = !DIFile(filename: "library/std/src/lib.rs", directory: "/b/s/w/ir/x/w/rust") +!2 = !{!3, !11, !16, !25, !31, !36, !45, !68, !75, !83, !90, !97, !106, !115, !121, !131, !153, !159, !163, !168, !179, !184, !189, !192, !194, !210} +!3 = !DICompositeType(tag: DW_TAG_enumeration_type, name: "c_void", scope: !5, file: !4, baseType: !7, size: 8, align: 8, flags: DIFlagEnumClass, elements: !8) +!4 = !DIFile(filename: "", directory: "") +!5 = !DINamespace(name: "ffi", scope: !6) +!6 = !DINamespace(name: "core", scope: null) +!7 = !DIBasicType(name: "u8", size: 8, encoding: DW_ATE_unsigned) +!8 = !{!9, !10} +!9 = !DIEnumerator(name: "__variant1", value: 0, isUnsigned: true) +!10 = !DIEnumerator(name: "__variant2", value: 1, isUnsigned: true) +!11 = !DICompositeType(tag: DW_TAG_enumeration_type, name: "Option", scope: !12, file: !4, baseType: !7, size: 8, align: 8, flags: DIFlagEnumClass, elements: !13) +!12 = !DINamespace(name: "option", scope: !6) +!13 = !{!14, !15} +!14 = !DIEnumerator(name: "None", value: 0) +!15 = !DIEnumerator(name: "Some", value: 1) +!16 = !DICompositeType(tag: DW_TAG_enumeration_type, name: "EscapeUnicodeState", scope: !17, file: !4, baseType: !7, size: 8, align: 8, flags: DIFlagEnumClass, elements: !18) +!17 = !DINamespace(name: "char", scope: !6) +!18 = !{!19, !20, !21, !22, !23, !24} +!19 = !DIEnumerator(name: "Done", value: 0) +!20 = !DIEnumerator(name: "RightBrace", value: 1) +!21 = !DIEnumerator(name: "Value", value: 2) +!22 = !DIEnumerator(name: "LeftBrace", value: 3) +!23 = !DIEnumerator(name: "Type", value: 4) +!24 = !DIEnumerator(name: "Backslash", value: 5) +!25 = !DICompositeType(tag: DW_TAG_enumeration_type, name: "Format", scope: !26, file: !4, baseType: !7, size: 8, align: 8, flags: DIFlagEnumClass, elements: !28) +!26 = !DINamespace(name: "common", scope: !27) +!27 = !DINamespace(name: "gimli", scope: null) +!28 = !{!29, !30} +!29 = !DIEnumerator(name: "Dwarf6
[llvm-branch-commits] [llvm] GlobalISel: Fix combine duplicating atomic loads (PR #111730)
https://github.com/aemerson approved this pull request. LGTM. https://github.com/llvm/llvm-project/pull/111730 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. (PR #121169)
https://github.com/aemerson edited https://github.com/llvm/llvm-project/pull/121169 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. (PR #121169)
@@ -4142,9 +4142,40 @@ LegalizerHelper::LegalizeResult LegalizerHelper::lowerStore(GStore &StoreMI) { } if (MemTy.isVector()) { -// TODO: Handle vector trunc stores -if (MemTy != SrcTy) +LLT MemScalarTy = MemTy.getElementType(); +if (MemTy != SrcTy) { + if (!MemScalarTy.isByteSized()) { +// We need to build an integer scalar of the vector bit pattern. +// It's not legal for us to add padding when storing a vector. +unsigned NumBits = MemTy.getSizeInBits(); +LLT IntTy = LLT::scalar(NumBits); +auto CurrVal = MIRBuilder.buildConstant(IntTy, 0); +LLT IdxTy = getLLTForMVT(TLI.getVectorIdxTy(MF.getDataLayout())); aemerson wrote: I'd rather make that change separately since it'll make sense otherwise convert all of our existing uses of the MVT version over. https://github.com/llvm/llvm-project/pull/121169 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Legalize vector boolean bitcasts to scalars by lowering via stack. (PR #121171)
https://github.com/aemerson updated https://github.com/llvm/llvm-project/pull/121171 >From b9214baba592d4c7860d714b6d0dffd519a48400 Mon Sep 17 00:00:00 2001 From: Amara Emerson Date: Fri, 27 Dec 2024 17:34:25 -0800 Subject: [PATCH] Factor out into funct. Created using spr 1.3.5 --- .../llvm/CodeGen/GlobalISel/LegalizerHelper.h | 3 + .../CodeGen/GlobalISel/LegalizerHelper.cpp| 47 +- .../AArch64/GISel/AArch64LegalizerInfo.cpp| 3 +- .../AArch64/GlobalISel/legalize-bitcast.mir | 59 +- .../legalize-store-vector-bools.mir | 68 +- .../AArch64/vec-combine-compare-to-bitmask.ll | 605 ++ 6 files changed, 640 insertions(+), 145 deletions(-) diff --git a/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h index fac059803b9489..4e18f5cc913a7e 100644 --- a/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h +++ b/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h @@ -302,6 +302,9 @@ class LegalizerHelper { /// same type as \p Res. MachineInstrBuilder createStackStoreLoad(const DstOp &Res, const SrcOp &Val); + /// Given a store of a boolean vector, scalarize it. + LegalizeResult scalarizeVectorBooleanStore(GStore &MI); + /// Get a pointer to vector element \p Index located in memory for a vector of /// type \p VecTy starting at a base address of \p VecPtr. If \p Index is out /// of bounds the returned pointer is unspecified, but will be within the diff --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp index 7dece931e8e0eb..0bfa897ecf4047 100644 --- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp +++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp @@ -4143,9 +4143,8 @@ LegalizerHelper::LegalizeResult LegalizerHelper::lowerStore(GStore &StoreMI) { } if (MemTy.isVector()) { -// TODO: Handle vector trunc stores if (MemTy != SrcTy) - return UnableToLegalize; + return scalarizeVectorBooleanStore(StoreMI); // TODO: We can do better than scalarizing the vector and at least split it // in half. @@ -4200,6 +4199,50 @@ LegalizerHelper::LegalizeResult LegalizerHelper::lowerStore(GStore &StoreMI) { return Legalized; } +LegalizerHelper::LegalizeResult +LegalizerHelper::scalarizeVectorBooleanStore(GStore &StoreMI) { + Register SrcReg = StoreMI.getValueReg(); + Register PtrReg = StoreMI.getPointerReg(); + LLT SrcTy = MRI.getType(SrcReg); + MachineMemOperand &MMO = **StoreMI.memoperands_begin(); + LLT MemTy = MMO.getMemoryType(); + LLT MemScalarTy = MemTy.getElementType(); + MachineFunction &MF = MIRBuilder.getMF(); + + assert(SrcTy.isVector() && "Expect a vector store type"); + + if (!MemScalarTy.isByteSized()) { +// We need to build an integer scalar of the vector bit pattern. +// It's not legal for us to add padding when storing a vector. +unsigned NumBits = MemTy.getSizeInBits(); +LLT IntTy = LLT::scalar(NumBits); +auto CurrVal = MIRBuilder.buildConstant(IntTy, 0); +LLT IdxTy = getLLTForMVT(TLI.getVectorIdxTy(MF.getDataLayout())); + +for (unsigned I = 0, E = MemTy.getNumElements(); I < E; ++I) { + auto Elt = MIRBuilder.buildExtractVectorElement( + SrcTy.getElementType(), SrcReg, MIRBuilder.buildConstant(IdxTy, I)); + auto Trunc = MIRBuilder.buildTrunc(MemScalarTy, Elt); + auto ZExt = MIRBuilder.buildZExt(IntTy, Trunc); + unsigned ShiftIntoIdx = MF.getDataLayout().isBigEndian() + ? (MemTy.getNumElements() - 1) - I + : I; + auto ShiftAmt = MIRBuilder.buildConstant( + IntTy, ShiftIntoIdx * MemScalarTy.getSizeInBits()); + auto Shifted = MIRBuilder.buildShl(IntTy, ZExt, ShiftAmt); + CurrVal = MIRBuilder.buildOr(IntTy, CurrVal, Shifted); +} +auto PtrInfo = MMO.getPointerInfo(); +auto *NewMMO = MF.getMachineMemOperand(&MMO, PtrInfo, IntTy); +MIRBuilder.buildStore(CurrVal, PtrReg, *NewMMO); +StoreMI.eraseFromParent(); +return Legalized; + } + + // TODO: implement simple scalarization. + return UnableToLegalize; +} + LegalizerHelper::LegalizeResult LegalizerHelper::bitcast(MachineInstr &MI, unsigned TypeIdx, LLT CastTy) { switch (MI.getOpcode()) { diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp index 2fac100f81519a..641f06530a5c23 100644 --- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp +++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp @@ -474,7 +474,8 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST) }) .customIf(IsPtrVecPred) .scalarizeIf(typeInSet(0, {v2s16, v2s8}), 0) - .scalarizeIf(scalarOrEltWiderThan(0, 64), 0); + .scalarizeIf(scalarOrEltWiderThan(0, 64), 0) + .lower(); getActionDefinitionsBuilder(G_INDEXED_STORE) // Idx 0
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. (PR #121169)
https://github.com/aemerson updated https://github.com/llvm/llvm-project/pull/121169 >From a1c545bab55b0e9329044f469507149718a1d36f Mon Sep 17 00:00:00 2001 From: Amara Emerson Date: Thu, 26 Dec 2024 23:50:07 -0800 Subject: [PATCH 1/2] Add -aarch64-enable-collect-loh torun line to remove unnecessary LOH labels. Created using spr 1.3.5 --- .../AArch64/vec-combine-compare-to-bitmask.ll | 627 +- 1 file changed, 172 insertions(+), 455 deletions(-) diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll index 496f7ebf300e50..1fa96979f45530 100644 --- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll +++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll @@ -1,6 +1,6 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2 -; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,SDAG -; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -global-isel -global-isel-abort=2 -verify-machineinstrs < %s 2>&1 | FileCheck %s --check-prefixes=CHECK,GISEL +; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -aarch64-enable-collect-loh=false -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,SDAG +; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -aarch64-enable-collect-loh=false -global-isel -global-isel-abort=2 -verify-machineinstrs < %s 2>&1 | FileCheck %s --check-prefixes=CHECK,GISEL ; Basic tests from input vector to bitmask ; IR generated from clang for: @@ -26,10 +26,8 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) { ; Bits used in mask ; SDAG-LABEL: convert_to_bitmask16: ; SDAG: ; %bb.0: -; SDAG-NEXT: Lloh0: ; SDAG-NEXT:adrp x8, lCPI0_0@PAGE ; SDAG-NEXT:cmeq.16b v0, v0, #0 -; SDAG-NEXT: Lloh1: ; SDAG-NEXT:ldr q1, [x8, lCPI0_0@PAGEOFF] ; SDAG-NEXT:bic.16b v0, v1, v0 ; SDAG-NEXT:ext.16b v1, v0, v0, #8 @@ -37,7 +35,6 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) { ; SDAG-NEXT:addv.8h h0, v0 ; SDAG-NEXT:fmov w0, s0 ; SDAG-NEXT:ret -; SDAG-NEXT:.loh AdrpLdr Lloh0, Lloh1 ; ; GISEL-LABEL: convert_to_bitmask16: ; GISEL: ; %bb.0: @@ -106,17 +103,14 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) { define i16 @convert_to_bitmask8(<8 x i16> %vec) { ; SDAG-LABEL: convert_to_bitmask8: ; SDAG: ; %bb.0: -; SDAG-NEXT: Lloh2: ; SDAG-NEXT:adrp x8, lCPI1_0@PAGE ; SDAG-NEXT:cmeq.8h v0, v0, #0 -; SDAG-NEXT: Lloh3: ; SDAG-NEXT:ldr q1, [x8, lCPI1_0@PAGEOFF] ; SDAG-NEXT:bic.16b v0, v1, v0 ; SDAG-NEXT:addv.8h h0, v0 ; SDAG-NEXT:fmov w8, s0 ; SDAG-NEXT:and w0, w8, #0xff ; SDAG-NEXT:ret -; SDAG-NEXT:.loh AdrpLdr Lloh2, Lloh3 ; ; GISEL-LABEL: convert_to_bitmask8: ; GISEL: ; %bb.0: @@ -160,31 +154,15 @@ define i16 @convert_to_bitmask8(<8 x i16> %vec) { } define i4 @convert_to_bitmask4(<4 x i32> %vec) { -; SDAG-LABEL: convert_to_bitmask4: -; SDAG: ; %bb.0: -; SDAG-NEXT: Lloh4: -; SDAG-NEXT:adrp x8, lCPI2_0@PAGE -; SDAG-NEXT:cmeq.4s v0, v0, #0 -; SDAG-NEXT: Lloh5: -; SDAG-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF] -; SDAG-NEXT:bic.16b v0, v1, v0 -; SDAG-NEXT:addv.4s s0, v0 -; SDAG-NEXT:fmov w0, s0 -; SDAG-NEXT:ret -; SDAG-NEXT:.loh AdrpLdr Lloh4, Lloh5 -; -; GISEL-LABEL: convert_to_bitmask4: -; GISEL: ; %bb.0: -; GISEL-NEXT: Lloh0: -; GISEL-NEXT:adrp x8, lCPI2_0@PAGE -; GISEL-NEXT:cmeq.4s v0, v0, #0 -; GISEL-NEXT: Lloh1: -; GISEL-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF] -; GISEL-NEXT:bic.16b v0, v1, v0 -; GISEL-NEXT:addv.4s s0, v0 -; GISEL-NEXT:fmov w0, s0 -; GISEL-NEXT:ret -; GISEL-NEXT:.loh AdrpLdr Lloh0, Lloh1 +; CHECK-LABEL: convert_to_bitmask4: +; CHECK: ; %bb.0: +; CHECK-NEXT:adrp x8, lCPI2_0@PAGE +; CHECK-NEXT:cmeq.4s v0, v0, #0 +; CHECK-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF] +; CHECK-NEXT:bic.16b v0, v1, v0 +; CHECK-NEXT:addv.4s s0, v0 +; CHECK-NEXT:fmov w0, s0 +; CHECK-NEXT:ret %cmp_result = icmp ne <4 x i32> %vec, zeroinitializer @@ -193,33 +171,16 @@ define i4 @convert_to_bitmask4(<4 x i32> %vec) { } define i8 @convert_to_bitmask2(<2 x i64> %vec) { -; SDAG-LABEL: convert_to_bitmask2: -; SDAG: ; %bb.0: -; SDAG-NEXT: Lloh6: -; SDAG-NEXT:adrp x8, lCPI3_0@PAGE -; SDAG-NEXT:cmeq.2d v0, v0, #0 -; SDAG-NEXT: Lloh7: -; SDAG-NEXT:ldr q1, [x8, lCPI3_0@PAGEOFF] -; SDAG-NEXT:bic.16b v0, v1, v0 -; SDAG-NEXT:addp.2d d0, v0 -; SDAG-NEXT:fmov w8, s0 -; SDAG-NEXT:and w0, w8, #0x3 -; SDAG-NEXT:ret -; SDAG-NEXT:.loh AdrpLdr Lloh6, Lloh7 -; -; GISEL-LABEL: convert_to_bitmask2: -; GISEL: ; %bb.0: -; GISEL-NEXT: Lloh2: -; GISEL-NEXT:adrp x8, lCPI3_0@PAGE -; GISEL-NEXT:cmeq.2d v0, v0, #0 -; GISEL-NEXT: Lloh3: -; GISEL-NEXT:ldr q1, [x8, lCPI3_0@PAGEOFF] -; GISEL-NEXT:bic.16b v0
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. (PR #121185)
https://github.com/aemerson updated https://github.com/llvm/llvm-project/pull/121185 >From 3efe80b9457a33c68362489fc8c946d51113856a Mon Sep 17 00:00:00 2001 From: Amara Emerson Date: Fri, 27 Dec 2024 00:06:30 -0800 Subject: [PATCH] Fix remark checks in test. Created using spr 1.3.5 --- .../AArch64/vec-combine-compare-to-bitmask.ll | 13 + 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll index cbb90c52835df8..7f3c1fdc93380e 100644 --- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll +++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll @@ -6,21 +6,10 @@ ; IR generated from clang for: ; __builtin_convertvector + reinterpret_cast -; GISEL: warning: Instruction selection used fallback path for convert_to_bitmask4 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask2 -; GISEL-NEXT: warning: Instruction selection used fallback path for clang_builtins_undef_concat_convert_to_bitmask4 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_no_compare -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_compare_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_trunc_in_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_unknown_type_in_long_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_different_types_in_chain +; GISEL: warning: Instruction selection used fallback path for clang_builtins_undef_concat_convert_to_bitmask4 ; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_2xi32 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_4xi8 ; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_8xi2 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_float -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_legalized_illegal_element_size ; GISEL-NEXT: warning: Instruction selection used fallback path for no_direct_convert_for_bad_concat -; GISEL-NEXT: warning: Instruction selection used fallback path for no_combine_illegal_num_elements define i16 @convert_to_bitmask16(<16 x i8> %vec) { ; Bits used in mask ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. (PR #121185)
https://github.com/aemerson updated https://github.com/llvm/llvm-project/pull/121185 >From 3efe80b9457a33c68362489fc8c946d51113856a Mon Sep 17 00:00:00 2001 From: Amara Emerson Date: Fri, 27 Dec 2024 00:06:30 -0800 Subject: [PATCH] Fix remark checks in test. Created using spr 1.3.5 --- .../AArch64/vec-combine-compare-to-bitmask.ll | 13 + 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll index cbb90c52835df8..7f3c1fdc93380e 100644 --- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll +++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll @@ -6,21 +6,10 @@ ; IR generated from clang for: ; __builtin_convertvector + reinterpret_cast -; GISEL: warning: Instruction selection used fallback path for convert_to_bitmask4 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask2 -; GISEL-NEXT: warning: Instruction selection used fallback path for clang_builtins_undef_concat_convert_to_bitmask4 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_no_compare -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_compare_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_trunc_in_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_unknown_type_in_long_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_different_types_in_chain +; GISEL: warning: Instruction selection used fallback path for clang_builtins_undef_concat_convert_to_bitmask4 ; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_2xi32 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_4xi8 ; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_8xi2 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_float -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_legalized_illegal_element_size ; GISEL-NEXT: warning: Instruction selection used fallback path for no_direct_convert_for_bad_concat -; GISEL-NEXT: warning: Instruction selection used fallback path for no_combine_illegal_num_elements define i16 @convert_to_bitmask16(<16 x i8> %vec) { ; Bits used in mask ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. (PR #121169)
https://github.com/aemerson updated https://github.com/llvm/llvm-project/pull/121169 >From a1c545bab55b0e9329044f469507149718a1d36f Mon Sep 17 00:00:00 2001 From: Amara Emerson Date: Thu, 26 Dec 2024 23:50:07 -0800 Subject: [PATCH 1/2] Add -aarch64-enable-collect-loh torun line to remove unnecessary LOH labels. Created using spr 1.3.5 --- .../AArch64/vec-combine-compare-to-bitmask.ll | 627 +- 1 file changed, 172 insertions(+), 455 deletions(-) diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll index 496f7ebf300e50..1fa96979f45530 100644 --- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll +++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll @@ -1,6 +1,6 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2 -; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,SDAG -; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -global-isel -global-isel-abort=2 -verify-machineinstrs < %s 2>&1 | FileCheck %s --check-prefixes=CHECK,GISEL +; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -aarch64-enable-collect-loh=false -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,SDAG +; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -aarch64-enable-collect-loh=false -global-isel -global-isel-abort=2 -verify-machineinstrs < %s 2>&1 | FileCheck %s --check-prefixes=CHECK,GISEL ; Basic tests from input vector to bitmask ; IR generated from clang for: @@ -26,10 +26,8 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) { ; Bits used in mask ; SDAG-LABEL: convert_to_bitmask16: ; SDAG: ; %bb.0: -; SDAG-NEXT: Lloh0: ; SDAG-NEXT:adrp x8, lCPI0_0@PAGE ; SDAG-NEXT:cmeq.16b v0, v0, #0 -; SDAG-NEXT: Lloh1: ; SDAG-NEXT:ldr q1, [x8, lCPI0_0@PAGEOFF] ; SDAG-NEXT:bic.16b v0, v1, v0 ; SDAG-NEXT:ext.16b v1, v0, v0, #8 @@ -37,7 +35,6 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) { ; SDAG-NEXT:addv.8h h0, v0 ; SDAG-NEXT:fmov w0, s0 ; SDAG-NEXT:ret -; SDAG-NEXT:.loh AdrpLdr Lloh0, Lloh1 ; ; GISEL-LABEL: convert_to_bitmask16: ; GISEL: ; %bb.0: @@ -106,17 +103,14 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) { define i16 @convert_to_bitmask8(<8 x i16> %vec) { ; SDAG-LABEL: convert_to_bitmask8: ; SDAG: ; %bb.0: -; SDAG-NEXT: Lloh2: ; SDAG-NEXT:adrp x8, lCPI1_0@PAGE ; SDAG-NEXT:cmeq.8h v0, v0, #0 -; SDAG-NEXT: Lloh3: ; SDAG-NEXT:ldr q1, [x8, lCPI1_0@PAGEOFF] ; SDAG-NEXT:bic.16b v0, v1, v0 ; SDAG-NEXT:addv.8h h0, v0 ; SDAG-NEXT:fmov w8, s0 ; SDAG-NEXT:and w0, w8, #0xff ; SDAG-NEXT:ret -; SDAG-NEXT:.loh AdrpLdr Lloh2, Lloh3 ; ; GISEL-LABEL: convert_to_bitmask8: ; GISEL: ; %bb.0: @@ -160,31 +154,15 @@ define i16 @convert_to_bitmask8(<8 x i16> %vec) { } define i4 @convert_to_bitmask4(<4 x i32> %vec) { -; SDAG-LABEL: convert_to_bitmask4: -; SDAG: ; %bb.0: -; SDAG-NEXT: Lloh4: -; SDAG-NEXT:adrp x8, lCPI2_0@PAGE -; SDAG-NEXT:cmeq.4s v0, v0, #0 -; SDAG-NEXT: Lloh5: -; SDAG-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF] -; SDAG-NEXT:bic.16b v0, v1, v0 -; SDAG-NEXT:addv.4s s0, v0 -; SDAG-NEXT:fmov w0, s0 -; SDAG-NEXT:ret -; SDAG-NEXT:.loh AdrpLdr Lloh4, Lloh5 -; -; GISEL-LABEL: convert_to_bitmask4: -; GISEL: ; %bb.0: -; GISEL-NEXT: Lloh0: -; GISEL-NEXT:adrp x8, lCPI2_0@PAGE -; GISEL-NEXT:cmeq.4s v0, v0, #0 -; GISEL-NEXT: Lloh1: -; GISEL-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF] -; GISEL-NEXT:bic.16b v0, v1, v0 -; GISEL-NEXT:addv.4s s0, v0 -; GISEL-NEXT:fmov w0, s0 -; GISEL-NEXT:ret -; GISEL-NEXT:.loh AdrpLdr Lloh0, Lloh1 +; CHECK-LABEL: convert_to_bitmask4: +; CHECK: ; %bb.0: +; CHECK-NEXT:adrp x8, lCPI2_0@PAGE +; CHECK-NEXT:cmeq.4s v0, v0, #0 +; CHECK-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF] +; CHECK-NEXT:bic.16b v0, v1, v0 +; CHECK-NEXT:addv.4s s0, v0 +; CHECK-NEXT:fmov w0, s0 +; CHECK-NEXT:ret %cmp_result = icmp ne <4 x i32> %vec, zeroinitializer @@ -193,33 +171,16 @@ define i4 @convert_to_bitmask4(<4 x i32> %vec) { } define i8 @convert_to_bitmask2(<2 x i64> %vec) { -; SDAG-LABEL: convert_to_bitmask2: -; SDAG: ; %bb.0: -; SDAG-NEXT: Lloh6: -; SDAG-NEXT:adrp x8, lCPI3_0@PAGE -; SDAG-NEXT:cmeq.2d v0, v0, #0 -; SDAG-NEXT: Lloh7: -; SDAG-NEXT:ldr q1, [x8, lCPI3_0@PAGEOFF] -; SDAG-NEXT:bic.16b v0, v1, v0 -; SDAG-NEXT:addp.2d d0, v0 -; SDAG-NEXT:fmov w8, s0 -; SDAG-NEXT:and w0, w8, #0x3 -; SDAG-NEXT:ret -; SDAG-NEXT:.loh AdrpLdr Lloh6, Lloh7 -; -; GISEL-LABEL: convert_to_bitmask2: -; GISEL: ; %bb.0: -; GISEL-NEXT: Lloh2: -; GISEL-NEXT:adrp x8, lCPI3_0@PAGE -; GISEL-NEXT:cmeq.2d v0, v0, #0 -; GISEL-NEXT: Lloh3: -; GISEL-NEXT:ldr q1, [x8, lCPI3_0@PAGEOFF] -; GISEL-NEXT:bic.16b v0
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. (PR #121185)
aemerson wrote: > I think this sounds OK. LGTM > > (Which of bitcast or load/store is considered the most fundamental for > v4i1/v8i1? I think I would have expected in GISel the loads to be converted > to a i4/i8 load with bitcast, and the bitcast legalizes however it does. It > could obviously go the other way where a bitcast is just legalized to > load+store. I wasn't sure why the v4i1 load needed to produce an extending > load just just to scalarize again, but perhaps it is necessary to get past > legalization successfully, I haven't looked a lot into it lately. ) I think for loads of v4i1 we should do as you say and bitcast to i4 and then legalize the bitcast. It looks like we currently don't do that and instead we try to lower it, which ends up failing. https://github.com/llvm/llvm-project/pull/121185 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Legalize vector boolean bitcasts to scalars by lowering via stack. (PR #121171)
https://github.com/aemerson updated https://github.com/llvm/llvm-project/pull/121171 >From 0be38ccf5c865b4fddc357b33c378c70a20532b9 Mon Sep 17 00:00:00 2001 From: Amara Emerson Date: Thu, 26 Dec 2024 16:13:55 -0800 Subject: [PATCH 1/4] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20ch?= =?UTF-8?q?anges=20to=20main=20this=20commit=20is=20based=20on?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Created using spr 1.3.5 [skip ci] --- .../CodeGen/GlobalISel/LegalizerHelper.cpp| 14 ++-- .../AArch64/GISel/AArch64LegalizerInfo.cpp| 1 + .../legalize-store-vector-bools.mir | 32 +++ 3 files changed, 45 insertions(+), 2 deletions(-) create mode 100644 llvm/test/CodeGen/AArch64/GlobalISel/legalize-store-vector-bools.mir diff --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp index e2247f76098e97..a931123638ffb9 100644 --- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp +++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp @@ -3022,8 +3022,18 @@ LegalizerHelper::widenScalar(MachineInstr &MI, unsigned TypeIdx, LLT WideTy) { return UnableToLegalize; LLT Ty = MRI.getType(MI.getOperand(0).getReg()); -if (!Ty.isScalar()) - return UnableToLegalize; +if (!Ty.isScalar()) { + // We need to widen the vector element type. + Observer.changingInstr(MI); + widenScalarSrc(MI, WideTy, 0, TargetOpcode::G_ANYEXT); + // We also need to adjust the MMO to turn this into a truncating store. + MachineMemOperand &MMO = **MI.memoperands_begin(); + MachineFunction &MF = MIRBuilder.getMF(); + auto *NewMMO = MF.getMachineMemOperand(&MMO, MMO.getPointerInfo(), Ty); + MI.setMemRefs(MF, {NewMMO}); + Observer.changedInstr(MI); + return Legalized; +} Observer.changingInstr(MI); diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp index 4b7d4158faf069..2c35482b7c9e5f 100644 --- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp +++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp @@ -454,6 +454,7 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST) {nxv2s64, p0, nxv2s64, 8}, }) .clampScalar(0, s8, s64) + .minScalarOrElt(0, s8) .lowerIf([=](const LegalityQuery &Query) { return Query.Types[0].isScalar() && Query.Types[0] != Query.MMODescrs[0].MemoryTy; diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-store-vector-bools.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-store-vector-bools.mir new file mode 100644 index 00..de70f89461780b --- /dev/null +++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-store-vector-bools.mir @@ -0,0 +1,32 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5 +# RUN: llc -O0 -mtriple=aarch64 -run-pass=legalizer -global-isel-abort=2 %s -o - | FileCheck %s +# This test currently is expected to fall back after reaching truncstore of <8 x s8> as <8 x s1>. +--- +name:store_8xs1 +tracksRegLiveness: true +body: | + bb.1: +liveins: $q0, $q1, $x0 +; CHECK-LABEL: name: store_8xs1 +; CHECK: liveins: $q0, $q1, $x0 +; CHECK-NEXT: {{ $}} +; CHECK-NEXT: [[COPY:%[0-9]+]]:_(<4 x s32>) = COPY $q0 +; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(<4 x s32>) = COPY $q1 +; CHECK-NEXT: %ptr:_(p0) = COPY $x0 +; CHECK-NEXT: [[CONCAT_VECTORS:%[0-9]+]]:_(<8 x s32>) = G_CONCAT_VECTORS [[COPY]](<4 x s32>), [[COPY1]](<4 x s32>) +; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 +; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[C]](s32), [[C]](s32), [[C]](s32), [[C]](s32), [[C]](s32), [[C]](s32), [[C]](s32), [[C]](s32) +; CHECK-NEXT: [[ICMP:%[0-9]+]]:_(<8 x s1>) = G_ICMP intpred(slt), [[CONCAT_VECTORS]](<8 x s32>), [[BUILD_VECTOR]] +; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:_(<8 x s8>) = G_ANYEXT [[ICMP]](<8 x s1>) +; CHECK-NEXT: G_STORE [[ANYEXT]](<8 x s8>), %ptr(p0) :: (store (<8 x s1>)) +; CHECK-NEXT: RET_ReallyLR +%1:_(<4 x s32>) = COPY $q0 +%2:_(<4 x s32>) = COPY $q1 +%ptr:_(p0) = COPY $x0 +%0:_(<8 x s32>) = G_CONCAT_VECTORS %1(<4 x s32>), %2(<4 x s32>) +%4:_(s32) = G_CONSTANT i32 0 +%3:_(<8 x s32>) = G_BUILD_VECTOR %4(s32), %4(s32), %4(s32), %4(s32), %4(s32), %4(s32), %4(s32), %4(s32) +%5:_(<8 x s1>) = G_ICMP intpred(slt), %0(<8 x s32>), %3 +G_STORE %5(<8 x s1>), %ptr(p0) :: (store (<8 x s1>)) +RET_ReallyLR +... >From 18da0bff65252d4ef62f7dcefa73b7b508d10bec Mon Sep 17 00:00:00 2001 From: Amara Emerson Date: Fri, 27 Dec 2024 10:49:17 -0800 Subject: [PATCH 2/4] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20ch?= =?UTF-8?q?anges=20introduced=20through=20rebase?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-T
[llvm-branch-commits] [AArch64][GlobalISel] Legalize vector boolean bitcasts to scalars by lowering via stack. (PR #121171)
https://github.com/aemerson updated https://github.com/llvm/llvm-project/pull/121171 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AArch64][GlobalISel] Legalize vector boolean bitcasts to scalars by lowering via stack. (PR #121171)
https://github.com/aemerson updated https://github.com/llvm/llvm-project/pull/121171 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. (PR #121169)
https://github.com/aemerson updated https://github.com/llvm/llvm-project/pull/121169 >From a1c545bab55b0e9329044f469507149718a1d36f Mon Sep 17 00:00:00 2001 From: Amara Emerson Date: Thu, 26 Dec 2024 23:50:07 -0800 Subject: [PATCH] Add -aarch64-enable-collect-loh torun line to remove unnecessary LOH labels. Created using spr 1.3.5 --- .../AArch64/vec-combine-compare-to-bitmask.ll | 627 +- 1 file changed, 172 insertions(+), 455 deletions(-) diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll index 496f7ebf300e50..1fa96979f45530 100644 --- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll +++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll @@ -1,6 +1,6 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2 -; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,SDAG -; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -global-isel -global-isel-abort=2 -verify-machineinstrs < %s 2>&1 | FileCheck %s --check-prefixes=CHECK,GISEL +; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -aarch64-enable-collect-loh=false -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,SDAG +; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -aarch64-enable-collect-loh=false -global-isel -global-isel-abort=2 -verify-machineinstrs < %s 2>&1 | FileCheck %s --check-prefixes=CHECK,GISEL ; Basic tests from input vector to bitmask ; IR generated from clang for: @@ -26,10 +26,8 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) { ; Bits used in mask ; SDAG-LABEL: convert_to_bitmask16: ; SDAG: ; %bb.0: -; SDAG-NEXT: Lloh0: ; SDAG-NEXT:adrp x8, lCPI0_0@PAGE ; SDAG-NEXT:cmeq.16b v0, v0, #0 -; SDAG-NEXT: Lloh1: ; SDAG-NEXT:ldr q1, [x8, lCPI0_0@PAGEOFF] ; SDAG-NEXT:bic.16b v0, v1, v0 ; SDAG-NEXT:ext.16b v1, v0, v0, #8 @@ -37,7 +35,6 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) { ; SDAG-NEXT:addv.8h h0, v0 ; SDAG-NEXT:fmov w0, s0 ; SDAG-NEXT:ret -; SDAG-NEXT:.loh AdrpLdr Lloh0, Lloh1 ; ; GISEL-LABEL: convert_to_bitmask16: ; GISEL: ; %bb.0: @@ -106,17 +103,14 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) { define i16 @convert_to_bitmask8(<8 x i16> %vec) { ; SDAG-LABEL: convert_to_bitmask8: ; SDAG: ; %bb.0: -; SDAG-NEXT: Lloh2: ; SDAG-NEXT:adrp x8, lCPI1_0@PAGE ; SDAG-NEXT:cmeq.8h v0, v0, #0 -; SDAG-NEXT: Lloh3: ; SDAG-NEXT:ldr q1, [x8, lCPI1_0@PAGEOFF] ; SDAG-NEXT:bic.16b v0, v1, v0 ; SDAG-NEXT:addv.8h h0, v0 ; SDAG-NEXT:fmov w8, s0 ; SDAG-NEXT:and w0, w8, #0xff ; SDAG-NEXT:ret -; SDAG-NEXT:.loh AdrpLdr Lloh2, Lloh3 ; ; GISEL-LABEL: convert_to_bitmask8: ; GISEL: ; %bb.0: @@ -160,31 +154,15 @@ define i16 @convert_to_bitmask8(<8 x i16> %vec) { } define i4 @convert_to_bitmask4(<4 x i32> %vec) { -; SDAG-LABEL: convert_to_bitmask4: -; SDAG: ; %bb.0: -; SDAG-NEXT: Lloh4: -; SDAG-NEXT:adrp x8, lCPI2_0@PAGE -; SDAG-NEXT:cmeq.4s v0, v0, #0 -; SDAG-NEXT: Lloh5: -; SDAG-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF] -; SDAG-NEXT:bic.16b v0, v1, v0 -; SDAG-NEXT:addv.4s s0, v0 -; SDAG-NEXT:fmov w0, s0 -; SDAG-NEXT:ret -; SDAG-NEXT:.loh AdrpLdr Lloh4, Lloh5 -; -; GISEL-LABEL: convert_to_bitmask4: -; GISEL: ; %bb.0: -; GISEL-NEXT: Lloh0: -; GISEL-NEXT:adrp x8, lCPI2_0@PAGE -; GISEL-NEXT:cmeq.4s v0, v0, #0 -; GISEL-NEXT: Lloh1: -; GISEL-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF] -; GISEL-NEXT:bic.16b v0, v1, v0 -; GISEL-NEXT:addv.4s s0, v0 -; GISEL-NEXT:fmov w0, s0 -; GISEL-NEXT:ret -; GISEL-NEXT:.loh AdrpLdr Lloh0, Lloh1 +; CHECK-LABEL: convert_to_bitmask4: +; CHECK: ; %bb.0: +; CHECK-NEXT:adrp x8, lCPI2_0@PAGE +; CHECK-NEXT:cmeq.4s v0, v0, #0 +; CHECK-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF] +; CHECK-NEXT:bic.16b v0, v1, v0 +; CHECK-NEXT:addv.4s s0, v0 +; CHECK-NEXT:fmov w0, s0 +; CHECK-NEXT:ret %cmp_result = icmp ne <4 x i32> %vec, zeroinitializer @@ -193,33 +171,16 @@ define i4 @convert_to_bitmask4(<4 x i32> %vec) { } define i8 @convert_to_bitmask2(<2 x i64> %vec) { -; SDAG-LABEL: convert_to_bitmask2: -; SDAG: ; %bb.0: -; SDAG-NEXT: Lloh6: -; SDAG-NEXT:adrp x8, lCPI3_0@PAGE -; SDAG-NEXT:cmeq.2d v0, v0, #0 -; SDAG-NEXT: Lloh7: -; SDAG-NEXT:ldr q1, [x8, lCPI3_0@PAGEOFF] -; SDAG-NEXT:bic.16b v0, v1, v0 -; SDAG-NEXT:addp.2d d0, v0 -; SDAG-NEXT:fmov w8, s0 -; SDAG-NEXT:and w0, w8, #0x3 -; SDAG-NEXT:ret -; SDAG-NEXT:.loh AdrpLdr Lloh6, Lloh7 -; -; GISEL-LABEL: convert_to_bitmask2: -; GISEL: ; %bb.0: -; GISEL-NEXT: Lloh2: -; GISEL-NEXT:adrp x8, lCPI3_0@PAGE -; GISEL-NEXT:cmeq.2d v0, v0, #0 -; GISEL-NEXT: Lloh3: -; GISEL-NEXT:ldr q1, [x8, lCPI3_0@PAGEOFF] -; GISEL-NEXT:bic.16b v0, v1
[llvm-branch-commits] [AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. (PR #121185)
https://github.com/aemerson created https://github.com/llvm/llvm-project/pull/121185 This case is different from the earlier <8 x i1> case handled because it triggers a legalization failure in lowerStore() that's intended for scalar code. It also was triggering incorrect bitcast actions in the AArch64 rules that weren't expecting truncating stores. With these two fixed, more cases are handled. The code is still bad, including some missing load promotion in our combiners that result in dead stores hanging around at the end of codegen. Again, we can fix these in separate changes. ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. (PR #121185)
aemerson wrote: Depends on #121169 https://github.com/llvm/llvm-project/pull/121185 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. (PR #121185)
https://github.com/aemerson updated https://github.com/llvm/llvm-project/pull/121185 >From 3efe80b9457a33c68362489fc8c946d51113856a Mon Sep 17 00:00:00 2001 From: Amara Emerson Date: Fri, 27 Dec 2024 00:06:30 -0800 Subject: [PATCH] Fix remark checks in test. Created using spr 1.3.5 --- .../AArch64/vec-combine-compare-to-bitmask.ll | 13 + 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll index cbb90c52835df8..7f3c1fdc93380e 100644 --- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll +++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll @@ -6,21 +6,10 @@ ; IR generated from clang for: ; __builtin_convertvector + reinterpret_cast -; GISEL: warning: Instruction selection used fallback path for convert_to_bitmask4 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask2 -; GISEL-NEXT: warning: Instruction selection used fallback path for clang_builtins_undef_concat_convert_to_bitmask4 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_no_compare -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_compare_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_trunc_in_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_unknown_type_in_long_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_different_types_in_chain +; GISEL: warning: Instruction selection used fallback path for clang_builtins_undef_concat_convert_to_bitmask4 ; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_2xi32 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_4xi8 ; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_8xi2 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_float -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_legalized_illegal_element_size ; GISEL-NEXT: warning: Instruction selection used fallback path for no_direct_convert_for_bad_concat -; GISEL-NEXT: warning: Instruction selection used fallback path for no_combine_illegal_num_elements define i16 @convert_to_bitmask16(<16 x i8> %vec) { ; Bits used in mask ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AArch64][GlobalISel] Legalize vector boolean bitcasts to scalars by lowering via stack. (PR #121171)
aemerson wrote: Depends on #121170 https://github.com/llvm/llvm-project/pull/121171 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. (PR #121169)
aemerson wrote: Depends on #121171 https://github.com/llvm/llvm-project/pull/121169 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. (PR #121169)
https://github.com/aemerson created https://github.com/llvm/llvm-project/pull/121169 This is essentially a port of TargetLowering::scalarizeVectorStore(), which is used for the case where we have something like a store of <8 x s8> truncating to <8 x s1> in memory. The naive lowering is a sequence of extracts to compute a scalar value to store. AArch64's DAG implementation has some more smarts to improve this further which we can do later. ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AArch64][GlobalISel] Legalize vector boolean bitcasts to scalars by lowering via stack. (PR #121171)
https://github.com/aemerson created https://github.com/llvm/llvm-project/pull/121171 None ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. (PR #121185)
https://github.com/aemerson updated https://github.com/llvm/llvm-project/pull/121185 >From 3efe80b9457a33c68362489fc8c946d51113856a Mon Sep 17 00:00:00 2001 From: Amara Emerson Date: Fri, 27 Dec 2024 00:06:30 -0800 Subject: [PATCH] Fix remark checks in test. Created using spr 1.3.5 --- .../AArch64/vec-combine-compare-to-bitmask.ll | 13 + 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll index cbb90c52835df8..7f3c1fdc93380e 100644 --- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll +++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll @@ -6,21 +6,10 @@ ; IR generated from clang for: ; __builtin_convertvector + reinterpret_cast -; GISEL: warning: Instruction selection used fallback path for convert_to_bitmask4 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask2 -; GISEL-NEXT: warning: Instruction selection used fallback path for clang_builtins_undef_concat_convert_to_bitmask4 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_no_compare -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_compare_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_trunc_in_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_unknown_type_in_long_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_different_types_in_chain +; GISEL: warning: Instruction selection used fallback path for clang_builtins_undef_concat_convert_to_bitmask4 ; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_2xi32 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_4xi8 ; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_8xi2 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_float -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_legalized_illegal_element_size ; GISEL-NEXT: warning: Instruction selection used fallback path for no_direct_convert_for_bad_concat -; GISEL-NEXT: warning: Instruction selection used fallback path for no_combine_illegal_num_elements define i16 @convert_to_bitmask16(<16 x i8> %vec) { ; Bits used in mask ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. (PR #121185)
https://github.com/aemerson updated https://github.com/llvm/llvm-project/pull/121185 >From 3efe80b9457a33c68362489fc8c946d51113856a Mon Sep 17 00:00:00 2001 From: Amara Emerson Date: Fri, 27 Dec 2024 00:06:30 -0800 Subject: [PATCH] Fix remark checks in test. Created using spr 1.3.5 --- .../AArch64/vec-combine-compare-to-bitmask.ll | 13 + 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll index cbb90c52835df8..7f3c1fdc93380e 100644 --- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll +++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll @@ -6,21 +6,10 @@ ; IR generated from clang for: ; __builtin_convertvector + reinterpret_cast -; GISEL: warning: Instruction selection used fallback path for convert_to_bitmask4 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask2 -; GISEL-NEXT: warning: Instruction selection used fallback path for clang_builtins_undef_concat_convert_to_bitmask4 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_no_compare -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_compare_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_trunc_in_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_unknown_type_in_long_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_different_types_in_chain +; GISEL: warning: Instruction selection used fallback path for clang_builtins_undef_concat_convert_to_bitmask4 ; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_2xi32 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_4xi8 ; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_8xi2 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_float -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_legalized_illegal_element_size ; GISEL-NEXT: warning: Instruction selection used fallback path for no_direct_convert_for_bad_concat -; GISEL-NEXT: warning: Instruction selection used fallback path for no_combine_illegal_num_elements define i16 @convert_to_bitmask16(<16 x i8> %vec) { ; Bits used in mask ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. (PR #121169)
aemerson wrote: Ehthe heck... I ended up somehow folding in the factoring out change from this PR into #121171 ... some weird `spr` bug? https://github.com/llvm/llvm-project/pull/121169 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Legalize vector boolean bitcasts to scalars by lowering via stack. (PR #121171)
https://github.com/aemerson updated https://github.com/llvm/llvm-project/pull/121171 >From b9214baba592d4c7860d714b6d0dffd519a48400 Mon Sep 17 00:00:00 2001 From: Amara Emerson Date: Fri, 27 Dec 2024 17:34:25 -0800 Subject: [PATCH 1/2] Factor out into funct. Created using spr 1.3.5 --- .../llvm/CodeGen/GlobalISel/LegalizerHelper.h | 3 + .../CodeGen/GlobalISel/LegalizerHelper.cpp| 47 +- .../AArch64/GISel/AArch64LegalizerInfo.cpp| 3 +- .../AArch64/GlobalISel/legalize-bitcast.mir | 59 +- .../legalize-store-vector-bools.mir | 68 +- .../AArch64/vec-combine-compare-to-bitmask.ll | 605 ++ 6 files changed, 640 insertions(+), 145 deletions(-) diff --git a/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h index fac059803b9489..4e18f5cc913a7e 100644 --- a/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h +++ b/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h @@ -302,6 +302,9 @@ class LegalizerHelper { /// same type as \p Res. MachineInstrBuilder createStackStoreLoad(const DstOp &Res, const SrcOp &Val); + /// Given a store of a boolean vector, scalarize it. + LegalizeResult scalarizeVectorBooleanStore(GStore &MI); + /// Get a pointer to vector element \p Index located in memory for a vector of /// type \p VecTy starting at a base address of \p VecPtr. If \p Index is out /// of bounds the returned pointer is unspecified, but will be within the diff --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp index 7dece931e8e0eb..0bfa897ecf4047 100644 --- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp +++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp @@ -4143,9 +4143,8 @@ LegalizerHelper::LegalizeResult LegalizerHelper::lowerStore(GStore &StoreMI) { } if (MemTy.isVector()) { -// TODO: Handle vector trunc stores if (MemTy != SrcTy) - return UnableToLegalize; + return scalarizeVectorBooleanStore(StoreMI); // TODO: We can do better than scalarizing the vector and at least split it // in half. @@ -4200,6 +4199,50 @@ LegalizerHelper::LegalizeResult LegalizerHelper::lowerStore(GStore &StoreMI) { return Legalized; } +LegalizerHelper::LegalizeResult +LegalizerHelper::scalarizeVectorBooleanStore(GStore &StoreMI) { + Register SrcReg = StoreMI.getValueReg(); + Register PtrReg = StoreMI.getPointerReg(); + LLT SrcTy = MRI.getType(SrcReg); + MachineMemOperand &MMO = **StoreMI.memoperands_begin(); + LLT MemTy = MMO.getMemoryType(); + LLT MemScalarTy = MemTy.getElementType(); + MachineFunction &MF = MIRBuilder.getMF(); + + assert(SrcTy.isVector() && "Expect a vector store type"); + + if (!MemScalarTy.isByteSized()) { +// We need to build an integer scalar of the vector bit pattern. +// It's not legal for us to add padding when storing a vector. +unsigned NumBits = MemTy.getSizeInBits(); +LLT IntTy = LLT::scalar(NumBits); +auto CurrVal = MIRBuilder.buildConstant(IntTy, 0); +LLT IdxTy = getLLTForMVT(TLI.getVectorIdxTy(MF.getDataLayout())); + +for (unsigned I = 0, E = MemTy.getNumElements(); I < E; ++I) { + auto Elt = MIRBuilder.buildExtractVectorElement( + SrcTy.getElementType(), SrcReg, MIRBuilder.buildConstant(IdxTy, I)); + auto Trunc = MIRBuilder.buildTrunc(MemScalarTy, Elt); + auto ZExt = MIRBuilder.buildZExt(IntTy, Trunc); + unsigned ShiftIntoIdx = MF.getDataLayout().isBigEndian() + ? (MemTy.getNumElements() - 1) - I + : I; + auto ShiftAmt = MIRBuilder.buildConstant( + IntTy, ShiftIntoIdx * MemScalarTy.getSizeInBits()); + auto Shifted = MIRBuilder.buildShl(IntTy, ZExt, ShiftAmt); + CurrVal = MIRBuilder.buildOr(IntTy, CurrVal, Shifted); +} +auto PtrInfo = MMO.getPointerInfo(); +auto *NewMMO = MF.getMachineMemOperand(&MMO, PtrInfo, IntTy); +MIRBuilder.buildStore(CurrVal, PtrReg, *NewMMO); +StoreMI.eraseFromParent(); +return Legalized; + } + + // TODO: implement simple scalarization. + return UnableToLegalize; +} + LegalizerHelper::LegalizeResult LegalizerHelper::bitcast(MachineInstr &MI, unsigned TypeIdx, LLT CastTy) { switch (MI.getOpcode()) { diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp index 2fac100f81519a..641f06530a5c23 100644 --- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp +++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp @@ -474,7 +474,8 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST) }) .customIf(IsPtrVecPred) .scalarizeIf(typeInSet(0, {v2s16, v2s8}), 0) - .scalarizeIf(scalarOrEltWiderThan(0, 64), 0); + .scalarizeIf(scalarOrEltWiderThan(0, 64), 0) + .lower(); getActionDefinitionsBuilder(G_INDEXED_STORE) // Id
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. (PR #121169)
https://github.com/aemerson updated https://github.com/llvm/llvm-project/pull/121169 >From a1c545bab55b0e9329044f469507149718a1d36f Mon Sep 17 00:00:00 2001 From: Amara Emerson Date: Thu, 26 Dec 2024 23:50:07 -0800 Subject: [PATCH] Add -aarch64-enable-collect-loh torun line to remove unnecessary LOH labels. Created using spr 1.3.5 --- .../AArch64/vec-combine-compare-to-bitmask.ll | 627 +- 1 file changed, 172 insertions(+), 455 deletions(-) diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll index 496f7ebf300e50..1fa96979f45530 100644 --- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll +++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll @@ -1,6 +1,6 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2 -; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,SDAG -; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -global-isel -global-isel-abort=2 -verify-machineinstrs < %s 2>&1 | FileCheck %s --check-prefixes=CHECK,GISEL +; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -aarch64-enable-collect-loh=false -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,SDAG +; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -aarch64-enable-collect-loh=false -global-isel -global-isel-abort=2 -verify-machineinstrs < %s 2>&1 | FileCheck %s --check-prefixes=CHECK,GISEL ; Basic tests from input vector to bitmask ; IR generated from clang for: @@ -26,10 +26,8 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) { ; Bits used in mask ; SDAG-LABEL: convert_to_bitmask16: ; SDAG: ; %bb.0: -; SDAG-NEXT: Lloh0: ; SDAG-NEXT:adrp x8, lCPI0_0@PAGE ; SDAG-NEXT:cmeq.16b v0, v0, #0 -; SDAG-NEXT: Lloh1: ; SDAG-NEXT:ldr q1, [x8, lCPI0_0@PAGEOFF] ; SDAG-NEXT:bic.16b v0, v1, v0 ; SDAG-NEXT:ext.16b v1, v0, v0, #8 @@ -37,7 +35,6 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) { ; SDAG-NEXT:addv.8h h0, v0 ; SDAG-NEXT:fmov w0, s0 ; SDAG-NEXT:ret -; SDAG-NEXT:.loh AdrpLdr Lloh0, Lloh1 ; ; GISEL-LABEL: convert_to_bitmask16: ; GISEL: ; %bb.0: @@ -106,17 +103,14 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) { define i16 @convert_to_bitmask8(<8 x i16> %vec) { ; SDAG-LABEL: convert_to_bitmask8: ; SDAG: ; %bb.0: -; SDAG-NEXT: Lloh2: ; SDAG-NEXT:adrp x8, lCPI1_0@PAGE ; SDAG-NEXT:cmeq.8h v0, v0, #0 -; SDAG-NEXT: Lloh3: ; SDAG-NEXT:ldr q1, [x8, lCPI1_0@PAGEOFF] ; SDAG-NEXT:bic.16b v0, v1, v0 ; SDAG-NEXT:addv.8h h0, v0 ; SDAG-NEXT:fmov w8, s0 ; SDAG-NEXT:and w0, w8, #0xff ; SDAG-NEXT:ret -; SDAG-NEXT:.loh AdrpLdr Lloh2, Lloh3 ; ; GISEL-LABEL: convert_to_bitmask8: ; GISEL: ; %bb.0: @@ -160,31 +154,15 @@ define i16 @convert_to_bitmask8(<8 x i16> %vec) { } define i4 @convert_to_bitmask4(<4 x i32> %vec) { -; SDAG-LABEL: convert_to_bitmask4: -; SDAG: ; %bb.0: -; SDAG-NEXT: Lloh4: -; SDAG-NEXT:adrp x8, lCPI2_0@PAGE -; SDAG-NEXT:cmeq.4s v0, v0, #0 -; SDAG-NEXT: Lloh5: -; SDAG-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF] -; SDAG-NEXT:bic.16b v0, v1, v0 -; SDAG-NEXT:addv.4s s0, v0 -; SDAG-NEXT:fmov w0, s0 -; SDAG-NEXT:ret -; SDAG-NEXT:.loh AdrpLdr Lloh4, Lloh5 -; -; GISEL-LABEL: convert_to_bitmask4: -; GISEL: ; %bb.0: -; GISEL-NEXT: Lloh0: -; GISEL-NEXT:adrp x8, lCPI2_0@PAGE -; GISEL-NEXT:cmeq.4s v0, v0, #0 -; GISEL-NEXT: Lloh1: -; GISEL-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF] -; GISEL-NEXT:bic.16b v0, v1, v0 -; GISEL-NEXT:addv.4s s0, v0 -; GISEL-NEXT:fmov w0, s0 -; GISEL-NEXT:ret -; GISEL-NEXT:.loh AdrpLdr Lloh0, Lloh1 +; CHECK-LABEL: convert_to_bitmask4: +; CHECK: ; %bb.0: +; CHECK-NEXT:adrp x8, lCPI2_0@PAGE +; CHECK-NEXT:cmeq.4s v0, v0, #0 +; CHECK-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF] +; CHECK-NEXT:bic.16b v0, v1, v0 +; CHECK-NEXT:addv.4s s0, v0 +; CHECK-NEXT:fmov w0, s0 +; CHECK-NEXT:ret %cmp_result = icmp ne <4 x i32> %vec, zeroinitializer @@ -193,33 +171,16 @@ define i4 @convert_to_bitmask4(<4 x i32> %vec) { } define i8 @convert_to_bitmask2(<2 x i64> %vec) { -; SDAG-LABEL: convert_to_bitmask2: -; SDAG: ; %bb.0: -; SDAG-NEXT: Lloh6: -; SDAG-NEXT:adrp x8, lCPI3_0@PAGE -; SDAG-NEXT:cmeq.2d v0, v0, #0 -; SDAG-NEXT: Lloh7: -; SDAG-NEXT:ldr q1, [x8, lCPI3_0@PAGEOFF] -; SDAG-NEXT:bic.16b v0, v1, v0 -; SDAG-NEXT:addp.2d d0, v0 -; SDAG-NEXT:fmov w8, s0 -; SDAG-NEXT:and w0, w8, #0x3 -; SDAG-NEXT:ret -; SDAG-NEXT:.loh AdrpLdr Lloh6, Lloh7 -; -; GISEL-LABEL: convert_to_bitmask2: -; GISEL: ; %bb.0: -; GISEL-NEXT: Lloh2: -; GISEL-NEXT:adrp x8, lCPI3_0@PAGE -; GISEL-NEXT:cmeq.2d v0, v0, #0 -; GISEL-NEXT: Lloh3: -; GISEL-NEXT:ldr q1, [x8, lCPI3_0@PAGEOFF] -; GISEL-NEXT:bic.16b v0, v1
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. (PR #121169)
https://github.com/aemerson updated https://github.com/llvm/llvm-project/pull/121169 >From a1c545bab55b0e9329044f469507149718a1d36f Mon Sep 17 00:00:00 2001 From: Amara Emerson Date: Thu, 26 Dec 2024 23:50:07 -0800 Subject: [PATCH] Add -aarch64-enable-collect-loh torun line to remove unnecessary LOH labels. Created using spr 1.3.5 --- .../AArch64/vec-combine-compare-to-bitmask.ll | 627 +- 1 file changed, 172 insertions(+), 455 deletions(-) diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll index 496f7ebf300e50..1fa96979f45530 100644 --- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll +++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll @@ -1,6 +1,6 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2 -; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,SDAG -; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -global-isel -global-isel-abort=2 -verify-machineinstrs < %s 2>&1 | FileCheck %s --check-prefixes=CHECK,GISEL +; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -aarch64-enable-collect-loh=false -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,SDAG +; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -aarch64-enable-collect-loh=false -global-isel -global-isel-abort=2 -verify-machineinstrs < %s 2>&1 | FileCheck %s --check-prefixes=CHECK,GISEL ; Basic tests from input vector to bitmask ; IR generated from clang for: @@ -26,10 +26,8 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) { ; Bits used in mask ; SDAG-LABEL: convert_to_bitmask16: ; SDAG: ; %bb.0: -; SDAG-NEXT: Lloh0: ; SDAG-NEXT:adrp x8, lCPI0_0@PAGE ; SDAG-NEXT:cmeq.16b v0, v0, #0 -; SDAG-NEXT: Lloh1: ; SDAG-NEXT:ldr q1, [x8, lCPI0_0@PAGEOFF] ; SDAG-NEXT:bic.16b v0, v1, v0 ; SDAG-NEXT:ext.16b v1, v0, v0, #8 @@ -37,7 +35,6 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) { ; SDAG-NEXT:addv.8h h0, v0 ; SDAG-NEXT:fmov w0, s0 ; SDAG-NEXT:ret -; SDAG-NEXT:.loh AdrpLdr Lloh0, Lloh1 ; ; GISEL-LABEL: convert_to_bitmask16: ; GISEL: ; %bb.0: @@ -106,17 +103,14 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) { define i16 @convert_to_bitmask8(<8 x i16> %vec) { ; SDAG-LABEL: convert_to_bitmask8: ; SDAG: ; %bb.0: -; SDAG-NEXT: Lloh2: ; SDAG-NEXT:adrp x8, lCPI1_0@PAGE ; SDAG-NEXT:cmeq.8h v0, v0, #0 -; SDAG-NEXT: Lloh3: ; SDAG-NEXT:ldr q1, [x8, lCPI1_0@PAGEOFF] ; SDAG-NEXT:bic.16b v0, v1, v0 ; SDAG-NEXT:addv.8h h0, v0 ; SDAG-NEXT:fmov w8, s0 ; SDAG-NEXT:and w0, w8, #0xff ; SDAG-NEXT:ret -; SDAG-NEXT:.loh AdrpLdr Lloh2, Lloh3 ; ; GISEL-LABEL: convert_to_bitmask8: ; GISEL: ; %bb.0: @@ -160,31 +154,15 @@ define i16 @convert_to_bitmask8(<8 x i16> %vec) { } define i4 @convert_to_bitmask4(<4 x i32> %vec) { -; SDAG-LABEL: convert_to_bitmask4: -; SDAG: ; %bb.0: -; SDAG-NEXT: Lloh4: -; SDAG-NEXT:adrp x8, lCPI2_0@PAGE -; SDAG-NEXT:cmeq.4s v0, v0, #0 -; SDAG-NEXT: Lloh5: -; SDAG-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF] -; SDAG-NEXT:bic.16b v0, v1, v0 -; SDAG-NEXT:addv.4s s0, v0 -; SDAG-NEXT:fmov w0, s0 -; SDAG-NEXT:ret -; SDAG-NEXT:.loh AdrpLdr Lloh4, Lloh5 -; -; GISEL-LABEL: convert_to_bitmask4: -; GISEL: ; %bb.0: -; GISEL-NEXT: Lloh0: -; GISEL-NEXT:adrp x8, lCPI2_0@PAGE -; GISEL-NEXT:cmeq.4s v0, v0, #0 -; GISEL-NEXT: Lloh1: -; GISEL-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF] -; GISEL-NEXT:bic.16b v0, v1, v0 -; GISEL-NEXT:addv.4s s0, v0 -; GISEL-NEXT:fmov w0, s0 -; GISEL-NEXT:ret -; GISEL-NEXT:.loh AdrpLdr Lloh0, Lloh1 +; CHECK-LABEL: convert_to_bitmask4: +; CHECK: ; %bb.0: +; CHECK-NEXT:adrp x8, lCPI2_0@PAGE +; CHECK-NEXT:cmeq.4s v0, v0, #0 +; CHECK-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF] +; CHECK-NEXT:bic.16b v0, v1, v0 +; CHECK-NEXT:addv.4s s0, v0 +; CHECK-NEXT:fmov w0, s0 +; CHECK-NEXT:ret %cmp_result = icmp ne <4 x i32> %vec, zeroinitializer @@ -193,33 +171,16 @@ define i4 @convert_to_bitmask4(<4 x i32> %vec) { } define i8 @convert_to_bitmask2(<2 x i64> %vec) { -; SDAG-LABEL: convert_to_bitmask2: -; SDAG: ; %bb.0: -; SDAG-NEXT: Lloh6: -; SDAG-NEXT:adrp x8, lCPI3_0@PAGE -; SDAG-NEXT:cmeq.2d v0, v0, #0 -; SDAG-NEXT: Lloh7: -; SDAG-NEXT:ldr q1, [x8, lCPI3_0@PAGEOFF] -; SDAG-NEXT:bic.16b v0, v1, v0 -; SDAG-NEXT:addp.2d d0, v0 -; SDAG-NEXT:fmov w8, s0 -; SDAG-NEXT:and w0, w8, #0x3 -; SDAG-NEXT:ret -; SDAG-NEXT:.loh AdrpLdr Lloh6, Lloh7 -; -; GISEL-LABEL: convert_to_bitmask2: -; GISEL: ; %bb.0: -; GISEL-NEXT: Lloh2: -; GISEL-NEXT:adrp x8, lCPI3_0@PAGE -; GISEL-NEXT:cmeq.2d v0, v0, #0 -; GISEL-NEXT: Lloh3: -; GISEL-NEXT:ldr q1, [x8, lCPI3_0@PAGEOFF] -; GISEL-NEXT:bic.16b v0, v1
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. (PR #121185)
https://github.com/aemerson updated https://github.com/llvm/llvm-project/pull/121185 >From 3efe80b9457a33c68362489fc8c946d51113856a Mon Sep 17 00:00:00 2001 From: Amara Emerson Date: Fri, 27 Dec 2024 00:06:30 -0800 Subject: [PATCH] Fix remark checks in test. Created using spr 1.3.5 --- .../AArch64/vec-combine-compare-to-bitmask.ll | 13 + 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll index cbb90c52835df8..7f3c1fdc93380e 100644 --- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll +++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll @@ -6,21 +6,10 @@ ; IR generated from clang for: ; __builtin_convertvector + reinterpret_cast -; GISEL: warning: Instruction selection used fallback path for convert_to_bitmask4 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask2 -; GISEL-NEXT: warning: Instruction selection used fallback path for clang_builtins_undef_concat_convert_to_bitmask4 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_no_compare -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_compare_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_trunc_in_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_unknown_type_in_long_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_different_types_in_chain +; GISEL: warning: Instruction selection used fallback path for clang_builtins_undef_concat_convert_to_bitmask4 ; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_2xi32 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_4xi8 ; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_8xi2 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_float -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_legalized_illegal_element_size ; GISEL-NEXT: warning: Instruction selection used fallback path for no_direct_convert_for_bad_concat -; GISEL-NEXT: warning: Instruction selection used fallback path for no_combine_illegal_num_elements define i16 @convert_to_bitmask16(<16 x i8> %vec) { ; Bits used in mask ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. (PR #121185)
https://github.com/aemerson updated https://github.com/llvm/llvm-project/pull/121185 >From 3efe80b9457a33c68362489fc8c946d51113856a Mon Sep 17 00:00:00 2001 From: Amara Emerson Date: Fri, 27 Dec 2024 00:06:30 -0800 Subject: [PATCH] Fix remark checks in test. Created using spr 1.3.5 --- .../AArch64/vec-combine-compare-to-bitmask.ll | 13 + 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll index cbb90c52835df8..7f3c1fdc93380e 100644 --- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll +++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll @@ -6,21 +6,10 @@ ; IR generated from clang for: ; __builtin_convertvector + reinterpret_cast -; GISEL: warning: Instruction selection used fallback path for convert_to_bitmask4 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask2 -; GISEL-NEXT: warning: Instruction selection used fallback path for clang_builtins_undef_concat_convert_to_bitmask4 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_no_compare -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_compare_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_trunc_in_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_unknown_type_in_long_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_different_types_in_chain +; GISEL: warning: Instruction selection used fallback path for clang_builtins_undef_concat_convert_to_bitmask4 ; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_2xi32 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_4xi8 ; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_8xi2 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_float -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_legalized_illegal_element_size ; GISEL-NEXT: warning: Instruction selection used fallback path for no_direct_convert_for_bad_concat -; GISEL-NEXT: warning: Instruction selection used fallback path for no_combine_illegal_num_elements define i16 @convert_to_bitmask16(<16 x i8> %vec) { ; Bits used in mask ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. (PR #121185)
https://github.com/aemerson updated https://github.com/llvm/llvm-project/pull/121185 >From 3efe80b9457a33c68362489fc8c946d51113856a Mon Sep 17 00:00:00 2001 From: Amara Emerson Date: Fri, 27 Dec 2024 00:06:30 -0800 Subject: [PATCH] Fix remark checks in test. Created using spr 1.3.5 --- .../AArch64/vec-combine-compare-to-bitmask.ll | 13 + 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll index cbb90c52835df8..7f3c1fdc93380e 100644 --- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll +++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll @@ -6,21 +6,10 @@ ; IR generated from clang for: ; __builtin_convertvector + reinterpret_cast -; GISEL: warning: Instruction selection used fallback path for convert_to_bitmask4 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask2 -; GISEL-NEXT: warning: Instruction selection used fallback path for clang_builtins_undef_concat_convert_to_bitmask4 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_no_compare -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_compare_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_trunc_in_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_unknown_type_in_long_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_different_types_in_chain +; GISEL: warning: Instruction selection used fallback path for clang_builtins_undef_concat_convert_to_bitmask4 ; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_2xi32 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_4xi8 ; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_8xi2 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_float -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_legalized_illegal_element_size ; GISEL-NEXT: warning: Instruction selection used fallback path for no_direct_convert_for_bad_concat -; GISEL-NEXT: warning: Instruction selection used fallback path for no_combine_illegal_num_elements define i16 @convert_to_bitmask16(<16 x i8> %vec) { ; Bits used in mask ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. (PR #121185)
https://github.com/aemerson updated https://github.com/llvm/llvm-project/pull/121185 >From 3efe80b9457a33c68362489fc8c946d51113856a Mon Sep 17 00:00:00 2001 From: Amara Emerson Date: Fri, 27 Dec 2024 00:06:30 -0800 Subject: [PATCH] Fix remark checks in test. Created using spr 1.3.5 --- .../AArch64/vec-combine-compare-to-bitmask.ll | 13 + 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll index cbb90c52835df8..7f3c1fdc93380e 100644 --- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll +++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll @@ -6,21 +6,10 @@ ; IR generated from clang for: ; __builtin_convertvector + reinterpret_cast -; GISEL: warning: Instruction selection used fallback path for convert_to_bitmask4 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask2 -; GISEL-NEXT: warning: Instruction selection used fallback path for clang_builtins_undef_concat_convert_to_bitmask4 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_no_compare -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_compare_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_trunc_in_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_unknown_type_in_long_chain -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_with_different_types_in_chain +; GISEL: warning: Instruction selection used fallback path for clang_builtins_undef_concat_convert_to_bitmask4 ; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_2xi32 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_4xi8 ; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_8xi2 -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_to_bitmask_float -; GISEL-NEXT: warning: Instruction selection used fallback path for convert_legalized_illegal_element_size ; GISEL-NEXT: warning: Instruction selection used fallback path for no_direct_convert_for_bad_concat -; GISEL-NEXT: warning: Instruction selection used fallback path for no_combine_illegal_num_elements define i16 @convert_to_bitmask16(<16 x i8> %vec) { ; Bits used in mask ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. (PR #121169)
aemerson wrote: Ok, should be fixed now. The factoring out change is now in this PR where it belongs. https://github.com/llvm/llvm-project/pull/121169 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. (PR #121169)
https://github.com/aemerson updated https://github.com/llvm/llvm-project/pull/121169 >From a1c545bab55b0e9329044f469507149718a1d36f Mon Sep 17 00:00:00 2001 From: Amara Emerson Date: Thu, 26 Dec 2024 23:50:07 -0800 Subject: [PATCH 1/2] Add -aarch64-enable-collect-loh torun line to remove unnecessary LOH labels. Created using spr 1.3.5 --- .../AArch64/vec-combine-compare-to-bitmask.ll | 627 +- 1 file changed, 172 insertions(+), 455 deletions(-) diff --git a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll index 496f7ebf300e50..1fa96979f45530 100644 --- a/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll +++ b/llvm/test/CodeGen/AArch64/vec-combine-compare-to-bitmask.ll @@ -1,6 +1,6 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2 -; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,SDAG -; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -global-isel -global-isel-abort=2 -verify-machineinstrs < %s 2>&1 | FileCheck %s --check-prefixes=CHECK,GISEL +; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -aarch64-enable-collect-loh=false -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,SDAG +; RUN: llc -mtriple=aarch64-apple-darwin -mattr=+neon -aarch64-enable-collect-loh=false -global-isel -global-isel-abort=2 -verify-machineinstrs < %s 2>&1 | FileCheck %s --check-prefixes=CHECK,GISEL ; Basic tests from input vector to bitmask ; IR generated from clang for: @@ -26,10 +26,8 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) { ; Bits used in mask ; SDAG-LABEL: convert_to_bitmask16: ; SDAG: ; %bb.0: -; SDAG-NEXT: Lloh0: ; SDAG-NEXT:adrp x8, lCPI0_0@PAGE ; SDAG-NEXT:cmeq.16b v0, v0, #0 -; SDAG-NEXT: Lloh1: ; SDAG-NEXT:ldr q1, [x8, lCPI0_0@PAGEOFF] ; SDAG-NEXT:bic.16b v0, v1, v0 ; SDAG-NEXT:ext.16b v1, v0, v0, #8 @@ -37,7 +35,6 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) { ; SDAG-NEXT:addv.8h h0, v0 ; SDAG-NEXT:fmov w0, s0 ; SDAG-NEXT:ret -; SDAG-NEXT:.loh AdrpLdr Lloh0, Lloh1 ; ; GISEL-LABEL: convert_to_bitmask16: ; GISEL: ; %bb.0: @@ -106,17 +103,14 @@ define i16 @convert_to_bitmask16(<16 x i8> %vec) { define i16 @convert_to_bitmask8(<8 x i16> %vec) { ; SDAG-LABEL: convert_to_bitmask8: ; SDAG: ; %bb.0: -; SDAG-NEXT: Lloh2: ; SDAG-NEXT:adrp x8, lCPI1_0@PAGE ; SDAG-NEXT:cmeq.8h v0, v0, #0 -; SDAG-NEXT: Lloh3: ; SDAG-NEXT:ldr q1, [x8, lCPI1_0@PAGEOFF] ; SDAG-NEXT:bic.16b v0, v1, v0 ; SDAG-NEXT:addv.8h h0, v0 ; SDAG-NEXT:fmov w8, s0 ; SDAG-NEXT:and w0, w8, #0xff ; SDAG-NEXT:ret -; SDAG-NEXT:.loh AdrpLdr Lloh2, Lloh3 ; ; GISEL-LABEL: convert_to_bitmask8: ; GISEL: ; %bb.0: @@ -160,31 +154,15 @@ define i16 @convert_to_bitmask8(<8 x i16> %vec) { } define i4 @convert_to_bitmask4(<4 x i32> %vec) { -; SDAG-LABEL: convert_to_bitmask4: -; SDAG: ; %bb.0: -; SDAG-NEXT: Lloh4: -; SDAG-NEXT:adrp x8, lCPI2_0@PAGE -; SDAG-NEXT:cmeq.4s v0, v0, #0 -; SDAG-NEXT: Lloh5: -; SDAG-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF] -; SDAG-NEXT:bic.16b v0, v1, v0 -; SDAG-NEXT:addv.4s s0, v0 -; SDAG-NEXT:fmov w0, s0 -; SDAG-NEXT:ret -; SDAG-NEXT:.loh AdrpLdr Lloh4, Lloh5 -; -; GISEL-LABEL: convert_to_bitmask4: -; GISEL: ; %bb.0: -; GISEL-NEXT: Lloh0: -; GISEL-NEXT:adrp x8, lCPI2_0@PAGE -; GISEL-NEXT:cmeq.4s v0, v0, #0 -; GISEL-NEXT: Lloh1: -; GISEL-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF] -; GISEL-NEXT:bic.16b v0, v1, v0 -; GISEL-NEXT:addv.4s s0, v0 -; GISEL-NEXT:fmov w0, s0 -; GISEL-NEXT:ret -; GISEL-NEXT:.loh AdrpLdr Lloh0, Lloh1 +; CHECK-LABEL: convert_to_bitmask4: +; CHECK: ; %bb.0: +; CHECK-NEXT:adrp x8, lCPI2_0@PAGE +; CHECK-NEXT:cmeq.4s v0, v0, #0 +; CHECK-NEXT:ldr q1, [x8, lCPI2_0@PAGEOFF] +; CHECK-NEXT:bic.16b v0, v1, v0 +; CHECK-NEXT:addv.4s s0, v0 +; CHECK-NEXT:fmov w0, s0 +; CHECK-NEXT:ret %cmp_result = icmp ne <4 x i32> %vec, zeroinitializer @@ -193,33 +171,16 @@ define i4 @convert_to_bitmask4(<4 x i32> %vec) { } define i8 @convert_to_bitmask2(<2 x i64> %vec) { -; SDAG-LABEL: convert_to_bitmask2: -; SDAG: ; %bb.0: -; SDAG-NEXT: Lloh6: -; SDAG-NEXT:adrp x8, lCPI3_0@PAGE -; SDAG-NEXT:cmeq.2d v0, v0, #0 -; SDAG-NEXT: Lloh7: -; SDAG-NEXT:ldr q1, [x8, lCPI3_0@PAGEOFF] -; SDAG-NEXT:bic.16b v0, v1, v0 -; SDAG-NEXT:addp.2d d0, v0 -; SDAG-NEXT:fmov w8, s0 -; SDAG-NEXT:and w0, w8, #0x3 -; SDAG-NEXT:ret -; SDAG-NEXT:.loh AdrpLdr Lloh6, Lloh7 -; -; GISEL-LABEL: convert_to_bitmask2: -; GISEL: ; %bb.0: -; GISEL-NEXT: Lloh2: -; GISEL-NEXT:adrp x8, lCPI3_0@PAGE -; GISEL-NEXT:cmeq.2d v0, v0, #0 -; GISEL-NEXT: Lloh3: -; GISEL-NEXT:ldr q1, [x8, lCPI3_0@PAGEOFF] -; GISEL-NEXT:bic.16b v0
[llvm-branch-commits] [clang] release/20.x: [AArch64] Enable vscale_range with +sme (#124466) (PR #125386)
https://github.com/aemerson approved this pull request. https://github.com/llvm/llvm-project/pull/125386 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)
@@ -129,6 +147,245 @@ bool AlwaysInlineImpl( return Changed; } +/// Promote allocas to registers if possible. +static void promoteAllocas( +Function *Caller, SmallPtrSetImpl &AllocasToPromote, +function_ref &GetAssumptionCache) { + if (AllocasToPromote.empty()) +return; + + SmallVector PromotableAllocas; + llvm::copy_if(AllocasToPromote, std::back_inserter(PromotableAllocas), +isAllocaPromotable); + if (PromotableAllocas.empty()) +return; + + DominatorTree DT(*Caller); + AssumptionCache &AC = GetAssumptionCache(*Caller); + PromoteMemToReg(PromotableAllocas, DT, &AC); + NumAllocasPromoted += PromotableAllocas.size(); + // Emit a remark for the promotion. + OptimizationRemarkEmitter ORE(Caller); + DebugLoc DLoc = Caller->getEntryBlock().getTerminator()->getDebugLoc(); + ORE.emit([&]() { +return OptimizationRemark(DEBUG_TYPE, "PromoteAllocas", DLoc, + &Caller->getEntryBlock()) + << "Promoting " << ore::NV("NumAlloca", PromotableAllocas.size()) + << " allocas to SSA registers in function '" + << ore::NV("Function", Caller) << "'"; + }); + LLVM_DEBUG(dbgs() << "Promoted " << PromotableAllocas.size() +<< " allocas to registers in function " << Caller->getName() +<< "\n"); +} + +/// We use a different visitation order of functions here to solve a phase +/// ordering problem. After inlining, a caller function may have allocas that +/// were previously used for passing reference arguments to the callee that +/// are now promotable to registers, using SROA/mem2reg. However if we just let +/// the AlwaysInliner continue inlining everything at once, the later SROA pass +/// in the pipeline will end up placing phis for these allocas into blocks along +/// the dominance frontier which may extend further than desired (e.g. loop +/// headers). This can happen when the caller is then inlined into another +/// caller, and the allocas end up hoisted further before SROA is run. +/// +/// Instead what we want is to try to do, as best as we can, is to inline leaf +/// functions into callers, and then run PromoteMemToReg() on the allocas that +/// were passed into the callee before it was inlined. +/// +/// We want to do this *before* the caller is inlined into another caller +/// because we want the alloca promotion to happen before its scope extends too +/// far because of further inlining. +/// +/// Here's a simple pseudo-example: +/// outermost_caller() { +/// for (...) { +/// middle_caller(); +/// } +/// } +/// +/// middle_caller() { +/// int stack_var; +/// inner_callee(&stack_var); +/// } +/// +/// inner_callee(int *x) { +/// // Do something with x. +/// } +/// +/// In this case, we want to inline inner_callee() into middle_caller() and +/// then promote stack_var to a register before we inline middle_caller() into +/// outermost_caller(). The regular always_inliner would inline everything at +/// once, and then SROA/mem2reg would promote stack_var to a register but in +/// the context of outermost_caller() which is not what we want. aemerson wrote: Sure. The problem is that mem2reg promotion has to place phi nodes for the value along the dominance frontier. This frontier is different depending on inlining order. For allocas, what you want is to insert phis when the size of the dominance frontier is as small as possible. The motivation is that allocas inside nested loops can "leak" phis beyond the innermost loop header, and that's bad for register pressure. The main inline already handles this because the pass manager interleaves optimizations with inlining, but for always-inliner we don't have that capability. https://github.com/llvm/llvm-project/pull/145613 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)
https://github.com/aemerson edited https://github.com/llvm/llvm-project/pull/145613 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)
https://github.com/aemerson updated https://github.com/llvm/llvm-project/pull/145613 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)
https://github.com/aemerson updated https://github.com/llvm/llvm-project/pull/145613 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)
aemerson wrote: > ⚠️ undef deprecator found issues in your code. ⚠️ This looks to be just the IR output containing undef, not the input. https://github.com/llvm/llvm-project/pull/145613 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)
https://github.com/aemerson edited https://github.com/llvm/llvm-project/pull/145613 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)
@@ -129,6 +147,245 @@ bool AlwaysInlineImpl( return Changed; } +/// Promote allocas to registers if possible. +static void promoteAllocas( +Function *Caller, SmallPtrSetImpl &AllocasToPromote, +function_ref &GetAssumptionCache) { + if (AllocasToPromote.empty()) +return; + + SmallVector PromotableAllocas; + llvm::copy_if(AllocasToPromote, std::back_inserter(PromotableAllocas), +isAllocaPromotable); + if (PromotableAllocas.empty()) +return; + + DominatorTree DT(*Caller); + AssumptionCache &AC = GetAssumptionCache(*Caller); + PromoteMemToReg(PromotableAllocas, DT, &AC); + NumAllocasPromoted += PromotableAllocas.size(); + // Emit a remark for the promotion. + OptimizationRemarkEmitter ORE(Caller); + DebugLoc DLoc = Caller->getEntryBlock().getTerminator()->getDebugLoc(); + ORE.emit([&]() { +return OptimizationRemark(DEBUG_TYPE, "PromoteAllocas", DLoc, + &Caller->getEntryBlock()) + << "Promoting " << ore::NV("NumAlloca", PromotableAllocas.size()) + << " allocas to SSA registers in function '" + << ore::NV("Function", Caller) << "'"; + }); + LLVM_DEBUG(dbgs() << "Promoted " << PromotableAllocas.size() +<< " allocas to registers in function " << Caller->getName() +<< "\n"); +} + +/// We use a different visitation order of functions here to solve a phase +/// ordering problem. After inlining, a caller function may have allocas that +/// were previously used for passing reference arguments to the callee that +/// are now promotable to registers, using SROA/mem2reg. However if we just let +/// the AlwaysInliner continue inlining everything at once, the later SROA pass +/// in the pipeline will end up placing phis for these allocas into blocks along +/// the dominance frontier which may extend further than desired (e.g. loop +/// headers). This can happen when the caller is then inlined into another +/// caller, and the allocas end up hoisted further before SROA is run. +/// +/// Instead what we want is to try to do, as best as we can, is to inline leaf +/// functions into callers, and then run PromoteMemToReg() on the allocas that +/// were passed into the callee before it was inlined. +/// +/// We want to do this *before* the caller is inlined into another caller +/// because we want the alloca promotion to happen before its scope extends too +/// far because of further inlining. +/// +/// Here's a simple pseudo-example: +/// outermost_caller() { +/// for (...) { +/// middle_caller(); +/// } +/// } +/// +/// middle_caller() { +/// int stack_var; +/// inner_callee(&stack_var); +/// } +/// +/// inner_callee(int *x) { +/// // Do something with x. +/// } +/// +/// In this case, we want to inline inner_callee() into middle_caller() and +/// then promote stack_var to a register before we inline middle_caller() into +/// outermost_caller(). The regular always_inliner would inline everything at +/// once, and then SROA/mem2reg would promote stack_var to a register but in +/// the context of outermost_caller() which is not what we want. aemerson wrote: Yes the traversal order matters here, because for optimal codegen we want mem2reg to happen between the inner->middle and middle->outer inlines. If you don't the other way around mem2reg can't do anything until the final inner->outer inline and by that point it's too late. For now I think only this promotion is a known issue, I don't know of general issues with simplification. https://github.com/llvm/llvm-project/pull/145613 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)
@@ -129,6 +147,245 @@ bool AlwaysInlineImpl( return Changed; } +/// Promote allocas to registers if possible. +static void promoteAllocas( +Function *Caller, SmallPtrSetImpl &AllocasToPromote, +function_ref &GetAssumptionCache) { + if (AllocasToPromote.empty()) +return; + + SmallVector PromotableAllocas; + llvm::copy_if(AllocasToPromote, std::back_inserter(PromotableAllocas), +isAllocaPromotable); + if (PromotableAllocas.empty()) +return; + + DominatorTree DT(*Caller); + AssumptionCache &AC = GetAssumptionCache(*Caller); + PromoteMemToReg(PromotableAllocas, DT, &AC); + NumAllocasPromoted += PromotableAllocas.size(); + // Emit a remark for the promotion. + OptimizationRemarkEmitter ORE(Caller); + DebugLoc DLoc = Caller->getEntryBlock().getTerminator()->getDebugLoc(); + ORE.emit([&]() { +return OptimizationRemark(DEBUG_TYPE, "PromoteAllocas", DLoc, + &Caller->getEntryBlock()) + << "Promoting " << ore::NV("NumAlloca", PromotableAllocas.size()) + << " allocas to SSA registers in function '" + << ore::NV("Function", Caller) << "'"; + }); + LLVM_DEBUG(dbgs() << "Promoted " << PromotableAllocas.size() +<< " allocas to registers in function " << Caller->getName() +<< "\n"); +} + +/// We use a different visitation order of functions here to solve a phase +/// ordering problem. After inlining, a caller function may have allocas that +/// were previously used for passing reference arguments to the callee that +/// are now promotable to registers, using SROA/mem2reg. However if we just let +/// the AlwaysInliner continue inlining everything at once, the later SROA pass +/// in the pipeline will end up placing phis for these allocas into blocks along +/// the dominance frontier which may extend further than desired (e.g. loop +/// headers). This can happen when the caller is then inlined into another +/// caller, and the allocas end up hoisted further before SROA is run. +/// +/// Instead what we want is to try to do, as best as we can, is to inline leaf +/// functions into callers, and then run PromoteMemToReg() on the allocas that +/// were passed into the callee before it was inlined. +/// +/// We want to do this *before* the caller is inlined into another caller +/// because we want the alloca promotion to happen before its scope extends too +/// far because of further inlining. +/// +/// Here's a simple pseudo-example: +/// outermost_caller() { +/// for (...) { +/// middle_caller(); +/// } +/// } +/// +/// middle_caller() { +/// int stack_var; +/// inner_callee(&stack_var); +/// } +/// +/// inner_callee(int *x) { +/// // Do something with x. +/// } +/// +/// In this case, we want to inline inner_callee() into middle_caller() and +/// then promote stack_var to a register before we inline middle_caller() into +/// outermost_caller(). The regular always_inliner would inline everything at +/// once, and then SROA/mem2reg would promote stack_var to a register but in +/// the context of outermost_caller() which is not what we want. aemerson wrote: > In that context, could the problem addressed here be decoupled from inlining > order? It seems like it'd result in a more robust system. I don't *think* so, unless there's something I've missed. Before doing this I tried other approaches, such as: - Trying to detect these over-extended PHIs and then demoting them back to allocas. Didn't work as we end up pessimizing codegen. - Avoiding hoisting large vector allocas to the entry block, in order to block mem2reg. This works but is conceptually the wrong place to do it (no other heuristics code exists there). I wasn't aware of ModuleInliner. Is the long term plan for it to replace the existing inliner? If so we could in future merge it with AlwaysInliner and if we interleave optimization as the current SCC manager does then this should fix the problem. https://github.com/llvm/llvm-project/pull/145613 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)
aemerson wrote: ping https://github.com/llvm/llvm-project/pull/145613 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)
https://github.com/aemerson updated https://github.com/llvm/llvm-project/pull/145613 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)
aemerson wrote: I managed to reduce down the original SME test to `Transforms/PhaseOrdering/always-inline-alloca-promotion.ll`. Compiling that to assembly with clang with and without the change shows the differences in codegen quality, but the IR shows the kind of scenario this patch is meant to handle. https://github.com/llvm/llvm-project/pull/145613 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)
https://github.com/aemerson updated https://github.com/llvm/llvm-project/pull/145613 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits