[llvm-branch-commits] [clang] PR for llvm/llvm-project#79564 (PR #79566)
https://github.com/david-arm approved this pull request. LGTM! https://github.com/llvm/llvm-project/pull/79566 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80140 (PR #80141)
https://github.com/david-arm approved this pull request. LGTM. This is a critical fix for SME to ensure correct behaviour and prevent stack corruption. https://github.com/llvm/llvm-project/pull/80141 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] 6584a9a - [release][docs] Update contributions to LLVM 12 for scalable vectors.
Author: David Sherwood Date: 2021-02-18T09:07:28Z New Revision: 6584a9a4c55e10c055f9f450798b826a9624d82f URL: https://github.com/llvm/llvm-project/commit/6584a9a4c55e10c055f9f450798b826a9624d82f DIFF: https://github.com/llvm/llvm-project/commit/6584a9a4c55e10c055f9f450798b826a9624d82f.diff LOG: [release][docs] Update contributions to LLVM 12 for scalable vectors. Differential Revision: https://reviews.llvm.org/D96270 Added: Modified: clang/docs/ReleaseNotes.rst Removed: diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index f4ca8a855142..a43cc33988ab 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -144,6 +144,18 @@ New Pragmas in Clang - ... +Modified Pragmas in Clang +- + +- The "#pragma clang loop vectorize_width" has been extended to support an + optional 'fixed|scalable' argument, which can be used to indicate that the + compiler should use fixed-width or scalable vectorization. Fixed-width is + assumed by default. + + Scalable or vector length agnostic vectorization is an experimental feature + for targets that support scalable vectors. For more information please refer + to the Clang Language Extensions documentation. + Attribute Changes in Clang -- ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 4cd4853 - [NFC][InstructionCost] Use InstructionCost in Transforms/Scalar/RewriteStatepointsForGC.cpp
Author: David Sherwood Date: 2021-01-13T09:42:58Z New Revision: 4cd48535eca06245c89a9158844bb177c6f8eb63 URL: https://github.com/llvm/llvm-project/commit/4cd48535eca06245c89a9158844bb177c6f8eb63 DIFF: https://github.com/llvm/llvm-project/commit/4cd48535eca06245c89a9158844bb177c6f8eb63.diff LOG: [NFC][InstructionCost] Use InstructionCost in Transforms/Scalar/RewriteStatepointsForGC.cpp In places where we calculate costs using TTI.getXXXCost() interfaces I have changed the code to use InstructionCost instead of unsigned. The change is non functional since InstructionCost behaves in the same way as an integer for valid costs. Currently the getXXXCost() functions used in this file do not return invalid costs. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential revision: https://reviews.llvm.org/D94484 Added: Modified: llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp Removed: diff --git a/llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp b/llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp index 68ddebf113d1..6a95ec3a6576 100644 --- a/llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp +++ b/llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp @@ -2110,10 +2110,10 @@ static Value* findRematerializableChainToBasePointer( // Helper function for the "rematerializeLiveValues". Compute cost of the use // chain we are going to rematerialize. -static unsigned -chainToBasePointerCost(SmallVectorImpl &Chain, +static InstructionCost +chainToBasePointerCost(SmallVectorImpl &Chain, TargetTransformInfo &TTI) { - unsigned Cost = 0; + InstructionCost Cost = 0; for (Instruction *Instr : Chain) { if (CastInst *CI = dyn_cast(Instr)) { @@ -2220,7 +2220,7 @@ static void rematerializeLiveValues(CallBase *Call, assert(Info.LiveSet.count(AlternateRootPhi)); } // Compute cost of this chain -unsigned Cost = chainToBasePointerCost(ChainToBase, TTI); +InstructionCost Cost = chainToBasePointerCost(ChainToBase, TTI); // TODO: We can also account for cases when we will be able to remove some // of the rematerialized values by later optimization passes. I.e if // we rematerialized several intersecting chains. Or if original values ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] c3ce262 - [NFC] Make remaining cost functions in LoopVectorize.cpp use InstructionCost
Author: David Sherwood Date: 2021-01-19T09:08:40Z New Revision: c3ce2627949eee3b5d3012db78f670919a49b35d URL: https://github.com/llvm/llvm-project/commit/c3ce2627949eee3b5d3012db78f670919a49b35d DIFF: https://github.com/llvm/llvm-project/commit/c3ce2627949eee3b5d3012db78f670919a49b35d.diff LOG: [NFC] Make remaining cost functions in LoopVectorize.cpp use InstructionCost A previous patch has already changed getInstructionCost to return an InstructionCost type. This patch changes the other various getXXXCost functions to return an InstructionCost too. This is a non-functional change - I've added a few asserts that the costs are valid in places where we're selecting between vector call and intrinsic costs. However, since we don't yet return invalid costs from any of the TTI implementations these asserts should not fire. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D94065 Added: Modified: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Removed: diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index 5ae400fb5dc9..50e4ef01b616 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -1385,7 +1385,7 @@ class LoopVectorizationCostModel { /// Save vectorization decision \p W and \p Cost taken by the cost model for /// instruction \p I and vector width \p VF. void setWideningDecision(Instruction *I, ElementCount VF, InstWidening W, - unsigned Cost) { + InstructionCost Cost) { assert(VF.isVector() && "Expected VF >=2"); WideningDecisions[std::make_pair(I, VF)] = std::make_pair(W, Cost); } @@ -1393,7 +1393,8 @@ class LoopVectorizationCostModel { /// Save vectorization decision \p W and \p Cost taken by the cost model for /// interleaving group \p Grp and vector width \p VF. void setWideningDecision(const InterleaveGroup *Grp, - ElementCount VF, InstWidening W, unsigned Cost) { + ElementCount VF, InstWidening W, + InstructionCost Cost) { assert(VF.isVector() && "Expected VF >=2"); /// Broadcast this decicion to all instructions inside the group. /// But the cost will be assigned to one instruction only. @@ -1426,7 +1427,7 @@ class LoopVectorizationCostModel { /// Return the vectorization cost for the given instruction \p I and vector /// width \p VF. - unsigned getWideningCost(Instruction *I, ElementCount VF) { + InstructionCost getWideningCost(Instruction *I, ElementCount VF) { assert(VF.isVector() && "Expected VF >=2"); std::pair InstOnVF = std::make_pair(I, VF); assert(WideningDecisions.find(InstOnVF) != WideningDecisions.end() && @@ -1604,15 +1605,15 @@ class LoopVectorizationCostModel { /// Estimate cost of an intrinsic call instruction CI if it were vectorized /// with factor VF. Return the cost of the instruction, including /// scalarization overhead if it's needed. - unsigned getVectorIntrinsicCost(CallInst *CI, ElementCount VF); + InstructionCost getVectorIntrinsicCost(CallInst *CI, ElementCount VF); /// Estimate cost of a call instruction CI if it were vectorized with factor /// VF. Return the cost of the instruction, including scalarization overhead /// if it's needed. The flag NeedToScalarize shows if the call needs to be /// scalarized - /// i.e. either vector version isn't available, or is too expensive. - unsigned getVectorCallCost(CallInst *CI, ElementCount VF, - bool &NeedToScalarize); + InstructionCost getVectorCallCost(CallInst *CI, ElementCount VF, +bool &NeedToScalarize); /// Invalidates decisions already taken by the cost model. void invalidateCostModelingDecisions() { @@ -1655,30 +1656,30 @@ class LoopVectorizationCostModel { Type *&VectorTy); /// Calculate vectorization cost of memory instruction \p I. - unsigned getMemoryInstructionCost(Instruction *I, ElementCount VF); + InstructionCost getMemoryInstructionCost(Instruction *I, ElementCount VF); /// The cost computation for scalarized memory instruction. - unsigned getMemInstScalarizationCost(Instruction *I, ElementCount VF); + InstructionCost getMemInstScalarizationCost(Instruction *I, ElementCount VF); /// The cost computation for interleaving group of memory instructions. - unsigned getInterleaveGroupCost(Instruction *I, ElementCount VF); + InstructionCost getInterleaveGroupCost(Instruction *I, ElementCount VF); /// The cost computation for Gather/Scatter instruction. -
[llvm-branch-commits] [llvm] 255a507 - [NFC][InstructionCost] Use InstructionCost in lib/Transforms/IPO/IROutliner.cpp
Author: David Sherwood Date: 2021-01-20T08:33:59Z New Revision: 255a507716bca63a375f3b8a379ccbbc58cb40da URL: https://github.com/llvm/llvm-project/commit/255a507716bca63a375f3b8a379ccbbc58cb40da DIFF: https://github.com/llvm/llvm-project/commit/255a507716bca63a375f3b8a379ccbbc58cb40da.diff LOG: [NFC][InstructionCost] Use InstructionCost in lib/Transforms/IPO/IROutliner.cpp In places where we call a TTI.getXXCost() function I have changed the code to use InstructionCost instead of unsigned. This is in preparation for later on when we will change the TTI interfaces to return InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D94427 Added: Modified: llvm/include/llvm/Transforms/IPO/IROutliner.h llvm/lib/Transforms/IPO/IROutliner.cpp Removed: diff --git a/llvm/include/llvm/Transforms/IPO/IROutliner.h b/llvm/include/llvm/Transforms/IPO/IROutliner.h index 0346803e9ad7..eefcbe5235c1 100644 --- a/llvm/include/llvm/Transforms/IPO/IROutliner.h +++ b/llvm/include/llvm/Transforms/IPO/IROutliner.h @@ -44,6 +44,7 @@ #include "llvm/Analysis/IRSimilarityIdentifier.h" #include "llvm/IR/PassManager.h" #include "llvm/IR/ValueMap.h" +#include "llvm/Support/InstructionCost.h" #include "llvm/Transforms/Utils/CodeExtractor.h" #include @@ -150,7 +151,7 @@ struct OutlinableRegion { /// /// \param [in] TTI - The TargetTransformInfo for the parent function. /// \returns the code size of the region - unsigned getBenefit(TargetTransformInfo &TTI); + InstructionCost getBenefit(TargetTransformInfo &TTI); }; /// This class is a pass that identifies similarity in a Module, extracts @@ -214,14 +215,14 @@ class IROutliner { /// \param [in] CurrentGroup - The collection of OutlinableRegions to be /// analyzed. /// \returns the number of outlined instructions across all regions. - unsigned findBenefitFromAllRegions(OutlinableGroup &CurrentGroup); + InstructionCost findBenefitFromAllRegions(OutlinableGroup &CurrentGroup); /// Find the number of instructions that will be added by reloading arguments. /// /// \param [in] CurrentGroup - The collection of OutlinableRegions to be /// analyzed. /// \returns the number of added reload instructions across all regions. - unsigned findCostOutputReloads(OutlinableGroup &CurrentGroup); + InstructionCost findCostOutputReloads(OutlinableGroup &CurrentGroup); /// Find the cost and the benefit of \p CurrentGroup and save it back to /// \p CurrentGroup. diff --git a/llvm/lib/Transforms/IPO/IROutliner.cpp b/llvm/lib/Transforms/IPO/IROutliner.cpp index 909e26b9a6e1..4b6a4f3d8fc4 100644 --- a/llvm/lib/Transforms/IPO/IROutliner.cpp +++ b/llvm/lib/Transforms/IPO/IROutliner.cpp @@ -86,10 +86,10 @@ struct OutlinableGroup { /// The number of instructions that will be outlined by extracting \ref /// Regions. - unsigned Benefit = 0; + InstructionCost Benefit = 0; /// The number of added instructions needed for the outlining of the \ref /// Regions. - unsigned Cost = 0; + InstructionCost Cost = 0; /// The argument that needs to be marked with the swifterr attribute. If not /// needed, there is no value. @@ -243,8 +243,8 @@ constantMatches(Value *V, unsigned GVN, return false; } -unsigned OutlinableRegion::getBenefit(TargetTransformInfo &TTI) { - InstructionCost Benefit(0); +InstructionCost OutlinableRegion::getBenefit(TargetTransformInfo &TTI) { + InstructionCost Benefit = 0; // Estimate the benefit of outlining a specific sections of the program. We // delegate mostly this task to the TargetTransformInfo so that if the target @@ -274,7 +274,7 @@ unsigned OutlinableRegion::getBenefit(TargetTransformInfo &TTI) { } } - return *Benefit.getValue(); + return Benefit; } /// Find whether \p Region matches the global value numbering to Constant @@ -1287,8 +1287,9 @@ void IROutliner::pruneIncompatibleRegions( } } -unsigned IROutliner::findBenefitFromAllRegions(OutlinableGroup &CurrentGroup) { - unsigned RegionBenefit = 0; +InstructionCost +IROutliner::findBenefitFromAllRegions(OutlinableGroup &CurrentGroup) { + InstructionCost RegionBenefit = 0; for (OutlinableRegion *Region : CurrentGroup.Regions) { TargetTransformInfo &TTI = getTTI(*Region->StartBB->getParent()); // We add the number of instructions in the region to the benefit as an @@ -1301,8 +1302,9 @@ unsigned IROutliner::findBenefitFromAllRegions(OutlinableGroup &CurrentGroup) { return RegionBenefit; } -unsigned IROutliner::findCostOutputReloads(OutlinableGroup &CurrentGroup) { - unsigned OverallCost = 0; +InstructionCost +IROutliner::findCostOutputReloads(OutlinableGroup &CurrentGroup) { + InstructionCost OverallCo
[llvm-branch-commits] [llvm] 2e080eb - [SVE] Add support for scalable vectorization of loops with selects and cmps
Author: David Sherwood Date: 2021-01-22T09:48:13Z New Revision: 2e080eb00ad76654313e0e119bb7fa0ffe2f9866 URL: https://github.com/llvm/llvm-project/commit/2e080eb00ad76654313e0e119bb7fa0ffe2f9866 DIFF: https://github.com/llvm/llvm-project/commit/2e080eb00ad76654313e0e119bb7fa0ffe2f9866.diff LOG: [SVE] Add support for scalable vectorization of loops with selects and cmps I have removed an unnecessary assert in LoopVectorizationCostModel::getInstructionCost that prevented a cost being calculated for select instructions when using scalable vectors. In addition, I have changed AArch64TTIImpl::getCmpSelInstrCost to only do special cost calculations for fixed width vectors and fall back to the base version for scalable vectors. I have added a simple cost model test for cmps and selects: test/Analysis/CostModel/sve-cmpsel.ll and some simple tests that show we vectorize loops with cmp and select: test/Transforms/LoopVectorize/AArch64/sve-basic-vec.ll Differential Revision: https://reviews.llvm.org/D95039 Added: llvm/test/Analysis/CostModel/sve-cmpsel.ll llvm/test/Transforms/LoopVectorize/AArch64/sve-basic-vec.ll Modified: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Removed: diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp index ffa045846e59..7fda6b8fb602 100644 --- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp +++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp @@ -707,7 +707,7 @@ int AArch64TTIImpl::getCmpSelInstrCost(unsigned Opcode, Type *ValTy, int ISD = TLI->InstructionOpcodeToISD(Opcode); // We don't lower some vector selects well that are wider than the register // width. - if (ValTy->isVectorTy() && ISD == ISD::SELECT) { + if (isa(ValTy) && ISD == ISD::SELECT) { // We would need this many instructions to hide the scalarization happening. const int AmortizationCost = 20; @@ -749,6 +749,8 @@ int AArch64TTIImpl::getCmpSelInstrCost(unsigned Opcode, Type *ValTy, return Entry->Cost; } } + // The base case handles scalable vectors fine for now, since it treats the + // cost as 1 * legalization cost. return BaseT::getCmpSelInstrCost(Opcode, ValTy, CondTy, VecPred, CostKind, I); } diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index 1bc4afeae5f9..9e157f3061b6 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -7334,10 +7334,8 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF, const SCEV *CondSCEV = SE->getSCEV(SI->getCondition()); bool ScalarCond = (SE->isLoopInvariant(CondSCEV, TheLoop)); Type *CondTy = SI->getCondition()->getType(); -if (!ScalarCond) { - assert(!VF.isScalable() && "VF is assumed to be non scalable."); +if (!ScalarCond) CondTy = VectorType::get(CondTy, VF); -} return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy, CondTy, CmpInst::BAD_ICMP_PREDICATE, CostKind, I); } diff --git a/llvm/test/Analysis/CostModel/sve-cmpsel.ll b/llvm/test/Analysis/CostModel/sve-cmpsel.ll new file mode 100644 index ..163c863c1ea3 --- /dev/null +++ b/llvm/test/Analysis/CostModel/sve-cmpsel.ll @@ -0,0 +1,146 @@ +; RUN: opt -cost-model -analyze -mtriple=aarch64--linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s + +; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t + +; If this check fails please read test/CodeGen/AArch64/README for instructions on how to resolve it. +; WARN-NOT: warning + +; Check icmp for legal integer vectors. +define void @cmp_legal_int() { +; CHECK-LABEL: 'cmp_legal_int' +; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %1 = icmp ne undef, undef +; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %2 = icmp ne undef, undef +; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %3 = icmp ne undef, undef +; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %4 = icmp ne undef, undef + %1 = icmp ne undef, undef + %2 = icmp ne undef, undef + %3 = icmp ne undef, undef + %4 = icmp ne undef, undef + ret void +} + +; Check icmp for an illegal integer vector. +define @cmp_nxv4i64() { +; CHECK-LABEL: 'cmp_nxv4i64' +; CHECK: Cost Model: Found an estimated cost of 2 for instruction: %res = icmp ne undef, undef +; CHECK: Cost Model: Found an estimated cost of 0 for instruction: ret %res + %res = icmp ne undef, undef + ret %res +} + +; Check icmp for legal predicate vectors. +define void @cmp_legal_pred() { +; CHECK-LABEL: 'cmp_legal_pred' +; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %1 = icmp ne und
[llvm-branch-commits] [llvm] 83e7a96 - Fix build failure caused by 2e080eb00ad76654313e0e119bb7fa0ffe2f9866
Author: David Sherwood Date: 2021-01-22T09:56:53Z New Revision: 83e7a96c06835eb37416ffdc463edc7ddd18656c URL: https://github.com/llvm/llvm-project/commit/83e7a96c06835eb37416ffdc463edc7ddd18656c DIFF: https://github.com/llvm/llvm-project/commit/83e7a96c06835eb37416ffdc463edc7ddd18656c.diff LOG: Fix build failure caused by 2e080eb00ad76654313e0e119bb7fa0ffe2f9866 Added: llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll Modified: Removed: llvm/test/Analysis/CostModel/sve-cmpsel.ll diff --git a/llvm/test/Analysis/CostModel/sve-cmpsel.ll b/llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll similarity index 100% rename from llvm/test/Analysis/CostModel/sve-cmpsel.ll rename to llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] a650920 - [SVE] Fix inline assembly parsing crash
Author: David Sherwood Date: 2021-01-04T09:11:05Z New Revision: a65092040ad4fefcdad18382781090839cad3b67 URL: https://github.com/llvm/llvm-project/commit/a65092040ad4fefcdad18382781090839cad3b67 DIFF: https://github.com/llvm/llvm-project/commit/a65092040ad4fefcdad18382781090839cad3b67.diff LOG: [SVE] Fix inline assembly parsing crash This patch fixes a crash encountered when compiling this code: ... float16_t a; __asm__("fminv %h[a], %[b], %[c].h" : [a] "=r" (a) : [b] "Upl" (b), [c] "w" (c)) The issue here is when using the 'h' modifier for a register constraint 'r'. Differential Revision: https://reviews.llvm.org/D93537 Added: Modified: llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp llvm/test/CodeGen/AArch64/inline-asm-constraints-bad-sve.ll Removed: diff --git a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp index c18e9a4e6db1..c7fa49c965a8 100644 --- a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp +++ b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp @@ -647,7 +647,8 @@ bool AArch64AsmPrinter::printAsmRegInClass(const MachineOperand &MO, const TargetRegisterInfo *RI = STI->getRegisterInfo(); Register Reg = MO.getReg(); unsigned RegToPrint = RC->getRegister(RI->getEncodingValue(Reg)); - assert(RI->regsOverlap(RegToPrint, Reg)); + if (!RI->regsOverlap(RegToPrint, Reg)) +return true; O << AArch64InstPrinter::getRegisterName(RegToPrint, AltName); return false; } diff --git a/llvm/test/CodeGen/AArch64/inline-asm-constraints-bad-sve.ll b/llvm/test/CodeGen/AArch64/inline-asm-constraints-bad-sve.ll index 5a2f4746af87..aa25d118c9b5 100644 --- a/llvm/test/CodeGen/AArch64/inline-asm-constraints-bad-sve.ll +++ b/llvm/test/CodeGen/AArch64/inline-asm-constraints-bad-sve.ll @@ -6,6 +6,7 @@ target triple = "aarch64-unknown-linux-gnu" ; CHECK: error: couldn't allocate input reg for constraint 'Upa' ; CHECK: error: couldn't allocate input reg for constraint 'r' ; CHECK: error: couldn't allocate output register for constraint 'w' +; CHECK: error: unknown token in expression define @foo1(i32 *%in) { entry: @@ -27,3 +28,11 @@ entry: %1 = call asm sideeffect "mov $0.b, $1.b \0A", "=&w,w"( %0) ret %1 } + +define half @foo4( *%inp, *%inv) { +entry: + %0 = load , * %inp, align 2 + %1 = load , * %inv, align 16 + %2 = call half asm "fminv ${0:h}, $1, $2.h", "=r,@3Upl,w"( %0, %1) + ret half %2 +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] d1bf26f - [AArch64][SVE] Add lowering for llvm abs intrinsic
Author: David Sherwood Date: 2021-01-08T08:55:25Z New Revision: d1bf26fd943e39a4e3bb55bdaeec5559e74dee99 URL: https://github.com/llvm/llvm-project/commit/d1bf26fd943e39a4e3bb55bdaeec5559e74dee99 DIFF: https://github.com/llvm/llvm-project/commit/d1bf26fd943e39a4e3bb55bdaeec5559e74dee99.diff LOG: [AArch64][SVE] Add lowering for llvm abs intrinsic Add functionality to permit lowering of the abs and neg intrinsics using the passthru variants. Differential Revision: https://reviews.llvm.org/D94160 Added: Modified: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.h llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td llvm/lib/Target/AArch64/SVEInstrFormats.td llvm/test/CodeGen/AArch64/sve-fixed-length-int-arith.ll llvm/test/CodeGen/AArch64/sve-int-arith.ll Removed: diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp index fdf3acfe68c5..926d952425d0 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp @@ -187,6 +187,8 @@ static bool isMergePassthruOpcode(unsigned Opc) { case AArch64ISD::CTLZ_MERGE_PASSTHRU: case AArch64ISD::CTPOP_MERGE_PASSTHRU: case AArch64ISD::DUP_MERGE_PASSTHRU: + case AArch64ISD::ABS_MERGE_PASSTHRU: + case AArch64ISD::NEG_MERGE_PASSTHRU: case AArch64ISD::FNEG_MERGE_PASSTHRU: case AArch64ISD::SIGN_EXTEND_INREG_MERGE_PASSTHRU: case AArch64ISD::ZERO_EXTEND_INREG_MERGE_PASSTHRU: @@ -1097,6 +1099,7 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM, setOperationAction(ISD::SHL, VT, Custom); setOperationAction(ISD::SRL, VT, Custom); setOperationAction(ISD::SRA, VT, Custom); + setOperationAction(ISD::ABS, VT, Custom); setOperationAction(ISD::VECREDUCE_ADD, VT, Custom); setOperationAction(ISD::VECREDUCE_AND, VT, Custom); setOperationAction(ISD::VECREDUCE_OR, VT, Custom); @@ -1345,6 +1348,7 @@ void AArch64TargetLowering::addTypeForFixedLengthSVE(MVT VT) { setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom); // Lower fixed length vector operations to scalable equivalents. + setOperationAction(ISD::ABS, VT, Custom); setOperationAction(ISD::ADD, VT, Custom); setOperationAction(ISD::AND, VT, Custom); setOperationAction(ISD::ANY_EXTEND, VT, Custom); @@ -1743,6 +1747,8 @@ const char *AArch64TargetLowering::getTargetNodeName(unsigned Opcode) const { MAKE_CASE(AArch64ISD::FSQRT_MERGE_PASSTHRU) MAKE_CASE(AArch64ISD::FRECPX_MERGE_PASSTHRU) MAKE_CASE(AArch64ISD::FABS_MERGE_PASSTHRU) +MAKE_CASE(AArch64ISD::ABS_MERGE_PASSTHRU) +MAKE_CASE(AArch64ISD::NEG_MERGE_PASSTHRU) MAKE_CASE(AArch64ISD::SETCC_MERGE_ZERO) MAKE_CASE(AArch64ISD::ADC) MAKE_CASE(AArch64ISD::SBC) @@ -3661,6 +3667,12 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op, case Intrinsic::aarch64_sve_fabs: return DAG.getNode(AArch64ISD::FABS_MERGE_PASSTHRU, dl, Op.getValueType(), Op.getOperand(2), Op.getOperand(3), Op.getOperand(1)); + case Intrinsic::aarch64_sve_abs: +return DAG.getNode(AArch64ISD::ABS_MERGE_PASSTHRU, dl, Op.getValueType(), + Op.getOperand(2), Op.getOperand(3), Op.getOperand(1)); + case Intrinsic::aarch64_sve_neg: +return DAG.getNode(AArch64ISD::NEG_MERGE_PASSTHRU, dl, Op.getValueType(), + Op.getOperand(2), Op.getOperand(3), Op.getOperand(1)); case Intrinsic::aarch64_sve_convert_to_svbool: { EVT OutVT = Op.getValueType(); EVT InVT = Op.getOperand(1).getValueType(); @@ -4163,9 +4175,12 @@ SDValue AArch64TargetLowering::LowerSTORE(SDValue Op, } // Generate SUBS and CSEL for integer abs. -static SDValue LowerABS(SDValue Op, SelectionDAG &DAG) { +SDValue AArch64TargetLowering::LowerABS(SDValue Op, SelectionDAG &DAG) const { MVT VT = Op.getSimpleValueType(); + if (VT.isVector()) +return LowerToPredicatedOp(Op, DAG, AArch64ISD::ABS_MERGE_PASSTHRU); + SDLoc DL(Op); SDValue Neg = DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT), Op.getOperand(0)); diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h index 96aaf40250e5..23d5ce91b3e3 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h +++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h @@ -114,6 +114,8 @@ enum NodeType : unsigned { FCVTZS_MERGE_PASSTHRU, SIGN_EXTEND_INREG_MERGE_PASSTHRU, ZERO_EXTEND_INREG_MERGE_PASSTHRU, + ABS_MERGE_PASSTHRU, + NEG_MERGE_PASSTHRU, SETCC_MERGE_ZERO, @@ -812,6 +814,7 @@ class AArch64TargetLowering : public TargetLowering { SDValue ThisVal) const; SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG) const; + SDValue LowerABS(SDValue Op, SelectionDAG &DAG
[llvm-branch-commits] [clang] 38d18d9 - [SVE] Add support to vectorize_width loop pragma for scalable vectors
Author: David Sherwood Date: 2021-01-08T11:37:27Z New Revision: 38d18d93534d290d045bbbfa86337e70f1139dc2 URL: https://github.com/llvm/llvm-project/commit/38d18d93534d290d045bbbfa86337e70f1139dc2 DIFF: https://github.com/llvm/llvm-project/commit/38d18d93534d290d045bbbfa86337e70f1139dc2.diff LOG: [SVE] Add support to vectorize_width loop pragma for scalable vectors This patch adds support for two new variants of the vectorize_width pragma: 1. vectorize_width(X[, fixed|scalable]) where an optional second parameter is passed to the vectorize_width pragma, which indicates if the user wishes to use fixed width or scalable vectorization. For example the user can now write something like: #pragma clang loop vectorize_width(4, fixed) or #pragma clang loop vectorize_width(4, scalable) In the absence of a second parameter it is assumed the user wants fixed width vectorization, in order to maintain compatibility with existing code. 2. vectorize_width(fixed|scalable) where the width is left unspecified, but the user hints what type of vectorization they prefer, either fixed width or scalable. I have implemented this by making use of the LLVM loop hint attribute: llvm.loop.vectorize.scalable.enable Tests were added to clang/test/CodeGenCXX/pragma-loop.cpp for both the 'fixed' and 'scalable' optional parameter. See this thread for context: http://lists.llvm.org/pipermail/cfe-dev/2020-November/067262.html Differential Revision: https://reviews.llvm.org/D89031 Added: Modified: clang/docs/LanguageExtensions.rst clang/include/clang/Basic/Attr.td clang/include/clang/Basic/DiagnosticParseKinds.td clang/lib/AST/AttrImpl.cpp clang/lib/CodeGen/CGLoopInfo.cpp clang/lib/CodeGen/CGLoopInfo.h clang/lib/Parse/ParsePragma.cpp clang/lib/Sema/SemaStmtAttr.cpp clang/test/AST/ast-print-pragmas.cpp clang/test/CodeGenCXX/pragma-loop-pr27643.cpp clang/test/CodeGenCXX/pragma-loop.cpp clang/test/Parser/pragma-loop.cpp Removed: diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst index 0c01a2bbc52b..6fa6c55b15fc 100644 --- a/clang/docs/LanguageExtensions.rst +++ b/clang/docs/LanguageExtensions.rst @@ -3107,8 +3107,18 @@ manually enable vectorization or interleaving. ... } -The vector width is specified by ``vectorize_width(_value_)`` and the interleave -count is specified by ``interleave_count(_value_)``, where +The vector width is specified by +``vectorize_width(_value_[, fixed|scalable])``, where _value_ is a positive +integer and the type of vectorization can be specified with an optional +second parameter. The default for the second parameter is 'fixed' and +refers to fixed width vectorization, whereas 'scalable' indicates the +compiler should use scalable vectors instead. Another use of vectorize_width +is ``vectorize_width(fixed|scalable)`` where the user can hint at the type +of vectorization to use without specifying the exact width. In both variants +of the pragma the vectorizer may decide to fall back on fixed width +vectorization if the target does not support scalable vectors. + +The interleave count is specified by ``interleave_count(_value_)``, where _value_ is a positive integer. This is useful for specifying the optimal width/count of the set of target architectures supported by your application. diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td index b84e6a14f371..248409946123 100644 --- a/clang/include/clang/Basic/Attr.td +++ b/clang/include/clang/Basic/Attr.td @@ -3375,8 +3375,10 @@ def LoopHint : Attr { "PipelineDisabled", "PipelineInitiationInterval", "Distribute", "VectorizePredicate"]>, EnumArgument<"State", "LoopHintState", - ["enable", "disable", "numeric", "assume_safety", "full"], - ["Enable", "Disable", "Numeric", "AssumeSafety", "Full"]>, + ["enable", "disable", "numeric", "fixed_width", +"scalable_width", "assume_safety", "full"], + ["Enable", "Disable", "Numeric", "FixedWidth", +"ScalableWidth", "AssumeSafety", "Full"]>, ExprArgument<"Value">]; let AdditionalMembers = [{ diff --git a/clang/include/clang/Basic/DiagnosticParseKinds.td b/clang/include/clang/Basic/DiagnosticParseKinds.td index 8f78bbfc4e70..0ed80a481e78 100644 --- a/clang/include/clang/Basic/DiagnosticParseKinds.td +++ b/clang/include/clang/Basic/DiagnosticParseKinds.td @@ -1396,6 +1396,12 @@ def err_pragma_loop_invalid_option : Error< "%select{invalid|missing}0 option%select{ %1|}0; expected vectorize, " "vectorize_width, interleave, interleave_count, unroll, unroll_count, " "pipeline, pipeline_initiation_interval, vectorize_predicate, or
[llvm-branch-commits] [llvm] b7ccaca - [NFC] Remove min/max functions from InstructionCost
Author: David Sherwood Date: 2021-01-11T09:00:12Z New Revision: b7ccaca53700fce21b0e8e5d7bd2a956bd391fee URL: https://github.com/llvm/llvm-project/commit/b7ccaca53700fce21b0e8e5d7bd2a956bd391fee DIFF: https://github.com/llvm/llvm-project/commit/b7ccaca53700fce21b0e8e5d7bd2a956bd391fee.diff LOG: [NFC] Remove min/max functions from InstructionCost Removed the InstructionCost::min/max functions because it's fine to use std::min/max instead. Differential Revision: https://reviews.llvm.org/D94301 Added: Modified: llvm/include/llvm/Support/InstructionCost.h llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp llvm/unittests/Support/InstructionCostTest.cpp Removed: diff --git a/llvm/include/llvm/Support/InstructionCost.h b/llvm/include/llvm/Support/InstructionCost.h index fe56d49b4174..725f8495ac09 100644 --- a/llvm/include/llvm/Support/InstructionCost.h +++ b/llvm/include/llvm/Support/InstructionCost.h @@ -196,14 +196,6 @@ class InstructionCost { return *this >= RHS2; } - static InstructionCost min(InstructionCost LHS, InstructionCost RHS) { -return LHS < RHS ? LHS : RHS; - } - - static InstructionCost max(InstructionCost LHS, InstructionCost RHS) { -return LHS > RHS ? LHS : RHS; - } - void print(raw_ostream &OS) const; }; diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp index 5b91495bd844..bd673d112b3a 100644 --- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp +++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp @@ -6305,7 +6305,7 @@ bool SLPVectorizerPass::tryToVectorizeList(ArrayRef VL, BoUpSLP &R, Cost -= UserCost; } - MinCost = InstructionCost::min(MinCost, Cost); + MinCost = std::min(MinCost, Cost); if (Cost.isValid() && Cost < -SLPCostThreshold) { LLVM_DEBUG(dbgs() << "SLP: Vectorizing list at cost:" << Cost << ".\n"); diff --git a/llvm/unittests/Support/InstructionCostTest.cpp b/llvm/unittests/Support/InstructionCostTest.cpp index da3d3f47a212..8ba9f990f027 100644 --- a/llvm/unittests/Support/InstructionCostTest.cpp +++ b/llvm/unittests/Support/InstructionCostTest.cpp @@ -59,6 +59,6 @@ TEST_F(CostTest, Operators) { EXPECT_EQ(*(VThree.getValue()), 3); EXPECT_EQ(IThreeA.getValue(), None); - EXPECT_EQ(InstructionCost::min(VThree, VNegTwo), -2); - EXPECT_EQ(InstructionCost::max(VThree, VSix), 6); + EXPECT_EQ(std::min(VThree, VNegTwo), -2); + EXPECT_EQ(std::max(VThree, VSix), 6); } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 40abeb1 - [NFC][InstructionCost] Change LoopVectorizationCostModel::getInstructionCost to return InstructionCost
Author: David Sherwood Date: 2021-01-11T09:22:37Z New Revision: 40abeb11f4584e8a07163d6c7e24011ac45f104c URL: https://github.com/llvm/llvm-project/commit/40abeb11f4584e8a07163d6c7e24011ac45f104c DIFF: https://github.com/llvm/llvm-project/commit/40abeb11f4584e8a07163d6c7e24011ac45f104c.diff LOG: [NFC][InstructionCost] Change LoopVectorizationCostModel::getInstructionCost to return InstructionCost This patch is part of a series of patches that migrate integer instruction costs to use InstructionCost. In the function selectVectorizationFactor I have simply asserted that the cost is valid and extracted the value as is. In future we expect to encounter invalid costs, but we should filter out those vectorization factors that lead to such invalid costs. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D92178 Added: Modified: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Removed: diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index 180cbb8ef847..e6cadf8f8796 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -130,6 +130,7 @@ #include "llvm/Support/Compiler.h" #include "llvm/Support/Debug.h" #include "llvm/Support/ErrorHandling.h" +#include "llvm/Support/InstructionCost.h" #include "llvm/Support/MathExtras.h" #include "llvm/Support/raw_ostream.h" #include "llvm/Transforms/Utils/BasicBlockUtils.h" @@ -1635,7 +1636,7 @@ class LoopVectorizationCostModel { /// is /// false, then all operations will be scalarized (i.e. no vectorization has /// actually taken place). - using VectorizationCostTy = std::pair; + using VectorizationCostTy = std::pair; /// Returns the expected execution cost. The unit of the cost does /// not matter because we use the 'cost' units to compare diff erent @@ -1649,7 +1650,8 @@ class LoopVectorizationCostModel { /// The cost-computation logic from getInstructionCost which provides /// the vector type as an output parameter. - unsigned getInstructionCost(Instruction *I, ElementCount VF, Type *&VectorTy); + InstructionCost getInstructionCost(Instruction *I, ElementCount VF, + Type *&VectorTy); /// Calculate vectorization cost of memory instruction \p I. unsigned getMemoryInstructionCost(Instruction *I, ElementCount VF); @@ -1693,7 +1695,7 @@ class LoopVectorizationCostModel { /// A type representing the costs for instructions if they were to be /// scalarized rather than vectorized. The entries are Instruction-Cost /// pairs. - using ScalarCostsTy = DenseMap; + using ScalarCostsTy = DenseMap; /// A set containing all BasicBlocks that are known to present after /// vectorization as a predicated block. @@ -5759,10 +5761,13 @@ LoopVectorizationCostModel::selectVectorizationFactor(ElementCount MaxVF) { // vectors when the loop has a hint to enable vectorization for a given VF. assert(!MaxVF.isScalable() && "scalable vectors not yet supported"); - float Cost = expectedCost(ElementCount::getFixed(1)).first; - const float ScalarCost = Cost; + InstructionCost ExpectedCost = expectedCost(ElementCount::getFixed(1)).first; + LLVM_DEBUG(dbgs() << "LV: Scalar loop costs: " << ExpectedCost << ".\n"); + assert(ExpectedCost.isValid() && "Unexpected invalid cost for scalar loop"); + unsigned Width = 1; - LLVM_DEBUG(dbgs() << "LV: Scalar loop costs: " << (int)ScalarCost << ".\n"); + const float ScalarCost = *ExpectedCost.getValue(); + float Cost = ScalarCost; bool ForceVectorization = Hints->getForce() == LoopVectorizeHints::FK_Enabled; if (ForceVectorization && MaxVF.isVector()) { @@ -5777,7 +5782,8 @@ LoopVectorizationCostModel::selectVectorizationFactor(ElementCount MaxVF) { // we need to divide the cost of the vector loops by the width of // the vector elements. VectorizationCostTy C = expectedCost(ElementCount::getFixed(i)); -float VectorCost = C.first / (float)i; +assert(C.first.isValid() && "Unexpected invalid cost for vector loop"); +float VectorCost = *C.first.getValue() / (float)i; LLVM_DEBUG(dbgs() << "LV: Vector loop of width " << i << " costs: " << (int)VectorCost << ".\n"); if (!C.second && !ForceVectorization) { @@ -6119,8 +6125,10 @@ unsigned LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF, // If we did not calculate the cost for VF (because the user selected the VF) // then we calculate the cost of VF here. - if (LoopCost == 0) -LoopCost = expectedCost(VF).first; + if (LoopCost == 0) { +assert(expectedCost(VF).first.isValid() && "Expected a valid cost"); +L
[llvm-branch-commits] [llvm] 3bf7d47 - [NFC][InstructionCost] Remove isValid() asserts in SLPVectorizer.cpp
Author: David Sherwood Date: 2020-12-21T09:12:28Z New Revision: 3bf7d47a977d463940f558259d24d43d76d50e6f URL: https://github.com/llvm/llvm-project/commit/3bf7d47a977d463940f558259d24d43d76d50e6f DIFF: https://github.com/llvm/llvm-project/commit/3bf7d47a977d463940f558259d24d43d76d50e6f.diff LOG: [NFC][InstructionCost] Remove isValid() asserts in SLPVectorizer.cpp An earlier patch introduced asserts that the InstructionCost is valid because at that time the ReuseShuffleCost variable was an unsigned. However, now that the variable is an InstructionCost instance the asserts can be removed. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html See this patch for the introduction of the type: https://reviews.llvm.org/D91174 Added: Modified: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp Removed: diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp index 80d510185470..b03fb203c6d7 100644 --- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp +++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp @@ -3806,23 +3806,17 @@ InstructionCost BoUpSLP::getEntryCost(TreeEntry *E) { if (NeedToShuffleReuses) { for (unsigned Idx : E->ReuseShuffleIndices) { Instruction *I = cast(VL[Idx]); - InstructionCost Cost = TTI->getInstructionCost(I, CostKind); - assert(Cost.isValid() && "Invalid instruction cost"); - ReuseShuffleCost -= *(Cost.getValue()); + ReuseShuffleCost -= TTI->getInstructionCost(I, CostKind); } for (Value *V : VL) { Instruction *I = cast(V); - InstructionCost Cost = TTI->getInstructionCost(I, CostKind); - assert(Cost.isValid() && "Invalid instruction cost"); - ReuseShuffleCost += *(Cost.getValue()); + ReuseShuffleCost += TTI->getInstructionCost(I, CostKind); } } for (Value *V : VL) { Instruction *I = cast(V); assert(E->isOpcodeOrAlt(I) && "Unexpected main/alternate opcode"); -InstructionCost Cost = TTI->getInstructionCost(I, CostKind); -assert(Cost.isValid() && "Invalid instruction cost"); -ScalarCost += *(Cost.getValue()); +ScalarCost += TTI->getInstructionCost(I, CostKind); } // VecCost is equal to sum of the cost of creating 2 vectors // and the cost of creating shuffle. ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 71bd59f - [SVE] Add support for scalable vectors with vectorize.scalable.enable loop attribute
Author: David Sherwood Date: 2020-12-02T13:23:43Z New Revision: 71bd59f0cb6db0211418b07127e8b311d944e2c2 URL: https://github.com/llvm/llvm-project/commit/71bd59f0cb6db0211418b07127e8b311d944e2c2 DIFF: https://github.com/llvm/llvm-project/commit/71bd59f0cb6db0211418b07127e8b311d944e2c2.diff LOG: [SVE] Add support for scalable vectors with vectorize.scalable.enable loop attribute In this patch I have added support for a new loop hint called vectorize.scalable.enable that says whether we should enable scalable vectorization or not. If a user wants to instruct the compiler to vectorize a loop with scalable vectors they can now do this as follows: br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !2 ... !2 = !{!2, !3, !4} !3 = !{!"llvm.loop.vectorize.width", i32 8} !4 = !{!"llvm.loop.vectorize.scalable.enable", i1 true} Setting the hint to false simply reverts the behaviour back to the default, using fixed width vectors. Differential Revision: https://reviews.llvm.org/D88962 Added: llvm/test/Transforms/LoopVectorize/no_array_bounds_scalable.ll Modified: llvm/docs/LangRef.rst llvm/include/llvm/Transforms/Utils/LoopUtils.h llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h llvm/lib/Transforms/Scalar/WarnMissedTransforms.cpp llvm/lib/Transforms/Utils/LoopUtils.cpp llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/test/Transforms/LoopVectorize/metadata-width.ll Removed: diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index 1964f2416b8f..03a8387d9d1f 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -5956,6 +5956,21 @@ vectorization: !0 = !{!"llvm.loop.vectorize.predicate.enable", i1 0} !1 = !{!"llvm.loop.vectorize.predicate.enable", i1 1} +'``llvm.loop.vectorize.scalable.enable``' Metadata +^^^ + +This metadata selectively enables or disables scalable vectorization for the +loop, and only has any effect if vectorization for the loop is already enabled. +The first operand is the string ``llvm.loop.vectorize.scalable.enable`` +and the second operand is a bit. If the bit operand value is 1 scalable +vectorization is enabled, whereas a value of 0 reverts to the default fixed +width vectorization: + +.. code-block:: llvm + + !0 = !{!"llvm.loop.vectorize.scalable.enable", i1 0} + !1 = !{!"llvm.loop.vectorize.scalable.enable", i1 1} + '``llvm.loop.vectorize.width``' Metadata diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h b/llvm/include/llvm/Transforms/Utils/LoopUtils.h index 360e262e8ae0..ef348ed56129 100644 --- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h +++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h @@ -213,6 +213,13 @@ Optional findStringMetadataForLoop(const Loop *TheLoop, /// Find named metadata for a loop with an integer value. llvm::Optional getOptionalIntLoopAttribute(Loop *TheLoop, StringRef Name); +/// Find a combination of metadata ("llvm.loop.vectorize.width" and +/// "llvm.loop.vectorize.scalable.enable") for a loop and use it to construct a +/// ElementCount. If the metadata "llvm.loop.vectorize.width" cannot be found +/// then None is returned. +Optional +getOptionalElementCountLoopAttribute(Loop *TheLoop); + /// Create a new loop identifier for a loop created from a loop transformation. /// /// @param OrigLoopID The loop ID of the loop before the transformation. diff --git a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h index 2be9ef10ac4f..f701e08961a0 100644 --- a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h +++ b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h @@ -29,6 +29,7 @@ #include "llvm/ADT/MapVector.h" #include "llvm/Analysis/LoopAccessAnalysis.h" #include "llvm/Analysis/OptimizationRemarkEmitter.h" +#include "llvm/Support/TypeSize.h" #include "llvm/Transforms/Utils/LoopUtils.h" namespace llvm { @@ -43,8 +44,14 @@ namespace llvm { /// for example 'force', means a decision has been made. So, we need to be /// careful NOT to add them if the user hasn't specifically asked so. class LoopVectorizeHints { - enum HintKind { HK_WIDTH, HK_UNROLL, HK_FORCE, HK_ISVECTORIZED, - HK_PREDICATE }; + enum HintKind { +HK_WIDTH, +HK_UNROLL, +HK_FORCE, +HK_ISVECTORIZED, +HK_PREDICATE, +HK_SCALABLE + }; /// Hint - associates name and validation with the hint value. struct Hint { @@ -73,6 +80,9 @@ class LoopVectorizeHints { /// Vector Predicate Hint Predicate; + /// Says whether we should use fixed width or scalable vectorization. + Hint Scalable; + /// Return the loop metadata prefix. static StringRef Prefi
[llvm-branch-commits] [llvm] 59f17b5 - [SVE] Fix crashes with inline assembly
Author: David Sherwood Date: 2020-12-08T13:48:43Z New Revision: 59f17b57d9c9abf86d8dcc05c49d3bbd807e29c7 URL: https://github.com/llvm/llvm-project/commit/59f17b57d9c9abf86d8dcc05c49d3bbd807e29c7 DIFF: https://github.com/llvm/llvm-project/commit/59f17b57d9c9abf86d8dcc05c49d3bbd807e29c7.diff LOG: [SVE] Fix crashes with inline assembly All the crashes found compiling inline assembly are fixed in this patch by changing AArch64TargetLowering::getRegForInlineAsmConstraint to be more resilient to mismatched value and register types. For example, it makes no sense to request a predicate register for a nxv2i64 type and so on. Tests have been added here: test/CodeGen/AArch64/inline-asm-constraints-bad-sve.ll Differential Revision: https://reviews.llvm.org/D92554 Added: llvm/test/CodeGen/AArch64/inline-asm-constraints-bad-sve.ll Modified: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp Removed: diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp index cca31a701d56..700c281cdaa9 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp @@ -7511,23 +7511,30 @@ AArch64TargetLowering::getRegForInlineAsmConstraint( if (Constraint.size() == 1) { switch (Constraint[0]) { case 'r': - if (VT.getSizeInBits() == 64) + if (VT.isScalableVector()) +return std::make_pair(0U, nullptr); + if (VT.getFixedSizeInBits() == 64) return std::make_pair(0U, &AArch64::GPR64commonRegClass); return std::make_pair(0U, &AArch64::GPR32commonRegClass); -case 'w': +case 'w': { if (!Subtarget->hasFPARMv8()) break; - if (VT.isScalableVector()) -return std::make_pair(0U, &AArch64::ZPRRegClass); - if (VT.getSizeInBits() == 16) + if (VT.isScalableVector()) { +if (VT.getVectorElementType() != MVT::i1) + return std::make_pair(0U, &AArch64::ZPRRegClass); +return std::make_pair(0U, nullptr); + } + uint64_t VTSize = VT.getFixedSizeInBits(); + if (VTSize == 16) return std::make_pair(0U, &AArch64::FPR16RegClass); - if (VT.getSizeInBits() == 32) + if (VTSize == 32) return std::make_pair(0U, &AArch64::FPR32RegClass); - if (VT.getSizeInBits() == 64) + if (VTSize == 64) return std::make_pair(0U, &AArch64::FPR64RegClass); - if (VT.getSizeInBits() == 128) + if (VTSize == 128) return std::make_pair(0U, &AArch64::FPR128RegClass); break; +} // The instructions that this constraint is designed for can // only take 128-bit registers so just use that regclass. case 'x': @@ -7548,10 +7555,11 @@ AArch64TargetLowering::getRegForInlineAsmConstraint( } else { PredicateConstraint PC = parsePredicateConstraint(Constraint); if (PC != PredicateConstraint::Invalid) { - assert(VT.isScalableVector()); + if (!VT.isScalableVector() || VT.getVectorElementType() != MVT::i1) +return std::make_pair(0U, nullptr); bool restricted = (PC == PredicateConstraint::Upl); return restricted ? std::make_pair(0U, &AArch64::PPR_3bRegClass) - : std::make_pair(0U, &AArch64::PPRRegClass); +: std::make_pair(0U, &AArch64::PPRRegClass); } } if (StringRef("{cc}").equals_lower(Constraint)) diff --git a/llvm/test/CodeGen/AArch64/inline-asm-constraints-bad-sve.ll b/llvm/test/CodeGen/AArch64/inline-asm-constraints-bad-sve.ll new file mode 100644 index ..5a2f4746af87 --- /dev/null +++ b/llvm/test/CodeGen/AArch64/inline-asm-constraints-bad-sve.ll @@ -0,0 +1,29 @@ +; RUN: not llc -mtriple=aarch64-none-linux-gnu -mattr=+sve -o - %s 2>&1 | FileCheck %s + +target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128" +target triple = "aarch64-unknown-linux-gnu" + +; CHECK: error: couldn't allocate input reg for constraint 'Upa' +; CHECK: error: couldn't allocate input reg for constraint 'r' +; CHECK: error: couldn't allocate output register for constraint 'w' + +define @foo1(i32 *%in) { +entry: + %0 = load i32, i32* %in, align 4 + %1 = call asm sideeffect "mov $0.b, $1.b \0A", "=@3Upa,@3Upa"(i32 %0) + ret %1 +} + +define @foo2( *%in) { +entry: + %0 = load , * %in, align 16 + %1 = call asm sideeffect "ptrue p0.s, #1 \0Afabs $0.s, p0/m, $1.s \0A", "=w,r"( %0) + ret %1 +} + +define @foo3( *%in) { +entry: + %0 = load , * %in, align 2 + %1 = call asm sideeffect "mov $0.b, $1.b \0A", "=&w,w"( %0) + ret %1 +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] e22259f - [SVE] Remove duplicate assert in DAGTypeLegalizer::SplitVecOp_EXTRACT_SUBVECTOR
Author: David Sherwood Date: 2020-12-08T14:41:14Z New Revision: e22259fafe5e2f5e0219366ff92bba15ec70ff56 URL: https://github.com/llvm/llvm-project/commit/e22259fafe5e2f5e0219366ff92bba15ec70ff56 DIFF: https://github.com/llvm/llvm-project/commit/e22259fafe5e2f5e0219366ff92bba15ec70ff56.diff LOG: [SVE] Remove duplicate assert in DAGTypeLegalizer::SplitVecOp_EXTRACT_SUBVECTOR Added: Modified: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp Removed: diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index 9a0925061105..1525543a60b6 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -2281,10 +2281,6 @@ SDValue DAGTypeLegalizer::SplitVecOp_EXTRACT_SUBVECTOR(SDNode *N) { // We know that the extracted result type is legal. EVT SubVT = N->getValueType(0); - if (SubVT.isScalableVector() != - N->getOperand(0).getValueType().isScalableVector()) -report_fatal_error("Extracting fixed from scalable not implemented"); - SDValue Idx = N->getOperand(1); SDLoc dl(N); SDValue Lo, Hi; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 9b76160 - [Support] Introduce a new InstructionCost class
Author: David Sherwood Date: 2020-12-11T08:12:54Z New Revision: 9b76160e53f67008ff21095098129a2949595a06 URL: https://github.com/llvm/llvm-project/commit/9b76160e53f67008ff21095098129a2949595a06 DIFF: https://github.com/llvm/llvm-project/commit/9b76160e53f67008ff21095098129a2949595a06.diff LOG: [Support] Introduce a new InstructionCost class This is the first in a series of patches that attempts to migrate existing cost instructions to return a new InstructionCost class in place of a simple integer. This new class is intended to be as light-weight and simple as possible, with a full range of arithmetic and comparison operators that largely mirror the same sets of operations on basic types, such as integers. The main advantage to using an InstructionCost is that it can encode a particular cost state in addition to a value. The initial implementation only has two states - Normal and Invalid - but these could be expanded over time if necessary. An invalid state can be used to represent an unknown cost or an instruction that is prohibitively expensive. This patch adds the new class and changes the getInstructionCost interface to return the new class. Other cost functions, such as getUserCost, etc., will be migrated in future patches as I believe this to be less disruptive. One benefit of this new class is that it provides a way to unify many of the magic costs in the codebase where the cost is set to a deliberately high number to prevent optimisations taking place, e.g. vectorization. It also provides a route to represent the extremely high, and unknown, cost of scalarization of scalable vectors, which is not currently supported. Differential Revision: https://reviews.llvm.org/D91174 Added: llvm/include/llvm/Support/InstructionCost.h llvm/lib/Support/InstructionCost.cpp llvm/unittests/Support/InstructionCostTest.cpp Modified: llvm/include/llvm/Analysis/TargetTransformInfo.h llvm/include/llvm/IR/DiagnosticInfo.h llvm/lib/Analysis/CostModel.cpp llvm/lib/CodeGen/InterleavedLoadCombinePass.cpp llvm/lib/IR/DiagnosticInfo.cpp llvm/lib/Support/CMakeLists.txt llvm/lib/Transforms/IPO/HotColdSplitting.cpp llvm/lib/Transforms/Scalar/CallSiteSplitting.cpp llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp llvm/unittests/Support/CMakeLists.txt Removed: diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h index af57176401b4..abaf07fad3d4 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfo.h +++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h @@ -27,6 +27,7 @@ #include "llvm/Pass.h" #include "llvm/Support/AtomicOrdering.h" #include "llvm/Support/DataTypes.h" +#include "llvm/Support/InstructionCost.h" #include namespace llvm { @@ -231,19 +232,26 @@ class TargetTransformInfo { /// /// Note, this method does not cache the cost calculation and it /// can be expensive in some cases. - int getInstructionCost(const Instruction *I, enum TargetCostKind kind) const { + InstructionCost getInstructionCost(const Instruction *I, + enum TargetCostKind kind) const { +InstructionCost Cost; switch (kind) { case TCK_RecipThroughput: - return getInstructionThroughput(I); - + Cost = getInstructionThroughput(I); + break; case TCK_Latency: - return getInstructionLatency(I); - + Cost = getInstructionLatency(I); + break; case TCK_CodeSize: case TCK_SizeAndLatency: - return getUserCost(I, kind); + Cost = getUserCost(I, kind); + break; +default: + llvm_unreachable("Unknown instruction cost kind"); } -llvm_unreachable("Unknown instruction cost kind"); +if (Cost == -1) + Cost.setInvalid(); +return Cost; } /// Underlying constants for 'cost' values in this interface. diff --git a/llvm/include/llvm/IR/DiagnosticInfo.h b/llvm/include/llvm/IR/DiagnosticInfo.h index 644d853b9b0d..c457072d50f1 100644 --- a/llvm/include/llvm/IR/DiagnosticInfo.h +++ b/llvm/include/llvm/IR/DiagnosticInfo.h @@ -35,6 +35,7 @@ namespace llvm { class DiagnosticPrinter; class Function; class Instruction; +class InstructionCost; class LLVMContext; class Module; class SMDiagnostic; @@ -437,6 +438,7 @@ class DiagnosticInfoOptimizationBase : public DiagnosticInfoWithLocationBase { Argument(StringRef Key, ElementCount EC); Argument(StringRef Key, bool B) : Key(Key), Val(B ? "true" : "false") {} Argument(StringRef Key, DebugLoc dl); +Argument(StringRef Key, InstructionCost C); }; /// \p PassName is the name of the pass emitting this diagnostic. \p diff --git a/llvm/include/llvm/Support/InstructionCost.h b/llvm/include/llvm/Support/InstructionCost.h new file mode 100644 index ..fe56d49b4174 --- /dev/nul
[llvm-branch-commits] [llvm] 616f978 - Fix build issue caused by 9b76160e53f67008ff21095098129a2949595a06
Author: David Sherwood Date: 2020-12-11T09:43:55Z New Revision: 616f9781af076942c177abcb7041761924757ea6 URL: https://github.com/llvm/llvm-project/commit/616f9781af076942c177abcb7041761924757ea6 DIFF: https://github.com/llvm/llvm-project/commit/616f9781af076942c177abcb7041761924757ea6.diff LOG: Fix build issue caused by 9b76160e53f67008ff21095098129a2949595a06 Added: Modified: llvm/include/llvm/Analysis/TargetTransformInfo.h Removed: diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h index abaf07fad3d4..3ba77c9a8dc9 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfo.h +++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h @@ -246,8 +246,6 @@ class TargetTransformInfo { case TCK_SizeAndLatency: Cost = getUserCost(I, kind); break; -default: - llvm_unreachable("Unknown instruction cost kind"); } if (Cost == -1) Cost.setInvalid(); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [MachineLICM] Don't allow hoisting invariant loads across mem barrier. (#116987) (PR #117154)
david-arm wrote: > @david-arm Should this be merged? Hi yes I think it should be merged. It's a fairly serious bug fix. https://github.com/llvm/llvm-project/pull/117154 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [LoopVectorize] Fix cost model assert when vectorising calls (#125716) (PR #126209)
david-arm wrote: Also needs a build error fix - 3872e55758a5de035c032a975f244302c3ddacc3. Not sure the best way to do this - should I backport two commits or create a new PR with a joint patch? https://github.com/llvm/llvm-project/pull/126209 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
@@ -2,6 +2,7 @@ ; RUN: opt -passes=loop-vectorize -enable-epilogue-vectorization=false -mattr=+neon,+dotprod -force-vector-interleave=1 -S < %s | FileCheck %s --check-prefixes=CHECK-INTERLEAVE1 ; RUN: opt -passes=loop-vectorize -enable-epilogue-vectorization=false -mattr=+neon,+dotprod -S < %s | FileCheck %s --check-prefixes=CHECK-INTERLEAVED ; RUN: opt -passes=loop-vectorize -enable-epilogue-vectorization=false -mattr=+neon,+dotprod -force-vector-interleave=1 -vectorizer-maximize-bandwidth -S < %s | FileCheck %s --check-prefixes=CHECK-MAXBW +; RUN: opt -passes=loop-vectorize -debug-only=loop-vectorize --disable-output -S < %s 2>&1 | FileCheck %s --check-prefix=CHECK-REGS david-arm wrote: Still missing a `REQUIRES: asserts` https://github.com/llvm/llvm-project/pull/133090 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
@@ -46,6 +46,11 @@ define i1 @select_exit_cond(ptr %start, ptr %end, i64 %N) { ; CHECK-NEXT:[[STEP_ADD_5:%.*]] = add <2 x i64> [[STEP_ADD_4]], splat (i64 2) ; CHECK-NEXT:[[STEP_ADD_6:%.*]] = add <2 x i64> [[STEP_ADD_5]], splat (i64 2) ; CHECK-NEXT:[[STEP_ADD_7:%.*]] = add <2 x i64> [[STEP_ADD_6]], splat (i64 2) +; CHECK-NEXT:[[STEP_ADD_8:%.*]] = add <2 x i64> [[STEP_ADD_7]], splat (i64 2) david-arm wrote: I'm a bit surprised these are the only CHECK lines that have changed. https://github.com/llvm/llvm-project/pull/133090 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
@@ -2033,6 +2033,8 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe, /// Generate the phi/select nodes. void execute(VPTransformState &State) override; + unsigned getVFScaleFactor() const { return VFScaleFactor; } david-arm wrote: Perhaps good to have comments on both new functions added? https://github.com/llvm/llvm-project/pull/133090 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
@@ -45,9 +45,9 @@ define void @load_and_compare_only_used_by_assume(ptr %a, ptr noalias %b) { ; CHECK-LABEL: LV: Checking a loop in 'load_and_compare_only_used_by_assume' ; CHECK: LV(REG): VF = vscale x 4 ; CHECK-NEXT: LV(REG): Found max usage: 2 item -; CHECK-NEXT: LV(REG): RegisterClass: Generic::ScalarRC, 2 registers -; CHECK-NEXT: LV(REG): RegisterClass: Generic::VectorRC, 1 registers -; CHECK-NEXT: LV(REG): Found invariant usage: 0 item +; CHECK-NEXT: LV(REG): RegisterClass: Generic::ScalarRC, 3 registers +; CHECK-NEXT: LV(REG): RegisterClass: Generic::VectorRC, 2 registers david-arm wrote: Do you know why this has changed? It doesn't look like a partial reduction. https://github.com/llvm/llvm-project/pull/133090 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
@@ -5039,10 +5039,25 @@ calculateRegisterUsage(VPlan &Plan, ArrayRef VFs, // even in the scalar case. RegUsage[ClassID] += 1; } else { +// The output from scaled phis and scaled reductions actually have +// fewer lanes than the VF. +ElementCount VF = VFs[J]; +if (auto *ReductionR = dyn_cast(R)) david-arm wrote: I realise it may be less efficient, but perhaps it's better to commonise these into the same block? If for some reason we need to update this logic in future it's easier to fix it only once, i.e. ``` if (isa(R)) { auto *ReductionR = dyn_cast(R); auto *PartialReductionR = dyn_cast(R); unsigned ScaleFactor = ReductionR ? ReductionR->getVFScaleFactor() : PartialReductionR->getVFScaleFactor(); VF = VF.divideCoefficientBy(ScaleFactor); } ``` If `getVFScaleFactor` becomes available to a common base class then it should simplify further. What do you think? https://github.com/llvm/llvm-project/pull/133090 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
david-arm wrote: I tried downloading this patch and applying to the HEAD of LLVM and `patch` said this diff had already been applied. Does the PR need rebasing? https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
david-arm wrote: Ah perhaps this is my mistake. You did say it depends upon https://github.com/llvm/llvm-project/pull/113903. :) https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Bundle sub reductions into VPExpressionRecipe (PR #147255)
@@ -5538,7 +5538,7 @@ LoopVectorizationCostModel::getReductionPatternCost(Instruction *I, TTI::CastContextHint::None, CostKind, RedOp); InstructionCost RedCost = TTI.getMulAccReductionCost( -IsUnsigned, RdxDesc.getRecurrenceType(), ExtType, CostKind); +IsUnsigned, RdxDesc.getRecurrenceType(), ExtType, false, CostKind); david-arm wrote: nit: `/*Negated=*/false` and same for other below. https://github.com/llvm/llvm-project/pull/147255 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Bundle sub reductions into VPExpressionRecipe (PR #147255)
@@ -3116,7 +3116,10 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase { InstructionCost getMulAccReductionCost(bool IsUnsigned, Type *ResTy, VectorType *Ty, + bool Negated, TTI::TargetCostKind CostKind) const override { +if (Negated) david-arm wrote: Why can't we add a cost for this? https://github.com/llvm/llvm-project/pull/147255 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Bundle sub reductions into VPExpressionRecipe (PR #147255)
@@ -2757,6 +2757,12 @@ class VPExpressionRecipe : public VPSingleDefRecipe { /// vector operands, performing a reduction.add on the result, and adding /// the scalar result to a chain. MulAccReduction, +/// Represent an inloop multiply-accumulate reduction, multiplying the +/// extended vector operands, negating the multiplication, performing a +/// reduction.add +/// on the result, and adding david-arm wrote: Formatting of the comment looks a bit odd - can you fix it? https://github.com/llvm/llvm-project/pull/147255 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Bundle sub reductions into VPExpressionRecipe (PR #147255)
@@ -1401,8 +1401,8 @@ static void analyzeCostOfVecReduction(const IntrinsicInst &II, TTI::CastContextHint::None, CostKind, RedOp); CostBeforeReduction = ExtCost * 2 + MulCost + Ext2Cost; -CostAfterReduction = -TTI.getMulAccReductionCost(IsUnsigned, II.getType(), ExtType, CostKind); +CostAfterReduction = TTI.getMulAccReductionCost(IsUnsigned, II.getType(), +ExtType, false, CostKind); david-arm wrote: nit: Probably better written as `/*Negated=*/false` https://github.com/llvm/llvm-project/pull/147255 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Bundle sub reductions into VPExpressionRecipe (PR #147255)
@@ -2725,6 +2729,31 @@ void VPExpressionRecipe::print(raw_ostream &O, const Twine &Indent, O << ")"; break; } + case ExpressionTypes::ExtNegatedMulAccReduction: { david-arm wrote: Is there a way to commonise this with the ExtMulAccReduction case if the only difference is a negate? https://github.com/llvm/llvm-project/pull/147255 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Bundle sub reductions into VPExpressionRecipe (PR #147255)
@@ -1645,8 +1645,10 @@ class TargetTransformInfo { /// extensions. This is the cost of as: /// ResTy vecreduce.add(mul (A, B)). /// ResTy vecreduce.add(mul(ext(Ty A), ext(Ty B)). + /// The multiply can optionally be negated, which signifies that it is a sub + /// reduction. LLVM_ABI InstructionCost getMulAccReductionCost( - bool IsUnsigned, Type *ResTy, VectorType *Ty, + bool IsUnsigned, Type *ResTy, VectorType *Ty, bool Negated, david-arm wrote: Is it worth keeping the booleans together, i.e. next to `IsUnsigned`? https://github.com/llvm/llvm-project/pull/147255 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits