[llvm-branch-commits] [compiler-rt] release/19.x: [AArch64][SME] Rewrite __arm_get_current_vg to preserve required registers (#100143) (PR #100546)
sdesmalen-arm wrote: It would be great if we could merge this fix into the release branch! https://github.com/llvm/llvm-project/pull/100546 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/19.x: [Clang] Demote always_inline error to warning for mismatching SME attrs (#100740) (PR #100987)
https://github.com/sdesmalen-arm requested changes to this pull request. For some odd reason, `clang/test/CodeGen/aarch64-sme-inline-streaming-attrs.c` seems to be failing on some buildbots with an error that says: > `unable to create target: No available targets are compatible with triple > "aarch64-none-linux-gnu"'`. I suspect because the test is missing a > `REQUIRES: aarch64-registered-target`, but I'm puzzled why that test didn't > fail before, because my patch doesn't introduce this test and doesn't change > the RUN lines, all this patch does is change one of the diagnostic messages. In any case, I seemed to have jumped the gun creating this cherry-pick in the first place, I thought the change was trivial enough to do so especially after testing locally. My apologies for the noise. I'll revert the patch and fix the issue, and will then create another cherry-pick. https://github.com/llvm/llvm-project/pull/100987 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang][FMV][AArch64] Improve streaming mode compatibility (PR #101007)
https://github.com/sdesmalen-arm approved this pull request. It would be good if this could make it into the LLVM 19 release. https://github.com/llvm/llvm-project/pull/101007 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/19.x: Reland: "[Clang] Demote always_inline error to warning for mismatching SME attrs" (#100991) (#100996) (PR #101303)
sdesmalen-arm wrote: > Is this safe enough to reland? Have it lived without a problem in main for a > bit? Thanks for checking. The only failures I would have expected are from lit tests, but the PR was merged on Monday and I've not seen any buildbot failures, so I believe it is safe. There should also be no impact to code that doesn't explicitly use these target-specific attributes. For the case where the attributes are used, the behaviour seems correct (e.g. when I try some examples in godbolt) and the change is specific and trivial enough not to cause any unrelated/unexpected issues either. I hope that answers your question. https://github.com/llvm/llvm-project/pull/101303 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [LLVM][TTI][SME] Allow optional auto-vectorisation for streaming functions. (#101679) (PR #101959)
https://github.com/sdesmalen-arm approved this pull request. https://github.com/llvm/llvm-project/pull/101959 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [AArch64] Adopt updated B16B16 target flags (PR #104602)
https://github.com/sdesmalen-arm approved this pull request. Thanks for making this change! The Clang/driver behaviour looks correct to me now. The patch is a lot smaller than the patches that went into main, so hopefully this can still make it into LLVM 19! https://github.com/llvm/llvm-project/pull/104602 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] release/18.x: [AArch64][SME] Implement inline-asm clobbers for za/zt0 (#79276) (PR #81593)
https://github.com/sdesmalen-arm approved this pull request. Looks pretty low-risk to me and would be nice to get into the release if we can. https://github.com/llvm/llvm-project/pull/81593 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] release/18.x: [AArch64][SME] Implement inline-asm clobbers for za/zt0 (#79276) (PR #81616)
https://github.com/sdesmalen-arm approved this pull request. Looks pretty low-risk to me and would be nice to get into the release if we can. (how is this PR different from #81593?) https://github.com/llvm/llvm-project/pull/81616 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Improve cost model for legal subvec insert/extract (PR #81135)
@@ -568,6 +568,32 @@ AArch64TTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA, } return Cost; } + case Intrinsic::vector_extract: { +// If both the vector argument and the return type are legal types, then +// this should be a no-op or simple operation; return a relatively low cost. +LLVMContext &C = RetTy->getContext(); +EVT MRTy = getTLI()->getValueType(DL, RetTy); +EVT MPTy = getTLI()->getValueType(DL, ICA.getArgTypes()[0]); +TargetLoweringBase::LegalizeKind RLK = getTLI()->getTypeConversion(C, MRTy); +TargetLoweringBase::LegalizeKind PLK = getTLI()->getTypeConversion(C, MPTy); +if (RLK.first == TargetLoweringBase::TypeLegal && +PLK.first == TargetLoweringBase::TypeLegal) + return InstructionCost(1); sdesmalen-arm wrote: Just pointing out that the code isn't updated yet to handle predicates differently, as those inserts/extracts are indeed not free. https://github.com/llvm/llvm-project/pull/81135 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Improve cost model for legal subvec insert/extract (PR #81135)
@@ -568,6 +568,48 @@ AArch64TTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA, } return Cost; } + case Intrinsic::vector_extract: { +// If both the vector argument and the return type are legal types and the +// index is 0, then this should be a no-op or simple operation; return a +// relatively low cost. + +// If arguments aren't actually supplied, then we cannot determine the +// value of the index. +if (ICA.getArgs().size() < 2) sdesmalen-arm wrote: nit: ```suggestion if (ICA.getArgs().size() != 2) ``` https://github.com/llvm/llvm-project/pull/81135 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Improve cost model for legal subvec insert/extract (PR #81135)
@@ -568,6 +568,48 @@ AArch64TTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA, } return Cost; } + case Intrinsic::vector_extract: { +// If both the vector argument and the return type are legal types and the +// index is 0, then this should be a no-op or simple operation; return a +// relatively low cost. + +// If arguments aren't actually supplied, then we cannot determine the +// value of the index. +if (ICA.getArgs().size() < 2) + break; +LLVMContext &C = RetTy->getContext(); +EVT MRTy = getTLI()->getValueType(DL, RetTy); +EVT MPTy = getTLI()->getValueType(DL, ICA.getArgTypes()[0]); +TargetLoweringBase::LegalizeKind RLK = getTLI()->getTypeConversion(C, MRTy); +TargetLoweringBase::LegalizeKind PLK = getTLI()->getTypeConversion(C, MPTy); +const ConstantInt *Idx = dyn_cast(ICA.getArgs()[1]); +if (RLK.first == TargetLoweringBase::TypeLegal && +PLK.first == TargetLoweringBase::TypeLegal && Idx && +Idx->getZExtValue() == 0) + return InstructionCost(1); sdesmalen-arm wrote: Is there a reason this wouldn't this be zero-cost? Also, stylistically to match the rest of this file, maybe return `TTI::TCC_Free` (if this is considered a cost of 0) or `TTI::TCC_Basic` (if this is considered a cost of 1) instead? https://github.com/llvm/llvm-project/pull/81135 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/19.x: [clang][AArch64] Add SME2.1 feature macros (#105657) (PR #106135)
sdesmalen-arm wrote: Rationale; this helps people who use LLVM 19 to write code for the SME2.1 intrinsics, which the compiler already supports, but without the macros set a user couldn't write compliant code, e.g. `#if defined __ARM_FEATURE_SME_B16B16, , #endif`, because the macro would not be set. https://github.com/llvm/llvm-project/pull/106135 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/19.x: [clang][AArch64] Add SME2.1 feature macros (#105657) (PR #106135)
https://github.com/sdesmalen-arm approved this pull request. https://github.com/llvm/llvm-project/pull/106135 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Disable SVE paired ld1/st1 for callee-saves. (PR #107406)
https://github.com/sdesmalen-arm milestoned https://github.com/llvm/llvm-project/pull/107406 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Disable SVE paired ld1/st1 for callee-saves. (PR #107406)
https://github.com/sdesmalen-arm edited https://github.com/llvm/llvm-project/pull/107406 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Disable SVE paired ld1/st1 for callee-saves. (PR #107406)
sdesmalen-arm wrote: > Hi, since we are wrapping up LLVM 19.1.0 we are very strict with the fixes we > pick at this point. Can you please respond to the following questions to help > me understand if this has to be included in the final release or not. Sure, I appreciate your diligence! > Is this PR a fix for a regression or a critical issue? A critical issue. > What is the risk of accepting this into the release branch? There shouldn't be any risk of accepting this patch into the release branch. With this patch, the code reverts back to the behaviour that we've had for AArch64's SVE/SVE2 extensions which has been deployed for a 5+ years. This was basically a feature that went in and broke things, and this patch reverts that feature. > What is the risk of NOT accepting this into the release branch? Without this change, practically any code that targets AArch64's SME2 or SVE2p1 extensions will be broken. https://github.com/llvm/llvm-project/pull/107406 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] d196f9e - [InstructionCost] Prevent InstructionCost being created with CostState.
Author: Sander de Smalen Date: 2021-01-25T11:26:56Z New Revision: d196f9e2fca3ff767aa7d2dcaf4654724a79e18c URL: https://github.com/llvm/llvm-project/commit/d196f9e2fca3ff767aa7d2dcaf4654724a79e18c DIFF: https://github.com/llvm/llvm-project/commit/d196f9e2fca3ff767aa7d2dcaf4654724a79e18c.diff LOG: [InstructionCost] Prevent InstructionCost being created with CostState. For a function that returns InstructionCost, it is very tempting to write: return InstructionCost::Invalid; But that actually returns InstructionCost(1 /* int value of Invalid */)) which has a totally different meaning. By marking this constructor as `delete`, this can no longer happen. Added: Modified: llvm/include/llvm/Support/InstructionCost.h Removed: diff --git a/llvm/include/llvm/Support/InstructionCost.h b/llvm/include/llvm/Support/InstructionCost.h index 725f8495ac09..fbc898b878bb 100644 --- a/llvm/include/llvm/Support/InstructionCost.h +++ b/llvm/include/llvm/Support/InstructionCost.h @@ -47,6 +47,7 @@ class InstructionCost { public: InstructionCost() = default; + InstructionCost(CostState) = delete; InstructionCost(CostType Val) : Value(Val), State(Valid) {} static InstructionCost getInvalid(CostType Val = 0) { ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 171d124 - [SLPVectorizer] NFC: Migrate getVectorCallCosts to use InstructionCost.
Author: Sander de Smalen Date: 2021-01-25T12:27:01Z New Revision: 171d12489f20818e292362342b5665c689073ad2 URL: https://github.com/llvm/llvm-project/commit/171d12489f20818e292362342b5665c689073ad2 DIFF: https://github.com/llvm/llvm-project/commit/171d12489f20818e292362342b5665c689073ad2.diff LOG: [SLPVectorizer] NFC: Migrate getVectorCallCosts to use InstructionCost. This change also changes getReductionCost to return InstructionCost, and it simplifies two expressions by removing a redundant 'isValid' check. Added: Modified: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp Removed: diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp index 7114b4d412fd..0b630197911a 100644 --- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp +++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp @@ -3411,21 +3411,21 @@ bool BoUpSLP::areAllUsersVectorized(Instruction *I) const { }); } -static std::pair +static std::pair getVectorCallCosts(CallInst *CI, FixedVectorType *VecTy, TargetTransformInfo *TTI, TargetLibraryInfo *TLI) { Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI); // Calculate the cost of the scalar and vector calls. IntrinsicCostAttributes CostAttrs(ID, *CI, VecTy->getElementCount()); - int IntrinsicCost = + auto IntrinsicCost = TTI->getIntrinsicInstrCost(CostAttrs, TTI::TCK_RecipThroughput); auto Shape = VFShape::get(*CI, ElementCount::getFixed(static_cast( VecTy->getNumElements())), false /*HasGlobalPred*/); Function *VecFunc = VFDatabase(*CI).getVectorizedFunction(Shape); - int LibCost = IntrinsicCost; + auto LibCost = IntrinsicCost; if (!CI->isNoBuiltin() && VecFunc) { // Calculate the cost of the vector library call. SmallVector VecTys; @@ -5994,7 +5994,7 @@ bool SLPVectorizerPass::vectorizeStoreChain(ArrayRef Chain, BoUpSLP &R, InstructionCost Cost = R.getTreeCost(); LLVM_DEBUG(dbgs() << "SLP: Found cost = " << Cost << " for VF =" << VF << "\n"); - if (Cost.isValid() && Cost < -SLPCostThreshold) { + if (Cost < -SLPCostThreshold) { LLVM_DEBUG(dbgs() << "SLP: Decided to vectorize cost = " << Cost << "\n"); using namespace ore; @@ -6295,7 +6295,7 @@ bool SLPVectorizerPass::tryToVectorizeList(ArrayRef VL, BoUpSLP &R, MinCost = std::min(MinCost, Cost); - if (Cost.isValid() && Cost < -SLPCostThreshold) { + if (Cost < -SLPCostThreshold) { LLVM_DEBUG(dbgs() << "SLP: Vectorizing list at cost:" << Cost << ".\n"); R.getORE()->emit(OptimizationRemark(SV_NAME, "VectorizedList", cast(Ops[0])) @@ -7007,11 +7007,12 @@ class HorizontalReduction { private: /// Calculate the cost of a reduction. - int getReductionCost(TargetTransformInfo *TTI, Value *FirstReducedVal, - unsigned ReduxWidth) { + InstructionCost getReductionCost(TargetTransformInfo *TTI, + Value *FirstReducedVal, + unsigned ReduxWidth) { Type *ScalarTy = FirstReducedVal->getType(); FixedVectorType *VectorTy = FixedVectorType::get(ScalarTy, ReduxWidth); -int VectorCost, ScalarCost; +InstructionCost VectorCost, ScalarCost; switch (RdxKind) { case RecurKind::Add: case RecurKind::Mul: ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] a9f5e43 - [AArch64] Use faddp to implement fadd reductions.
Author: Sander de Smalen Date: 2021-01-06T09:36:51Z New Revision: a9f5e4375b36e5316b8d6f9731be6bfa5a70e276 URL: https://github.com/llvm/llvm-project/commit/a9f5e4375b36e5316b8d6f9731be6bfa5a70e276 DIFF: https://github.com/llvm/llvm-project/commit/a9f5e4375b36e5316b8d6f9731be6bfa5a70e276.diff LOG: [AArch64] Use faddp to implement fadd reductions. Custom-expand legal VECREDUCE_FADD SDNodes to benefit from pair-wise faddp instructions. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D59259 Added: Modified: llvm/include/llvm/Target/TargetSelectionDAG.td llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64InstrInfo.td llvm/test/CodeGen/AArch64/vecreduce-fadd-legalization.ll llvm/test/CodeGen/AArch64/vecreduce-fadd.ll Removed: diff --git a/llvm/include/llvm/Target/TargetSelectionDAG.td b/llvm/include/llvm/Target/TargetSelectionDAG.td index d5b8aeb1055d..0c6eef939ea4 100644 --- a/llvm/include/llvm/Target/TargetSelectionDAG.td +++ b/llvm/include/llvm/Target/TargetSelectionDAG.td @@ -250,6 +250,10 @@ def SDTVecInsert : SDTypeProfile<1, 3, [// vector insert def SDTVecReduce : SDTypeProfile<1, 1, [// vector reduction SDTCisInt<0>, SDTCisVec<1> ]>; +def SDTFPVecReduce : SDTypeProfile<1, 1, [ // FP vector reduction + SDTCisFP<0>, SDTCisVec<1> +]>; + def SDTSubVecExtract : SDTypeProfile<1, 2, [// subvector extract SDTCisSubVecOfVec<0,1>, SDTCisInt<2> @@ -439,6 +443,7 @@ def vecreduce_smax : SDNode<"ISD::VECREDUCE_SMAX", SDTVecReduce>; def vecreduce_umax : SDNode<"ISD::VECREDUCE_UMAX", SDTVecReduce>; def vecreduce_smin : SDNode<"ISD::VECREDUCE_SMIN", SDTVecReduce>; def vecreduce_umin : SDNode<"ISD::VECREDUCE_UMIN", SDTVecReduce>; +def vecreduce_fadd : SDNode<"ISD::VECREDUCE_FADD", SDTFPVecReduce>; def fadd : SDNode<"ISD::FADD" , SDTFPBinOp, [SDNPCommutative]>; def fsub : SDNode<"ISD::FSUB" , SDTFPBinOp>; diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp index faed7c64a15e..2b9dc84a06cc 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp @@ -999,6 +999,9 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM, MVT::v8f16, MVT::v4f32, MVT::v2f64 }) { setOperationAction(ISD::VECREDUCE_FMAX, VT, Custom); setOperationAction(ISD::VECREDUCE_FMIN, VT, Custom); + + if (VT.getVectorElementType() != MVT::f16 || Subtarget->hasFullFP16()) +setOperationAction(ISD::VECREDUCE_FADD, VT, Legal); } for (MVT VT : { MVT::v8i8, MVT::v4i16, MVT::v2i32, MVT::v16i8, MVT::v8i16, MVT::v4i32 }) { diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td index 4d70fb334828..7e9f2fb95188 100644 --- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td +++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td @@ -4989,6 +4989,26 @@ defm FMAXNMP : SIMDFPPairwiseScalar<0, 0b01100, "fmaxnmp">; defm FMAXP : SIMDFPPairwiseScalar<0, 0b0, "fmaxp">; defm FMINNMP : SIMDFPPairwiseScalar<1, 0b01100, "fminnmp">; defm FMINP : SIMDFPPairwiseScalar<1, 0b0, "fminp">; + +let Predicates = [HasFullFP16] in { +def : Pat<(f16 (vecreduce_fadd (v8f16 V128:$Rn))), +(FADDPv2i16p + (EXTRACT_SUBREG + (FADDPv8f16 (FADDPv8f16 V128:$Rn, (v8f16 (IMPLICIT_DEF))), (v8f16 (IMPLICIT_DEF))), + dsub))>; +def : Pat<(f16 (vecreduce_fadd (v4f16 V64:$Rn))), + (FADDPv2i16p (FADDPv4f16 V64:$Rn, (v4f16 (IMPLICIT_DEF>; +} +def : Pat<(f32 (vecreduce_fadd (v4f32 V128:$Rn))), + (FADDPv2i32p +(EXTRACT_SUBREG + (FADDPv4f32 V128:$Rn, (v4f32 (IMPLICIT_DEF))), + dsub))>; +def : Pat<(f32 (vecreduce_fadd (v2f32 V64:$Rn))), + (FADDPv2i32p V64:$Rn)>; +def : Pat<(f64 (vecreduce_fadd (v2f64 V128:$Rn))), + (FADDPv2i64p V128:$Rn)>; + def : Pat<(v2i64 (AArch64saddv V128:$Rn)), (INSERT_SUBREG (v2i64 (IMPLICIT_DEF)), (ADDPv2i64p V128:$Rn), dsub)>; def : Pat<(v2i64 (AArch64uaddv V128:$Rn)), diff --git a/llvm/test/CodeGen/AArch64/vecreduce-fadd-legalization.ll b/llvm/test/CodeGen/AArch64/vecreduce-fadd-legalization.ll index 69b9c3e22d7a..2b5fcd4b839a 100644 --- a/llvm/test/CodeGen/AArch64/vecreduce-fadd-legalization.ll +++ b/llvm/test/CodeGen/AArch64/vecreduce-fadd-legalization.ll @@ -51,8 +51,7 @@ define float @test_v3f32(<3 x float> %a) nounwind { ; CHECK-NEXT:mov w8, #-2147483648 ; CHECK-NEXT:fmov s1, w8 ; CHECK-NEXT:mov v0.s[3], v1.s[0] -; CHECK-NEXT:ext v1.16b, v0.16b, v0.16b, #8 -; CHECK-NEXT:fadd v0.2s, v0.2s, v1.2s +; CHECK-NEXT:faddp v0.4s, v0.4s, v0.4s ; CHECK-NEXT:faddp s0, v0.2s ; CHECK-NEXT:ret %b = call reassoc float @l
[llvm-branch-commits] [llvm] a7e3339 - [AArch64][SVE] Emit DWARF location expression for SVE stack objects.
Author: Sander de Smalen Date: 2021-01-06T09:40:53Z New Revision: a7e3339f3b0eb71e43d44e6f59cc8db6a7b110bf URL: https://github.com/llvm/llvm-project/commit/a7e3339f3b0eb71e43d44e6f59cc8db6a7b110bf DIFF: https://github.com/llvm/llvm-project/commit/a7e3339f3b0eb71e43d44e6f59cc8db6a7b110bf.diff LOG: [AArch64][SVE] Emit DWARF location expression for SVE stack objects. Extend PEI to emit a DWARF expression for StackOffsets that have a fixed and scalable component. This means the expression that needs to be added is either: + offset or: + offset + scalable_offset * scalereg where for SVE, the scale reg is the Vector Granule Dwarf register, which encodes the number of 64bit 'granules' in an SVE vector and which the debugger can evaluate at runtime. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D90020 Added: llvm/test/CodeGen/AArch64/debug-info-sve-dbg-value.mir Modified: llvm/include/llvm/CodeGen/TargetRegisterInfo.h llvm/lib/CodeGen/PrologEpilogInserter.cpp llvm/lib/CodeGen/TargetRegisterInfo.cpp llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp llvm/lib/Target/AArch64/AArch64RegisterInfo.h Removed: diff --git a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h index de2c1b069784..6f32729a1e83 100644 --- a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h +++ b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h @@ -34,6 +34,7 @@ namespace llvm { class BitVector; +class DIExpression; class LiveRegMatrix; class MachineFunction; class MachineInstr; @@ -923,6 +924,15 @@ class TargetRegisterInfo : public MCRegisterInfo { llvm_unreachable("isFrameOffsetLegal does not exist on this target"); } + /// Gets the DWARF expression opcodes for \p Offset. + virtual void getOffsetOpcodes(const StackOffset &Offset, +SmallVectorImpl &Ops) const; + + /// Prepends a DWARF expression for \p Offset to DIExpression \p Expr. + DIExpression * + prependOffsetExpression(const DIExpression *Expr, unsigned PrependFlags, + const StackOffset &Offset) const; + /// Spill the register so it can be used by the register scavenger. /// Return true if the register was spilled, false otherwise. /// If this function does not spill the register, the scavenger diff --git a/llvm/lib/CodeGen/PrologEpilogInserter.cpp b/llvm/lib/CodeGen/PrologEpilogInserter.cpp index 7c38b193a980..65b2165bf2a0 100644 --- a/llvm/lib/CodeGen/PrologEpilogInserter.cpp +++ b/llvm/lib/CodeGen/PrologEpilogInserter.cpp @@ -1211,8 +1211,6 @@ void PEI::replaceFrameIndices(MachineBasicBlock *BB, MachineFunction &MF, StackOffset Offset = TFI->getFrameIndexReference(MF, FrameIdx, Reg); -assert(!Offset.getScalable() && - "Frame offsets with a scalable component are not supported"); MI.getOperand(0).ChangeToRegister(Reg, false /*isDef*/); MI.getOperand(0).setIsDebug(); @@ -1238,7 +1236,8 @@ void PEI::replaceFrameIndices(MachineBasicBlock *BB, MachineFunction &MF, // Make the DBG_VALUE direct. MI.getDebugOffset().ChangeToRegister(0, false); } -DIExpr = DIExpression::prepend(DIExpr, PrependFlags, Offset.getFixed()); + +DIExpr = TRI.prependOffsetExpression(DIExpr, PrependFlags, Offset); MI.getDebugExpressionOp().setMetadata(DIExpr); continue; } diff --git a/llvm/lib/CodeGen/TargetRegisterInfo.cpp b/llvm/lib/CodeGen/TargetRegisterInfo.cpp index e89353c9ad27..4a190c9f50af 100644 --- a/llvm/lib/CodeGen/TargetRegisterInfo.cpp +++ b/llvm/lib/CodeGen/TargetRegisterInfo.cpp @@ -26,6 +26,7 @@ #include "llvm/CodeGen/VirtRegMap.h" #include "llvm/Config/llvm-config.h" #include "llvm/IR/Attributes.h" +#include "llvm/IR/DebugInfoMetadata.h" #include "llvm/IR/Function.h" #include "llvm/MC/MCRegisterInfo.h" #include "llvm/Support/CommandLine.h" @@ -532,6 +533,31 @@ TargetRegisterInfo::lookThruCopyLike(Register SrcReg, } } +void TargetRegisterInfo::getOffsetOpcodes( +const StackOffset &Offset, SmallVectorImpl &Ops) const { + assert(!Offset.getScalable() && "Scalable offsets are not handled"); + DIExpression::appendOffset(Ops, Offset.getFixed()); +} + +DIExpression * +TargetRegisterInfo::prependOffsetExpression(const DIExpression *Expr, +unsigned PrependFlags, +const StackOffset &Offset) const { + assert((PrependFlags & + ~(DIExpression::DerefBefore | DIExpression::DerefAfter | +DIExpression::StackValue | DIExpression::EntryValue)) == 0 && + "Unsupported prepend flag"); + SmallVector OffsetExpr; + if (PrependFlags & DIExpression::DerefBefore) +OffsetExpr.push_back(dwarf::DW_OP_deref); + getOffsetOpcodes(Offset, OffsetExpr); + if (PrependFlags & D
[llvm-branch-commits] [llvm] e4cda13 - Fix test failure in a7e3339f3b0eb71e43d44e6f59cc8db6a7b110bf
Author: Sander de Smalen Date: 2021-01-06T10:43:48Z New Revision: e4cda13d5a54a8c6366e4ca82d74265e68bbb3f5 URL: https://github.com/llvm/llvm-project/commit/e4cda13d5a54a8c6366e4ca82d74265e68bbb3f5 DIFF: https://github.com/llvm/llvm-project/commit/e4cda13d5a54a8c6366e4ca82d74265e68bbb3f5.diff LOG: Fix test failure in a7e3339f3b0eb71e43d44e6f59cc8db6a7b110bf Set the target-triple to aarch64 in debug-info-sve-dbg-value.mir to avoid "'+sve' is not a recognized feature for this target" diagnostic. Added: Modified: llvm/test/CodeGen/AArch64/debug-info-sve-dbg-value.mir Removed: diff --git a/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-value.mir b/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-value.mir index ffce40c9c4f4..84d34ce3d2ac 100644 --- a/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-value.mir +++ b/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-value.mir @@ -27,6 +27,7 @@ --- | ; ModuleID = 'bla.mir' source_filename = "bla.mir" + target triple = "aarch64-unknown-linux-gnu" target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128" define void @foo() #0 !dbg !5 { ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 84a1120 - [LiveDebugValues] Handle spill locations with a fixed and scalable component.
Author: Sander de Smalen Date: 2021-01-06T11:30:13Z New Revision: 84a1120943a651184bae507fed5d648fee381ae4 URL: https://github.com/llvm/llvm-project/commit/84a1120943a651184bae507fed5d648fee381ae4 DIFF: https://github.com/llvm/llvm-project/commit/84a1120943a651184bae507fed5d648fee381ae4.diff LOG: [LiveDebugValues] Handle spill locations with a fixed and scalable component. This patch fixes the two LiveDebugValues implementations (InstrRef/VarLoc)Based to handle cases where the StackOffset contains both a fixed and scalable component. This depends on the `TargetRegisterInfo::prependOffsetExpression` being added in D90020. Feel free to leave comments on that patch if you have them. Reviewed By: djtodoro, jmorse Differential Revision: https://reviews.llvm.org/D90046 Added: llvm/test/CodeGen/AArch64/live-debugvalues-sve.mir Modified: llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp Removed: diff --git a/llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp b/llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp index 04ead18cc3de2..b6f46daf8bba9 100644 --- a/llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp +++ b/llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp @@ -182,6 +182,7 @@ #include "llvm/Support/Casting.h" #include "llvm/Support/Compiler.h" #include "llvm/Support/Debug.h" +#include "llvm/Support/TypeSize.h" #include "llvm/Support/raw_ostream.h" #include #include @@ -221,14 +222,16 @@ namespace { // an offset. struct SpillLoc { unsigned SpillBase; - int SpillOffset; + StackOffset SpillOffset; bool operator==(const SpillLoc &Other) const { -return std::tie(SpillBase, SpillOffset) == - std::tie(Other.SpillBase, Other.SpillOffset); +return std::make_pair(SpillBase, SpillOffset) == + std::make_pair(Other.SpillBase, Other.SpillOffset); } bool operator<(const SpillLoc &Other) const { -return std::tie(SpillBase, SpillOffset) < - std::tie(Other.SpillBase, Other.SpillOffset); +return std::make_tuple(SpillBase, SpillOffset.getFixed(), +SpillOffset.getScalable()) < + std::make_tuple(Other.SpillBase, Other.SpillOffset.getFixed(), +Other.SpillOffset.getScalable()); } }; @@ -769,8 +772,10 @@ class MLocTracker { } else if (LocIdxToLocID[*MLoc] >= NumRegs) { unsigned LocID = LocIdxToLocID[*MLoc]; const SpillLoc &Spill = SpillLocs[LocID - NumRegs + 1]; - Expr = DIExpression::prepend(Expr, DIExpression::ApplyOffset, - Spill.SpillOffset); + + auto *TRI = MF.getSubtarget().getRegisterInfo(); + Expr = TRI->prependOffsetExpression(Expr, DIExpression::ApplyOffset, + Spill.SpillOffset); unsigned Base = Spill.SpillBase; MIB.addReg(Base, RegState::Debug); MIB.addImm(0); @@ -1579,9 +1584,7 @@ InstrRefBasedLDV::extractSpillBaseRegAndOffset(const MachineInstr &MI) { const MachineBasicBlock *MBB = MI.getParent(); Register Reg; StackOffset Offset = TFI->getFrameIndexReference(*MBB->getParent(), FI, Reg); - assert(!Offset.getScalable() && - "Frame offsets with a scalable component are not supported"); - return {Reg, static_cast(Offset.getFixed())}; + return {Reg, Offset}; } /// End all previous ranges related to @MI and start a new range from @MI diff --git a/llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp b/llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp index ed7f04e571acc..4811b80467973 100644 --- a/llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp +++ b/llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp @@ -145,6 +145,7 @@ #include "llvm/Support/Casting.h" #include "llvm/Support/Compiler.h" #include "llvm/Support/Debug.h" +#include "llvm/Support/TypeSize.h" #include "llvm/Support/raw_ostream.h" #include "llvm/Target/TargetMachine.h" #include @@ -292,7 +293,7 @@ class VarLocBasedLDV : public LDVImpl { // register and an offset. struct SpillLoc { unsigned SpillBase; - int SpillOffset; + StackOffset SpillOffset; bool operator==(const SpillLoc &Other) const { return SpillBase == Other.SpillBase && SpillOffset == Other.SpillOffset; } @@ -323,21 +324,20 @@ class VarLocBasedLDV : public LDVImpl { /// The value location. Stored separately to avoid repeatedly /// extracting it from MI. -union { +union LocUnion { uint64_t RegNo; SpillLoc SpillLocation; uint64_t Hash; int64_t Immediate; const ConstantFP *FPImm; const ConstantInt *CImm; + LocUnion() : Hash(0) {} } Loc; VarLoc(const MachineInstr &MI, LexicalScopes &LS) : Var(MI.getDebugVariable(), MI.getDebugExpression(), MI.getDebugLoc()->getInlinedAt()),
[llvm-branch-commits] [llvm] aa280c9 - [AArch64][SVE] Emit DWARF location expr for SVE (dbg.declare)
Author: Sander de Smalen Date: 2021-01-06T11:45:05Z New Revision: aa280c99f708dca9dea96bc9070d6194d2622529 URL: https://github.com/llvm/llvm-project/commit/aa280c99f708dca9dea96bc9070d6194d2622529 DIFF: https://github.com/llvm/llvm-project/commit/aa280c99f708dca9dea96bc9070d6194d2622529.diff LOG: [AArch64][SVE] Emit DWARF location expr for SVE (dbg.declare) When using dbg.declare, the debug-info is generated from a list of locals rather than through DBG_VALUE instructions in the MIR. This patch is different from D90020 because it emits the DWARF location expressions from that list of locals directly. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D90044 Added: llvm/test/CodeGen/AArch64/debug-info-sve-dbg-declare.mir Modified: llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp Removed: diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp index 02791f2280d2..ea279e4914b0 100644 --- a/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp +++ b/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp @@ -739,11 +739,10 @@ DIE *DwarfCompileUnit::constructVariableDIEImpl(const DbgVariable &DV, TFI->getFrameIndexReference(*Asm->MF, Fragment.FI, FrameReg); DwarfExpr.addFragmentOffset(Expr); -assert(!Offset.getScalable() && - "Frame offsets with a scalable component are not supported"); - +auto *TRI = Asm->MF->getSubtarget().getRegisterInfo(); SmallVector Ops; -DIExpression::appendOffset(Ops, Offset.getFixed()); +TRI->getOffsetOpcodes(Offset, Ops); + // According to // https://docs.nvidia.com/cuda/archive/10.0/ptx-writers-guide-to-interoperability/index.html#cuda-specific-dwarf // cuda-gdb requires DW_AT_address_class for all variables to be able to diff --git a/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-declare.mir b/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-declare.mir new file mode 100644 index ..39b11ef7bfea --- /dev/null +++ b/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-declare.mir @@ -0,0 +1,222 @@ +# RUN: llc -o %t -filetype=obj -start-before=prologepilog %s +# RUN: llvm-dwarfdump --name="z0" %t | FileCheck %s --check-prefix=CHECKZ0 +# RUN: llvm-dwarfdump --name="z1" %t | FileCheck %s --check-prefix=CHECKZ1 +# RUN: llvm-dwarfdump --name="p0" %t | FileCheck %s --check-prefix=CHECKP0 +# RUN: llvm-dwarfdump --name="p1" %t | FileCheck %s --check-prefix=CHECKP1 +# RUN: llvm-dwarfdump --name="localv0" %t | FileCheck %s --check-prefix=CHECKLV0 +# RUN: llvm-dwarfdump --name="localv1" %t | FileCheck %s --check-prefix=CHECKLV1 +# RUN: llvm-dwarfdump --name="localp0" %t | FileCheck %s --check-prefix=CHECKLP0 +# RUN: llvm-dwarfdump --name="localp1" %t | FileCheck %s --check-prefix=CHECKLP1 +# +# CHECKZ0: DW_AT_location(DW_OP_fbreg +0, DW_OP_lit8, DW_OP_bregx VG+0, DW_OP_mul, DW_OP_minus) +# CHECKZ0-NEXT: DW_AT_name("z0") +# CHECKZ1: DW_AT_location(DW_OP_fbreg +0, DW_OP_lit16, DW_OP_bregx VG+0, DW_OP_mul, DW_OP_minus) +# CHECKZ1-NEXT: DW_AT_name("z1") +# CHECKP0: DW_AT_location(DW_OP_fbreg +0, DW_OP_lit17, DW_OP_bregx VG+0, DW_OP_mul, DW_OP_minus) +# CHECKP0-NEXT: DW_AT_name("p0") +# CHECKP1: DW_AT_location(DW_OP_fbreg +0, DW_OP_lit18, DW_OP_bregx VG+0, DW_OP_mul, DW_OP_minus) +# CHECKP1-NEXT: DW_AT_name("p1") +# CHECKLV0: DW_AT_location(DW_OP_fbreg +0, DW_OP_constu 0x20, DW_OP_bregx VG+0, DW_OP_mul, DW_OP_minus) +# CHECKLV0-NEXT: DW_AT_name("localv0") +# CHECKLV1: DW_AT_location(DW_OP_fbreg +0, DW_OP_constu 0x28, DW_OP_bregx VG+0, DW_OP_mul, DW_OP_minus) +# CHECKLV1-NEXT: DW_AT_name("localv1") +# CHECKLP0: DW_AT_location(DW_OP_fbreg +0, DW_OP_constu 0x29, DW_OP_bregx VG+0, DW_OP_mul, DW_OP_minus) +# CHECKLP0-NEXT: DW_AT_name("localp0") +# CHECKLP1: DW_AT_location(DW_OP_fbreg +0, DW_OP_constu 0x2a, DW_OP_bregx VG+0, DW_OP_mul, DW_OP_minus) +# CHECKLP1-NEXT: DW_AT_name("localp1") +--- | + ; ModuleID = 't.c' + source_filename = "t.c" + target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128" + target triple = "aarch64-unknown-linux-gnu" + + ; Function Attrs: noinline nounwind optnone + define dso_local @foo( %z0, %z1, %p0, %p1, i32 %w0) #0 !dbg !11 { + entry: +%z0.addr = alloca , align 16 +%z1.addr = alloca , align 16 +%p0.addr = alloca , align 2 +%p1.addr = alloca , align 2 +%w0.addr = alloca i32, align 4 +%local_gpr0 = alloca i32, align 4 +%localv0 = alloca , align 16 +%localv1 = alloca , align 16 +%localp0 = alloca , align 2 +%localp1 = alloca , align 2 +store %z0, * %z0.addr, align 16 +call void @llvm.dbg.declare(metadata * %z0.addr, metadata !29, metadata !DIExpression()), !dbg !30 +store %z1, * %z1.addr, align 16 +call
[llvm-branch-commits] [llvm] c8a914d - [LiveDebugValues] Fix comparison operator in VarLocBasedImpl
Author: Sander de Smalen Date: 2021-01-12T08:44:58Z New Revision: c8a914db5c60dbeb5b638f30a9915855a67805f7 URL: https://github.com/llvm/llvm-project/commit/c8a914db5c60dbeb5b638f30a9915855a67805f7 DIFF: https://github.com/llvm/llvm-project/commit/c8a914db5c60dbeb5b638f30a9915855a67805f7.diff LOG: [LiveDebugValues] Fix comparison operator in VarLocBasedImpl The issue was introduced in commit rG84a1120943a651184bae507fed5d648fee381ae4 and would cause a VarLoc's StackOffset to be compared with its own, instead of the StackOffset from the other VarLoc. This patch fixes that. Added: Modified: llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp Removed: diff --git a/llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp b/llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp index 4811b8046797..e2daa46fe6b9 100644 --- a/llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp +++ b/llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp @@ -572,8 +572,9 @@ class VarLocBasedLDV : public LDVImpl { Expr) < std::make_tuple( Other.Var, Other.Kind, Other.Loc.SpillLocation.SpillBase, - Loc.SpillLocation.SpillOffset.getFixed(), - Loc.SpillLocation.SpillOffset.getScalable(), Other.Expr); + Other.Loc.SpillLocation.SpillOffset.getFixed(), + Other.Loc.SpillLocation.SpillOffset.getScalable(), + Other.Expr); case RegisterKind: case ImmediateKind: case EntryValueKind: ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 329fda3 - NFC: Mention auto-vec support for SVE in release notes.
Author: Sander de Smalen Date: 2022-03-14T09:44:55Z New Revision: 329fda39c507e8740978d10458451dcdb21563be URL: https://github.com/llvm/llvm-project/commit/329fda39c507e8740978d10458451dcdb21563be DIFF: https://github.com/llvm/llvm-project/commit/329fda39c507e8740978d10458451dcdb21563be.diff LOG: NFC: Mention auto-vec support for SVE in release notes. Added: Modified: llvm/docs/ReleaseNotes.rst Removed: diff --git a/llvm/docs/ReleaseNotes.rst b/llvm/docs/ReleaseNotes.rst index e8934f79181a7..b2d8d8c2640e2 100644 --- a/llvm/docs/ReleaseNotes.rst +++ b/llvm/docs/ReleaseNotes.rst @@ -82,6 +82,7 @@ Changes to the AArch64 Backend "tune-cpu" attribute is absent it tunes according to the "target-cpu". * Fixed relocations against temporary symbols (e.g. in jump tables and constant pools) in large COFF object files. +* Auto-vectorization now targets SVE by default when available. Changes to the ARM Backend -- ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] adc3714 - [LoopVectorizer] NFC: Remove unnecessary asserts that VF cannot be scalable.
Author: Sander de Smalen Date: 2020-12-09T11:25:21Z New Revision: adc37145dec9cadf76af05326150ed22a3cc2fdd URL: https://github.com/llvm/llvm-project/commit/adc37145dec9cadf76af05326150ed22a3cc2fdd DIFF: https://github.com/llvm/llvm-project/commit/adc37145dec9cadf76af05326150ed22a3cc2fdd.diff LOG: [LoopVectorizer] NFC: Remove unnecessary asserts that VF cannot be scalable. This patch removes a number of asserts that VF is not scalable, even though the code where this assert lives does nothing that prevents VF being scalable. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91060 Added: Modified: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Removed: diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index 6ba14e942ff8..f504afd1ffc4 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -367,7 +367,6 @@ static Type *getMemInstValueType(Value *I) { /// type is irregular if its allocated size doesn't equal the store size of an /// element of the corresponding vector type at the given vectorization factor. static bool hasIrregularType(Type *Ty, const DataLayout &DL, ElementCount VF) { - assert(!VF.isScalable() && "scalable vectors not yet supported."); // Determine if an array of VF elements of type Ty is "bitcast compatible" // with a vector. if (VF.isVector()) { @@ -1387,9 +1386,7 @@ class LoopVectorizationCostModel { /// width \p VF. Return CM_Unknown if this instruction did not pass /// through the cost modeling. InstWidening getWideningDecision(Instruction *I, ElementCount VF) { -assert(!VF.isScalable() && "scalable vectors not yet supported."); -assert(VF.isVector() && "Expected VF >=2"); - +assert(VF.isVector() && "Expected VF to be a vector VF"); // Cost model is not run in the VPlan-native path - return conservative // result until this changes. if (EnableVPlanNativePath) @@ -3902,8 +3899,10 @@ void InnerLoopVectorizer::fixVectorizedLoop() { // profile is not inherently precise anyway. Note also possible bypass of // vector code caused by legality checks is ignored, assigning all the weight // to the vector loop, optimistically. - assert(!VF.isScalable() && - "cannot use scalable ElementCount to determine unroll factor"); + // + // For scalable vectorization we can't know at compile time how many iterations + // of the loop are handled in one vector iteration, so instead assume a pessimistic + // vscale of '1'. setProfileInfoAfterUnrolling( LI->getLoopFor(LoopScalarBody), LI->getLoopFor(LoopVectorBody), LI->getLoopFor(LoopScalarBody), VF.getKnownMinValue() * UF); @@ -4709,7 +4708,6 @@ static bool mayDivideByZero(Instruction &I) { void InnerLoopVectorizer::widenInstruction(Instruction &I, VPValue *Def, VPUser &User, VPTransformState &State) { - assert(!VF.isScalable() && "scalable vectors not yet supported."); switch (I.getOpcode()) { case Instruction::Call: case Instruction::Br: @@ -4797,7 +4795,6 @@ void InnerLoopVectorizer::widenInstruction(Instruction &I, VPValue *Def, setDebugLocFromInst(Builder, CI); /// Vectorize casts. -assert(!VF.isScalable() && "VF is assumed to be non scalable."); Type *DestTy = (VF.isScalar()) ? CI->getType() : VectorType::get(CI->getType(), VF); @@ -5099,7 +5096,6 @@ void LoopVectorizationCostModel::collectLoopScalars(ElementCount VF) { bool LoopVectorizationCostModel::isScalarWithPredication(Instruction *I, ElementCount VF) { - assert(!VF.isScalable() && "scalable vectors not yet supported."); if (!blockNeedsPredication(I->getParent())) return false; switch(I->getOpcode()) { @@ -6420,7 +6416,6 @@ int LoopVectorizationCostModel::computePredInstDiscount( LoopVectorizationCostModel::VectorizationCostTy LoopVectorizationCostModel::expectedCost(ElementCount VF) { - assert(!VF.isScalable() && "scalable vectors not yet supported."); VectorizationCostTy Cost; // For each block. @@ -7935,7 +7930,6 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, VFRange &Range, "Must be called with either a load or store"); auto willWiden = [&](ElementCount VF) -> bool { -assert(!VF.isScalable() && "unexpected scalable ElementCount"); if (VF.isScalar()) return false; LoopVectorizationCostModel::InstWidening Decision = ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] d568cff - [LoopVectorizer][SVE] Vectorize a simple loop with with a scalable VF.
Author: Sander de Smalen Date: 2020-12-09T11:25:21Z New Revision: d568cff696e8fb89ce1b040561c037412767af60 URL: https://github.com/llvm/llvm-project/commit/d568cff696e8fb89ce1b040561c037412767af60 DIFF: https://github.com/llvm/llvm-project/commit/d568cff696e8fb89ce1b040561c037412767af60.diff LOG: [LoopVectorizer][SVE] Vectorize a simple loop with with a scalable VF. * Steps are scaled by `vscale`, a runtime value. * Changes to circumvent the cost-model for now (temporary) so that the cost-model can be implemented separately. This can vectorize the following loop [1]: void loop(int N, double *a, double *b) { #pragma clang loop vectorize_width(4, scalable) for (int i = 0; i < N; i++) { a[i] = b[i] + 1.0; } } [1] This source-level example is based on the pragma proposed separately in D89031. This patch only implements the LLVM part. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91077 Added: llvm/test/Transforms/LoopVectorize/scalable-loop-unpredicated-body-scalar-tail.ll Modified: llvm/include/llvm/IR/IRBuilder.h llvm/lib/IR/IRBuilder.cpp llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/lib/Transforms/Vectorize/VPlan.h llvm/test/Transforms/LoopVectorize/metadata-width.ll Removed: diff --git a/llvm/include/llvm/IR/IRBuilder.h b/llvm/include/llvm/IR/IRBuilder.h index db215094a7e49..c2b3446d159f2 100644 --- a/llvm/include/llvm/IR/IRBuilder.h +++ b/llvm/include/llvm/IR/IRBuilder.h @@ -879,6 +879,10 @@ class IRBuilderBase { Type *ResultType, const Twine &Name = ""); + /// Create a call to llvm.vscale, multiplied by \p Scaling. The type of VScale + /// will be the same type as that of \p Scaling. + Value *CreateVScale(Constant *Scaling, const Twine &Name = ""); + /// Create a call to intrinsic \p ID with 1 operand which is mangled on its /// type. CallInst *CreateUnaryIntrinsic(Intrinsic::ID ID, Value *V, diff --git a/llvm/lib/IR/IRBuilder.cpp b/llvm/lib/IR/IRBuilder.cpp index c0e4451f52003..f936f5756b6f0 100644 --- a/llvm/lib/IR/IRBuilder.cpp +++ b/llvm/lib/IR/IRBuilder.cpp @@ -80,6 +80,17 @@ static CallInst *createCallHelper(Function *Callee, ArrayRef Ops, return CI; } +Value *IRBuilderBase::CreateVScale(Constant *Scaling, const Twine &Name) { + Module *M = GetInsertBlock()->getParent()->getParent(); + assert(isa(Scaling) && "Expected constant integer"); + Function *TheFn = + Intrinsic::getDeclaration(M, Intrinsic::vscale, {Scaling->getType()}); + CallInst *CI = createCallHelper(TheFn, {}, this, Name); + return cast(Scaling)->getSExtValue() == 1 + ? CI + : CreateMul(CI, Scaling); +} + CallInst *IRBuilderBase::CreateMemSet(Value *Ptr, Value *Val, Value *Size, MaybeAlign Align, bool isVolatile, MDNode *TBAATag, MDNode *ScopeTag, diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index f504afd1ffc41..a91fb988badf6 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -1121,6 +1121,15 @@ static OptimizationRemarkAnalysis createLVAnalysis(const char *PassName, return R; } +/// Return a value for Step multiplied by VF. +static Value *createStepForVF(IRBuilder<> &B, Constant *Step, ElementCount VF) { + assert(isa(Step) && "Expected an integer step"); + Constant *StepVal = ConstantInt::get( + Step->getType(), + cast(Step)->getSExtValue() * VF.getKnownMinValue()); + return VF.isScalable() ? B.CreateVScale(StepVal) : StepVal; +} + namespace llvm { void reportVectorizationFailure(const StringRef DebugMsg, @@ -2277,8 +2286,6 @@ void InnerLoopVectorizer::buildScalarSteps(Value *ScalarIV, Value *Step, const InductionDescriptor &ID) { // We shouldn't have to build scalar steps if we aren't vectorizing. assert(VF.isVector() && "VF should be greater than one"); - assert(!VF.isScalable() && - "the code below assumes a fixed number of elements at compile time"); // Get the value type and ensure it and the step have the same integer type. Type *ScalarIVTy = ScalarIV->getType()->getScalarType(); assert(ScalarIVTy == Step->getType() && @@ -2303,11 +2310,24 @@ void InnerLoopVectorizer::buildScalarSteps(Value *ScalarIV, Value *Step, Cost->isUniformAfterVectorization(cast(EntryVal), VF) ? 1 : VF.getKnownMinValue(); + assert((!VF.isScalable() || Lanes == 1) && + "Should never scalarize a scalable vector"); // Compute the scalar steps and save the results in VectorLoopValueMap. for (unsigned Part = 0; Part < UF; ++Part) { for (unsigned Lane = 0; Lane < Lanes; ++Lane) { - auto *StartIdx = g
[llvm-branch-commits] [llvm] [TableGen] Fix calculation of Lanemask for RCs with artificial subregs. (PR #114392)
https://github.com/sdesmalen-arm updated https://github.com/llvm/llvm-project/pull/114392 >From 303e1c87e0ea835d5892afaa04c9e72d2d1778f4 Mon Sep 17 00:00:00 2001 From: Sander de Smalen Date: Thu, 31 Oct 2024 09:54:52 + Subject: [PATCH] [TableGen] Fix calculation of Lanemask for RCs with artificial subregs. TableGen builds up a map of "SubRegIdx -> Subclass" where Subclass is the largest class where all registers have SubRegIdx as a sub-register. When SubRegIdx (vis-a-vis the sub-register) is artificial it should still include it in the map. This map is used in various places, including in the calculation of the Lanemask of a register class, which otherwise calculates an incorrect lanemask. --- llvm/test/TableGen/ArtificialSubregs.td | 4 ++-- llvm/utils/TableGen/Common/CodeGenRegisters.cpp | 10 -- 2 files changed, 6 insertions(+), 8 deletions(-) diff --git a/llvm/test/TableGen/ArtificialSubregs.td b/llvm/test/TableGen/ArtificialSubregs.td index dbac129fb2463b..b8d5686dee51be 100644 --- a/llvm/test/TableGen/ArtificialSubregs.td +++ b/llvm/test/TableGen/ArtificialSubregs.td @@ -104,7 +104,7 @@ def TestTarget : Target; // CHECK: SuperClasses: // // CHECK: RegisterClass DRegs: -// CHECK: LaneMask: 0004 +// CHECK: LaneMask: 0044 // CHECK: HasDisjunctSubRegs: 1 // CHECK: CoveredBySubRegs: 1 // CHECK: Regs: D0 D1 D2 @@ -112,7 +112,7 @@ def TestTarget : Target; // CHECK: SuperClasses: // // CHECK: RegisterClass QRegs: -// CHECK: LaneMask: 0044 +// CHECK: LaneMask: 0045 // CHECK: HasDisjunctSubRegs: 1 // CHECK: CoveredBySubRegs: 1 // CHECK: Regs: Q0 Q1 Q2 diff --git a/llvm/utils/TableGen/Common/CodeGenRegisters.cpp b/llvm/utils/TableGen/Common/CodeGenRegisters.cpp index 2bf6a3740c486b..2dbee94d7e5406 100644 --- a/llvm/utils/TableGen/Common/CodeGenRegisters.cpp +++ b/llvm/utils/TableGen/Common/CodeGenRegisters.cpp @@ -2300,10 +2300,8 @@ void CodeGenRegBank::inferSubClassWithSubReg(CodeGenRegisterClass *RC) { if (R->Artificial) continue; const CodeGenRegister::SubRegMap &SRM = R->getSubRegs(); -for (auto I : SRM) { - if (!I.first->Artificial) -SRSets[I.first].push_back(R); -} +for (auto I : SRM) + SRSets[I.first].push_back(R); } for (auto I : SRSets) @@ -2312,8 +2310,6 @@ void CodeGenRegBank::inferSubClassWithSubReg(CodeGenRegisterClass *RC) { // Find matching classes for all SRSets entries. Iterate in SubRegIndex // numerical order to visit synthetic indices last. for (const auto &SubIdx : SubRegIndices) { -if (SubIdx.Artificial) - continue; SubReg2SetMap::const_iterator I = SRSets.find(&SubIdx); // Unsupported SubRegIndex. Skip it. if (I == SRSets.end()) @@ -2323,6 +2319,8 @@ void CodeGenRegBank::inferSubClassWithSubReg(CodeGenRegisterClass *RC) { RC->setSubClassWithSubReg(&SubIdx, RC); continue; } +if (SubIdx.Artificial) + continue; // This is a real subset. See if we have a matching class. CodeGenRegisterClass *SubRC = getOrCreateSubClass( RC, &I->second, RC->getName() + "_with_" + I->first->getName()); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Define high bits of FPR and GPR registers. (PR #114263)
https://github.com/sdesmalen-arm closed https://github.com/llvm/llvm-project/pull/114263 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Define high bits of FPR and GPR registers. (PR #114263)
sdesmalen-arm wrote: Trying to reopen.. https://github.com/llvm/llvm-project/pull/114263 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Define high bits of FPR and GPR registers. (PR #114263)
sdesmalen-arm wrote: I think I need to create a new PR for this as Github doesn't allow me to reopen and choose a different branch to merge into. https://github.com/llvm/llvm-project/pull/114263 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Define high bits of FPR and GPR registers. (PR #114263)
sdesmalen-arm wrote: It wasn't, but I also didn't realise that I closed it. Could Github have done this automatically after the branch it was based of was deleted? (I was about to push the rebased branch of this PR after merging #114391 and #114392) https://github.com/llvm/llvm-project/pull/114263 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Define high bits of FPR and GPR registers. (PR #114263)
@@ -424,6 +424,58 @@ AArch64RegisterInfo::explainReservedReg(const MachineFunction &MF, return {}; } +static SmallVector ReservedHi = { sdesmalen-arm wrote: Without marking the registers as reserved, then for the example below: ``` --- name:sv2i64 tracksRegLiveness: true body: | bb.0.entry: liveins: $q0, $q1 %0:fpr128 = COPY $q0 %1:fpr128 = COPY $q1 %35:gpr64 = COPY %0.dsub %36:gpr64 = COPY %1.dsub %9:gpr64 = SDIVXr %35, %36 %37:gpr64 = UMOVvi64 %0, 1 %38:gpr64 = UMOVvi64 %1, 1 %10:gpr64 = SDIVXr %37, %38 %19:fpr128 = INSvi64gpr undef %19, 0, %9 %19:fpr128 = INSvi64gpr %19, 1, %10 %39:gpr64 = COPY %19.dsub %24:gpr64 = MADDXrrr %39, %36, $xzr %41:gpr64 = UMOVvi64 %19, 1 %25:gpr64 = MADDXrrr %41, %38, $xzr %34:fpr128 = INSvi64gpr undef %34, 0, %24 %34:fpr128 = INSvi64gpr %34, 1, %25 %2:fpr128 = SUBv2i64 %0, %34 $q0 = COPY %2 RET_ReallyLR implicit $q0 ... ``` When I run this with: ``` llc -global-isel -verify-machineinstrs -run-pass=machine-scheduler ``` It fails with: ``` Use of $xzr does not have a corresponding definition on every path: 216r %10:gpr64 = MADDXrrr %9:gpr64, %3:gpr64, $xzr LLVM ERROR: Use not jointly dominated by defs. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: ./bin/llc -global-isel -verify-machineinstrs -run-pass=machine-scheduler /tmp/t.mir -o - 1. Running pass 'Function Pass Manager' on module '/tmp/t.mir'. 2. Running pass 'Machine Instruction Scheduler' on function '@sv2i64' ... #8 0x80062b7c llvm::LiveRangeCalc::findReachingDefs(llvm::LiveRange&, llvm::MachineBasicBlock&, llvm::SlotIndex, unsigned int, llvm::ArrayRef) #9 0x80063e94 llvm::LiveRangeCalc::extend(llvm::LiveRange&, llvm::SlotIndex, unsigned int, llvm::ArrayRef) #10 0x80064a18 llvm::LiveIntervalCalc::extendToUses(llvm::LiveRange&, llvm::Register, llvm::LaneBitmask, llvm::LiveInterval*) #11 0x8003e82c llvm::LiveIntervals::computeRegUnitRange(llvm::LiveRange&, unsigned int) #12 0x80044cdc llvm::LiveIntervals::HMEditor::updateAllRanges(llvm::MachineInstr*) #13 0x8004848c llvm::LiveIntervals::handleMove(llvm::MachineInstr&, bool) #14 0x801f44ec llvm::ScheduleDAGMI::moveInstruction(llvm::MachineInstr*, llvm::MachineInstrBundleIterator) #15 0x801fdb58 llvm::ScheduleDAGMILive::scheduleMI(llvm::SUnit*, bool) #16 0x8020b214 llvm::ScheduleDAGMILive::schedule() #17 0x801f0934 (anonymous namespace)::MachineSchedulerBase::scheduleRegions(llvm::ScheduleDAGInstrs&, bool) (.isra.0) MachineScheduler.cpp:0:0 ``` https://github.com/llvm/llvm-project/pull/114263 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Define high bits of FPR and GPR registers. (PR #114263)
https://github.com/sdesmalen-arm edited https://github.com/llvm/llvm-project/pull/114263 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Define high bits of FPR and GPR registers. (PR #114263)
@@ -424,6 +424,58 @@ AArch64RegisterInfo::explainReservedReg(const MachineFunction &MF, return {}; } +static SmallVector ReservedHi = { sdesmalen-arm wrote: I don't think there is a bug; the code for moving an instruction goes through the list of operands to update the register's liverange. For each physreg it then goes through the regunits to calculate/update the liverange for that regunit, but only if the regunit is not reserved. The code that determines if the register is reserved says: ``` // A register unit is considered reserved if all its roots and all their // super registers are reserved. ``` Without this change to AArch64RegisterInfo.cpp, WZR and XZR are marked as reserved, but WZR_HI isn't (because WZR_HI is a sibling of WZR, and `markSuperRegs` marks only XZR as reserved), and so `IsReserved` is `false` for the WZR_HI regunit. Why this doesn't fail for AMDGPU I don't know, perhaps these registers are always virtual and they never go down this path. https://github.com/llvm/llvm-project/pull/114263 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Cherry pick f314e12 into release/19.x (PR #117695)
sdesmalen-arm wrote: > Hi! How important are these backports for the release branch? Are they > bugfixes? What's the risk of them? Hi @tru, these are all bugfixes for things that would somehow lead to a compilation failure when targeting AArch64 SME instructions. I don't see any risks in cherry-picking them onto the release branch. https://github.com/llvm/llvm-project/pull/117695 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Cherry pick f314e12 into release/19.x (PR #117695)
https://github.com/sdesmalen-arm created https://github.com/llvm/llvm-project/pull/117695 f314e12 uses `requiresSaveVG` which was introduced in 334a366ba792, which is also missing from the release/19.x branch. I figured it made sense to cherry-pick that one as well. >From de526e5893e901c350fc9bd6d8013d7d1dbd42c6 Mon Sep 17 00:00:00 2001 From: Amara Emerson Date: Tue, 13 Aug 2024 00:39:14 -0700 Subject: [PATCH 1/2] [AArch64][Darwin][SME] Don't try to save VG to the stack for unwinding. On Darwin we don't have any hardware that has SVE support, only SME. Therefore we don't need to save VG for unwinders and can safely omit it. This also fixes crashes introduced since this feature landed since Darwin's compact unwind code can't handle the presence of VG anyway. rdar://131072344 --- .../Target/AArch64/AArch64FrameLowering.cpp | 24 ++- .../Target/AArch64/AArch64ISelLowering.cpp| 19 ++- .../CodeGen/AArch64/sme-darwin-no-sve-vg.ll | 161 ++ 3 files changed, 189 insertions(+), 15 deletions(-) create mode 100644 llvm/test/CodeGen/AArch64/sme-darwin-no-sve-vg.ll diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp index 87e057a468afd6..83d9dd17259733 100644 --- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp @@ -1394,6 +1394,18 @@ bool requiresGetVGCall(MachineFunction &MF) { !MF.getSubtarget().hasSVE(); } +static bool requiresSaveVG(MachineFunction &MF) { + AArch64FunctionInfo *AFI = MF.getInfo(); + // For Darwin platforms we don't save VG for non-SVE functions, even if SME + // is enabled with streaming mode changes. + if (!AFI->hasStreamingModeChanges()) +return false; + auto &ST = MF.getSubtarget(); + if (ST.isTargetDarwin()) +return ST.hasSVE(); + return true; +} + bool isVGInstruction(MachineBasicBlock::iterator MBBI) { unsigned Opc = MBBI->getOpcode(); if (Opc == AArch64::CNTD_XPiI || Opc == AArch64::RDSVLI_XI || @@ -1430,8 +1442,7 @@ static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec( // functions, we need to do this for both the streaming and non-streaming // vector length. Move past these instructions if necessary. MachineFunction &MF = *MBB.getParent(); - AArch64FunctionInfo *AFI = MF.getInfo(); - if (AFI->hasStreamingModeChanges()) + if (requiresSaveVG(MF)) while (isVGInstruction(MBBI)) ++MBBI; @@ -1937,7 +1948,7 @@ void AArch64FrameLowering::emitPrologue(MachineFunction &MF, while (MBBI != End && MBBI->getFlag(MachineInstr::FrameSetup) && !IsSVECalleeSave(MBBI)) { // Move past instructions generated to calculate VG -if (AFI->hasStreamingModeChanges()) +if (requiresSaveVG(MF)) while (isVGInstruction(MBBI)) ++MBBI; @@ -3720,7 +3731,7 @@ void AArch64FrameLowering::determineCalleeSaves(MachineFunction &MF, // non-streaming VG value. const Function &F = MF.getFunction(); SMEAttrs Attrs(F); - if (AFI->hasStreamingModeChanges()) { + if (requiresSaveVG(MF)) { if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface()) CSStackSize += 16; else @@ -3873,7 +3884,7 @@ bool AArch64FrameLowering::assignCalleeSavedSpillSlots( } // Insert VG into the list of CSRs, immediately before LR if saved. - if (AFI->hasStreamingModeChanges()) { + if (requiresSaveVG(MF)) { std::vector VGSaves; SMEAttrs Attrs(MF.getFunction()); @@ -4602,10 +4613,9 @@ MachineBasicBlock::iterator emitVGSaveRestore(MachineBasicBlock::iterator II, void AArch64FrameLowering::processFunctionBeforeFrameIndicesReplaced( MachineFunction &MF, RegScavenger *RS = nullptr) const { - AArch64FunctionInfo *AFI = MF.getInfo(); for (auto &BB : MF) for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) { - if (AFI->hasStreamingModeChanges()) + if (requiresSaveVG(MF)) II = emitVGSaveRestore(II, this); if (StackTaggingMergeSetTag) II = tryMergeAdjacentSTG(II, this, RS); diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp index 62078822c89b18..ef2789e96213b5 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp @@ -8732,10 +8732,11 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI, SDValue InGlue; if (RequiresSMChange) { - -Chain = DAG.getNode(AArch64ISD::VG_SAVE, DL, -DAG.getVTList(MVT::Other, MVT::Glue), Chain); -InGlue = Chain.getValue(1); +if (!Subtarget->isTargetDarwin() || Subtarget->hasSVE()) { + Chain = DAG.getNode(AArch64ISD::VG_SAVE, DL, + DAG.getVTList(MVT::Other, MVT::Glue), Chain); + InGlue = Chain.getValue(1); +} SDValue NewChain = changeStreamingMode( DAG, DL, CalleeAttrs.hasStreamingInterface(), Chain, InGlue, @@ -8914,11 +891
[llvm-branch-commits] [llvm] Cherry pick f314e12 into release/19.x (PR #117695)
https://github.com/sdesmalen-arm edited https://github.com/llvm/llvm-project/pull/117695 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Cherry pick f314e12 into release/19.x (PR #117695)
https://github.com/sdesmalen-arm milestoned https://github.com/llvm/llvm-project/pull/117695 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Cherry pick f314e12 into release/19.x (PR #117695)
https://github.com/sdesmalen-arm updated https://github.com/llvm/llvm-project/pull/117695 >From de526e5893e901c350fc9bd6d8013d7d1dbd42c6 Mon Sep 17 00:00:00 2001 From: Amara Emerson Date: Tue, 13 Aug 2024 00:39:14 -0700 Subject: [PATCH 1/3] [AArch64][Darwin][SME] Don't try to save VG to the stack for unwinding. On Darwin we don't have any hardware that has SVE support, only SME. Therefore we don't need to save VG for unwinders and can safely omit it. This also fixes crashes introduced since this feature landed since Darwin's compact unwind code can't handle the presence of VG anyway. rdar://131072344 --- .../Target/AArch64/AArch64FrameLowering.cpp | 24 ++- .../Target/AArch64/AArch64ISelLowering.cpp| 19 ++- .../CodeGen/AArch64/sme-darwin-no-sve-vg.ll | 161 ++ 3 files changed, 189 insertions(+), 15 deletions(-) create mode 100644 llvm/test/CodeGen/AArch64/sme-darwin-no-sve-vg.ll diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp index 87e057a468afd6..83d9dd17259733 100644 --- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp @@ -1394,6 +1394,18 @@ bool requiresGetVGCall(MachineFunction &MF) { !MF.getSubtarget().hasSVE(); } +static bool requiresSaveVG(MachineFunction &MF) { + AArch64FunctionInfo *AFI = MF.getInfo(); + // For Darwin platforms we don't save VG for non-SVE functions, even if SME + // is enabled with streaming mode changes. + if (!AFI->hasStreamingModeChanges()) +return false; + auto &ST = MF.getSubtarget(); + if (ST.isTargetDarwin()) +return ST.hasSVE(); + return true; +} + bool isVGInstruction(MachineBasicBlock::iterator MBBI) { unsigned Opc = MBBI->getOpcode(); if (Opc == AArch64::CNTD_XPiI || Opc == AArch64::RDSVLI_XI || @@ -1430,8 +1442,7 @@ static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec( // functions, we need to do this for both the streaming and non-streaming // vector length. Move past these instructions if necessary. MachineFunction &MF = *MBB.getParent(); - AArch64FunctionInfo *AFI = MF.getInfo(); - if (AFI->hasStreamingModeChanges()) + if (requiresSaveVG(MF)) while (isVGInstruction(MBBI)) ++MBBI; @@ -1937,7 +1948,7 @@ void AArch64FrameLowering::emitPrologue(MachineFunction &MF, while (MBBI != End && MBBI->getFlag(MachineInstr::FrameSetup) && !IsSVECalleeSave(MBBI)) { // Move past instructions generated to calculate VG -if (AFI->hasStreamingModeChanges()) +if (requiresSaveVG(MF)) while (isVGInstruction(MBBI)) ++MBBI; @@ -3720,7 +3731,7 @@ void AArch64FrameLowering::determineCalleeSaves(MachineFunction &MF, // non-streaming VG value. const Function &F = MF.getFunction(); SMEAttrs Attrs(F); - if (AFI->hasStreamingModeChanges()) { + if (requiresSaveVG(MF)) { if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface()) CSStackSize += 16; else @@ -3873,7 +3884,7 @@ bool AArch64FrameLowering::assignCalleeSavedSpillSlots( } // Insert VG into the list of CSRs, immediately before LR if saved. - if (AFI->hasStreamingModeChanges()) { + if (requiresSaveVG(MF)) { std::vector VGSaves; SMEAttrs Attrs(MF.getFunction()); @@ -4602,10 +4613,9 @@ MachineBasicBlock::iterator emitVGSaveRestore(MachineBasicBlock::iterator II, void AArch64FrameLowering::processFunctionBeforeFrameIndicesReplaced( MachineFunction &MF, RegScavenger *RS = nullptr) const { - AArch64FunctionInfo *AFI = MF.getInfo(); for (auto &BB : MF) for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) { - if (AFI->hasStreamingModeChanges()) + if (requiresSaveVG(MF)) II = emitVGSaveRestore(II, this); if (StackTaggingMergeSetTag) II = tryMergeAdjacentSTG(II, this, RS); diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp index 62078822c89b18..ef2789e96213b5 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp @@ -8732,10 +8732,11 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI, SDValue InGlue; if (RequiresSMChange) { - -Chain = DAG.getNode(AArch64ISD::VG_SAVE, DL, -DAG.getVTList(MVT::Other, MVT::Glue), Chain); -InGlue = Chain.getValue(1); +if (!Subtarget->isTargetDarwin() || Subtarget->hasSVE()) { + Chain = DAG.getNode(AArch64ISD::VG_SAVE, DL, + DAG.getVTList(MVT::Other, MVT::Glue), Chain); + InGlue = Chain.getValue(1); +} SDValue NewChain = changeStreamingMode( DAG, DL, CalleeAttrs.hasStreamingInterface(), Chain, InGlue, @@ -8914,11 +8915,13 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI, Result = changeStreamingMode( DAG, DL, !CalleeAttrs.hasStreamingInterface(), Result, InGlue,
[llvm-branch-commits] [llvm] Cherry pick f314e12 into release/19.x (PR #117695)
https://github.com/sdesmalen-arm edited https://github.com/llvm/llvm-project/pull/117695 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Coalescer] Consider NewMI's subreg index when updating lanemask. (PR #121780)
https://github.com/sdesmalen-arm created https://github.com/llvm/llvm-project/pull/121780 The code added in #116191 that updated the lanemasks for rematerialized values checked if `DefMI`'s destination register had a subreg index. This seems to have missed the following case: ``` %0:gpr32 = MOVi32imm 1 %1:gpr64 = SUBREG_TO_REG 0, %0:gpr32, %subreg.sub_32 ``` which during rematerialization would have the following variables set: ``` DefMI = %0:gpr32 = MOVi32imm 1 NewMI = %3.sub_32:gpr64 = MOVi32imm 1 (rematerialized value) ``` When checking whether the lanemasks need to be generated, considering whether DefMI's destination has a subreg index is insufficient, we should look at DefMI's subreg index instead. The added tests are a bit more involved, because I was not able to reconstruct the issue without having some control flow in the test. These tests come from actual reproducers. >From 5babad18cb92f708f93c5a0512918ff5e7368fba Mon Sep 17 00:00:00 2001 From: Sander de Smalen Date: Mon, 6 Jan 2025 14:31:49 + Subject: [PATCH 1/2] Pre-commit tests --- ...gister-coalesce-update-subranges-remat.mir | 91 ++- 1 file changed, 90 insertions(+), 1 deletion(-) diff --git a/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir b/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir index b61fa4be040070..524f51a2da5af2 100644 --- a/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir +++ b/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir @@ -1,5 +1,5 @@ +# RUN: llc -mtriple=aarch64 -o /dev/null -run-pass=register-coalescer -aarch64-enable-subreg-liveness-tracking -debug-only=regalloc %s 2>&1 | FileCheck %s --check-prefix=CHECK-DBG # RUN: llc -mtriple=aarch64 -verify-machineinstrs -o - -run-pass=register-coalescer -aarch64-enable-subreg-liveness-tracking %s | FileCheck %s --check-prefix=CHECK -# RUN: llc -mtriple=aarch64 -verify-machineinstrs -o /dev/null -run-pass=register-coalescer -aarch64-enable-subreg-liveness-tracking -debug-only=regalloc %s 2>&1 | FileCheck %s --check-prefix=CHECK-DBG # REQUIRES: asserts # CHECK-DBG: ** REGISTER COALESCER ** @@ -36,3 +36,92 @@ body: | RET_ReallyLR ... +# CHECK-DBG: ** REGISTER COALESCER ** +# CHECK-DBG: ** Function: reproducer +# CHECK-DBG: ** JOINING INTERVALS *** +# CHECK-DBG: ** INTERVALS ** +# CHECK-DBG: %1 [32r,48B:2)[48B,320r:0)[320r,368B:1) 0@48B-phi 1@320r 2@32r +# CHECK-DBG-SAME: weight:0.00e+00 +# CHECK-DBG: %3 [80r,160B:2)[240r,272B:1)[288r,304B:0)[304B,320r:3) 0@288r 1@240r 2@80r 3@304B-phi +# CHECK-DBG-SAME: L0080 [80r,160B:2)[288r,304B:0)[304B,320r:3) 0@288r 1@x 2@80r 3@304B-phi +# CHECK-DBG-SAME: L0040 [80r,160B:2)[240r,272B:1)[288r,304B:0)[304B,320r:3) 0@288r 1@240r 2@80r 3@304B-phi +# CHECK-DBG-SAME: weight:0.00e+00 +--- +name: reproducer +tracksRegLiveness: true +body: | + bb.0: +%0:gpr32 = MOVi32imm 1 +%1:gpr64 = IMPLICIT_DEF + + bb.1: + + bb.2: +%3:gpr64all = SUBREG_TO_REG 0, %0, %subreg.sub_32 + + bb.3: +$nzcv = IMPLICIT_DEF +%4:gpr64 = COPY killed %3 +Bcc 1, %bb.7, implicit killed $nzcv + + bb.4: +$nzcv = IMPLICIT_DEF +Bcc 1, %bb.6, implicit killed $nzcv + + bb.5: +%5:gpr64all = SUBREG_TO_REG 0, %0, %subreg.sub_32 +%4:gpr64 = COPY killed %5 +B %bb.7 + + bb.6: +%4:gpr64 = COPY $xzr + + bb.7: +%7:gpr64 = ADDXrs killed %1, killed %4, 1 +%1:gpr64 = COPY killed %7 +B %bb.1 + +... +# CHECK-DBG: ** REGISTER COALESCER ** +# CHECK-DBG: ** Function: reproducer2 +# CHECK-DBG: ** JOINING INTERVALS *** +# CHECK-DBG: ** INTERVALS ** +# CHECK-DBG: %1 [32r,48B:2)[48B,304r:0)[304r,352B:1) 0@48B-phi 1@304r 2@32r +# CHECK-DBG-SAME: weight:0.00e+00 +# CHECK-DBG: %3 [80r,160B:2)[224r,256B:1)[272r,288B:0)[288B,304r:3) 0@272r 1@224r 2@80r 3@288B-phi +# CHECK-DBG-SAME: weight:0.00e+00 +--- +name: reproducer2 +tracksRegLiveness: true +body: | + bb.0: +%0:gpr32 = MOVi32imm 1 +%1:gpr64 = IMPLICIT_DEF + + bb.1: + + bb.2: +%3:gpr64all = SUBREG_TO_REG 0, %0, %subreg.sub_32 + + bb.3: +$nzcv = IMPLICIT_DEF +%4:gpr64 = COPY killed %3 +Bcc 1, %bb.7, implicit killed $nzcv + + bb.4: +$nzcv = IMPLICIT_DEF +Bcc 1, %bb.6, implicit killed $nzcv + + bb.5: +%4:gpr64 = IMPLICIT_DEF +B %bb.7 + + bb.6: +%4:gpr64 = COPY $xzr + + bb.7: +%5:gpr64 = ADDXrs killed %1, killed %4, 1 +%1:gpr64 = COPY killed %5 +B %bb.1 + +... >From b7afd88e94e1b80d35b4eb7b712976200857ad3a Mon Sep 17 00:00:00 2001 From: Sander de Smalen Date: Tue, 17 Dec 2024 21:01:20 + Subject: [PATCH 2/2] [Coalescer] Consider NewMI's subreg index when updating lanemask. The code added in #116191 that updated the lanemasks for rematerialized
[llvm-branch-commits] [llvm] release/20.x: [AArch64][SME] [AArch64][SME] Spill p-regs as z-regs when streaming hazards are possible (PR #126503)
https://github.com/sdesmalen-arm approved this pull request. There is no risk in adding this to the release branch, because all functionality is hidden behind a flag. The TableGen/SubtargetEmitter.cpp change should not affect anything, because it merely emits an extra `enum class` to the `*GenSubtargetInfo.cpp` file. https://github.com/llvm/llvm-project/pull/126503 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
@@ -2,6 +2,7 @@ ; RUN: opt -passes=loop-vectorize -enable-epilogue-vectorization=false -mattr=+neon,+dotprod -force-vector-interleave=1 -S < %s | FileCheck %s --check-prefixes=CHECK-INTERLEAVE1 ; RUN: opt -passes=loop-vectorize -enable-epilogue-vectorization=false -mattr=+neon,+dotprod -S < %s | FileCheck %s --check-prefixes=CHECK-INTERLEAVED ; RUN: opt -passes=loop-vectorize -enable-epilogue-vectorization=false -mattr=+neon,+dotprod -force-vector-interleave=1 -vectorizer-maximize-bandwidth -S < %s | FileCheck %s --check-prefixes=CHECK-MAXBW +; RUN: opt -passes=loop-vectorize -debug-only=loop-vectorize --disable-output -S < %s 2>&1 | FileCheck %s --check-prefix=CHECK-REGS sdesmalen-arm wrote: This now needs a `REQUIRES: asserts` to guard it for the `-debug-only=`. https://github.com/llvm/llvm-project/pull/133090 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
@@ -5039,10 +5039,25 @@ calculateRegisterUsage(VPlan &Plan, ArrayRef VFs, // even in the scalar case. RegUsage[ClassID] += 1; } else { +// The output from scaled phis and scaled reductions actually have +// fewer lanes than the VF. +ElementCount VF = VFs[J]; +if (auto *ReductionR = dyn_cast(R)) + VF = VF.divideCoefficientBy(ReductionR->getVFScaleFactor()); +else if (auto *PartialReductionR = + dyn_cast(R)) + VF = VF.divideCoefficientBy(PartialReductionR->getVFScaleFactor()); + +LLVM_DEBUG(if (VF != VFs[J]) { + dbgs() << "LV(REG): Scaled down VF from " << VFs[J] << " to " + << VF << " for "; + R->dump(); sdesmalen-arm wrote: minor nit: Is it worth creating a `operator<<` for VPDef, so that you can write: ```suggestion << VF << " for " << *R << "\n"; ``` ? https://github.com/llvm/llvm-project/pull/133090 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
https://github.com/sdesmalen-arm edited https://github.com/llvm/llvm-project/pull/133090 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
@@ -5039,10 +5039,25 @@ calculateRegisterUsage(VPlan &Plan, ArrayRef VFs, // even in the scalar case. RegUsage[ClassID] += 1; } else { +// The output from scaled phis and scaled reductions actually have +// fewer lanes than the VF. +ElementCount VF = VFs[J]; +if (auto *ReductionR = dyn_cast(R)) + VF = VF.divideCoefficientBy(ReductionR->getVFScaleFactor()); +else if (auto *PartialReductionR = + dyn_cast(R)) + VF = VF.divideCoefficientBy(PartialReductionR->getVFScaleFactor()); + +LLVM_DEBUG(if (VF != VFs[J]) { + dbgs() << "LV(REG): Scaled down VF from " << VFs[J] << " to " + << VF << " for "; + R->dump(); sdesmalen-arm wrote: I would expect it to be a pretty small and uncontroversial change (it would merely add an extra function) and doesn't require any particular tests. Are you sure you don't want to include it in this PR? https://github.com/llvm/llvm-project/pull/133090 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
https://github.com/sdesmalen-arm commented: You'll probably want to rebase on top of #126437 again, as some of the code has changed and will cause a merge conflict. https://github.com/llvm/llvm-project/pull/133090 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
@@ -5039,10 +5039,26 @@ calculateRegisterUsage(VPlan &Plan, ArrayRef VFs, // even in the scalar case. RegUsage[ClassID] += 1; } else { +ElementCount VF = VFs[J]; +// The output from scaled phis and scaled reductions actually has +// fewer lanes than the VF. +if (isa(R)) { + auto *ReductionR = dyn_cast(R); + auto *PartialReductionR = ReductionR ? nullptr : dyn_cast(R); + unsigned ScaleFactor = ReductionR ? ReductionR->getVFScaleFactor() : PartialReductionR->getVFScaleFactor(); + VF = VF.divideCoefficientBy(ScaleFactor); +} sdesmalen-arm wrote: Maybe create a utility function that returns the scaling factor a `Recipe`, which returns `1` for any recipe other than the `VPPartialReductionRecipe/VPReductionPHIRecipe`. Also, please run clang-format on your code. https://github.com/llvm/llvm-project/pull/133090 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" (PR #134408)
sdesmalen-arm wrote: @arsenm are you happy for me to reland this? I've done better testing this time around; doing a two-stage build with sanitisers enabled and running LNT on both X86 and AArch64 platforms. https://github.com/llvm/llvm-project/pull/134408 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -4923,9 +4923,7 @@ InstructionCost AArch64TTIImpl::getPartialReductionCost( return Invalid; break; case 16: - if (AccumEVT == MVT::i64) -Cost *= 2; - else if (AccumEVT != MVT::i32) + if (AccumEVT != MVT::i32) sdesmalen-arm wrote: ```llvm.partial.reduce(nxv2i64 %acc, mul(zext(nxv16i8 %a) -> nxv16i64, zext(nxv16i8 %b) -> nxv16i64))``` This case can be lowered with the correct support in selectiondag (see https://github.com/llvm/llvm-project/pull/130935#discussion_r2068162662) I'd argue that this function will need a bit of a rewrite anyway once we get proper support in SelectionDAG for legalising these operations, so I wouldn't bother changing that in this PR. https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -2056,55 +2056,6 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe, } }; -/// A recipe for forming partial reductions. In the loop, an accumulator and sdesmalen-arm wrote: (We discussed this offline) swapping the operands in the debug-print function of the recipe is not something that really concerns me, and I think there's still value making this functionally (from end-user perspective) NFC change. https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -986,11 +986,23 @@ InstructionCost TargetTransformInfo::getShuffleCost( TargetTransformInfo::PartialReductionExtendKind TargetTransformInfo::getPartialReductionExtendKind(Instruction *I) { - if (isa(I)) -return PR_SignExtend; - if (isa(I)) + auto *Cast = dyn_cast(I); + if (!Cast) +return PR_None; + return getPartialReductionExtendKind(Cast->getOpcode()); +} + +TargetTransformInfo::PartialReductionExtendKind +TargetTransformInfo::getPartialReductionExtendKind( +Instruction::CastOps ExtOpcode) { + switch (ExtOpcode) { + case Instruction::CastOps::ZExt: return PR_ZeroExtend; - return PR_None; + case Instruction::CastOps::SExt: +return PR_SignExtend; + default: +return PR_None; sdesmalen-arm wrote: I think the default case should be an `llvm_unreachable()`, because only zero/sign-extend are supported partial reduction extends. The `PR_None` is handled by the function above that takes an `Instruction*`. https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -986,11 +986,23 @@ InstructionCost TargetTransformInfo::getShuffleCost( TargetTransformInfo::PartialReductionExtendKind TargetTransformInfo::getPartialReductionExtendKind(Instruction *I) { - if (isa(I)) -return PR_SignExtend; - if (isa(I)) + auto *Cast = dyn_cast(I); + if (!Cast) +return PR_None; + return getPartialReductionExtendKind(Cast->getOpcode()); sdesmalen-arm wrote: ```suggestion if (auto *Cast = dyn_cast(I)) return getPartialReductionExtendKind(Cast->getOpcode()); return PR_None; ``` https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -2432,12 +2437,40 @@ static void tryToCreateAbstractReductionRecipe(VPReductionRecipe *Red, Red->replaceAllUsesWith(AbstractR); } +/// This function tries to create an abstract recipe from a partial reduction to +/// hide its mul and extends from cost estimation. +static void +tryToCreateAbstractPartialReductionRecipe(VPPartialReductionRecipe *PRed) { sdesmalen-arm wrote: Does this need to be given the `Range &` and for that range to be clamped if it doesn't match or if the cost is higher than the individual operations (similar to what happens in `tryToCreateAbstractReductionRecipe`) ? (note that the cost part is still missing) https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [AArch64][SME] Prevent spills of ZT0 when ZA is not enabled (PR #137683)
https://github.com/sdesmalen-arm approved this pull request. https://github.com/llvm/llvm-project/pull/137683 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
https://github.com/sdesmalen-arm deleted https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -4923,9 +4923,7 @@ InstructionCost AArch64TTIImpl::getPartialReductionCost( return Invalid; break; case 16: - if (AccumEVT == MVT::i64) -Cost *= 2; - else if (AccumEVT != MVT::i32) + if (AccumEVT != MVT::i32) sdesmalen-arm wrote: Why are you making this change? https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -219,6 +219,8 @@ class TargetTransformInfo { /// Get the kind of extension that an instruction represents. static PartialReductionExtendKind getPartialReductionExtendKind(Instruction *I); + static PartialReductionExtendKind + getPartialReductionExtendKind(Instruction::CastOps ExtOpcode); sdesmalen-arm wrote: What about either replacing `getPartialReductionExtendKind(Instruction *I);` with this one, rather than adding a new interface, Or otherwise changing the implementation of `getPartialReductionExtendKind(Instruction *I)` to use `getPartialReductionExtendKind(Instruction::CastOps)` ? https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -2056,55 +2056,6 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe, } }; -/// A recipe for forming partial reductions. In the loop, an accumulator and sdesmalen-arm wrote: Would it be possible to make the change of `VPPartialReductionRecipe : public VPSingleDefRecipe` -> `VPPartialReductionRecipe : public VPReductionRecipe` as an NFC change? (For cases around VPMulAccumulateReductionRecipes you can initially add some asserts that the recipe isn't a partial reduction, because that won't be supported until this PR lands) https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)
@@ -2438,14 +2438,14 @@ VPMulAccumulateReductionRecipe::computeCost(ElementCount VF, return Ctx.TTI.getPartialReductionCost( Instruction::Add, Ctx.Types.inferScalarType(getVecOp0()), Ctx.Types.inferScalarType(getVecOp1()), getResultType(), VF, -TTI::getPartialReductionExtendKind(getExtOpcode()), -TTI::getPartialReductionExtendKind(getExtOpcode()), Instruction::Mul); +TTI::getPartialReductionExtendKind(getExt0Opcode()), +TTI::getPartialReductionExtendKind(getExt1Opcode()), Instruction::Mul); } Type *RedTy = Ctx.Types.inferScalarType(this); auto *SrcVecTy = cast(toVectorTy(Ctx.Types.inferScalarType(getVecOp0()), VF)); - return Ctx.TTI.getMulAccReductionCost(isZExt(), RedTy, SrcVecTy, + return Ctx.TTI.getMulAccReductionCost(isZExt0(), RedTy, SrcVecTy, sdesmalen-arm wrote: The TTI hook also needs updating to reflect the separate extends. https://github.com/llvm/llvm-project/pull/136997 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -2496,6 +2501,9 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { Type *ResultTy; + /// If the reduction this is based on is a partial reduction. sdesmalen-arm wrote: This comment makes no sense. https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" (PR #134408)
@@ -329,11 +329,10 @@ define <2 x half> @chain_hi_to_lo_global() { ; GFX11-TRUE16: ; %bb.0: ; %bb ; GFX11-TRUE16-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; GFX11-TRUE16-NEXT:v_mov_b32_e32 v0, 2 -; GFX11-TRUE16-NEXT:v_mov_b32_e32 v1, 0 +; GFX11-TRUE16-NEXT:v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v2, 0 +; GFX11-TRUE16-NEXT:v_mov_b32_e32 v3, 0 ; GFX11-TRUE16-NEXT:global_load_d16_b16 v0, v[0:1], off -; GFX11-TRUE16-NEXT:v_mov_b32_e32 v1, 0 -; GFX11-TRUE16-NEXT:v_mov_b32_e32 v2, 0 -; GFX11-TRUE16-NEXT:global_load_d16_hi_b16 v0, v[1:2], off +; GFX11-TRUE16-NEXT:global_load_d16_hi_b16 v0, v[2:3], off sdesmalen-arm wrote: @broxigarchen I noticed these tests changed, but I couldn't really tell whether these changes are functionally equivalent. https://github.com/llvm/llvm-project/pull/134408 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -2432,12 +2437,40 @@ static void tryToCreateAbstractReductionRecipe(VPReductionRecipe *Red, Red->replaceAllUsesWith(AbstractR); } +/// This function tries to create an abstract recipe from a partial reduction to +/// hide its mul and extends from cost estimation. +static void +tryToCreateAbstractPartialReductionRecipe(VPPartialReductionRecipe *PRed) { sdesmalen-arm wrote: The way I read the code is that at the point of getting to this point in the code, it has recognised a reduction so there is a `VP[Partial]ReductionRecipe`. It then tries to analyse whether that recipe can be transformed into a `VPMulAccumulateReductionRecipe`. For `VPReductionRecipe` it will clamp the range to all the VFs that can be turned into a `VPMulAccumulateReductionRecipe`, but for `VPPartialReductionRecipe` it doesn't do that. I don't see why for partial reductions we'd do something different. In fact, why wouldn't the `tryToMatchAndCreateMulAccumulateReduction` code be sufficient here? Now that you've made `VPPartialReductionRecipe` a subclass of `VPReductionRecipe`, I'd expect that code to function roughly the same. https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] ARM: Remove fake entries for divrem libcalls (PR #143832)
https://github.com/sdesmalen-arm approved this pull request. https://github.com/llvm/llvm-project/pull/143832 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Use VPReductionRecipe for partial reductions (PR #146073)
@@ -2744,6 +2702,12 @@ class VPSingleDefBundleRecipe : public VPSingleDefRecipe { /// vector operands, performing a reduction.add on the result, and adding /// the scalar result to a chain. MulAccumulateReduction, +/// Represent an inloop multiply-accumulate reduction, multiplying the +/// extended vector operands, negating the multiplication, performing a +/// reduction.add +/// on the result, and adding +/// the scalar result to a chain. +ExtNegatedMulAccumulateReduction, sdesmalen-arm wrote: This looks like something that is currently not yet supported for InLoop reductions. Because you're trying to align the PartialReductions with other reductions, we need to add support for `sub` reductions as well to InLoop reductions. I'd suggest pulling that out into a separate PR (and rebasing this one on top of that). This will require a few changes: * `RecurrenceDescriptor::getReductionOpChain` does not recognise sub-reductions at the moment, which will need fixing. The reason for that is described in one of the comments: ``` // [..] Subs are also currently not allowed (which are usually // treated as part of a add reduction) as they are expected to generally be // more expensive than out-of-loop reductions, and need to be costed more // carefully. ``` Because you're adding a cost-model for those here, we can now handle sub-reductions as well. * When the `VPReductionRecipe` is created in `LoopVectorizationPlanner::adjustRecipesForReductions`, for in-loop sub-reductions we'd need to add a `sub(0, VecOp)` similar to how we do this for partial reductions. When we finally properly merge the two implementations, the special code we currently have for partial reductions to insert the `sub` can be removed, in favour of the code that we'd need to add for in-loop reductions. (FWIW, that is not something I'm suggesting to do as part of this PR, but rather as a future follow-up) https://github.com/llvm/llvm-project/pull/146073 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)
@@ -784,8 +785,8 @@ AArch64RegisterInfo::useFPForScavengingIndex(const MachineFunction &MF) const { assert((!MF.getSubtarget().hasSVE() || AFI->hasCalculatedStackSizeSVE()) && "Expected SVE area to be calculated by this point"); - return TFI.hasFP(MF) && !hasStackRealignment(MF) && !AFI->getStackSizeSVE() && - !AFI->hasStackHazardSlotIndex(); + return TFI.hasFP(MF) && !hasStackRealignment(MF) && !AFI->getStackSizeZPR() && + !AFI->getStackSizePPR() && !AFI->hasStackHazardSlotIndex(); sdesmalen-arm wrote: nit: ```suggestion return TFI.hasFP(MF) && !hasStackRealignment(MF) && !AFI->hasSVEStackSize() && !AFI->hasStackHazardSlotIndex(); ``` https://github.com/llvm/llvm-project/pull/142391 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)
@@ -299,14 +297,20 @@ class AArch64FunctionInfo final : public MachineFunctionInfo { TailCallReservedStack = bytes; } - bool hasCalculatedStackSizeSVE() const { return HasCalculatedStackSizeSVE; } + void setStackSizeZPR(uint64_t S) { +HasCalculatedStackSizeSVE = true; sdesmalen-arm wrote: nit: this function sets `HasCalculatedStackSizeSVE` if only one of the two values are set. Is it worth making this `setStackSizeSVE(uint64_t ZPR, uint64_t PPR=0)` such that `HasCalculatedStackSizeSVE is set only once? https://github.com/llvm/llvm-project/pull/142391 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)
@@ -299,14 +297,20 @@ class AArch64FunctionInfo final : public MachineFunctionInfo { TailCallReservedStack = bytes; } - bool hasCalculatedStackSizeSVE() const { return HasCalculatedStackSizeSVE; } + void setStackSizeZPR(uint64_t S) { +HasCalculatedStackSizeSVE = true; +StackSizeZPR = S; + } - void setStackSizeSVE(uint64_t S) { + void setStackSizePPR(uint64_t S) { HasCalculatedStackSizeSVE = true; -StackSizeSVE = S; +StackSizePPR = S; } - uint64_t getStackSizeSVE() const { return StackSizeSVE; } + uint64_t getStackSizeZPR() const { return StackSizeZPR; } sdesmalen-arm wrote: not related to your PR, but I think we should add an assert that `HasCalculatedStackSizeSVE` is true (same for CalleeSavedStackSize), although unfortunately that currently leads to some failures where they're used. https://github.com/llvm/llvm-project/pull/142391 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)
@@ -451,10 +454,36 @@ static unsigned getFixedObjectSize(const MachineFunction &MF, } } -/// Returns the size of the entire SVE stackframe (calleesaves + spills). +static unsigned getStackHazardSize(const MachineFunction &MF) { sdesmalen-arm wrote: nit: maybe just move the implementation to where they are declared? https://github.com/llvm/llvm-project/pull/142391 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)
@@ -19,6 +19,11 @@ namespace llvm { +struct SVEStackSizes { sdesmalen-arm wrote: Should this be named `SVEStackOffsets` (given that they're used as signed offsets)? https://github.com/llvm/llvm-project/pull/142391 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)
@@ -644,7 +644,8 @@ bool AArch64RegisterInfo::hasBasePointer(const MachineFunction &MF) const { if (ST.hasSVE() || ST.isStreaming()) { // Frames that have variable sized objects and scalable SVE objects, // should always use a basepointer. - if (!AFI->hasCalculatedStackSizeSVE() || AFI->getStackSizeSVE()) + if (!AFI->hasCalculatedStackSizeSVE() || AFI->getStackSizeZPR() || + AFI->getStackSizePPR()) sdesmalen-arm wrote: nit: ```suggestion if (!AFI->hasCalculatedStackSizeSVE() || AFI->hasSVEStackSize()) ``` https://github.com/llvm/llvm-project/pull/142391 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)
@@ -1605,25 +1634,19 @@ static bool isTargetWindows(const MachineFunction &MF) { return MF.getSubtarget().isTargetWindows(); } -static unsigned getStackHazardSize(const MachineFunction &MF) { - return MF.getSubtarget().getStreamingHazardSize(); -} - // Convenience function to determine whether I is an SVE callee save. -static bool IsSVECalleeSave(MachineBasicBlock::iterator I) { +static bool IsZPRCalleeSave(MachineBasicBlock::iterator I) { sdesmalen-arm wrote: nit: given that you're renaming these, what about calling them `isPartOfZPRCalleeSave` (because a `PTRUE_B` instruction is not a callee-save in itself) https://github.com/llvm/llvm-project/pull/142391 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)
@@ -4294,24 +4396,32 @@ static int64_t determineSVEStackObjectOffsets(MachineFrameInfo &MFI, report_fatal_error( "Alignment of scalable vectors > 16 bytes is not yet supported"); +int64_t &Offset = OffsetForObject(FI, ZPROffset, PPROffset); Offset = alignTo(Offset + MFI.getObjectSize(FI), Alignment); if (AssignOffsets) Assign(FI, -Offset); } - return Offset; + PPROffset = alignTo(PPROffset, Align(16U)); + ZPROffset = alignTo(ZPROffset, Align(16U)); + + if (&ZPROffset != &PPROffset) { +// SplitSVEObjects (PPRs and ZPRs allocated to separate areas). +return SVEStackSizes{ZPROffset, PPROffset}; + } + // When SplitSVEObjects is disabled just attribute all the stack to ZPRs. + // Determining the split is not necessary. + return SVEStackSizes{ZPROffset, 0}; sdesmalen-arm wrote: When you use an instance of the return type (`SVEStackSizes`) instead of `ZPRStack` and `PPRStack`, then you can just return that struct at the end of this function. https://github.com/llvm/llvm-project/pull/142391 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)
@@ -4227,10 +4310,20 @@ static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, // Fills in the first and last callee-saved frame indices into // Min/MaxCSFrameIndex, respectively. // Returns the size of the stack. -static int64_t determineSVEStackObjectOffsets(MachineFrameInfo &MFI, - int &MinCSFrameIndex, - int &MaxCSFrameIndex, - bool AssignOffsets) { +static SVEStackSizes +determineSVEStackObjectOffsets(MachineFunction &MF, bool AssignOffsets, + bool SplitSVEObjects = false) { + MachineFrameInfo &MFI = MF.getFrameInfo(); + + int64_t ZPRStack = 0; + int64_t PPRStack = 0; + + auto [ZPROffset, PPROffset] = [&] { +if (SplitSVEObjects) + return std::tie(ZPRStack, PPRStack); +return std::tie(ZPRStack, ZPRStack); + }(); sdesmalen-arm wrote: This seems a lot more readable: ```suggestion int64_t &ZPROffset = ZPRStack; int64_t &PPROffset = SplitSVEObjects ? PPRStack : ZPRStack; ``` Also, can you add a brief comment describing why you create two aliases? https://github.com/llvm/llvm-project/pull/142391 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)
@@ -299,14 +297,20 @@ class AArch64FunctionInfo final : public MachineFunctionInfo { TailCallReservedStack = bytes; } - bool hasCalculatedStackSizeSVE() const { return HasCalculatedStackSizeSVE; } + void setStackSizeZPR(uint64_t S) { +HasCalculatedStackSizeSVE = true; +StackSizeZPR = S; + } - void setStackSizeSVE(uint64_t S) { + void setStackSizePPR(uint64_t S) { HasCalculatedStackSizeSVE = true; -StackSizeSVE = S; +StackSizePPR = S; } - uint64_t getStackSizeSVE() const { return StackSizeSVE; } + uint64_t getStackSizeZPR() const { return StackSizeZPR; } sdesmalen-arm wrote: Yeah, I wasn't necessarily suggesting that to be done as part of this PR. I just spotted it and I'm worried this hides some latent bugs. https://github.com/llvm/llvm-project/pull/142391 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)
@@ -19,6 +19,11 @@ namespace llvm { +struct SVEStackSizes { sdesmalen-arm wrote: Okay, could you then change `int64_t` to an unsigned type? https://github.com/llvm/llvm-project/pull/142391 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" (PR #134408)
https://github.com/sdesmalen-arm edited https://github.com/llvm/llvm-project/pull/134408 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" (PR #134408)
sdesmalen-arm wrote: My apologies for taking a while to get back to this; I had been lacking focus time, and it also took me a while to get the changes right. This time I think it's in better shape, partly because I've got a better understanding of how things work and second because I've done more testing. (a) I've built and ran the LLVM test-suite with subreg-liveness-tracking enabled on AArch64. (b) I've done stage-2-builds with sanitizers (address + memory) on both AArch64 and X86 with subreg liveness tracking enabled for AArch64. https://github.com/llvm/llvm-project/pull/134408 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)
@@ -4363,24 +4458,25 @@ static int64_t determineSVEStackObjectOffsets(MachineFrameInfo &MFI, report_fatal_error( "Alignment of scalable vectors > 16 bytes is not yet supported"); +int64_t &Offset = OffsetForObject(FI, ZPROffset, PPROffset); Offset = alignTo(Offset + MFI.getObjectSize(FI), Alignment); if (AssignOffsets) Assign(FI, -Offset); } - return Offset; + PPROffset = alignTo(PPROffset, Align(16U)); + ZPROffset = alignTo(ZPROffset, Align(16U)); + return SVEStack; } -int64_t AArch64FrameLowering::estimateSVEStackObjectOffsets( -MachineFrameInfo &MFI) const { - int MinCSFrameIndex, MaxCSFrameIndex; - return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex, false); +SVEStackSizes +AArch64FrameLowering::estimateSVEStackObjectOffsets(MachineFunction &MF) const { + return determineSVEStackObjectOffsets(MF, false); sdesmalen-arm wrote: ```suggestion return determineSVEStackObjectOffsets(MF, /*AssignOffsets=*/ false); ``` (same for the one below) https://github.com/llvm/llvm-project/pull/142391 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)
@@ -4296,10 +4372,20 @@ static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, // Fills in the first and last callee-saved frame indices into // Min/MaxCSFrameIndex, respectively. // Returns the size of the stack. -static int64_t determineSVEStackObjectOffsets(MachineFrameInfo &MFI, - int &MinCSFrameIndex, - int &MaxCSFrameIndex, - bool AssignOffsets) { +static SVEStackSizes +determineSVEStackObjectOffsets(MachineFunction &MF, bool AssignOffsets, + bool SplitSVEObjects = false) { sdesmalen-arm wrote: nit: maybe move the parameter to the other PR, because at the moment it is never called with `SplitSVEObjects = true`, and so this code cannot be tested. https://github.com/llvm/llvm-project/pull/142391 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)
@@ -4694,12 +4790,8 @@ void AArch64FrameLowering::processFunctionBeforeFrameFinalized( assert(getStackGrowthDirection() == TargetFrameLowering::StackGrowsDown && "Upwards growing stack unsupported"); - int MinCSFrameIndex, MaxCSFrameIndex; - int64_t SVEStackSize = - assignSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex); - - AFI->setStackSizeSVE(alignTo(SVEStackSize, 16U)); - AFI->setMinMaxSVECSFrameIndex(MinCSFrameIndex, MaxCSFrameIndex); + auto [ZPRStackSize, PPRStackSize] = assignSVEStackObjectOffsets(MF); sdesmalen-arm wrote: nit: There's little value in returning from `assignSVEStackObjectOffsets` if their only use is them being passed to `setStackSizeSVE`, so I think assignment can now probably be moved to `determineSVEStackObjectOffsets` (predicated under `AssignOffsets`), removing the need for the aliases that then just call `determineSVEStackObjectOffsets`. https://github.com/llvm/llvm-project/pull/142391 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)
@@ -19,6 +19,11 @@ namespace llvm { +struct SVEStackSizes { sdesmalen-arm wrote: The only places where `SVEStackSizes` are used, are in the context of unsigned values (the function `setStackSizeSVE` and the variables `SVELocals` and `SVEStackSize`), except for the case where you chose to use `int64_t` for `SVELocals` and `SVEStackSize` even though their sizes are always non-negative. If the value can never be negative and it's only ever used in contexts where the derived value is never negative, then I think it should be an unsigned value. The size will at some point need converting to a signed offset, but that's already the case and the code that does the conversion from a size -> offset will need to guarantee the value does not overflow. Which brings me to the data-type `uint64_t`; other places for these sizes use `unsigned` at the moment, so if you change this to `unsigned`, then the conversion to `int64_t` (as used in `StackOffset`) will never result in a different signedness. https://github.com/llvm/llvm-project/pull/142391 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)
@@ -19,6 +19,11 @@ namespace llvm { +struct SVEStackSizes { sdesmalen-arm wrote: Ah yes, those use the same data storage so there will be a conversion required in that function. In that case, I'm happy with an explicit cast as in: ``` unsigned &Offset = OffsetForObject(FI, ZPROffset, PPROffset); Offset += MFI.getObjectSize(FI); Offset = alignTo(Offset, MFI.getObjectAlign(FI)); if (AssignOffsets) { LLVM_DEBUG(dbgs() << "FI: " << FI << ", Offset: " << -int64_t(Offset) << "\n"); Assign(FI, -int64_t(Offset)); } ``` (tangentially related; the double LLVM_DEBUG (the one above and the one in `Assign`) look almost identical, can we remove one?) https://github.com/llvm/llvm-project/pull/142391 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)
@@ -4308,26 +4398,33 @@ static int64_t determineSVEStackObjectOffsets(MachineFrameInfo &MFI, "reference."); #endif - auto Assign = [&MFI](int FI, int64_t Offset) { + auto StackForObject = [&](int FI, uint64_t &ZPRStackTop, +uint64_t &PPRStackTop) -> uint64_t & { +return MFI.getStackID(FI) == TargetStackID::ScalableVector ? ZPRStackTop + : PPRStackTop; + }; + + auto Assign = [&MFI, AssignOffsets](int FI, int64_t Offset) { +if (AssignOffsets == AssignObjectOffsets::No) + return; LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n"); MFI.setObjectOffset(FI, Offset); }; - int64_t Offset = 0; - // Then process all callee saved slots. + int MinCSFrameIndex, MaxCSFrameIndex; if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) { -// Assign offsets to the callee save slots. -for (int I = MinCSFrameIndex; I <= MaxCSFrameIndex; ++I) { - Offset += MFI.getObjectSize(I); - Offset = alignTo(Offset, MFI.getObjectAlign(I)); - if (AssignOffsets) -Assign(I, -Offset); +for (int FI = MinCSFrameIndex; FI <= MaxCSFrameIndex; ++FI) { + uint64_t &StackTop = StackForObject(FI, ZPRStackTop, PPRStackTop); + StackTop += MFI.getObjectSize(FI); + StackTop = alignTo(StackTop, MFI.getObjectAlign(FI)); + Assign(FI, -int64_t(StackTop)); sdesmalen-arm wrote: nit: maybe add an assert that `int64_t(StackTop) == StackTop` in case of insanely big stacks? or move this into the `Assign` function so not to have to add that twice. https://github.com/llvm/llvm-project/pull/142391 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)
https://github.com/sdesmalen-arm approved this pull request. LGTM with nits addressed. https://github.com/llvm/llvm-project/pull/142391 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)
https://github.com/sdesmalen-arm edited https://github.com/llvm/llvm-project/pull/142391 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)
@@ -4042,8 +4124,11 @@ void AArch64FrameLowering::determineCalleeSaves(MachineFunction &MF, }); // If any callee-saved registers are used, the frame cannot be eliminated. + auto [ZPRLocalStackSize, PPRLocalStackSize] = + determineSVEStackSizes(MF, AssignObjectOffsets::No); + int64_t SVELocals = ZPRLocalStackSize + PPRLocalStackSize; sdesmalen-arm wrote: ```suggestion uint64_t SVELocals = ZPRLocalStackSize + PPRLocalStackSize; ``` (same for `SVEStackSize` below) https://github.com/llvm/llvm-project/pull/142391 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits