[llvm-branch-commits] [compiler-rt] release/19.x: [AArch64][SME] Rewrite __arm_get_current_vg to preserve required registers (#100143) (PR #100546)

2024-07-25 Thread Sander de Smalen via llvm-branch-commits

sdesmalen-arm wrote:

It would be great if we could merge this fix into the release branch!

https://github.com/llvm/llvm-project/pull/100546
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [Clang] Demote always_inline error to warning for mismatching SME attrs (#100740) (PR #100987)

2024-07-29 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm requested changes to this pull request.

For some odd reason, `clang/test/CodeGen/aarch64-sme-inline-streaming-attrs.c` 
seems to be failing on some buildbots with an error that says:
> `unable to create target: No available targets are compatible with triple 
> "aarch64-none-linux-gnu"'`.  I suspect because the test is missing a 
> `REQUIRES: aarch64-registered-target`, but I'm puzzled why that test didn't 
> fail before, because my patch doesn't introduce this test and doesn't change 
> the RUN lines, all this patch does is change one of the diagnostic messages.

In any case, I seemed to have jumped the gun creating this cherry-pick in the 
first place, I thought the change was trivial enough to do so especially after 
testing locally. My apologies for the noise.

I'll revert the patch and fix the issue, and will then create another 
cherry-pick.

https://github.com/llvm/llvm-project/pull/100987
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang][FMV][AArch64] Improve streaming mode compatibility (PR #101007)

2024-07-31 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm approved this pull request.

It would be good if this could make it into the LLVM 19 release.

https://github.com/llvm/llvm-project/pull/101007
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: Reland: "[Clang] Demote always_inline error to warning for mismatching SME attrs" (#100991) (#100996) (PR #101303)

2024-08-01 Thread Sander de Smalen via llvm-branch-commits

sdesmalen-arm wrote:

> Is this safe enough to reland? Have it lived without a problem in main for a 
> bit?

Thanks for checking. The only failures I would have expected are from lit 
tests, but the PR was merged on Monday and I've not seen any buildbot failures, 
so I believe it is safe.

There should also be no impact to code that doesn't explicitly use these 
target-specific attributes. For the case where the attributes are used, the 
behaviour seems correct (e.g. when I try some examples in godbolt) and the 
change is specific and trivial enough not to cause any unrelated/unexpected 
issues either.

I hope that answers your question.

https://github.com/llvm/llvm-project/pull/101303
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [LLVM][TTI][SME] Allow optional auto-vectorisation for streaming functions. (#101679) (PR #101959)

2024-08-05 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm approved this pull request.


https://github.com/llvm/llvm-project/pull/101959
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [AArch64] Adopt updated B16B16 target flags (PR #104602)

2024-08-16 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm approved this pull request.

Thanks for making this change! The Clang/driver behaviour looks correct to me 
now. The patch is a lot smaller than the patches that went into main, so 
hopefully this can still make it into LLVM 19!

https://github.com/llvm/llvm-project/pull/104602
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] release/18.x: [AArch64][SME] Implement inline-asm clobbers for za/zt0 (#79276) (PR #81593)

2024-02-13 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm approved this pull request.

Looks pretty low-risk to me and would be nice to get into the release if we can.

https://github.com/llvm/llvm-project/pull/81593
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] release/18.x: [AArch64][SME] Implement inline-asm clobbers for za/zt0 (#79276) (PR #81616)

2024-02-14 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm approved this pull request.

Looks pretty low-risk to me and would be nice to get into the release if we can.

(how is this PR different from #81593?)

https://github.com/llvm/llvm-project/pull/81616
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Improve cost model for legal subvec insert/extract (PR #81135)

2024-02-15 Thread Sander de Smalen via llvm-branch-commits


@@ -568,6 +568,32 @@ AArch64TTIImpl::getIntrinsicInstrCost(const 
IntrinsicCostAttributes &ICA,
 }
 return Cost;
   }
+  case Intrinsic::vector_extract: {
+// If both the vector argument and the return type are legal types, then
+// this should be a no-op or simple operation; return a relatively low 
cost.
+LLVMContext &C = RetTy->getContext();
+EVT MRTy = getTLI()->getValueType(DL, RetTy);
+EVT MPTy = getTLI()->getValueType(DL, ICA.getArgTypes()[0]);
+TargetLoweringBase::LegalizeKind RLK = getTLI()->getTypeConversion(C, 
MRTy);
+TargetLoweringBase::LegalizeKind PLK = getTLI()->getTypeConversion(C, 
MPTy);
+if (RLK.first == TargetLoweringBase::TypeLegal &&
+PLK.first == TargetLoweringBase::TypeLegal)
+  return InstructionCost(1);

sdesmalen-arm wrote:

Just pointing out that the code isn't updated yet to handle predicates 
differently, as those inserts/extracts are indeed not free.

https://github.com/llvm/llvm-project/pull/81135
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Improve cost model for legal subvec insert/extract (PR #81135)

2024-02-15 Thread Sander de Smalen via llvm-branch-commits


@@ -568,6 +568,48 @@ AArch64TTIImpl::getIntrinsicInstrCost(const 
IntrinsicCostAttributes &ICA,
 }
 return Cost;
   }
+  case Intrinsic::vector_extract: {
+// If both the vector argument and the return type are legal types and the
+// index is 0, then this should be a no-op or simple operation; return a
+// relatively low cost.
+
+// If arguments aren't actually supplied, then we cannot determine the
+// value of the index.
+if (ICA.getArgs().size() < 2)

sdesmalen-arm wrote:

nit:
```suggestion
if (ICA.getArgs().size() != 2)
```

https://github.com/llvm/llvm-project/pull/81135
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Improve cost model for legal subvec insert/extract (PR #81135)

2024-02-15 Thread Sander de Smalen via llvm-branch-commits


@@ -568,6 +568,48 @@ AArch64TTIImpl::getIntrinsicInstrCost(const 
IntrinsicCostAttributes &ICA,
 }
 return Cost;
   }
+  case Intrinsic::vector_extract: {
+// If both the vector argument and the return type are legal types and the
+// index is 0, then this should be a no-op or simple operation; return a
+// relatively low cost.
+
+// If arguments aren't actually supplied, then we cannot determine the
+// value of the index.
+if (ICA.getArgs().size() < 2)
+  break;
+LLVMContext &C = RetTy->getContext();
+EVT MRTy = getTLI()->getValueType(DL, RetTy);
+EVT MPTy = getTLI()->getValueType(DL, ICA.getArgTypes()[0]);
+TargetLoweringBase::LegalizeKind RLK = getTLI()->getTypeConversion(C, 
MRTy);
+TargetLoweringBase::LegalizeKind PLK = getTLI()->getTypeConversion(C, 
MPTy);
+const ConstantInt *Idx = dyn_cast(ICA.getArgs()[1]);
+if (RLK.first == TargetLoweringBase::TypeLegal &&
+PLK.first == TargetLoweringBase::TypeLegal && Idx &&
+Idx->getZExtValue() == 0)
+  return InstructionCost(1);

sdesmalen-arm wrote:

Is there a reason this wouldn't this be zero-cost?

Also, stylistically to match the rest of this file, maybe return 
`TTI::TCC_Free` (if this is considered a cost of 0) or `TTI::TCC_Basic` (if 
this is considered a cost of 1) instead?

https://github.com/llvm/llvm-project/pull/81135
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [clang][AArch64] Add SME2.1 feature macros (#105657) (PR #106135)

2024-08-26 Thread Sander de Smalen via llvm-branch-commits

sdesmalen-arm wrote:

Rationale; this helps people who use LLVM 19 to write code for the SME2.1 
intrinsics, which the compiler already supports, but without the macros set a 
user couldn't write compliant code, e.g. `#if defined __ARM_FEATURE_SME_B16B16, 
, #endif`, because the macro would not be set.

https://github.com/llvm/llvm-project/pull/106135
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [clang][AArch64] Add SME2.1 feature macros (#105657) (PR #106135)

2024-08-26 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm approved this pull request.


https://github.com/llvm/llvm-project/pull/106135
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Disable SVE paired ld1/st1 for callee-saves. (PR #107406)

2024-09-05 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm milestoned 
https://github.com/llvm/llvm-project/pull/107406
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Disable SVE paired ld1/st1 for callee-saves. (PR #107406)

2024-09-05 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm edited 
https://github.com/llvm/llvm-project/pull/107406
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Disable SVE paired ld1/st1 for callee-saves. (PR #107406)

2024-09-10 Thread Sander de Smalen via llvm-branch-commits

sdesmalen-arm wrote:

> Hi, since we are wrapping up LLVM 19.1.0 we are very strict with the fixes we 
> pick at this point. Can you please respond to the following questions to help 
> me understand if this has to be included in the final release or not.

Sure, I appreciate your diligence!

> Is this PR a fix for a regression or a critical issue?

A critical issue.

> What is the risk of accepting this into the release branch?

There shouldn't be any risk of accepting this patch into the release branch. 
With this patch, the code reverts back to the behaviour that we've had for 
AArch64's SVE/SVE2 extensions which has been deployed for a 5+ years. This was 
basically a feature that went in and broke things, and this patch reverts that 
feature.

> What is the risk of NOT accepting this into the release branch?

Without this change, practically any code that targets AArch64's SME2 or SVE2p1 
extensions will be broken.

https://github.com/llvm/llvm-project/pull/107406
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] d196f9e - [InstructionCost] Prevent InstructionCost being created with CostState.

2021-01-25 Thread Sander de Smalen via llvm-branch-commits

Author: Sander de Smalen
Date: 2021-01-25T11:26:56Z
New Revision: d196f9e2fca3ff767aa7d2dcaf4654724a79e18c

URL: 
https://github.com/llvm/llvm-project/commit/d196f9e2fca3ff767aa7d2dcaf4654724a79e18c
DIFF: 
https://github.com/llvm/llvm-project/commit/d196f9e2fca3ff767aa7d2dcaf4654724a79e18c.diff

LOG: [InstructionCost] Prevent InstructionCost being created with CostState.

For a function that returns InstructionCost, it is very tempting to write:

  return InstructionCost::Invalid;

But that actually returns InstructionCost(1 /* int value of Invalid */))
which has a totally different meaning. By marking this constructor as
`delete`, this can no longer happen.

Added: 


Modified: 
llvm/include/llvm/Support/InstructionCost.h

Removed: 




diff  --git a/llvm/include/llvm/Support/InstructionCost.h 
b/llvm/include/llvm/Support/InstructionCost.h
index 725f8495ac09..fbc898b878bb 100644
--- a/llvm/include/llvm/Support/InstructionCost.h
+++ b/llvm/include/llvm/Support/InstructionCost.h
@@ -47,6 +47,7 @@ class InstructionCost {
 public:
   InstructionCost() = default;
 
+  InstructionCost(CostState) = delete;
   InstructionCost(CostType Val) : Value(Val), State(Valid) {}
 
   static InstructionCost getInvalid(CostType Val = 0) {



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 171d124 - [SLPVectorizer] NFC: Migrate getVectorCallCosts to use InstructionCost.

2021-01-25 Thread Sander de Smalen via llvm-branch-commits

Author: Sander de Smalen
Date: 2021-01-25T12:27:01Z
New Revision: 171d12489f20818e292362342b5665c689073ad2

URL: 
https://github.com/llvm/llvm-project/commit/171d12489f20818e292362342b5665c689073ad2
DIFF: 
https://github.com/llvm/llvm-project/commit/171d12489f20818e292362342b5665c689073ad2.diff

LOG: [SLPVectorizer] NFC: Migrate getVectorCallCosts to use InstructionCost.

This change also changes getReductionCost to return InstructionCost,
and it simplifies two expressions by removing a redundant 'isValid' check.

Added: 


Modified: 
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

Removed: 




diff  --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp 
b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 7114b4d412fd..0b630197911a 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -3411,21 +3411,21 @@ bool BoUpSLP::areAllUsersVectorized(Instruction *I) 
const {
  });
 }
 
-static std::pair
+static std::pair
 getVectorCallCosts(CallInst *CI, FixedVectorType *VecTy,
TargetTransformInfo *TTI, TargetLibraryInfo *TLI) {
   Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);
 
   // Calculate the cost of the scalar and vector calls.
   IntrinsicCostAttributes CostAttrs(ID, *CI, VecTy->getElementCount());
-  int IntrinsicCost =
+  auto IntrinsicCost =
 TTI->getIntrinsicInstrCost(CostAttrs, TTI::TCK_RecipThroughput);
 
   auto Shape = VFShape::get(*CI, ElementCount::getFixed(static_cast(
  VecTy->getNumElements())),
 false /*HasGlobalPred*/);
   Function *VecFunc = VFDatabase(*CI).getVectorizedFunction(Shape);
-  int LibCost = IntrinsicCost;
+  auto LibCost = IntrinsicCost;
   if (!CI->isNoBuiltin() && VecFunc) {
 // Calculate the cost of the vector library call.
 SmallVector VecTys;
@@ -5994,7 +5994,7 @@ bool 
SLPVectorizerPass::vectorizeStoreChain(ArrayRef Chain, BoUpSLP &R,
   InstructionCost Cost = R.getTreeCost();
 
   LLVM_DEBUG(dbgs() << "SLP: Found cost = " << Cost << " for VF =" << VF << 
"\n");
-  if (Cost.isValid() && Cost < -SLPCostThreshold) {
+  if (Cost < -SLPCostThreshold) {
 LLVM_DEBUG(dbgs() << "SLP: Decided to vectorize cost = " << Cost << "\n");
 
 using namespace ore;
@@ -6295,7 +6295,7 @@ bool SLPVectorizerPass::tryToVectorizeList(ArrayRef VL, BoUpSLP &R,
 
   MinCost = std::min(MinCost, Cost);
 
-  if (Cost.isValid() && Cost < -SLPCostThreshold) {
+  if (Cost < -SLPCostThreshold) {
 LLVM_DEBUG(dbgs() << "SLP: Vectorizing list at cost:" << Cost << 
".\n");
 R.getORE()->emit(OptimizationRemark(SV_NAME, "VectorizedList",
 cast(Ops[0]))
@@ -7007,11 +7007,12 @@ class HorizontalReduction {
 
 private:
   /// Calculate the cost of a reduction.
-  int getReductionCost(TargetTransformInfo *TTI, Value *FirstReducedVal,
-   unsigned ReduxWidth) {
+  InstructionCost getReductionCost(TargetTransformInfo *TTI,
+   Value *FirstReducedVal,
+   unsigned ReduxWidth) {
 Type *ScalarTy = FirstReducedVal->getType();
 FixedVectorType *VectorTy = FixedVectorType::get(ScalarTy, ReduxWidth);
-int VectorCost, ScalarCost;
+InstructionCost VectorCost, ScalarCost;
 switch (RdxKind) {
 case RecurKind::Add:
 case RecurKind::Mul:



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] a9f5e43 - [AArch64] Use faddp to implement fadd reductions.

2021-01-06 Thread Sander de Smalen via llvm-branch-commits

Author: Sander de Smalen
Date: 2021-01-06T09:36:51Z
New Revision: a9f5e4375b36e5316b8d6f9731be6bfa5a70e276

URL: 
https://github.com/llvm/llvm-project/commit/a9f5e4375b36e5316b8d6f9731be6bfa5a70e276
DIFF: 
https://github.com/llvm/llvm-project/commit/a9f5e4375b36e5316b8d6f9731be6bfa5a70e276.diff

LOG: [AArch64] Use faddp to implement fadd reductions.

Custom-expand legal VECREDUCE_FADD SDNodes
to benefit from pair-wise faddp instructions.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D59259

Added: 


Modified: 
llvm/include/llvm/Target/TargetSelectionDAG.td
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
llvm/lib/Target/AArch64/AArch64InstrInfo.td
llvm/test/CodeGen/AArch64/vecreduce-fadd-legalization.ll
llvm/test/CodeGen/AArch64/vecreduce-fadd.ll

Removed: 




diff  --git a/llvm/include/llvm/Target/TargetSelectionDAG.td 
b/llvm/include/llvm/Target/TargetSelectionDAG.td
index d5b8aeb1055d..0c6eef939ea4 100644
--- a/llvm/include/llvm/Target/TargetSelectionDAG.td
+++ b/llvm/include/llvm/Target/TargetSelectionDAG.td
@@ -250,6 +250,10 @@ def SDTVecInsert : SDTypeProfile<1, 3, [// vector 
insert
 def SDTVecReduce : SDTypeProfile<1, 1, [// vector reduction
   SDTCisInt<0>, SDTCisVec<1>
 ]>;
+def SDTFPVecReduce : SDTypeProfile<1, 1, [  // FP vector reduction
+  SDTCisFP<0>, SDTCisVec<1>
+]>;
+
 
 def SDTSubVecExtract : SDTypeProfile<1, 2, [// subvector extract
   SDTCisSubVecOfVec<0,1>, SDTCisInt<2>
@@ -439,6 +443,7 @@ def vecreduce_smax  : SDNode<"ISD::VECREDUCE_SMAX", 
SDTVecReduce>;
 def vecreduce_umax  : SDNode<"ISD::VECREDUCE_UMAX", SDTVecReduce>;
 def vecreduce_smin  : SDNode<"ISD::VECREDUCE_SMIN", SDTVecReduce>;
 def vecreduce_umin  : SDNode<"ISD::VECREDUCE_UMIN", SDTVecReduce>;
+def vecreduce_fadd  : SDNode<"ISD::VECREDUCE_FADD", SDTFPVecReduce>;
 
 def fadd   : SDNode<"ISD::FADD"   , SDTFPBinOp, [SDNPCommutative]>;
 def fsub   : SDNode<"ISD::FSUB"   , SDTFPBinOp>;

diff  --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index faed7c64a15e..2b9dc84a06cc 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -999,6 +999,9 @@ AArch64TargetLowering::AArch64TargetLowering(const 
TargetMachine &TM,
 MVT::v8f16, MVT::v4f32, MVT::v2f64 }) {
   setOperationAction(ISD::VECREDUCE_FMAX, VT, Custom);
   setOperationAction(ISD::VECREDUCE_FMIN, VT, Custom);
+
+  if (VT.getVectorElementType() != MVT::f16 || Subtarget->hasFullFP16())
+setOperationAction(ISD::VECREDUCE_FADD, VT, Legal);
 }
 for (MVT VT : { MVT::v8i8, MVT::v4i16, MVT::v2i32,
 MVT::v16i8, MVT::v8i16, MVT::v4i32 }) {

diff  --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td 
b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 4d70fb334828..7e9f2fb95188 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -4989,6 +4989,26 @@ defm FMAXNMP : SIMDFPPairwiseScalar<0, 0b01100, 
"fmaxnmp">;
 defm FMAXP   : SIMDFPPairwiseScalar<0, 0b0, "fmaxp">;
 defm FMINNMP : SIMDFPPairwiseScalar<1, 0b01100, "fminnmp">;
 defm FMINP   : SIMDFPPairwiseScalar<1, 0b0, "fminp">;
+
+let Predicates = [HasFullFP16] in {
+def : Pat<(f16 (vecreduce_fadd (v8f16 V128:$Rn))),
+(FADDPv2i16p
+  (EXTRACT_SUBREG
+ (FADDPv8f16 (FADDPv8f16 V128:$Rn, (v8f16 (IMPLICIT_DEF))), 
(v8f16 (IMPLICIT_DEF))),
+   dsub))>;
+def : Pat<(f16 (vecreduce_fadd (v4f16 V64:$Rn))),
+  (FADDPv2i16p (FADDPv4f16 V64:$Rn, (v4f16 (IMPLICIT_DEF>;
+}
+def : Pat<(f32 (vecreduce_fadd (v4f32 V128:$Rn))),
+  (FADDPv2i32p
+(EXTRACT_SUBREG
+  (FADDPv4f32 V128:$Rn, (v4f32 (IMPLICIT_DEF))),
+ dsub))>;
+def : Pat<(f32 (vecreduce_fadd (v2f32 V64:$Rn))),
+  (FADDPv2i32p V64:$Rn)>;
+def : Pat<(f64 (vecreduce_fadd (v2f64 V128:$Rn))),
+  (FADDPv2i64p V128:$Rn)>;
+
 def : Pat<(v2i64 (AArch64saddv V128:$Rn)),
   (INSERT_SUBREG (v2i64 (IMPLICIT_DEF)), (ADDPv2i64p V128:$Rn), dsub)>;
 def : Pat<(v2i64 (AArch64uaddv V128:$Rn)),

diff  --git a/llvm/test/CodeGen/AArch64/vecreduce-fadd-legalization.ll 
b/llvm/test/CodeGen/AArch64/vecreduce-fadd-legalization.ll
index 69b9c3e22d7a..2b5fcd4b839a 100644
--- a/llvm/test/CodeGen/AArch64/vecreduce-fadd-legalization.ll
+++ b/llvm/test/CodeGen/AArch64/vecreduce-fadd-legalization.ll
@@ -51,8 +51,7 @@ define float @test_v3f32(<3 x float> %a) nounwind {
 ; CHECK-NEXT:mov w8, #-2147483648
 ; CHECK-NEXT:fmov s1, w8
 ; CHECK-NEXT:mov v0.s[3], v1.s[0]
-; CHECK-NEXT:ext v1.16b, v0.16b, v0.16b, #8
-; CHECK-NEXT:fadd v0.2s, v0.2s, v1.2s
+; CHECK-NEXT:faddp v0.4s, v0.4s, v0.4s
 ; CHECK-NEXT:faddp s0, v0.2s
 ; CHECK-NEXT:ret
   %b = call reassoc float @l

[llvm-branch-commits] [llvm] a7e3339 - [AArch64][SVE] Emit DWARF location expression for SVE stack objects.

2021-01-06 Thread Sander de Smalen via llvm-branch-commits

Author: Sander de Smalen
Date: 2021-01-06T09:40:53Z
New Revision: a7e3339f3b0eb71e43d44e6f59cc8db6a7b110bf

URL: 
https://github.com/llvm/llvm-project/commit/a7e3339f3b0eb71e43d44e6f59cc8db6a7b110bf
DIFF: 
https://github.com/llvm/llvm-project/commit/a7e3339f3b0eb71e43d44e6f59cc8db6a7b110bf.diff

LOG: [AArch64][SVE] Emit DWARF location expression for SVE stack objects.

Extend PEI to emit a DWARF expression for StackOffsets that have
a fixed and scalable component. This means the expression that needs
to be added is either:
   + offset
or:
   + offset + scalable_offset * scalereg

where for SVE, the scale reg is the Vector Granule Dwarf register, which
encodes the number of 64bit 'granules' in an SVE vector and which
the debugger can evaluate at runtime.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D90020

Added: 
llvm/test/CodeGen/AArch64/debug-info-sve-dbg-value.mir

Modified: 
llvm/include/llvm/CodeGen/TargetRegisterInfo.h
llvm/lib/CodeGen/PrologEpilogInserter.cpp
llvm/lib/CodeGen/TargetRegisterInfo.cpp
llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
llvm/lib/Target/AArch64/AArch64RegisterInfo.h

Removed: 




diff  --git a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h 
b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
index de2c1b069784..6f32729a1e83 100644
--- a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
@@ -34,6 +34,7 @@
 namespace llvm {
 
 class BitVector;
+class DIExpression;
 class LiveRegMatrix;
 class MachineFunction;
 class MachineInstr;
@@ -923,6 +924,15 @@ class TargetRegisterInfo : public MCRegisterInfo {
 llvm_unreachable("isFrameOffsetLegal does not exist on this target");
   }
 
+  /// Gets the DWARF expression opcodes for \p Offset.
+  virtual void getOffsetOpcodes(const StackOffset &Offset,
+SmallVectorImpl &Ops) const;
+
+  /// Prepends a DWARF expression for \p Offset to DIExpression \p Expr.
+  DIExpression *
+  prependOffsetExpression(const DIExpression *Expr, unsigned PrependFlags,
+  const StackOffset &Offset) const;
+
   /// Spill the register so it can be used by the register scavenger.
   /// Return true if the register was spilled, false otherwise.
   /// If this function does not spill the register, the scavenger

diff  --git a/llvm/lib/CodeGen/PrologEpilogInserter.cpp 
b/llvm/lib/CodeGen/PrologEpilogInserter.cpp
index 7c38b193a980..65b2165bf2a0 100644
--- a/llvm/lib/CodeGen/PrologEpilogInserter.cpp
+++ b/llvm/lib/CodeGen/PrologEpilogInserter.cpp
@@ -1211,8 +1211,6 @@ void PEI::replaceFrameIndices(MachineBasicBlock *BB, 
MachineFunction &MF,
 
 StackOffset Offset =
 TFI->getFrameIndexReference(MF, FrameIdx, Reg);
-assert(!Offset.getScalable() &&
-   "Frame offsets with a scalable component are not supported");
 MI.getOperand(0).ChangeToRegister(Reg, false /*isDef*/);
 MI.getOperand(0).setIsDebug();
 
@@ -1238,7 +1236,8 @@ void PEI::replaceFrameIndices(MachineBasicBlock *BB, 
MachineFunction &MF,
   // Make the DBG_VALUE direct.
   MI.getDebugOffset().ChangeToRegister(0, false);
 }
-DIExpr = DIExpression::prepend(DIExpr, PrependFlags, 
Offset.getFixed());
+
+DIExpr = TRI.prependOffsetExpression(DIExpr, PrependFlags, Offset);
 MI.getDebugExpressionOp().setMetadata(DIExpr);
 continue;
   }

diff  --git a/llvm/lib/CodeGen/TargetRegisterInfo.cpp 
b/llvm/lib/CodeGen/TargetRegisterInfo.cpp
index e89353c9ad27..4a190c9f50af 100644
--- a/llvm/lib/CodeGen/TargetRegisterInfo.cpp
+++ b/llvm/lib/CodeGen/TargetRegisterInfo.cpp
@@ -26,6 +26,7 @@
 #include "llvm/CodeGen/VirtRegMap.h"
 #include "llvm/Config/llvm-config.h"
 #include "llvm/IR/Attributes.h"
+#include "llvm/IR/DebugInfoMetadata.h"
 #include "llvm/IR/Function.h"
 #include "llvm/MC/MCRegisterInfo.h"
 #include "llvm/Support/CommandLine.h"
@@ -532,6 +533,31 @@ TargetRegisterInfo::lookThruCopyLike(Register SrcReg,
   }
 }
 
+void TargetRegisterInfo::getOffsetOpcodes(
+const StackOffset &Offset, SmallVectorImpl &Ops) const {
+  assert(!Offset.getScalable() && "Scalable offsets are not handled");
+  DIExpression::appendOffset(Ops, Offset.getFixed());
+}
+
+DIExpression *
+TargetRegisterInfo::prependOffsetExpression(const DIExpression *Expr,
+unsigned PrependFlags,
+const StackOffset &Offset) const {
+  assert((PrependFlags &
+  ~(DIExpression::DerefBefore | DIExpression::DerefAfter |
+DIExpression::StackValue | DIExpression::EntryValue)) == 0 &&
+ "Unsupported prepend flag");
+  SmallVector OffsetExpr;
+  if (PrependFlags & DIExpression::DerefBefore)
+OffsetExpr.push_back(dwarf::DW_OP_deref);
+  getOffsetOpcodes(Offset, OffsetExpr);
+  if (PrependFlags & D

[llvm-branch-commits] [llvm] e4cda13 - Fix test failure in a7e3339f3b0eb71e43d44e6f59cc8db6a7b110bf

2021-01-06 Thread Sander de Smalen via llvm-branch-commits

Author: Sander de Smalen
Date: 2021-01-06T10:43:48Z
New Revision: e4cda13d5a54a8c6366e4ca82d74265e68bbb3f5

URL: 
https://github.com/llvm/llvm-project/commit/e4cda13d5a54a8c6366e4ca82d74265e68bbb3f5
DIFF: 
https://github.com/llvm/llvm-project/commit/e4cda13d5a54a8c6366e4ca82d74265e68bbb3f5.diff

LOG: Fix test failure in a7e3339f3b0eb71e43d44e6f59cc8db6a7b110bf

Set the target-triple to aarch64 in debug-info-sve-dbg-value.mir
to avoid "'+sve' is not a recognized feature for this target"
diagnostic.

Added: 


Modified: 
llvm/test/CodeGen/AArch64/debug-info-sve-dbg-value.mir

Removed: 




diff  --git a/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-value.mir 
b/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-value.mir
index ffce40c9c4f4..84d34ce3d2ac 100644
--- a/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-value.mir
+++ b/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-value.mir
@@ -27,6 +27,7 @@
 --- |
   ; ModuleID = 'bla.mir'
   source_filename = "bla.mir"
+  target triple = "aarch64-unknown-linux-gnu"
   target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
 
   define void @foo() #0 !dbg !5 {



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 84a1120 - [LiveDebugValues] Handle spill locations with a fixed and scalable component.

2021-01-06 Thread Sander de Smalen via llvm-branch-commits

Author: Sander de Smalen
Date: 2021-01-06T11:30:13Z
New Revision: 84a1120943a651184bae507fed5d648fee381ae4

URL: 
https://github.com/llvm/llvm-project/commit/84a1120943a651184bae507fed5d648fee381ae4
DIFF: 
https://github.com/llvm/llvm-project/commit/84a1120943a651184bae507fed5d648fee381ae4.diff

LOG: [LiveDebugValues] Handle spill locations with a fixed and scalable 
component.

This patch fixes the two LiveDebugValues implementations
(InstrRef/VarLoc)Based to handle cases where the StackOffset contains
both a fixed and scalable component.

This depends on the `TargetRegisterInfo::prependOffsetExpression` being
added in D90020. Feel free to leave comments on that patch if you have them.

Reviewed By: djtodoro, jmorse

Differential Revision: https://reviews.llvm.org/D90046

Added: 
llvm/test/CodeGen/AArch64/live-debugvalues-sve.mir

Modified: 
llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp
llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp

Removed: 




diff  --git a/llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp 
b/llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp
index 04ead18cc3de2..b6f46daf8bba9 100644
--- a/llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp
+++ b/llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp
@@ -182,6 +182,7 @@
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/Compiler.h"
 #include "llvm/Support/Debug.h"
+#include "llvm/Support/TypeSize.h"
 #include "llvm/Support/raw_ostream.h"
 #include 
 #include 
@@ -221,14 +222,16 @@ namespace {
 // an offset.
 struct SpillLoc {
   unsigned SpillBase;
-  int SpillOffset;
+  StackOffset SpillOffset;
   bool operator==(const SpillLoc &Other) const {
-return std::tie(SpillBase, SpillOffset) ==
-   std::tie(Other.SpillBase, Other.SpillOffset);
+return std::make_pair(SpillBase, SpillOffset) ==
+   std::make_pair(Other.SpillBase, Other.SpillOffset);
   }
   bool operator<(const SpillLoc &Other) const {
-return std::tie(SpillBase, SpillOffset) <
-   std::tie(Other.SpillBase, Other.SpillOffset);
+return std::make_tuple(SpillBase, SpillOffset.getFixed(),
+SpillOffset.getScalable()) <
+   std::make_tuple(Other.SpillBase, Other.SpillOffset.getFixed(),
+Other.SpillOffset.getScalable());
   }
 };
 
@@ -769,8 +772,10 @@ class MLocTracker {
 } else if (LocIdxToLocID[*MLoc] >= NumRegs) {
   unsigned LocID = LocIdxToLocID[*MLoc];
   const SpillLoc &Spill = SpillLocs[LocID - NumRegs + 1];
-  Expr = DIExpression::prepend(Expr, DIExpression::ApplyOffset,
-   Spill.SpillOffset);
+
+  auto *TRI = MF.getSubtarget().getRegisterInfo();
+  Expr = TRI->prependOffsetExpression(Expr, DIExpression::ApplyOffset,
+  Spill.SpillOffset);
   unsigned Base = Spill.SpillBase;
   MIB.addReg(Base, RegState::Debug);
   MIB.addImm(0);
@@ -1579,9 +1584,7 @@ InstrRefBasedLDV::extractSpillBaseRegAndOffset(const 
MachineInstr &MI) {
   const MachineBasicBlock *MBB = MI.getParent();
   Register Reg;
   StackOffset Offset = TFI->getFrameIndexReference(*MBB->getParent(), FI, Reg);
-  assert(!Offset.getScalable() &&
- "Frame offsets with a scalable component are not supported");
-  return {Reg, static_cast(Offset.getFixed())};
+  return {Reg, Offset};
 }
 
 /// End all previous ranges related to @MI and start a new range from @MI

diff  --git a/llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp 
b/llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
index ed7f04e571acc..4811b80467973 100644
--- a/llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
+++ b/llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
@@ -145,6 +145,7 @@
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/Compiler.h"
 #include "llvm/Support/Debug.h"
+#include "llvm/Support/TypeSize.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/Target/TargetMachine.h"
 #include 
@@ -292,7 +293,7 @@ class VarLocBasedLDV : public LDVImpl {
 // register and an offset.
 struct SpillLoc {
   unsigned SpillBase;
-  int SpillOffset;
+  StackOffset SpillOffset;
   bool operator==(const SpillLoc &Other) const {
 return SpillBase == Other.SpillBase && SpillOffset == 
Other.SpillOffset;
   }
@@ -323,21 +324,20 @@ class VarLocBasedLDV : public LDVImpl {
 
 /// The value location. Stored separately to avoid repeatedly
 /// extracting it from MI.
-union {
+union LocUnion {
   uint64_t RegNo;
   SpillLoc SpillLocation;
   uint64_t Hash;
   int64_t Immediate;
   const ConstantFP *FPImm;
   const ConstantInt *CImm;
+  LocUnion() : Hash(0) {}
 } Loc;
 
 VarLoc(const MachineInstr &MI, LexicalScopes &LS)
 : Var(MI.getDebugVariable(), MI.getDebugExpression(),
   MI.getDebugLoc()->getInlinedAt()),

[llvm-branch-commits] [llvm] aa280c9 - [AArch64][SVE] Emit DWARF location expr for SVE (dbg.declare)

2021-01-06 Thread Sander de Smalen via llvm-branch-commits

Author: Sander de Smalen
Date: 2021-01-06T11:45:05Z
New Revision: aa280c99f708dca9dea96bc9070d6194d2622529

URL: 
https://github.com/llvm/llvm-project/commit/aa280c99f708dca9dea96bc9070d6194d2622529
DIFF: 
https://github.com/llvm/llvm-project/commit/aa280c99f708dca9dea96bc9070d6194d2622529.diff

LOG: [AArch64][SVE] Emit DWARF location expr for SVE (dbg.declare)

When using dbg.declare, the debug-info is generated from a list of
locals rather than through DBG_VALUE instructions in the MIR.
This patch is different from D90020 because it emits the DWARF
location expressions from that list of locals directly.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D90044

Added: 
llvm/test/CodeGen/AArch64/debug-info-sve-dbg-declare.mir

Modified: 
llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp

Removed: 




diff  --git a/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp 
b/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
index 02791f2280d2..ea279e4914b0 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
@@ -739,11 +739,10 @@ DIE *DwarfCompileUnit::constructVariableDIEImpl(const 
DbgVariable &DV,
 TFI->getFrameIndexReference(*Asm->MF, Fragment.FI, FrameReg);
 DwarfExpr.addFragmentOffset(Expr);
 
-assert(!Offset.getScalable() &&
-   "Frame offsets with a scalable component are not supported");
-
+auto *TRI = Asm->MF->getSubtarget().getRegisterInfo();
 SmallVector Ops;
-DIExpression::appendOffset(Ops, Offset.getFixed());
+TRI->getOffsetOpcodes(Offset, Ops);
+
 // According to
 // 
https://docs.nvidia.com/cuda/archive/10.0/ptx-writers-guide-to-interoperability/index.html#cuda-specific-dwarf
 // cuda-gdb requires DW_AT_address_class for all variables to be able to

diff  --git a/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-declare.mir 
b/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-declare.mir
new file mode 100644
index ..39b11ef7bfea
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-declare.mir
@@ -0,0 +1,222 @@
+# RUN: llc -o %t -filetype=obj -start-before=prologepilog %s
+# RUN: llvm-dwarfdump --name="z0" %t | FileCheck %s --check-prefix=CHECKZ0
+# RUN: llvm-dwarfdump --name="z1" %t | FileCheck %s --check-prefix=CHECKZ1
+# RUN: llvm-dwarfdump --name="p0" %t | FileCheck %s --check-prefix=CHECKP0
+# RUN: llvm-dwarfdump --name="p1" %t | FileCheck %s --check-prefix=CHECKP1
+# RUN: llvm-dwarfdump --name="localv0" %t | FileCheck %s 
--check-prefix=CHECKLV0
+# RUN: llvm-dwarfdump --name="localv1" %t | FileCheck %s 
--check-prefix=CHECKLV1
+# RUN: llvm-dwarfdump --name="localp0" %t | FileCheck %s 
--check-prefix=CHECKLP0
+# RUN: llvm-dwarfdump --name="localp1" %t | FileCheck %s 
--check-prefix=CHECKLP1
+#
+# CHECKZ0:   DW_AT_location(DW_OP_fbreg +0, DW_OP_lit8, DW_OP_bregx 
VG+0, DW_OP_mul, DW_OP_minus)
+# CHECKZ0-NEXT:  DW_AT_name("z0")
+# CHECKZ1:   DW_AT_location(DW_OP_fbreg +0, DW_OP_lit16, DW_OP_bregx 
VG+0, DW_OP_mul, DW_OP_minus)
+# CHECKZ1-NEXT:  DW_AT_name("z1")
+# CHECKP0:   DW_AT_location(DW_OP_fbreg +0, DW_OP_lit17, DW_OP_bregx 
VG+0, DW_OP_mul, DW_OP_minus)
+# CHECKP0-NEXT:  DW_AT_name("p0")
+# CHECKP1:   DW_AT_location(DW_OP_fbreg +0, DW_OP_lit18, DW_OP_bregx 
VG+0, DW_OP_mul, DW_OP_minus)
+# CHECKP1-NEXT:  DW_AT_name("p1")
+# CHECKLV0:  DW_AT_location(DW_OP_fbreg +0, DW_OP_constu 0x20, 
DW_OP_bregx VG+0, DW_OP_mul, DW_OP_minus)
+# CHECKLV0-NEXT: DW_AT_name("localv0")
+# CHECKLV1:  DW_AT_location(DW_OP_fbreg +0, DW_OP_constu 0x28, 
DW_OP_bregx VG+0, DW_OP_mul, DW_OP_minus)
+# CHECKLV1-NEXT: DW_AT_name("localv1")
+# CHECKLP0:  DW_AT_location(DW_OP_fbreg +0, DW_OP_constu 0x29, 
DW_OP_bregx VG+0, DW_OP_mul, DW_OP_minus)
+# CHECKLP0-NEXT: DW_AT_name("localp0")
+# CHECKLP1:  DW_AT_location(DW_OP_fbreg +0, DW_OP_constu 0x2a, 
DW_OP_bregx VG+0, DW_OP_mul, DW_OP_minus)
+# CHECKLP1-NEXT: DW_AT_name("localp1")
+--- |
+  ; ModuleID = 't.c'
+  source_filename = "t.c"
+  target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
+  target triple = "aarch64-unknown-linux-gnu"
+
+  ; Function Attrs: noinline nounwind optnone
+  define dso_local  @foo( %z0,  %z1,  %p0,  %p1, i32 %w0) #0 !dbg 
!11 {
+  entry:
+%z0.addr = alloca , align 16
+%z1.addr = alloca , align 16
+%p0.addr = alloca , align 2
+%p1.addr = alloca , align 2
+%w0.addr = alloca i32, align 4
+%local_gpr0 = alloca i32, align 4
+%localv0 = alloca , align 16
+%localv1 = alloca , align 16
+%localp0 = alloca , align 2
+%localp1 = alloca , align 2
+store  %z0, * %z0.addr, align 16
+call void @llvm.dbg.declare(metadata * %z0.addr, 
metadata !29, metadata !DIExpression()), !dbg !30
+store  %z1, * %z1.addr, align 16
+call 

[llvm-branch-commits] [llvm] c8a914d - [LiveDebugValues] Fix comparison operator in VarLocBasedImpl

2021-01-12 Thread Sander de Smalen via llvm-branch-commits

Author: Sander de Smalen
Date: 2021-01-12T08:44:58Z
New Revision: c8a914db5c60dbeb5b638f30a9915855a67805f7

URL: 
https://github.com/llvm/llvm-project/commit/c8a914db5c60dbeb5b638f30a9915855a67805f7
DIFF: 
https://github.com/llvm/llvm-project/commit/c8a914db5c60dbeb5b638f30a9915855a67805f7.diff

LOG: [LiveDebugValues] Fix comparison operator in VarLocBasedImpl

The issue was introduced in commit rG84a1120943a651184bae507fed5d648fee381ae4
and would cause a VarLoc's StackOffset to be compared with its own, instead of
the StackOffset from the other VarLoc. This patch fixes that.

Added: 


Modified: 
llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp

Removed: 




diff  --git a/llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp 
b/llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
index 4811b8046797..e2daa46fe6b9 100644
--- a/llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
+++ b/llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
@@ -572,8 +572,9 @@ class VarLocBasedLDV : public LDVImpl {
Expr) <
std::make_tuple(
Other.Var, Other.Kind, Other.Loc.SpillLocation.SpillBase,
-   Loc.SpillLocation.SpillOffset.getFixed(),
-   Loc.SpillLocation.SpillOffset.getScalable(), Other.Expr);
+   Other.Loc.SpillLocation.SpillOffset.getFixed(),
+   Other.Loc.SpillLocation.SpillOffset.getScalable(),
+   Other.Expr);
   case RegisterKind:
   case ImmediateKind:
   case EntryValueKind:



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 329fda3 - NFC: Mention auto-vec support for SVE in release notes.

2022-03-15 Thread Sander de Smalen via llvm-branch-commits

Author: Sander de Smalen
Date: 2022-03-14T09:44:55Z
New Revision: 329fda39c507e8740978d10458451dcdb21563be

URL: 
https://github.com/llvm/llvm-project/commit/329fda39c507e8740978d10458451dcdb21563be
DIFF: 
https://github.com/llvm/llvm-project/commit/329fda39c507e8740978d10458451dcdb21563be.diff

LOG: NFC: Mention auto-vec support for SVE in release notes.

Added: 


Modified: 
llvm/docs/ReleaseNotes.rst

Removed: 




diff  --git a/llvm/docs/ReleaseNotes.rst b/llvm/docs/ReleaseNotes.rst
index e8934f79181a7..b2d8d8c2640e2 100644
--- a/llvm/docs/ReleaseNotes.rst
+++ b/llvm/docs/ReleaseNotes.rst
@@ -82,6 +82,7 @@ Changes to the AArch64 Backend
   "tune-cpu" attribute is absent it tunes according to the "target-cpu".
 * Fixed relocations against temporary symbols (e.g. in jump tables and
   constant pools) in large COFF object files.
+* Auto-vectorization now targets SVE by default when available.
 
 Changes to the ARM Backend
 --



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] adc3714 - [LoopVectorizer] NFC: Remove unnecessary asserts that VF cannot be scalable.

2020-12-09 Thread Sander de Smalen via llvm-branch-commits

Author: Sander de Smalen
Date: 2020-12-09T11:25:21Z
New Revision: adc37145dec9cadf76af05326150ed22a3cc2fdd

URL: 
https://github.com/llvm/llvm-project/commit/adc37145dec9cadf76af05326150ed22a3cc2fdd
DIFF: 
https://github.com/llvm/llvm-project/commit/adc37145dec9cadf76af05326150ed22a3cc2fdd.diff

LOG: [LoopVectorizer] NFC: Remove unnecessary asserts that VF cannot be 
scalable.

This patch removes a number of asserts that VF is not scalable, even though
the code where this assert lives does nothing that prevents VF being scalable.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D91060

Added: 


Modified: 
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

Removed: 




diff  --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 6ba14e942ff8..f504afd1ffc4 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -367,7 +367,6 @@ static Type *getMemInstValueType(Value *I) {
 /// type is irregular if its allocated size doesn't equal the store size of an
 /// element of the corresponding vector type at the given vectorization factor.
 static bool hasIrregularType(Type *Ty, const DataLayout &DL, ElementCount VF) {
-  assert(!VF.isScalable() && "scalable vectors not yet supported.");
   // Determine if an array of VF elements of type Ty is "bitcast compatible"
   // with a  vector.
   if (VF.isVector()) {
@@ -1387,9 +1386,7 @@ class LoopVectorizationCostModel {
   /// width \p VF. Return CM_Unknown if this instruction did not pass
   /// through the cost modeling.
   InstWidening getWideningDecision(Instruction *I, ElementCount VF) {
-assert(!VF.isScalable() && "scalable vectors not yet supported.");
-assert(VF.isVector() && "Expected VF >=2");
-
+assert(VF.isVector() && "Expected VF to be a vector VF");
 // Cost model is not run in the VPlan-native path - return conservative
 // result until this changes.
 if (EnableVPlanNativePath)
@@ -3902,8 +3899,10 @@ void InnerLoopVectorizer::fixVectorizedLoop() {
   // profile is not inherently precise anyway. Note also possible bypass of
   // vector code caused by legality checks is ignored, assigning all the weight
   // to the vector loop, optimistically.
-  assert(!VF.isScalable() &&
- "cannot use scalable ElementCount to determine unroll factor");
+  //
+  // For scalable vectorization we can't know at compile time how many 
iterations
+  // of the loop are handled in one vector iteration, so instead assume a 
pessimistic
+  // vscale of '1'.
   setProfileInfoAfterUnrolling(
   LI->getLoopFor(LoopScalarBody), LI->getLoopFor(LoopVectorBody),
   LI->getLoopFor(LoopScalarBody), VF.getKnownMinValue() * UF);
@@ -4709,7 +4708,6 @@ static bool mayDivideByZero(Instruction &I) {
 void InnerLoopVectorizer::widenInstruction(Instruction &I, VPValue *Def,
VPUser &User,
VPTransformState &State) {
-  assert(!VF.isScalable() && "scalable vectors not yet supported.");
   switch (I.getOpcode()) {
   case Instruction::Call:
   case Instruction::Br:
@@ -4797,7 +4795,6 @@ void InnerLoopVectorizer::widenInstruction(Instruction 
&I, VPValue *Def,
 setDebugLocFromInst(Builder, CI);
 
 /// Vectorize casts.
-assert(!VF.isScalable() && "VF is assumed to be non scalable.");
 Type *DestTy =
 (VF.isScalar()) ? CI->getType() : VectorType::get(CI->getType(), VF);
 
@@ -5099,7 +5096,6 @@ void 
LoopVectorizationCostModel::collectLoopScalars(ElementCount VF) {
 
 bool LoopVectorizationCostModel::isScalarWithPredication(Instruction *I,
  ElementCount VF) {
-  assert(!VF.isScalable() && "scalable vectors not yet supported.");
   if (!blockNeedsPredication(I->getParent()))
 return false;
   switch(I->getOpcode()) {
@@ -6420,7 +6416,6 @@ int LoopVectorizationCostModel::computePredInstDiscount(
 
 LoopVectorizationCostModel::VectorizationCostTy
 LoopVectorizationCostModel::expectedCost(ElementCount VF) {
-  assert(!VF.isScalable() && "scalable vectors not yet supported.");
   VectorizationCostTy Cost;
 
   // For each block.
@@ -7935,7 +7930,6 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, VFRange 
&Range,
  "Must be called with either a load or store");
 
   auto willWiden = [&](ElementCount VF) -> bool {
-assert(!VF.isScalable() && "unexpected scalable ElementCount");
 if (VF.isScalar())
   return false;
 LoopVectorizationCostModel::InstWidening Decision =



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] d568cff - [LoopVectorizer][SVE] Vectorize a simple loop with with a scalable VF.

2020-12-09 Thread Sander de Smalen via llvm-branch-commits

Author: Sander de Smalen
Date: 2020-12-09T11:25:21Z
New Revision: d568cff696e8fb89ce1b040561c037412767af60

URL: 
https://github.com/llvm/llvm-project/commit/d568cff696e8fb89ce1b040561c037412767af60
DIFF: 
https://github.com/llvm/llvm-project/commit/d568cff696e8fb89ce1b040561c037412767af60.diff

LOG: [LoopVectorizer][SVE] Vectorize a simple loop with with a scalable VF.

* Steps are scaled by `vscale`, a runtime value.
* Changes to circumvent the cost-model for now (temporary)
  so that the cost-model can be implemented separately.

This can vectorize the following loop [1]:

   void loop(int N, double *a, double *b) {
 #pragma clang loop vectorize_width(4, scalable)
 for (int i = 0; i < N; i++) {
   a[i] = b[i] + 1.0;
 }
   }

[1] This source-level example is based on the pragma proposed
separately in D89031. This patch only implements the LLVM part.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D91077

Added: 

llvm/test/Transforms/LoopVectorize/scalable-loop-unpredicated-body-scalar-tail.ll

Modified: 
llvm/include/llvm/IR/IRBuilder.h
llvm/lib/IR/IRBuilder.cpp
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
llvm/lib/Transforms/Vectorize/VPlan.h
llvm/test/Transforms/LoopVectorize/metadata-width.ll

Removed: 




diff  --git a/llvm/include/llvm/IR/IRBuilder.h 
b/llvm/include/llvm/IR/IRBuilder.h
index db215094a7e49..c2b3446d159f2 100644
--- a/llvm/include/llvm/IR/IRBuilder.h
+++ b/llvm/include/llvm/IR/IRBuilder.h
@@ -879,6 +879,10 @@ class IRBuilderBase {
  Type *ResultType,
  const Twine &Name = "");
 
+  /// Create a call to llvm.vscale, multiplied by \p Scaling. The type of 
VScale
+  /// will be the same type as that of \p Scaling.
+  Value *CreateVScale(Constant *Scaling, const Twine &Name = "");
+
   /// Create a call to intrinsic \p ID with 1 operand which is mangled on its
   /// type.
   CallInst *CreateUnaryIntrinsic(Intrinsic::ID ID, Value *V,

diff  --git a/llvm/lib/IR/IRBuilder.cpp b/llvm/lib/IR/IRBuilder.cpp
index c0e4451f52003..f936f5756b6f0 100644
--- a/llvm/lib/IR/IRBuilder.cpp
+++ b/llvm/lib/IR/IRBuilder.cpp
@@ -80,6 +80,17 @@ static CallInst *createCallHelper(Function *Callee, 
ArrayRef Ops,
   return CI;
 }
 
+Value *IRBuilderBase::CreateVScale(Constant *Scaling, const Twine &Name) {
+  Module *M = GetInsertBlock()->getParent()->getParent();
+  assert(isa(Scaling) && "Expected constant integer");
+  Function *TheFn =
+  Intrinsic::getDeclaration(M, Intrinsic::vscale, {Scaling->getType()});
+  CallInst *CI = createCallHelper(TheFn, {}, this, Name);
+  return cast(Scaling)->getSExtValue() == 1
+ ? CI
+ : CreateMul(CI, Scaling);
+}
+
 CallInst *IRBuilderBase::CreateMemSet(Value *Ptr, Value *Val, Value *Size,
   MaybeAlign Align, bool isVolatile,
   MDNode *TBAATag, MDNode *ScopeTag,

diff  --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index f504afd1ffc41..a91fb988badf6 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1121,6 +1121,15 @@ static OptimizationRemarkAnalysis createLVAnalysis(const 
char *PassName,
   return R;
 }
 
+/// Return a value for Step multiplied by VF.
+static Value *createStepForVF(IRBuilder<> &B, Constant *Step, ElementCount VF) 
{
+  assert(isa(Step) && "Expected an integer step");
+  Constant *StepVal = ConstantInt::get(
+  Step->getType(),
+  cast(Step)->getSExtValue() * VF.getKnownMinValue());
+  return VF.isScalable() ? B.CreateVScale(StepVal) : StepVal;
+}
+
 namespace llvm {
 
 void reportVectorizationFailure(const StringRef DebugMsg,
@@ -2277,8 +2286,6 @@ void InnerLoopVectorizer::buildScalarSteps(Value 
*ScalarIV, Value *Step,
const InductionDescriptor &ID) {
   // We shouldn't have to build scalar steps if we aren't vectorizing.
   assert(VF.isVector() && "VF should be greater than one");
-  assert(!VF.isScalable() &&
- "the code below assumes a fixed number of elements at compile time");
   // Get the value type and ensure it and the step have the same integer type.
   Type *ScalarIVTy = ScalarIV->getType()->getScalarType();
   assert(ScalarIVTy == Step->getType() &&
@@ -2303,11 +2310,24 @@ void InnerLoopVectorizer::buildScalarSteps(Value 
*ScalarIV, Value *Step,
   Cost->isUniformAfterVectorization(cast(EntryVal), VF)
   ? 1
   : VF.getKnownMinValue();
+  assert((!VF.isScalable() || Lanes == 1) &&
+ "Should never scalarize a scalable vector");
   // Compute the scalar steps and save the results in VectorLoopValueMap.
   for (unsigned Part = 0; Part < UF; ++Part) {
 for (unsigned Lane = 0; Lane < Lanes; ++Lane) {
-  auto *StartIdx = g

[llvm-branch-commits] [llvm] [TableGen] Fix calculation of Lanemask for RCs with artificial subregs. (PR #114392)

2024-11-04 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm updated 
https://github.com/llvm/llvm-project/pull/114392

>From 303e1c87e0ea835d5892afaa04c9e72d2d1778f4 Mon Sep 17 00:00:00 2001
From: Sander de Smalen 
Date: Thu, 31 Oct 2024 09:54:52 +
Subject: [PATCH] [TableGen] Fix calculation of Lanemask for RCs with
 artificial subregs.

TableGen builds up a map of "SubRegIdx -> Subclass" where Subclass is
the largest class where all registers have SubRegIdx as a
sub-register. When SubRegIdx (vis-a-vis the sub-register) is
artificial it should still include it in the map. This map is used in
various places, including in the calculation of the Lanemask of a
register class, which otherwise calculates an incorrect lanemask.
---
 llvm/test/TableGen/ArtificialSubregs.td |  4 ++--
 llvm/utils/TableGen/Common/CodeGenRegisters.cpp | 10 --
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/llvm/test/TableGen/ArtificialSubregs.td 
b/llvm/test/TableGen/ArtificialSubregs.td
index dbac129fb2463b..b8d5686dee51be 100644
--- a/llvm/test/TableGen/ArtificialSubregs.td
+++ b/llvm/test/TableGen/ArtificialSubregs.td
@@ -104,7 +104,7 @@ def TestTarget : Target;
 // CHECK:  SuperClasses:
 //
 // CHECK:  RegisterClass DRegs:
-// CHECK:  LaneMask: 0004
+// CHECK:  LaneMask: 0044
 // CHECK:  HasDisjunctSubRegs: 1
 // CHECK:  CoveredBySubRegs: 1
 // CHECK:  Regs: D0 D1 D2
@@ -112,7 +112,7 @@ def TestTarget : Target;
 // CHECK:  SuperClasses:
 //
 // CHECK:  RegisterClass QRegs:
-// CHECK:  LaneMask: 0044
+// CHECK:  LaneMask: 0045
 // CHECK:  HasDisjunctSubRegs: 1
 // CHECK:  CoveredBySubRegs: 1
 // CHECK:  Regs: Q0 Q1 Q2
diff --git a/llvm/utils/TableGen/Common/CodeGenRegisters.cpp 
b/llvm/utils/TableGen/Common/CodeGenRegisters.cpp
index 2bf6a3740c486b..2dbee94d7e5406 100644
--- a/llvm/utils/TableGen/Common/CodeGenRegisters.cpp
+++ b/llvm/utils/TableGen/Common/CodeGenRegisters.cpp
@@ -2300,10 +2300,8 @@ void 
CodeGenRegBank::inferSubClassWithSubReg(CodeGenRegisterClass *RC) {
 if (R->Artificial)
   continue;
 const CodeGenRegister::SubRegMap &SRM = R->getSubRegs();
-for (auto I : SRM) {
-  if (!I.first->Artificial)
-SRSets[I.first].push_back(R);
-}
+for (auto I : SRM)
+  SRSets[I.first].push_back(R);
   }
 
   for (auto I : SRSets)
@@ -2312,8 +2310,6 @@ void 
CodeGenRegBank::inferSubClassWithSubReg(CodeGenRegisterClass *RC) {
   // Find matching classes for all SRSets entries.  Iterate in SubRegIndex
   // numerical order to visit synthetic indices last.
   for (const auto &SubIdx : SubRegIndices) {
-if (SubIdx.Artificial)
-  continue;
 SubReg2SetMap::const_iterator I = SRSets.find(&SubIdx);
 // Unsupported SubRegIndex. Skip it.
 if (I == SRSets.end())
@@ -2323,6 +2319,8 @@ void 
CodeGenRegBank::inferSubClassWithSubReg(CodeGenRegisterClass *RC) {
   RC->setSubClassWithSubReg(&SubIdx, RC);
   continue;
 }
+if (SubIdx.Artificial)
+  continue;
 // This is a real subset.  See if we have a matching class.
 CodeGenRegisterClass *SubRC = getOrCreateSubClass(
 RC, &I->second, RC->getName() + "_with_" + I->first->getName());

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Define high bits of FPR and GPR registers. (PR #114263)

2024-11-04 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm closed 
https://github.com/llvm/llvm-project/pull/114263
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Define high bits of FPR and GPR registers. (PR #114263)

2024-11-04 Thread Sander de Smalen via llvm-branch-commits

sdesmalen-arm wrote:

Trying to reopen..

https://github.com/llvm/llvm-project/pull/114263
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Define high bits of FPR and GPR registers. (PR #114263)

2024-11-04 Thread Sander de Smalen via llvm-branch-commits

sdesmalen-arm wrote:

I think I need to create a new PR for this as Github doesn't allow me to reopen 
and choose a different branch to merge into.

https://github.com/llvm/llvm-project/pull/114263
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Define high bits of FPR and GPR registers. (PR #114263)

2024-11-04 Thread Sander de Smalen via llvm-branch-commits

sdesmalen-arm wrote:

It wasn't, but I also didn't realise that I closed it. Could Github have done 
this automatically after the branch it was based of was deleted? (I was about 
to push the rebased branch of this PR after merging #114391 and #114392)

https://github.com/llvm/llvm-project/pull/114263
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Define high bits of FPR and GPR registers. (PR #114263)

2024-10-31 Thread Sander de Smalen via llvm-branch-commits


@@ -424,6 +424,58 @@ AArch64RegisterInfo::explainReservedReg(const 
MachineFunction &MF,
   return {};
 }
 
+static SmallVector ReservedHi = {

sdesmalen-arm wrote:

Without marking the registers as reserved, then for the example below:
```
---
name:sv2i64
tracksRegLiveness: true
body: |
  bb.0.entry:
liveins: $q0, $q1

%0:fpr128 = COPY $q0
%1:fpr128 = COPY $q1
%35:gpr64 = COPY %0.dsub
%36:gpr64 = COPY %1.dsub
%9:gpr64 = SDIVXr %35, %36
%37:gpr64 = UMOVvi64 %0, 1
%38:gpr64 = UMOVvi64 %1, 1
%10:gpr64 = SDIVXr %37, %38
%19:fpr128 = INSvi64gpr undef %19, 0, %9
%19:fpr128 = INSvi64gpr %19, 1, %10
%39:gpr64 = COPY %19.dsub
%24:gpr64 = MADDXrrr %39, %36, $xzr
%41:gpr64 = UMOVvi64 %19, 1
%25:gpr64 = MADDXrrr %41, %38, $xzr
%34:fpr128 = INSvi64gpr undef %34, 0, %24
%34:fpr128 = INSvi64gpr %34, 1, %25
%2:fpr128 = SUBv2i64 %0, %34
$q0 = COPY %2
RET_ReallyLR implicit $q0
...
```

When I run this with:
```
llc -global-isel -verify-machineinstrs -run-pass=machine-scheduler
```

It fails with:
```
Use of $xzr does not have a corresponding definition on every path:
216r %10:gpr64 = MADDXrrr %9:gpr64, %3:gpr64, $xzr
LLVM ERROR: Use not jointly dominated by defs.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and 
include the crash backtrace.
Stack dump:
0.  Program arguments: ./bin/llc -global-isel -verify-machineinstrs 
-run-pass=machine-scheduler /tmp/t.mir -o -
1.  Running pass 'Function Pass Manager' on module '/tmp/t.mir'.
2.  Running pass 'Machine Instruction Scheduler' on function '@sv2i64'
 ...
 #8 0x80062b7c llvm::LiveRangeCalc::findReachingDefs(llvm::LiveRange&, 
llvm::MachineBasicBlock&, llvm::SlotIndex, unsigned int, 
llvm::ArrayRef)
 #9 0x80063e94 llvm::LiveRangeCalc::extend(llvm::LiveRange&, 
llvm::SlotIndex, unsigned int, llvm::ArrayRef)
#10 0x80064a18 llvm::LiveIntervalCalc::extendToUses(llvm::LiveRange&, 
llvm::Register, llvm::LaneBitmask, llvm::LiveInterval*)
#11 0x8003e82c 
llvm::LiveIntervals::computeRegUnitRange(llvm::LiveRange&, unsigned int)
#12 0x80044cdc 
llvm::LiveIntervals::HMEditor::updateAllRanges(llvm::MachineInstr*)
#13 0x8004848c llvm::LiveIntervals::handleMove(llvm::MachineInstr&, 
bool)
#14 0x801f44ec 
llvm::ScheduleDAGMI::moveInstruction(llvm::MachineInstr*, 
llvm::MachineInstrBundleIterator)
#15 0x801fdb58 llvm::ScheduleDAGMILive::scheduleMI(llvm::SUnit*, bool)
#16 0x8020b214 llvm::ScheduleDAGMILive::schedule()
#17 0x801f0934 (anonymous 
namespace)::MachineSchedulerBase::scheduleRegions(llvm::ScheduleDAGInstrs&, 
bool) (.isra.0) MachineScheduler.cpp:0:0
```

https://github.com/llvm/llvm-project/pull/114263
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Define high bits of FPR and GPR registers. (PR #114263)

2024-10-31 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm edited 
https://github.com/llvm/llvm-project/pull/114263
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Define high bits of FPR and GPR registers. (PR #114263)

2024-11-01 Thread Sander de Smalen via llvm-branch-commits


@@ -424,6 +424,58 @@ AArch64RegisterInfo::explainReservedReg(const 
MachineFunction &MF,
   return {};
 }
 
+static SmallVector ReservedHi = {

sdesmalen-arm wrote:

I don't think there is a bug; the code for moving an instruction goes through 
the list of operands to update the register's liverange. For each physreg it 
then goes through the regunits to calculate/update the liverange for that 
regunit, but only if the regunit is not reserved.

The code that determines if the register is reserved says:
```
// A register unit is considered reserved if all its roots and all their
// super registers are reserved.
```
Without this change to AArch64RegisterInfo.cpp, WZR and XZR are marked as 
reserved, but WZR_HI isn't (because WZR_HI is a sibling of WZR, and 
`markSuperRegs` marks only XZR as reserved), and so `IsReserved` is `false` for 
the WZR_HI regunit.

Why this doesn't fail for AMDGPU I don't know, perhaps these registers are 
always virtual and they never go down this path.

https://github.com/llvm/llvm-project/pull/114263
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Cherry pick f314e12 into release/19.x (PR #117695)

2024-12-02 Thread Sander de Smalen via llvm-branch-commits

sdesmalen-arm wrote:

> Hi! How important are these backports for the release branch? Are they 
> bugfixes? What's the risk of them?

Hi @tru, these are all bugfixes for things that would somehow lead to a 
compilation failure when targeting AArch64 SME instructions. I don't see any 
risks in cherry-picking them onto the release branch.

https://github.com/llvm/llvm-project/pull/117695
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Cherry pick f314e12 into release/19.x (PR #117695)

2024-11-26 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm created 
https://github.com/llvm/llvm-project/pull/117695

f314e12 uses `requiresSaveVG` which was introduced in 334a366ba792, which is 
also missing from the release/19.x branch.
I figured it made sense to cherry-pick that one as well.

>From de526e5893e901c350fc9bd6d8013d7d1dbd42c6 Mon Sep 17 00:00:00 2001
From: Amara Emerson 
Date: Tue, 13 Aug 2024 00:39:14 -0700
Subject: [PATCH 1/2] [AArch64][Darwin][SME] Don't try to save VG to the stack
 for unwinding.

On Darwin we don't have any hardware that has SVE support, only SME.
Therefore we don't need to save VG for unwinders and can safely omit it.

This also fixes crashes introduced since this feature landed since Darwin's
compact unwind code can't handle the presence of VG anyway.

rdar://131072344
---
 .../Target/AArch64/AArch64FrameLowering.cpp   |  24 ++-
 .../Target/AArch64/AArch64ISelLowering.cpp|  19 ++-
 .../CodeGen/AArch64/sme-darwin-no-sve-vg.ll   | 161 ++
 3 files changed, 189 insertions(+), 15 deletions(-)
 create mode 100644 llvm/test/CodeGen/AArch64/sme-darwin-no-sve-vg.ll

diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
index 87e057a468afd6..83d9dd17259733 100644
--- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -1394,6 +1394,18 @@ bool requiresGetVGCall(MachineFunction &MF) {
  !MF.getSubtarget().hasSVE();
 }
 
+static bool requiresSaveVG(MachineFunction &MF) {
+  AArch64FunctionInfo *AFI = MF.getInfo();
+  // For Darwin platforms we don't save VG for non-SVE functions, even if SME
+  // is enabled with streaming mode changes.
+  if (!AFI->hasStreamingModeChanges())
+return false;
+  auto &ST = MF.getSubtarget();
+  if (ST.isTargetDarwin())
+return ST.hasSVE();
+  return true;
+}
+
 bool isVGInstruction(MachineBasicBlock::iterator MBBI) {
   unsigned Opc = MBBI->getOpcode();
   if (Opc == AArch64::CNTD_XPiI || Opc == AArch64::RDSVLI_XI ||
@@ -1430,8 +1442,7 @@ static MachineBasicBlock::iterator 
convertCalleeSaveRestoreToSPPrePostIncDec(
   // functions, we need to do this for both the streaming and non-streaming
   // vector length. Move past these instructions if necessary.
   MachineFunction &MF = *MBB.getParent();
-  AArch64FunctionInfo *AFI = MF.getInfo();
-  if (AFI->hasStreamingModeChanges())
+  if (requiresSaveVG(MF))
 while (isVGInstruction(MBBI))
   ++MBBI;
 
@@ -1937,7 +1948,7 @@ void AArch64FrameLowering::emitPrologue(MachineFunction 
&MF,
   while (MBBI != End && MBBI->getFlag(MachineInstr::FrameSetup) &&
  !IsSVECalleeSave(MBBI)) {
 // Move past instructions generated to calculate VG
-if (AFI->hasStreamingModeChanges())
+if (requiresSaveVG(MF))
   while (isVGInstruction(MBBI))
 ++MBBI;
 
@@ -3720,7 +3731,7 @@ void 
AArch64FrameLowering::determineCalleeSaves(MachineFunction &MF,
   // non-streaming VG value.
   const Function &F = MF.getFunction();
   SMEAttrs Attrs(F);
-  if (AFI->hasStreamingModeChanges()) {
+  if (requiresSaveVG(MF)) {
 if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface())
   CSStackSize += 16;
 else
@@ -3873,7 +3884,7 @@ bool AArch64FrameLowering::assignCalleeSavedSpillSlots(
   }
 
   // Insert VG into the list of CSRs, immediately before LR if saved.
-  if (AFI->hasStreamingModeChanges()) {
+  if (requiresSaveVG(MF)) {
 std::vector VGSaves;
 SMEAttrs Attrs(MF.getFunction());
 
@@ -4602,10 +4613,9 @@ MachineBasicBlock::iterator 
emitVGSaveRestore(MachineBasicBlock::iterator II,
 
 void AArch64FrameLowering::processFunctionBeforeFrameIndicesReplaced(
 MachineFunction &MF, RegScavenger *RS = nullptr) const {
-  AArch64FunctionInfo *AFI = MF.getInfo();
   for (auto &BB : MF)
 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
-  if (AFI->hasStreamingModeChanges())
+  if (requiresSaveVG(MF))
 II = emitVGSaveRestore(II, this);
   if (StackTaggingMergeSetTag)
 II = tryMergeAdjacentSTG(II, this, RS);
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 62078822c89b18..ef2789e96213b5 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -8732,10 +8732,11 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
 
   SDValue InGlue;
   if (RequiresSMChange) {
-
-Chain = DAG.getNode(AArch64ISD::VG_SAVE, DL,
-DAG.getVTList(MVT::Other, MVT::Glue), Chain);
-InGlue = Chain.getValue(1);
+if (!Subtarget->isTargetDarwin() || Subtarget->hasSVE()) {
+  Chain = DAG.getNode(AArch64ISD::VG_SAVE, DL,
+  DAG.getVTList(MVT::Other, MVT::Glue), Chain);
+  InGlue = Chain.getValue(1);
+}
 
 SDValue NewChain = changeStreamingMode(
 DAG, DL, CalleeAttrs.hasStreamingInterface(), Chain, InGlue,
@@ -8914,11 +891

[llvm-branch-commits] [llvm] Cherry pick f314e12 into release/19.x (PR #117695)

2024-11-26 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm edited 
https://github.com/llvm/llvm-project/pull/117695
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Cherry pick f314e12 into release/19.x (PR #117695)

2024-11-26 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm milestoned 
https://github.com/llvm/llvm-project/pull/117695
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Cherry pick f314e12 into release/19.x (PR #117695)

2024-11-26 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm updated 
https://github.com/llvm/llvm-project/pull/117695

>From de526e5893e901c350fc9bd6d8013d7d1dbd42c6 Mon Sep 17 00:00:00 2001
From: Amara Emerson 
Date: Tue, 13 Aug 2024 00:39:14 -0700
Subject: [PATCH 1/3] [AArch64][Darwin][SME] Don't try to save VG to the stack
 for unwinding.

On Darwin we don't have any hardware that has SVE support, only SME.
Therefore we don't need to save VG for unwinders and can safely omit it.

This also fixes crashes introduced since this feature landed since Darwin's
compact unwind code can't handle the presence of VG anyway.

rdar://131072344
---
 .../Target/AArch64/AArch64FrameLowering.cpp   |  24 ++-
 .../Target/AArch64/AArch64ISelLowering.cpp|  19 ++-
 .../CodeGen/AArch64/sme-darwin-no-sve-vg.ll   | 161 ++
 3 files changed, 189 insertions(+), 15 deletions(-)
 create mode 100644 llvm/test/CodeGen/AArch64/sme-darwin-no-sve-vg.ll

diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
index 87e057a468afd6..83d9dd17259733 100644
--- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -1394,6 +1394,18 @@ bool requiresGetVGCall(MachineFunction &MF) {
  !MF.getSubtarget().hasSVE();
 }
 
+static bool requiresSaveVG(MachineFunction &MF) {
+  AArch64FunctionInfo *AFI = MF.getInfo();
+  // For Darwin platforms we don't save VG for non-SVE functions, even if SME
+  // is enabled with streaming mode changes.
+  if (!AFI->hasStreamingModeChanges())
+return false;
+  auto &ST = MF.getSubtarget();
+  if (ST.isTargetDarwin())
+return ST.hasSVE();
+  return true;
+}
+
 bool isVGInstruction(MachineBasicBlock::iterator MBBI) {
   unsigned Opc = MBBI->getOpcode();
   if (Opc == AArch64::CNTD_XPiI || Opc == AArch64::RDSVLI_XI ||
@@ -1430,8 +1442,7 @@ static MachineBasicBlock::iterator 
convertCalleeSaveRestoreToSPPrePostIncDec(
   // functions, we need to do this for both the streaming and non-streaming
   // vector length. Move past these instructions if necessary.
   MachineFunction &MF = *MBB.getParent();
-  AArch64FunctionInfo *AFI = MF.getInfo();
-  if (AFI->hasStreamingModeChanges())
+  if (requiresSaveVG(MF))
 while (isVGInstruction(MBBI))
   ++MBBI;
 
@@ -1937,7 +1948,7 @@ void AArch64FrameLowering::emitPrologue(MachineFunction 
&MF,
   while (MBBI != End && MBBI->getFlag(MachineInstr::FrameSetup) &&
  !IsSVECalleeSave(MBBI)) {
 // Move past instructions generated to calculate VG
-if (AFI->hasStreamingModeChanges())
+if (requiresSaveVG(MF))
   while (isVGInstruction(MBBI))
 ++MBBI;
 
@@ -3720,7 +3731,7 @@ void 
AArch64FrameLowering::determineCalleeSaves(MachineFunction &MF,
   // non-streaming VG value.
   const Function &F = MF.getFunction();
   SMEAttrs Attrs(F);
-  if (AFI->hasStreamingModeChanges()) {
+  if (requiresSaveVG(MF)) {
 if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface())
   CSStackSize += 16;
 else
@@ -3873,7 +3884,7 @@ bool AArch64FrameLowering::assignCalleeSavedSpillSlots(
   }
 
   // Insert VG into the list of CSRs, immediately before LR if saved.
-  if (AFI->hasStreamingModeChanges()) {
+  if (requiresSaveVG(MF)) {
 std::vector VGSaves;
 SMEAttrs Attrs(MF.getFunction());
 
@@ -4602,10 +4613,9 @@ MachineBasicBlock::iterator 
emitVGSaveRestore(MachineBasicBlock::iterator II,
 
 void AArch64FrameLowering::processFunctionBeforeFrameIndicesReplaced(
 MachineFunction &MF, RegScavenger *RS = nullptr) const {
-  AArch64FunctionInfo *AFI = MF.getInfo();
   for (auto &BB : MF)
 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
-  if (AFI->hasStreamingModeChanges())
+  if (requiresSaveVG(MF))
 II = emitVGSaveRestore(II, this);
   if (StackTaggingMergeSetTag)
 II = tryMergeAdjacentSTG(II, this, RS);
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 62078822c89b18..ef2789e96213b5 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -8732,10 +8732,11 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
 
   SDValue InGlue;
   if (RequiresSMChange) {
-
-Chain = DAG.getNode(AArch64ISD::VG_SAVE, DL,
-DAG.getVTList(MVT::Other, MVT::Glue), Chain);
-InGlue = Chain.getValue(1);
+if (!Subtarget->isTargetDarwin() || Subtarget->hasSVE()) {
+  Chain = DAG.getNode(AArch64ISD::VG_SAVE, DL,
+  DAG.getVTList(MVT::Other, MVT::Glue), Chain);
+  InGlue = Chain.getValue(1);
+}
 
 SDValue NewChain = changeStreamingMode(
 DAG, DL, CalleeAttrs.hasStreamingInterface(), Chain, InGlue,
@@ -8914,11 +8915,13 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
 Result = changeStreamingMode(
 DAG, DL, !CalleeAttrs.hasStreamingInterface(), Result, InGlue,

[llvm-branch-commits] [llvm] Cherry pick f314e12 into release/19.x (PR #117695)

2024-11-26 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm edited 
https://github.com/llvm/llvm-project/pull/117695
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Coalescer] Consider NewMI's subreg index when updating lanemask. (PR #121780)

2025-01-06 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm created 
https://github.com/llvm/llvm-project/pull/121780

The code added in #116191 that updated the lanemasks for rematerialized
values checked if `DefMI`'s destination register had a subreg index.
This seems to have missed the following case:

```
  %0:gpr32 = MOVi32imm 1
  %1:gpr64 = SUBREG_TO_REG 0, %0:gpr32, %subreg.sub_32
```

which during rematerialization would have the following variables set:

```
  DefMI = %0:gpr32 = MOVi32imm 1

  NewMI = %3.sub_32:gpr64 = MOVi32imm 1   (rematerialized value)
```

When checking whether the lanemasks need to be generated, considering
whether DefMI's destination has a subreg index is insufficient, we should
look at DefMI's subreg index instead.

The added tests are a bit more involved, because I was not able to
reconstruct the issue without having some control flow in the test.
These tests come from actual reproducers.

>From 5babad18cb92f708f93c5a0512918ff5e7368fba Mon Sep 17 00:00:00 2001
From: Sander de Smalen 
Date: Mon, 6 Jan 2025 14:31:49 +
Subject: [PATCH 1/2] Pre-commit tests

---
 ...gister-coalesce-update-subranges-remat.mir | 91 ++-
 1 file changed, 90 insertions(+), 1 deletion(-)

diff --git 
a/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir 
b/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir
index b61fa4be040070..524f51a2da5af2 100644
--- a/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir
+++ b/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir
@@ -1,5 +1,5 @@
+# RUN: llc -mtriple=aarch64 -o /dev/null -run-pass=register-coalescer 
-aarch64-enable-subreg-liveness-tracking -debug-only=regalloc %s 2>&1 | 
FileCheck %s --check-prefix=CHECK-DBG
 # RUN: llc -mtriple=aarch64 -verify-machineinstrs -o - 
-run-pass=register-coalescer -aarch64-enable-subreg-liveness-tracking %s | 
FileCheck %s --check-prefix=CHECK
-# RUN: llc -mtriple=aarch64 -verify-machineinstrs -o /dev/null 
-run-pass=register-coalescer -aarch64-enable-subreg-liveness-tracking 
-debug-only=regalloc %s 2>&1 | FileCheck %s --check-prefix=CHECK-DBG
 # REQUIRES: asserts
 
 # CHECK-DBG: ** REGISTER COALESCER **
@@ -36,3 +36,92 @@ body: |
 RET_ReallyLR
 
 ...
+# CHECK-DBG: ** REGISTER COALESCER **
+# CHECK-DBG: ** Function: reproducer
+# CHECK-DBG: ** JOINING INTERVALS ***
+# CHECK-DBG: ** INTERVALS **
+# CHECK-DBG: %1 [32r,48B:2)[48B,320r:0)[320r,368B:1) 0@48B-phi 1@320r 2@32r
+# CHECK-DBG-SAME: weight:0.00e+00
+# CHECK-DBG: %3 [80r,160B:2)[240r,272B:1)[288r,304B:0)[304B,320r:3) 0@288r 
1@240r 2@80r 3@304B-phi
+# CHECK-DBG-SAME: L0080 [80r,160B:2)[288r,304B:0)[304B,320r:3) 
0@288r 1@x 2@80r 3@304B-phi
+# CHECK-DBG-SAME: L0040 
[80r,160B:2)[240r,272B:1)[288r,304B:0)[304B,320r:3) 0@288r 1@240r 2@80r 
3@304B-phi
+# CHECK-DBG-SAME: weight:0.00e+00
+---
+name:  reproducer
+tracksRegLiveness: true
+body: |
+  bb.0:
+%0:gpr32 = MOVi32imm 1
+%1:gpr64 = IMPLICIT_DEF
+
+  bb.1:
+
+  bb.2:
+%3:gpr64all = SUBREG_TO_REG 0, %0, %subreg.sub_32
+
+  bb.3:
+$nzcv = IMPLICIT_DEF
+%4:gpr64 = COPY killed %3
+Bcc 1, %bb.7, implicit killed $nzcv
+
+  bb.4:
+$nzcv = IMPLICIT_DEF
+Bcc 1, %bb.6, implicit killed $nzcv
+
+  bb.5:
+%5:gpr64all = SUBREG_TO_REG 0, %0, %subreg.sub_32
+%4:gpr64 = COPY killed %5
+B %bb.7
+
+  bb.6:
+%4:gpr64 = COPY $xzr
+
+  bb.7:
+%7:gpr64 = ADDXrs killed %1, killed %4, 1
+%1:gpr64 = COPY killed %7
+B %bb.1
+
+...
+# CHECK-DBG: ** REGISTER COALESCER **
+# CHECK-DBG: ** Function: reproducer2
+# CHECK-DBG: ** JOINING INTERVALS ***
+# CHECK-DBG: ** INTERVALS **
+# CHECK-DBG: %1 [32r,48B:2)[48B,304r:0)[304r,352B:1) 0@48B-phi 1@304r 2@32r
+# CHECK-DBG-SAME: weight:0.00e+00
+# CHECK-DBG: %3 [80r,160B:2)[224r,256B:1)[272r,288B:0)[288B,304r:3) 0@272r 
1@224r 2@80r 3@288B-phi
+# CHECK-DBG-SAME: weight:0.00e+00
+---
+name:  reproducer2
+tracksRegLiveness: true
+body: |
+  bb.0:
+%0:gpr32 = MOVi32imm 1
+%1:gpr64 = IMPLICIT_DEF
+
+  bb.1:
+
+  bb.2:
+%3:gpr64all = SUBREG_TO_REG 0, %0, %subreg.sub_32
+
+  bb.3:
+$nzcv = IMPLICIT_DEF
+%4:gpr64 = COPY killed %3
+Bcc 1, %bb.7, implicit killed $nzcv
+
+  bb.4:
+$nzcv = IMPLICIT_DEF
+Bcc 1, %bb.6, implicit killed $nzcv
+
+  bb.5:
+%4:gpr64 = IMPLICIT_DEF
+B %bb.7
+
+  bb.6:
+%4:gpr64 = COPY $xzr
+
+  bb.7:
+%5:gpr64 = ADDXrs killed %1, killed %4, 1
+%1:gpr64 = COPY killed %5
+B %bb.1
+
+...

>From b7afd88e94e1b80d35b4eb7b712976200857ad3a Mon Sep 17 00:00:00 2001
From: Sander de Smalen 
Date: Tue, 17 Dec 2024 21:01:20 +
Subject: [PATCH 2/2] [Coalescer] Consider NewMI's subreg index when updating
 lanemask.

The code added in #116191 that updated the lanemasks for rematerialized

[llvm-branch-commits] [llvm] release/20.x: [AArch64][SME] [AArch64][SME] Spill p-regs as z-regs when streaming hazards are possible (PR #126503)

2025-02-10 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm approved this pull request.

There is no risk in adding this to the release branch, because all 
functionality is hidden behind a flag. The TableGen/SubtargetEmitter.cpp change 
should not affect anything, because it merely emits an extra `enum class` to 
the `*GenSubtargetInfo.cpp` file.

https://github.com/llvm/llvm-project/pull/126503
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-04-05 Thread Sander de Smalen via llvm-branch-commits


@@ -2,6 +2,7 @@
 ; RUN: opt -passes=loop-vectorize -enable-epilogue-vectorization=false 
-mattr=+neon,+dotprod -force-vector-interleave=1 -S < %s | FileCheck %s 
--check-prefixes=CHECK-INTERLEAVE1
 ; RUN: opt -passes=loop-vectorize -enable-epilogue-vectorization=false 
-mattr=+neon,+dotprod -S < %s | FileCheck %s --check-prefixes=CHECK-INTERLEAVED
 ; RUN: opt -passes=loop-vectorize -enable-epilogue-vectorization=false 
-mattr=+neon,+dotprod -force-vector-interleave=1 -vectorizer-maximize-bandwidth 
-S < %s | FileCheck %s --check-prefixes=CHECK-MAXBW
+; RUN: opt -passes=loop-vectorize -debug-only=loop-vectorize --disable-output 
-S < %s 2>&1 | FileCheck %s --check-prefix=CHECK-REGS

sdesmalen-arm wrote:

This now needs a `REQUIRES: asserts` to guard it for the `-debug-only=`.

https://github.com/llvm/llvm-project/pull/133090
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-04-07 Thread Sander de Smalen via llvm-branch-commits


@@ -5039,10 +5039,25 @@ calculateRegisterUsage(VPlan &Plan, 
ArrayRef VFs,
 // even in the scalar case.
 RegUsage[ClassID] += 1;
   } else {
+// The output from scaled phis and scaled reductions actually have
+// fewer lanes than the VF.
+ElementCount VF = VFs[J];
+if (auto *ReductionR = dyn_cast(R))
+  VF = VF.divideCoefficientBy(ReductionR->getVFScaleFactor());
+else if (auto *PartialReductionR =
+ dyn_cast(R))
+  VF = 
VF.divideCoefficientBy(PartialReductionR->getVFScaleFactor());
+
+LLVM_DEBUG(if (VF != VFs[J]) {
+  dbgs() << "LV(REG): Scaled down VF from " << VFs[J] << " to "
+ << VF << " for ";
+  R->dump();

sdesmalen-arm wrote:

minor nit: Is it worth creating a `operator<<` for VPDef, so that you can write:
```suggestion
 << VF << " for " << *R << "\n";
```
?

https://github.com/llvm/llvm-project/pull/133090
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-04-08 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm edited 
https://github.com/llvm/llvm-project/pull/133090
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-04-08 Thread Sander de Smalen via llvm-branch-commits


@@ -5039,10 +5039,25 @@ calculateRegisterUsage(VPlan &Plan, 
ArrayRef VFs,
 // even in the scalar case.
 RegUsage[ClassID] += 1;
   } else {
+// The output from scaled phis and scaled reductions actually have
+// fewer lanes than the VF.
+ElementCount VF = VFs[J];
+if (auto *ReductionR = dyn_cast(R))
+  VF = VF.divideCoefficientBy(ReductionR->getVFScaleFactor());
+else if (auto *PartialReductionR =
+ dyn_cast(R))
+  VF = 
VF.divideCoefficientBy(PartialReductionR->getVFScaleFactor());
+
+LLVM_DEBUG(if (VF != VFs[J]) {
+  dbgs() << "LV(REG): Scaled down VF from " << VFs[J] << " to "
+ << VF << " for ";
+  R->dump();

sdesmalen-arm wrote:

I would expect it to be a pretty small and uncontroversial change (it would 
merely add an extra function) and doesn't require any particular tests. Are you 
sure you don't want to include it in this PR?

https://github.com/llvm/llvm-project/pull/133090
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-04-08 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm commented:

You'll probably want to rebase on top of #126437 again, as some of the code has 
changed and will cause a merge conflict.

https://github.com/llvm/llvm-project/pull/133090
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-04-10 Thread Sander de Smalen via llvm-branch-commits


@@ -5039,10 +5039,26 @@ calculateRegisterUsage(VPlan &Plan, 
ArrayRef VFs,
 // even in the scalar case.
 RegUsage[ClassID] += 1;
   } else {
+ElementCount VF = VFs[J];
+// The output from scaled phis and scaled reductions actually has
+// fewer lanes than the VF.
+if (isa(R)) {
+  auto *ReductionR = dyn_cast(R);
+  auto *PartialReductionR = ReductionR ? nullptr : 
dyn_cast(R);
+  unsigned ScaleFactor = ReductionR ? 
ReductionR->getVFScaleFactor() : PartialReductionR->getVFScaleFactor();
+  VF = VF.divideCoefficientBy(ScaleFactor);
+}

sdesmalen-arm wrote:

Maybe create a utility function that returns the scaling factor a `Recipe`, 
which returns `1` for any recipe other than the 
`VPPartialReductionRecipe/VPReductionPHIRecipe`.

Also, please run clang-format on your code.

https://github.com/llvm/llvm-project/pull/133090
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" (PR #134408)

2025-04-11 Thread Sander de Smalen via llvm-branch-commits

sdesmalen-arm wrote:

@arsenm are you happy for me to reland this?

I've done better testing this time around; doing a two-stage build with 
sanitisers enabled and running LNT on both X86 and AArch64 platforms.

https://github.com/llvm/llvm-project/pull/134408
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-30 Thread Sander de Smalen via llvm-branch-commits


@@ -4923,9 +4923,7 @@ InstructionCost AArch64TTIImpl::getPartialReductionCost(
 return Invalid;
   break;
 case 16:
-  if (AccumEVT == MVT::i64)
-Cost *= 2;
-  else if (AccumEVT != MVT::i32)
+  if (AccumEVT != MVT::i32)

sdesmalen-arm wrote:

```llvm.partial.reduce(nxv2i64 %acc, mul(zext(nxv16i8 %a) -> nxv16i64, 
zext(nxv16i8 %b) -> nxv16i64))```
This case can be lowered with the correct support in selectiondag (see 
https://github.com/llvm/llvm-project/pull/130935#discussion_r2068162662)

I'd argue that this function will need a bit of a rewrite anyway once we get 
proper support in SelectionDAG for legalising these operations, so I wouldn't 
bother changing that in this PR.

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-30 Thread Sander de Smalen via llvm-branch-commits


@@ -2056,55 +2056,6 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe,
   }
 };
 
-/// A recipe for forming partial reductions. In the loop, an accumulator and

sdesmalen-arm wrote:

(We discussed this offline) swapping the operands in the debug-print function 
of the recipe is not something that really concerns me, and I think there's 
still value making this functionally (from end-user perspective) NFC change.

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-30 Thread Sander de Smalen via llvm-branch-commits


@@ -986,11 +986,23 @@ InstructionCost TargetTransformInfo::getShuffleCost(
 
 TargetTransformInfo::PartialReductionExtendKind
 TargetTransformInfo::getPartialReductionExtendKind(Instruction *I) {
-  if (isa(I))
-return PR_SignExtend;
-  if (isa(I))
+  auto *Cast = dyn_cast(I);
+  if (!Cast)
+return PR_None;
+  return getPartialReductionExtendKind(Cast->getOpcode());
+}
+
+TargetTransformInfo::PartialReductionExtendKind
+TargetTransformInfo::getPartialReductionExtendKind(
+Instruction::CastOps ExtOpcode) {
+  switch (ExtOpcode) {
+  case Instruction::CastOps::ZExt:
 return PR_ZeroExtend;
-  return PR_None;
+  case Instruction::CastOps::SExt:
+return PR_SignExtend;
+  default:
+return PR_None;

sdesmalen-arm wrote:

I think the default case should be an `llvm_unreachable()`, because only 
zero/sign-extend are supported partial reduction extends. The `PR_None` is 
handled by the function above that takes an `Instruction*`.

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-30 Thread Sander de Smalen via llvm-branch-commits


@@ -986,11 +986,23 @@ InstructionCost TargetTransformInfo::getShuffleCost(
 
 TargetTransformInfo::PartialReductionExtendKind
 TargetTransformInfo::getPartialReductionExtendKind(Instruction *I) {
-  if (isa(I))
-return PR_SignExtend;
-  if (isa(I))
+  auto *Cast = dyn_cast(I);
+  if (!Cast)
+return PR_None;
+  return getPartialReductionExtendKind(Cast->getOpcode());

sdesmalen-arm wrote:

```suggestion
  if (auto *Cast = dyn_cast(I))
return getPartialReductionExtendKind(Cast->getOpcode());
  return PR_None;
```

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-30 Thread Sander de Smalen via llvm-branch-commits


@@ -2432,12 +2437,40 @@ static void 
tryToCreateAbstractReductionRecipe(VPReductionRecipe *Red,
   Red->replaceAllUsesWith(AbstractR);
 }
 
+/// This function tries to create an abstract recipe from a partial reduction 
to
+/// hide its mul and extends from cost estimation.
+static void
+tryToCreateAbstractPartialReductionRecipe(VPPartialReductionRecipe *PRed) {

sdesmalen-arm wrote:

Does this need to be given the `Range &` and for that range to be clamped if it 
doesn't match or if the cost is higher than the individual operations (similar 
to what happens in `tryToCreateAbstractReductionRecipe`) ?

(note that the cost part is still missing)

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/20.x: [AArch64][SME] Prevent spills of ZT0 when ZA is not enabled (PR #137683)

2025-04-30 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm approved this pull request.


https://github.com/llvm/llvm-project/pull/137683
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-24 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm deleted 
https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-24 Thread Sander de Smalen via llvm-branch-commits


@@ -4923,9 +4923,7 @@ InstructionCost AArch64TTIImpl::getPartialReductionCost(
 return Invalid;
   break;
 case 16:
-  if (AccumEVT == MVT::i64)
-Cost *= 2;
-  else if (AccumEVT != MVT::i32)
+  if (AccumEVT != MVT::i32)

sdesmalen-arm wrote:

Why are you making this change?

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-24 Thread Sander de Smalen via llvm-branch-commits


@@ -219,6 +219,8 @@ class TargetTransformInfo {
   /// Get the kind of extension that an instruction represents.
   static PartialReductionExtendKind
   getPartialReductionExtendKind(Instruction *I);
+  static PartialReductionExtendKind
+  getPartialReductionExtendKind(Instruction::CastOps ExtOpcode);

sdesmalen-arm wrote:

What about either replacing `getPartialReductionExtendKind(Instruction *I);` 
with this one, rather than adding a new interface, Or otherwise changing the 
implementation of `getPartialReductionExtendKind(Instruction *I)` to use 
`getPartialReductionExtendKind(Instruction::CastOps)` ?

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-24 Thread Sander de Smalen via llvm-branch-commits


@@ -2056,55 +2056,6 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe,
   }
 };
 
-/// A recipe for forming partial reductions. In the loop, an accumulator and

sdesmalen-arm wrote:

Would it be possible to make the change of `VPPartialReductionRecipe : public 
VPSingleDefRecipe` -> `VPPartialReductionRecipe : public VPReductionRecipe` as 
an NFC change? (For cases around VPMulAccumulateReductionRecipes you can 
initially add some asserts that the recipe isn't a partial reduction, because 
that won't be supported until this PR lands)

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)

2025-04-24 Thread Sander de Smalen via llvm-branch-commits


@@ -2438,14 +2438,14 @@ 
VPMulAccumulateReductionRecipe::computeCost(ElementCount VF,
 return Ctx.TTI.getPartialReductionCost(
 Instruction::Add, Ctx.Types.inferScalarType(getVecOp0()),
 Ctx.Types.inferScalarType(getVecOp1()), getResultType(), VF,
-TTI::getPartialReductionExtendKind(getExtOpcode()),
-TTI::getPartialReductionExtendKind(getExtOpcode()), Instruction::Mul);
+TTI::getPartialReductionExtendKind(getExt0Opcode()),
+TTI::getPartialReductionExtendKind(getExt1Opcode()), Instruction::Mul);
   }
 
   Type *RedTy = Ctx.Types.inferScalarType(this);
   auto *SrcVecTy =
   cast(toVectorTy(Ctx.Types.inferScalarType(getVecOp0()), VF));
-  return Ctx.TTI.getMulAccReductionCost(isZExt(), RedTy, SrcVecTy,
+  return Ctx.TTI.getMulAccReductionCost(isZExt0(), RedTy, SrcVecTy,

sdesmalen-arm wrote:

The TTI hook also needs updating to reflect the separate extends.

https://github.com/llvm/llvm-project/pull/136997
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-24 Thread Sander de Smalen via llvm-branch-commits


@@ -2496,6 +2501,9 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
 
   Type *ResultTy;
 
+  /// If the reduction this is based on is a partial reduction.

sdesmalen-arm wrote:

This comment makes no sense.

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" (PR #134408)

2025-04-04 Thread Sander de Smalen via llvm-branch-commits


@@ -329,11 +329,10 @@ define <2 x half> @chain_hi_to_lo_global() {
 ; GFX11-TRUE16:   ; %bb.0: ; %bb
 ; GFX11-TRUE16-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX11-TRUE16-NEXT:v_mov_b32_e32 v0, 2
-; GFX11-TRUE16-NEXT:v_mov_b32_e32 v1, 0
+; GFX11-TRUE16-NEXT:v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v2, 0
+; GFX11-TRUE16-NEXT:v_mov_b32_e32 v3, 0
 ; GFX11-TRUE16-NEXT:global_load_d16_b16 v0, v[0:1], off
-; GFX11-TRUE16-NEXT:v_mov_b32_e32 v1, 0
-; GFX11-TRUE16-NEXT:v_mov_b32_e32 v2, 0
-; GFX11-TRUE16-NEXT:global_load_d16_hi_b16 v0, v[1:2], off
+; GFX11-TRUE16-NEXT:global_load_d16_hi_b16 v0, v[2:3], off

sdesmalen-arm wrote:

@broxigarchen I noticed these tests changed, but I couldn't really tell whether 
these changes are functionally equivalent.

https://github.com/llvm/llvm-project/pull/134408
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-05-12 Thread Sander de Smalen via llvm-branch-commits


@@ -2432,12 +2437,40 @@ static void 
tryToCreateAbstractReductionRecipe(VPReductionRecipe *Red,
   Red->replaceAllUsesWith(AbstractR);
 }
 
+/// This function tries to create an abstract recipe from a partial reduction 
to
+/// hide its mul and extends from cost estimation.
+static void
+tryToCreateAbstractPartialReductionRecipe(VPPartialReductionRecipe *PRed) {

sdesmalen-arm wrote:

The way I read the code is that at the point of getting to this point in the 
code, it has recognised a reduction so there is a `VP[Partial]ReductionRecipe`. 
It then tries to analyse whether that recipe can be transformed into a 
`VPMulAccumulateReductionRecipe`. For `VPReductionRecipe` it will clamp the 
range to all the VFs that can be turned into a 
`VPMulAccumulateReductionRecipe`, but for `VPPartialReductionRecipe` it doesn't 
do that. I don't see why for partial reductions we'd do something different.

In fact, why wouldn't the `tryToMatchAndCreateMulAccumulateReduction` code be 
sufficient here? Now that you've made `VPPartialReductionRecipe` a subclass of 
`VPReductionRecipe`, I'd expect that code to function roughly the same.

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] ARM: Remove fake entries for divrem libcalls (PR #143832)

2025-06-12 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm approved this pull request.


https://github.com/llvm/llvm-project/pull/143832
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LV] Use VPReductionRecipe for partial reductions (PR #146073)

2025-07-01 Thread Sander de Smalen via llvm-branch-commits


@@ -2744,6 +2702,12 @@ class VPSingleDefBundleRecipe : public VPSingleDefRecipe 
{
 /// vector operands, performing a reduction.add on the result, and adding
 /// the scalar result to a chain.
 MulAccumulateReduction,
+/// Represent an inloop multiply-accumulate reduction, multiplying the
+/// extended vector operands, negating the multiplication, performing a
+/// reduction.add
+/// on the result, and adding
+/// the scalar result to a chain.
+ExtNegatedMulAccumulateReduction,

sdesmalen-arm wrote:

This looks like something that is currently not yet supported for InLoop 
reductions. Because you're trying to align the PartialReductions with other 
reductions, we need to add support for `sub` reductions as well to InLoop 
reductions. I'd suggest pulling that out into a separate PR (and rebasing this 
one on top of that).

This will require a few changes:
* `RecurrenceDescriptor::getReductionOpChain` does not recognise sub-reductions 
at the moment, which will need fixing. The reason for that is described in one 
of the comments:
```
// [..] Subs are also currently not allowed (which are usually
// treated as part of a add reduction) as they are expected to generally be
// more expensive than out-of-loop reductions, and need to be costed more
// carefully.
```
Because you're adding a cost-model for those here, we can now handle 
sub-reductions as well.

* When the `VPReductionRecipe` is created in 
`LoopVectorizationPlanner::adjustRecipesForReductions`, for in-loop 
sub-reductions we'd need to add a `sub(0, VecOp)` similar to how we do this for 
partial reductions.

When we finally properly merge the two implementations, the special code we 
currently have for partial reductions to insert the `sub` can be removed, in 
favour of the code that we'd need to add for in-loop reductions. (FWIW, that is 
not something I'm suggesting to do as part of this PR, but rather as a future 
follow-up)

https://github.com/llvm/llvm-project/pull/146073
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)

2025-07-07 Thread Sander de Smalen via llvm-branch-commits


@@ -784,8 +785,8 @@ AArch64RegisterInfo::useFPForScavengingIndex(const 
MachineFunction &MF) const {
   assert((!MF.getSubtarget().hasSVE() ||
   AFI->hasCalculatedStackSizeSVE()) &&
  "Expected SVE area to be calculated by this point");
-  return TFI.hasFP(MF) && !hasStackRealignment(MF) && !AFI->getStackSizeSVE() 
&&
- !AFI->hasStackHazardSlotIndex();
+  return TFI.hasFP(MF) && !hasStackRealignment(MF) && !AFI->getStackSizeZPR() 
&&
+ !AFI->getStackSizePPR() && !AFI->hasStackHazardSlotIndex();

sdesmalen-arm wrote:

nit:
```suggestion
  return TFI.hasFP(MF) && !hasStackRealignment(MF) && !AFI->hasSVEStackSize() &&
 !AFI->hasStackHazardSlotIndex();
```

https://github.com/llvm/llvm-project/pull/142391
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)

2025-07-07 Thread Sander de Smalen via llvm-branch-commits


@@ -299,14 +297,20 @@ class AArch64FunctionInfo final : public 
MachineFunctionInfo {
 TailCallReservedStack = bytes;
   }
 
-  bool hasCalculatedStackSizeSVE() const { return HasCalculatedStackSizeSVE; }
+  void setStackSizeZPR(uint64_t S) {
+HasCalculatedStackSizeSVE = true;

sdesmalen-arm wrote:

nit: this function sets `HasCalculatedStackSizeSVE` if only one of the two 
values are set. Is it worth making this `setStackSizeSVE(uint64_t ZPR, uint64_t 
PPR=0)` such that `HasCalculatedStackSizeSVE is set only once?

https://github.com/llvm/llvm-project/pull/142391
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)

2025-07-07 Thread Sander de Smalen via llvm-branch-commits


@@ -299,14 +297,20 @@ class AArch64FunctionInfo final : public 
MachineFunctionInfo {
 TailCallReservedStack = bytes;
   }
 
-  bool hasCalculatedStackSizeSVE() const { return HasCalculatedStackSizeSVE; }
+  void setStackSizeZPR(uint64_t S) {
+HasCalculatedStackSizeSVE = true;
+StackSizeZPR = S;
+  }
 
-  void setStackSizeSVE(uint64_t S) {
+  void setStackSizePPR(uint64_t S) {
 HasCalculatedStackSizeSVE = true;
-StackSizeSVE = S;
+StackSizePPR = S;
   }
 
-  uint64_t getStackSizeSVE() const { return StackSizeSVE; }
+  uint64_t getStackSizeZPR() const { return StackSizeZPR; }

sdesmalen-arm wrote:

not related to your PR, but I think we should add an assert that 
`HasCalculatedStackSizeSVE` is true (same for CalleeSavedStackSize), although 
unfortunately that currently leads to some failures where they're used.

https://github.com/llvm/llvm-project/pull/142391
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)

2025-07-07 Thread Sander de Smalen via llvm-branch-commits


@@ -451,10 +454,36 @@ static unsigned getFixedObjectSize(const MachineFunction 
&MF,
   }
 }
 
-/// Returns the size of the entire SVE stackframe (calleesaves + spills).
+static unsigned getStackHazardSize(const MachineFunction &MF) {

sdesmalen-arm wrote:

nit: maybe just move the implementation to where they are declared?

https://github.com/llvm/llvm-project/pull/142391
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)

2025-07-07 Thread Sander de Smalen via llvm-branch-commits


@@ -19,6 +19,11 @@
 
 namespace llvm {
 
+struct SVEStackSizes {

sdesmalen-arm wrote:

Should this be named `SVEStackOffsets` (given that they're used as signed 
offsets)?

https://github.com/llvm/llvm-project/pull/142391
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)

2025-07-07 Thread Sander de Smalen via llvm-branch-commits


@@ -644,7 +644,8 @@ bool AArch64RegisterInfo::hasBasePointer(const 
MachineFunction &MF) const {
 if (ST.hasSVE() || ST.isStreaming()) {
   // Frames that have variable sized objects and scalable SVE objects,
   // should always use a basepointer.
-  if (!AFI->hasCalculatedStackSizeSVE() || AFI->getStackSizeSVE())
+  if (!AFI->hasCalculatedStackSizeSVE() || AFI->getStackSizeZPR() ||
+  AFI->getStackSizePPR())

sdesmalen-arm wrote:

nit:
```suggestion
  if (!AFI->hasCalculatedStackSizeSVE() || AFI->hasSVEStackSize())
```

https://github.com/llvm/llvm-project/pull/142391
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)

2025-07-07 Thread Sander de Smalen via llvm-branch-commits


@@ -1605,25 +1634,19 @@ static bool isTargetWindows(const MachineFunction &MF) {
   return MF.getSubtarget().isTargetWindows();
 }
 
-static unsigned getStackHazardSize(const MachineFunction &MF) {
-  return MF.getSubtarget().getStreamingHazardSize();
-}
-
 // Convenience function to determine whether I is an SVE callee save.
-static bool IsSVECalleeSave(MachineBasicBlock::iterator I) {
+static bool IsZPRCalleeSave(MachineBasicBlock::iterator I) {

sdesmalen-arm wrote:

nit: given that you're renaming these, what about calling them 
`isPartOfZPRCalleeSave` (because a `PTRUE_B` instruction is not a callee-save 
in itself)

https://github.com/llvm/llvm-project/pull/142391
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)

2025-07-07 Thread Sander de Smalen via llvm-branch-commits


@@ -4294,24 +4396,32 @@ static int64_t 
determineSVEStackObjectOffsets(MachineFrameInfo &MFI,
   report_fatal_error(
   "Alignment of scalable vectors > 16 bytes is not yet supported");
 
+int64_t &Offset = OffsetForObject(FI, ZPROffset, PPROffset);
 Offset = alignTo(Offset + MFI.getObjectSize(FI), Alignment);
 if (AssignOffsets)
   Assign(FI, -Offset);
   }
 
-  return Offset;
+  PPROffset = alignTo(PPROffset, Align(16U));
+  ZPROffset = alignTo(ZPROffset, Align(16U));
+
+  if (&ZPROffset != &PPROffset) {
+// SplitSVEObjects (PPRs and ZPRs allocated to separate areas).
+return SVEStackSizes{ZPROffset, PPROffset};
+  }
+  // When SplitSVEObjects is disabled just attribute all the stack to ZPRs.
+  // Determining the split is not necessary.
+  return SVEStackSizes{ZPROffset, 0};

sdesmalen-arm wrote:

When you use an instance of the return type (`SVEStackSizes`) instead of 
`ZPRStack` and `PPRStack`, then you can just return that struct at the end of 
this function.

https://github.com/llvm/llvm-project/pull/142391
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)

2025-07-07 Thread Sander de Smalen via llvm-branch-commits


@@ -4227,10 +4310,20 @@ static bool getSVECalleeSaveSlotRange(const 
MachineFrameInfo &MFI,
 // Fills in the first and last callee-saved frame indices into
 // Min/MaxCSFrameIndex, respectively.
 // Returns the size of the stack.
-static int64_t determineSVEStackObjectOffsets(MachineFrameInfo &MFI,
-  int &MinCSFrameIndex,
-  int &MaxCSFrameIndex,
-  bool AssignOffsets) {
+static SVEStackSizes
+determineSVEStackObjectOffsets(MachineFunction &MF, bool AssignOffsets,
+   bool SplitSVEObjects = false) {
+  MachineFrameInfo &MFI = MF.getFrameInfo();
+
+  int64_t ZPRStack = 0;
+  int64_t PPRStack = 0;
+
+  auto [ZPROffset, PPROffset] = [&] {
+if (SplitSVEObjects)
+  return std::tie(ZPRStack, PPRStack);
+return std::tie(ZPRStack, ZPRStack);
+  }();

sdesmalen-arm wrote:

This seems a lot more readable:
```suggestion
  int64_t &ZPROffset = ZPRStack; 
  int64_t &PPROffset = SplitSVEObjects ? PPRStack : ZPRStack;
```

Also, can you add a brief comment describing why you create two aliases?

https://github.com/llvm/llvm-project/pull/142391
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)

2025-07-08 Thread Sander de Smalen via llvm-branch-commits


@@ -299,14 +297,20 @@ class AArch64FunctionInfo final : public 
MachineFunctionInfo {
 TailCallReservedStack = bytes;
   }
 
-  bool hasCalculatedStackSizeSVE() const { return HasCalculatedStackSizeSVE; }
+  void setStackSizeZPR(uint64_t S) {
+HasCalculatedStackSizeSVE = true;
+StackSizeZPR = S;
+  }
 
-  void setStackSizeSVE(uint64_t S) {
+  void setStackSizePPR(uint64_t S) {
 HasCalculatedStackSizeSVE = true;
-StackSizeSVE = S;
+StackSizePPR = S;
   }
 
-  uint64_t getStackSizeSVE() const { return StackSizeSVE; }
+  uint64_t getStackSizeZPR() const { return StackSizeZPR; }

sdesmalen-arm wrote:

Yeah, I wasn't necessarily suggesting that to be done as part of this PR. I 
just spotted it and I'm worried this hides some latent bugs. 

https://github.com/llvm/llvm-project/pull/142391
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)

2025-07-08 Thread Sander de Smalen via llvm-branch-commits


@@ -19,6 +19,11 @@
 
 namespace llvm {
 
+struct SVEStackSizes {

sdesmalen-arm wrote:

Okay, could you then change `int64_t` to an unsigned type?

https://github.com/llvm/llvm-project/pull/142391
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" (PR #134408)

2025-07-09 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm edited 
https://github.com/llvm/llvm-project/pull/134408
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" (PR #134408)

2025-07-09 Thread Sander de Smalen via llvm-branch-commits

sdesmalen-arm wrote:

My apologies for taking a while to get back to this; I had been lacking focus 
time, and it also took me a while to get the changes right. This time I think 
it's in better shape, partly because I've got a better understanding of how 
things work and second because I've done more testing.
(a) I've built and ran the LLVM test-suite with subreg-liveness-tracking 
enabled on AArch64.
(b) I've done stage-2-builds with sanitizers (address + memory) on both AArch64 
and X86  with subreg liveness tracking enabled for AArch64.


https://github.com/llvm/llvm-project/pull/134408
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)

2025-07-10 Thread Sander de Smalen via llvm-branch-commits


@@ -4363,24 +4458,25 @@ static int64_t 
determineSVEStackObjectOffsets(MachineFrameInfo &MFI,
   report_fatal_error(
   "Alignment of scalable vectors > 16 bytes is not yet supported");
 
+int64_t &Offset = OffsetForObject(FI, ZPROffset, PPROffset);
 Offset = alignTo(Offset + MFI.getObjectSize(FI), Alignment);
 if (AssignOffsets)
   Assign(FI, -Offset);
   }
 
-  return Offset;
+  PPROffset = alignTo(PPROffset, Align(16U));
+  ZPROffset = alignTo(ZPROffset, Align(16U));
+  return SVEStack;
 }
 
-int64_t AArch64FrameLowering::estimateSVEStackObjectOffsets(
-MachineFrameInfo &MFI) const {
-  int MinCSFrameIndex, MaxCSFrameIndex;
-  return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex, 
false);
+SVEStackSizes
+AArch64FrameLowering::estimateSVEStackObjectOffsets(MachineFunction &MF) const 
{
+  return determineSVEStackObjectOffsets(MF, false);

sdesmalen-arm wrote:

```suggestion
  return determineSVEStackObjectOffsets(MF, /*AssignOffsets=*/ false);
```
(same for the one below)

https://github.com/llvm/llvm-project/pull/142391
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)

2025-07-10 Thread Sander de Smalen via llvm-branch-commits


@@ -4296,10 +4372,20 @@ static bool getSVECalleeSaveSlotRange(const 
MachineFrameInfo &MFI,
 // Fills in the first and last callee-saved frame indices into
 // Min/MaxCSFrameIndex, respectively.
 // Returns the size of the stack.
-static int64_t determineSVEStackObjectOffsets(MachineFrameInfo &MFI,
-  int &MinCSFrameIndex,
-  int &MaxCSFrameIndex,
-  bool AssignOffsets) {
+static SVEStackSizes
+determineSVEStackObjectOffsets(MachineFunction &MF, bool AssignOffsets,
+   bool SplitSVEObjects = false) {

sdesmalen-arm wrote:

nit: maybe move the parameter to the other PR, because at the moment it is 
never called with `SplitSVEObjects = true`, and so this code cannot be tested.

https://github.com/llvm/llvm-project/pull/142391
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)

2025-07-10 Thread Sander de Smalen via llvm-branch-commits


@@ -4694,12 +4790,8 @@ void 
AArch64FrameLowering::processFunctionBeforeFrameFinalized(
   assert(getStackGrowthDirection() == TargetFrameLowering::StackGrowsDown &&
  "Upwards growing stack unsupported");
 
-  int MinCSFrameIndex, MaxCSFrameIndex;
-  int64_t SVEStackSize =
-  assignSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex);
-
-  AFI->setStackSizeSVE(alignTo(SVEStackSize, 16U));
-  AFI->setMinMaxSVECSFrameIndex(MinCSFrameIndex, MaxCSFrameIndex);
+  auto [ZPRStackSize, PPRStackSize] = assignSVEStackObjectOffsets(MF);

sdesmalen-arm wrote:

nit: There's little value in returning from `assignSVEStackObjectOffsets` if 
their only use is them being passed to `setStackSizeSVE`, so I think assignment 
can now probably be moved to `determineSVEStackObjectOffsets` (predicated under 
`AssignOffsets`), removing the need for the aliases that then just call 
`determineSVEStackObjectOffsets`.

https://github.com/llvm/llvm-project/pull/142391
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)

2025-07-10 Thread Sander de Smalen via llvm-branch-commits


@@ -19,6 +19,11 @@
 
 namespace llvm {
 
+struct SVEStackSizes {

sdesmalen-arm wrote:

The only places where `SVEStackSizes` are used, are in the context of unsigned 
values (the function `setStackSizeSVE` and the variables `SVELocals` and 
`SVEStackSize`), except for the case where you chose to use `int64_t` for 
`SVELocals` and `SVEStackSize` even though their sizes are always non-negative. 
If the value can never be negative and it's only ever used in contexts where 
the derived value is never negative, then I think it should be an unsigned 
value.

The size will at some point need converting to a signed offset, but that's 
already the case and the code that does the conversion from a size -> offset 
will need to guarantee the value does not overflow. Which brings me to the 
data-type `uint64_t`; other places for these sizes use `unsigned` at the 
moment, so if you change this to `unsigned`, then the conversion to `int64_t` 
(as used in `StackOffset`) will never result in a different signedness.

https://github.com/llvm/llvm-project/pull/142391
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)

2025-07-10 Thread Sander de Smalen via llvm-branch-commits


@@ -19,6 +19,11 @@
 
 namespace llvm {
 
+struct SVEStackSizes {

sdesmalen-arm wrote:

Ah yes, those use the same data storage so there will be a conversion required 
in that function. In that case, I'm happy with an explicit cast as in:
```
unsigned &Offset = OffsetForObject(FI, ZPROffset, PPROffset);
Offset += MFI.getObjectSize(FI);
Offset = alignTo(Offset, MFI.getObjectAlign(FI));
if (AssignOffsets) {
  LLVM_DEBUG(dbgs() << "FI: " << FI << ", Offset: " << -int64_t(Offset) << 
"\n");
  Assign(FI, -int64_t(Offset));
}
```

(tangentially related; the double LLVM_DEBUG (the one above and the one in 
`Assign`) look almost identical, can we remove one?)

https://github.com/llvm/llvm-project/pull/142391
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)

2025-07-11 Thread Sander de Smalen via llvm-branch-commits


@@ -4308,26 +4398,33 @@ static int64_t 
determineSVEStackObjectOffsets(MachineFrameInfo &MFI,
"reference.");
 #endif
 
-  auto Assign = [&MFI](int FI, int64_t Offset) {
+  auto StackForObject = [&](int FI, uint64_t &ZPRStackTop,
+uint64_t &PPRStackTop) -> uint64_t & {
+return MFI.getStackID(FI) == TargetStackID::ScalableVector ? ZPRStackTop
+   : PPRStackTop;
+  };
+
+  auto Assign = [&MFI, AssignOffsets](int FI, int64_t Offset) {
+if (AssignOffsets == AssignObjectOffsets::No)
+  return;
 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
 MFI.setObjectOffset(FI, Offset);
   };
 
-  int64_t Offset = 0;
-
   // Then process all callee saved slots.
+  int MinCSFrameIndex, MaxCSFrameIndex;
   if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
-// Assign offsets to the callee save slots.
-for (int I = MinCSFrameIndex; I <= MaxCSFrameIndex; ++I) {
-  Offset += MFI.getObjectSize(I);
-  Offset = alignTo(Offset, MFI.getObjectAlign(I));
-  if (AssignOffsets)
-Assign(I, -Offset);
+for (int FI = MinCSFrameIndex; FI <= MaxCSFrameIndex; ++FI) {
+  uint64_t &StackTop = StackForObject(FI, ZPRStackTop, PPRStackTop);
+  StackTop += MFI.getObjectSize(FI);
+  StackTop = alignTo(StackTop, MFI.getObjectAlign(FI));
+  Assign(FI, -int64_t(StackTop));

sdesmalen-arm wrote:

nit: maybe add an assert that `int64_t(StackTop) == StackTop` in case of 
insanely big stacks?
or move this into the `Assign` function so not to have to add that twice.

https://github.com/llvm/llvm-project/pull/142391
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)

2025-07-11 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm approved this pull request.

LGTM with nits addressed.

https://github.com/llvm/llvm-project/pull/142391
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)

2025-07-11 Thread Sander de Smalen via llvm-branch-commits

https://github.com/sdesmalen-arm edited 
https://github.com/llvm/llvm-project/pull/142391
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)

2025-07-11 Thread Sander de Smalen via llvm-branch-commits


@@ -4042,8 +4124,11 @@ void 
AArch64FrameLowering::determineCalleeSaves(MachineFunction &MF,
   });
 
   // If any callee-saved registers are used, the frame cannot be eliminated.
+  auto [ZPRLocalStackSize, PPRLocalStackSize] =
+  determineSVEStackSizes(MF, AssignObjectOffsets::No);
+  int64_t SVELocals = ZPRLocalStackSize + PPRLocalStackSize;

sdesmalen-arm wrote:

```suggestion
  uint64_t SVELocals = ZPRLocalStackSize + PPRLocalStackSize;
```

(same for `SVEStackSize` below)

https://github.com/llvm/llvm-project/pull/142391
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits