[Lldb-commits] [lldb] [compiler-rt] [clang-tools-extra] [llvm] [flang] [clang] [libcxx] [libc] [lld] [TTI][RISCV]Improve costs for fixed vector whole reg extract/insert. (PR #80164)

2024-02-01 Thread Philip Reames via lldb-commits


@@ -326,6 +326,50 @@ InstructionCost 
RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
 switch (Kind) {
 default:
   break;
+case TTI::SK_ExtractSubvector:
+  if (isa(SubTp)) {
+unsigned TpRegs = getRegUsageForType(Tp);
+unsigned NumElems =
+divideCeil(Tp->getElementCount().getFixedValue(), TpRegs);
+// Whole vector extract - just the vector itself + (possible) vsetvli.
+// TODO: consider adding the cost for vsetvli.
+if (Index == 0 || (ST->getRealMaxVLen() == ST->getRealMinVLen() &&
+   Index % NumElems == 0)) {
+  std::pair SubLT =
+  getTypeLegalizationCost(SubTp);
+  return Index == 0
+ ? TTI::TCC_Free
+ : SubLT.first * getRISCVInstructionCost(RISCV::VMV_V_V,

preames wrote:

For a full VREG case, you never need the VMV_V_V.  You only need the VMV_V_V if 
NumElems < VLMAX.  

Extending this to sub-register extract with exact VLEN known would be 
reasonable, but let's do that in a separate patch.

https://github.com/llvm/llvm-project/pull/80164
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [clang-tools-extra] [clang] [libcxx] [compiler-rt] [lldb] [llvm] [flang] [lld] [libc] [TTI][RISCV]Improve costs for fixed vector whole reg extract/insert. (PR #80164)

2024-02-01 Thread Philip Reames via lldb-commits


@@ -326,6 +326,50 @@ InstructionCost 
RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
 switch (Kind) {
 default:
   break;
+case TTI::SK_ExtractSubvector:
+  if (isa(SubTp)) {
+unsigned TpRegs = getRegUsageForType(Tp);
+unsigned NumElems =
+divideCeil(Tp->getElementCount().getFixedValue(), TpRegs);
+// Whole vector extract - just the vector itself + (possible) vsetvli.
+// TODO: consider adding the cost for vsetvli.
+if (Index == 0 || (ST->getRealMaxVLen() == ST->getRealMinVLen() &&

preames wrote:

I think this check would be more clearly expressed as an and of the following 
clauses
a) ST->getRealMaxVLen() == ST->getRealMinVLen()
b) NumElems * ElementSizeInBits == VLEN
c) Index % NumElems == 0

Note that this only supports m1 full extracts.  But starting there and 
extending it to m2, and m4 later seems entirely reasonable.

https://github.com/llvm/llvm-project/pull/80164
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [compiler-rt] [llvm] [clang-tools-extra] [lld] [clang] [libc] [libcxx] [lldb] [flang] [TTI][RISCV]Improve costs for fixed vector whole reg extract/insert. (PR #80164)

2024-02-01 Thread Philip Reames via lldb-commits


@@ -326,6 +326,50 @@ InstructionCost 
RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
 switch (Kind) {
 default:
   break;
+case TTI::SK_ExtractSubvector:
+  if (isa(SubTp)) {
+unsigned TpRegs = getRegUsageForType(Tp);
+unsigned NumElems =
+divideCeil(Tp->getElementCount().getFixedValue(), TpRegs);
+// Whole vector extract - just the vector itself + (possible) vsetvli.
+// TODO: consider adding the cost for vsetvli.
+if (Index == 0 || (ST->getRealMaxVLen() == ST->getRealMinVLen() &&
+   Index % NumElems == 0)) {
+  std::pair SubLT =
+  getTypeLegalizationCost(SubTp);
+  return Index == 0
+ ? TTI::TCC_Free
+ : SubLT.first * getRISCVInstructionCost(RISCV::VMV_V_V,
+ SubLT.second,
+ CostKind);
+}
+  }
+  break;
+case TTI::SK_InsertSubvector:
+  if (auto *FSubTy = dyn_cast(SubTp)) {
+unsigned TpRegs = getRegUsageForType(Tp);

preames wrote:

Same basic style comments as above.

https://github.com/llvm/llvm-project/pull/80164
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [libcxx] [libc] [mlir] [clang-tools-extra] [openmp] [llvm] [lldb] [clang] [flang] [lld] [SLP]Add support for strided loads. (PR #80310)

2024-02-01 Thread Philip Reames via lldb-commits

preames wrote:

FYI - https://github.com/llvm/llvm-project/pull/80360 adds testing 
infrastructure to exercise the TTI hooks.

https://github.com/llvm/llvm-project/pull/80310
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [libcxx] [libc] [mlir] [clang-tools-extra] [openmp] [llvm] [lldb] [clang] [flang] [lld] [SLP]Add support for strided loads. (PR #80310)

2024-02-01 Thread Philip Reames via lldb-commits


@@ -7,7 +7,7 @@ define i32 @test(ptr noalias %p, ptr noalias %addr) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:[[TMP0:%.*]] = insertelement <8 x ptr> poison, ptr 
[[ADDR:%.*]], i32 0
 ; CHECK-NEXT:[[TMP1:%.*]] = shufflevector <8 x ptr> [[TMP0]], <8 x ptr> 
poison, <8 x i32> zeroinitializer
-; CHECK-NEXT:[[TMP2:%.*]] = getelementptr i32, <8 x ptr> [[TMP1]], <8 x 
i32> 
+; CHECK-NEXT:[[TMP2:%.*]] = getelementptr i32, <8 x ptr> [[TMP1]], <8 x 
i32> 

preames wrote:

Same as last.

https://github.com/llvm/llvm-project/pull/80310
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [mlir] [clang] [libc] [lldb] [lld] [openmp] [flang] [libcxx] [clang-tools-extra] [llvm] [SLP]Add support for strided loads. (PR #80310)

2024-02-01 Thread Philip Reames via lldb-commits

https://github.com/preames edited 
https://github.com/llvm/llvm-project/pull/80310
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [flang] [clang] [openmp] [mlir] [libc] [lldb] [lld] [clang-tools-extra] [llvm] [libcxx] [SLP]Add support for strided loads. (PR #80310)

2024-02-01 Thread Philip Reames via lldb-commits


@@ -3930,30 +4065,68 @@ static LoadsState canVectorizeLoads(ArrayRef 
VL, const Value *VL0,
   std::optional Diff =
   getPointersDiff(ScalarTy, Ptr0, ScalarTy, PtrN, DL, SE);
   // Check that the sorted loads are consecutive.
-  if (static_cast(*Diff) == VL.size() - 1)
+  if (static_cast(*Diff) == Sz - 1)
 return LoadsState::Vectorize;
   // Simple check if not a strided access - clear order.
-  IsPossibleStrided = *Diff % (VL.size() - 1) == 0;
+  bool IsPossibleStrided = *Diff % (Sz - 1) == 0;
+  // Try to generate strided load node if:
+  // 1. Target with strided load support is detected.
+  // 2. The number of loads is greater than MinProfitableStridedLoads,
+  // or the potential stride <= MaxProfitableLoadStride and the
+  // potential stride is power-of-2 (to avoid perf regressions for the very
+  // small number of loads) and max distance > number of loads, or 
potential
+  // stride is -1.
+  // 3. The loads are ordered, or number of unordered loads <=
+  // MaxProfitableUnorderedLoads, or loads are in reversed order.
+  // (this check is to avoid extra costs for very expensive shuffles).
+  if (IsPossibleStrided && (((Sz > MinProfitableStridedLoads ||
+  (static_cast(std::abs(*Diff)) <=
+   MaxProfitableLoadStride * Sz &&
+   isPowerOf2_32(std::abs(*Diff &&
+ static_cast(std::abs(*Diff)) > Sz) 
||
+*Diff == -(static_cast(Sz) - 1))) {
+int Stride = *Diff / static_cast(Sz - 1);
+if (*Diff == Stride * static_cast(Sz - 1)) {
+  if (TTI.isTypeLegal(VecTy) &&

preames wrote:

The isTypeLegal check here should be redundant.  

https://github.com/llvm/llvm-project/pull/80310
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [flang] [libc] [mlir] [libcxx] [lldb] [lld] [clang] [openmp] [clang-tools-extra] [llvm] [SLP]Add support for strided loads. (PR #80310)

2024-02-01 Thread Philip Reames via lldb-commits


@@ -397,27 +241,12 @@ define void @test3([48 x float]* %p, float* noalias %s) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds [48 x float], ptr 
[[P:%.*]], i64 0, i64 0
 ; CHECK-NEXT:[[ARRAYIDX2:%.*]] = getelementptr inbounds float, ptr 
[[S:%.*]], i64 0
-; CHECK-NEXT:[[ARRAYIDX4:%.*]] = getelementptr inbounds [48 x float], ptr 
[[P]], i64 0, i64 4
-; CHECK-NEXT:[[ARRAYIDX11:%.*]] = getelementptr inbounds [48 x float], ptr 
[[P]], i64 0, i64 8
-; CHECK-NEXT:[[ARRAYIDX18:%.*]] = getelementptr inbounds [48 x float], ptr 
[[P]], i64 0, i64 12
-; CHECK-NEXT:[[ARRAYIDX25:%.*]] = getelementptr inbounds [48 x float], ptr 
[[P]], i64 0, i64 16
-; CHECK-NEXT:[[ARRAYIDX32:%.*]] = getelementptr inbounds [48 x float], ptr 
[[P]], i64 0, i64 20
-; CHECK-NEXT:[[ARRAYIDX39:%.*]] = getelementptr inbounds [48 x float], ptr 
[[P]], i64 0, i64 24
-; CHECK-NEXT:[[ARRAYIDX46:%.*]] = getelementptr inbounds [48 x float], ptr 
[[P]], i64 0, i64 28
 ; CHECK-NEXT:[[ARRAYIDX48:%.*]] = getelementptr inbounds [48 x float], ptr 
[[P]], i64 0, i64 23
-; CHECK-NEXT:[[TMP0:%.*]] = insertelement <8 x ptr> poison, ptr 
[[ARRAYIDX]], i32 0
-; CHECK-NEXT:[[TMP1:%.*]] = insertelement <8 x ptr> [[TMP0]], ptr 
[[ARRAYIDX4]], i32 1
-; CHECK-NEXT:[[TMP2:%.*]] = insertelement <8 x ptr> [[TMP1]], ptr 
[[ARRAYIDX11]], i32 2
-; CHECK-NEXT:[[TMP3:%.*]] = insertelement <8 x ptr> [[TMP2]], ptr 
[[ARRAYIDX18]], i32 3
-; CHECK-NEXT:[[TMP4:%.*]] = insertelement <8 x ptr> [[TMP3]], ptr 
[[ARRAYIDX25]], i32 4
-; CHECK-NEXT:[[TMP5:%.*]] = insertelement <8 x ptr> [[TMP4]], ptr 
[[ARRAYIDX32]], i32 5
-; CHECK-NEXT:[[TMP6:%.*]] = insertelement <8 x ptr> [[TMP5]], ptr 
[[ARRAYIDX39]], i32 6
-; CHECK-NEXT:[[TMP7:%.*]] = insertelement <8 x ptr> [[TMP6]], ptr 
[[ARRAYIDX46]], i32 7
-; CHECK-NEXT:[[TMP8:%.*]] = call <8 x float> 
@llvm.masked.gather.v8f32.v8p0(<8 x ptr> [[TMP7]], i32 4, <8 x i1> , <8 x float> poison)
-; CHECK-NEXT:[[TMP9:%.*]] = load <8 x float>, ptr [[ARRAYIDX48]], align 4
-; CHECK-NEXT:[[TMP10:%.*]] = shufflevector <8 x float> [[TMP9]], <8 x 
float> poison, <8 x i32> 
-; CHECK-NEXT:[[TMP11:%.*]] = fsub fast <8 x float> [[TMP10]], [[TMP8]]
-; CHECK-NEXT:store <8 x float> [[TMP11]], ptr [[ARRAYIDX2]], align 4
+; CHECK-NEXT:[[TMP0:%.*]] = call <8 x float> 
@llvm.experimental.vp.strided.load.v8f32.p0.i64(ptr align 4 [[ARRAYIDX]], i64 
16, <8 x i1> , i32 8)
+; CHECK-NEXT:[[TMP1:%.*]] = load <8 x float>, ptr [[ARRAYIDX48]], align 4
+; CHECK-NEXT:[[TMP2:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x 
float> poison, <8 x i32> 

preames wrote:

Can't this reverse become a negative strided load?

https://github.com/llvm/llvm-project/pull/80310
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [clang] [libc] [lld] [llvm] [lldb] [libcxx] [mlir] [flang] [openmp] [clang-tools-extra] [SLP]Add support for strided loads. (PR #80310)

2024-02-01 Thread Philip Reames via lldb-commits


@@ -17,7 +17,7 @@ define i16 @test() {
 ; CHECK-NEXT:[[TMP4:%.*]] = call <2 x i16> 
@llvm.masked.gather.v2i16.v2p0(<2 x ptr> [[TMP3]], i32 2, <2 x i1> , <2 x i16> poison)
 ; CHECK-NEXT:[[TMP5:%.*]] = extractelement <2 x i16> [[TMP4]], i32 0
 ; CHECK-NEXT:[[TMP6:%.*]] = extractelement <2 x i16> [[TMP4]], i32 1
-; CHECK-NEXT:[[CMP_I178:%.*]] = icmp ult i16 [[TMP6]], [[TMP5]]
+; CHECK-NEXT:[[CMP_I178:%.*]] = icmp ult i16 [[TMP5]], [[TMP6]]
 ; CHECK-NEXT:br label [[WHILE_BODY_I]]
 ;
 entry:

preames wrote:

Unless this is specifically testing something about offsets from null, can you 
update this test to pass in a pointer argument and index off that?  

(Separate change, no review needed.)

https://github.com/llvm/llvm-project/pull/80310
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [flang] [libcxx] [lldb] [lld] [mlir] [clang-tools-extra] [libc] [openmp] [llvm] [clang] [SLP]Add support for strided loads. (PR #80310)

2024-02-01 Thread Philip Reames via lldb-commits


@@ -30,7 +30,7 @@ define void @test() {
 ; CHECK-SLP-THRESHOLD:   bb:
 ; CHECK-SLP-THRESHOLD-NEXT:[[TMP0:%.*]] = insertelement <4 x ptr> poison, 
ptr [[COND_IN_V]], i32 0
 ; CHECK-SLP-THRESHOLD-NEXT:[[TMP1:%.*]] = shufflevector <4 x ptr> 
[[TMP0]], <4 x ptr> poison, <4 x i32> zeroinitializer
-; CHECK-SLP-THRESHOLD-NEXT:[[TMP2:%.*]] = getelementptr i64, <4 x ptr> 
[[TMP1]], <4 x i64> 
+; CHECK-SLP-THRESHOLD-NEXT:[[TMP2:%.*]] = getelementptr i64, <4 x ptr> 
[[TMP1]], <4 x i64> 

preames wrote:

Shouldn't this be a strided load with a stride of -4*8?

If what you're aiming for is test stability, can you use a index which doesn't 
look anything like a strided load?

https://github.com/llvm/llvm-project/pull/80310
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [libcxx] [flang] [mlir] [openmp] [llvm] [clang] [clang-tools-extra] [lldb] [lld] [libc] [SLP]Add support for strided loads. (PR #80310)

2024-02-01 Thread Philip Reames via lldb-commits

https://github.com/preames commented:

These comments are trying to be helpful in pointing out bits which might be 
simplified or split off, but my track record with SLP reviews is not great.  
Feel free to ignore any or all of these.  

https://github.com/llvm/llvm-project/pull/80310
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [openmp] [clang-tools-extra] [libcxx] [lld] [flang] [clang] [llvm] [lldb] [mlir] [libc] [SLP]Add support for strided loads. (PR #80310)

2024-02-01 Thread Philip Reames via lldb-commits


@@ -3930,30 +4065,68 @@ static LoadsState canVectorizeLoads(ArrayRef 
VL, const Value *VL0,
   std::optional Diff =
   getPointersDiff(ScalarTy, Ptr0, ScalarTy, PtrN, DL, SE);
   // Check that the sorted loads are consecutive.
-  if (static_cast(*Diff) == VL.size() - 1)
+  if (static_cast(*Diff) == Sz - 1)
 return LoadsState::Vectorize;
   // Simple check if not a strided access - clear order.
-  IsPossibleStrided = *Diff % (VL.size() - 1) == 0;
+  bool IsPossibleStrided = *Diff % (Sz - 1) == 0;
+  // Try to generate strided load node if:
+  // 1. Target with strided load support is detected.
+  // 2. The number of loads is greater than MinProfitableStridedLoads,
+  // or the potential stride <= MaxProfitableLoadStride and the
+  // potential stride is power-of-2 (to avoid perf regressions for the very
+  // small number of loads) and max distance > number of loads, or 
potential
+  // stride is -1.
+  // 3. The loads are ordered, or number of unordered loads <=
+  // MaxProfitableUnorderedLoads, or loads are in reversed order.
+  // (this check is to avoid extra costs for very expensive shuffles).
+  if (IsPossibleStrided && (((Sz > MinProfitableStridedLoads ||
+  (static_cast(std::abs(*Diff)) <=
+   MaxProfitableLoadStride * Sz &&
+   isPowerOf2_32(std::abs(*Diff &&
+ static_cast(std::abs(*Diff)) > Sz) 
||
+*Diff == -(static_cast(Sz) - 1))) {
+int Stride = *Diff / static_cast(Sz - 1);

preames wrote:

How is the diff-in-bytes divided by the number of elements the stride?  Did you 
maybe mean to use element size here?

It's also possible you have two Sz variables with different meaning.  I did not 
check for this.

https://github.com/llvm/llvm-project/pull/80310
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [libcxx] [lldb] [clang] [lld] [libc] [llvm] [mlir] [flang] [openmp] [clang-tools-extra] [SLP]Add support for strided loads. (PR #80310)

2024-02-01 Thread Philip Reames via lldb-commits


@@ -3878,6 +3883,130 @@ static Align computeCommonAlignment(ArrayRef 
VL) {
   return CommonAlignment;
 }
 
+/// Check if \p Order represents reverse order.
+static bool isReverseOrder(ArrayRef Order) {
+  unsigned Sz = Order.size();
+  return !Order.empty() && all_of(enumerate(Order), [&](const auto &Pair) {
+return Pair.value() == Sz || Sz - Pair.index() - 1 == Pair.value();
+  });
+}
+
+/// Checks if the provided list of pointers \p Pointers represents the strided
+/// pointers for type ElemTy. If they are not, std::nullopt is returned.
+/// Otherwise, if \p Inst is not specified, just initialized optional value is
+/// returned to show that the pointers represent strided pointers. If \p Inst
+/// specified, the runtime stride is materialized before the given \p Inst.
+/// \returns std::nullopt if the pointers are not pointers with the runtime
+/// stride, nullptr or actual stride value, otherwise.
+static std::optional
+calculateRtStride(ArrayRef PointerOps, Type *ElemTy,
+  const DataLayout &DL, ScalarEvolution &SE,
+  SmallVectorImpl &SortedIndices,
+  Instruction *Inst = nullptr) {
+  SmallVector SCEVs;

preames wrote:

An alternate approach which might be simpler and yet cover many of the 
interesting test cases might be:
* Loop over the pointers, check that getPointerBase matches.
* Loop again doing removePointerBase
* This gives a list of offsets from base, bail if any non-constant
* Sort the list of constant offsets
* Check if strided w/shuffle?

If you don't want a shuffle afterwards, you can check the delta without sorting.

This won't cover non-constant strides, but I'm not sure we really care about 
those in practice.

https://github.com/llvm/llvm-project/pull/80310
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [clang] [libc] [libcxx] [lldb] [llvm] [doc] Add Discord invite link alongside channel links (PR #126352)

2025-02-08 Thread Philip Reames via lldb-commits


@@ -149,7 +149,7 @@ Open Clang Projects
 If you hit a bug with Clang, it is very useful for us if you reduce the code
 that demonstrates the problem down to something small. There are many ways to
 do this; ask on https://discourse.llvm.org/c/clang";>Discourse,
-https://discord.com/channels/636084430946959380/636725486533345280";>Discord
+https://discord.gg/xS7Z362";>Discord

preames wrote:

Can you keep the channel link for this one, and add the invite?  

https://github.com/llvm/llvm-project/pull/126352
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [clang] [libc] [libcxx] [lldb] [llvm] [doc] Add Discord invite link alongside channel links (PR #126352)

2025-02-08 Thread Philip Reames via lldb-commits

https://github.com/preames edited 
https://github.com/llvm/llvm-project/pull/126352
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [clang] [libc] [libcxx] [lldb] [llvm] [doc] Add Discord invite link alongside channel links (PR #126352)

2025-02-08 Thread Philip Reames via lldb-commits

https://github.com/preames approved this pull request.

LGTM w/requested change.

Another option would be to expand a section on joining discord somewhere, and 
then scatter links to that in the docs instead of the invite link itself. 

https://github.com/llvm/llvm-project/pull/126352
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb][test] Disable flaky test_qThreadInfo_matches_qC_attach test on AArch64 Linux (PR #138940)

2025-05-07 Thread Philip Reames via lldb-commits

https://github.com/preames approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/138940
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits