[Lldb-commits] [lldb] [compiler-rt] [clang-tools-extra] [llvm] [flang] [clang] [libcxx] [libc] [lld] [TTI][RISCV]Improve costs for fixed vector whole reg extract/insert. (PR #80164)
@@ -326,6 +326,50 @@ InstructionCost RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind, switch (Kind) { default: break; +case TTI::SK_ExtractSubvector: + if (isa(SubTp)) { +unsigned TpRegs = getRegUsageForType(Tp); +unsigned NumElems = +divideCeil(Tp->getElementCount().getFixedValue(), TpRegs); +// Whole vector extract - just the vector itself + (possible) vsetvli. +// TODO: consider adding the cost for vsetvli. +if (Index == 0 || (ST->getRealMaxVLen() == ST->getRealMinVLen() && + Index % NumElems == 0)) { + std::pair SubLT = + getTypeLegalizationCost(SubTp); + return Index == 0 + ? TTI::TCC_Free + : SubLT.first * getRISCVInstructionCost(RISCV::VMV_V_V, preames wrote: For a full VREG case, you never need the VMV_V_V. You only need the VMV_V_V if NumElems < VLMAX. Extending this to sub-register extract with exact VLEN known would be reasonable, but let's do that in a separate patch. https://github.com/llvm/llvm-project/pull/80164 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [clang-tools-extra] [clang] [libcxx] [compiler-rt] [lldb] [llvm] [flang] [lld] [libc] [TTI][RISCV]Improve costs for fixed vector whole reg extract/insert. (PR #80164)
@@ -326,6 +326,50 @@ InstructionCost RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind, switch (Kind) { default: break; +case TTI::SK_ExtractSubvector: + if (isa(SubTp)) { +unsigned TpRegs = getRegUsageForType(Tp); +unsigned NumElems = +divideCeil(Tp->getElementCount().getFixedValue(), TpRegs); +// Whole vector extract - just the vector itself + (possible) vsetvli. +// TODO: consider adding the cost for vsetvli. +if (Index == 0 || (ST->getRealMaxVLen() == ST->getRealMinVLen() && preames wrote: I think this check would be more clearly expressed as an and of the following clauses a) ST->getRealMaxVLen() == ST->getRealMinVLen() b) NumElems * ElementSizeInBits == VLEN c) Index % NumElems == 0 Note that this only supports m1 full extracts. But starting there and extending it to m2, and m4 later seems entirely reasonable. https://github.com/llvm/llvm-project/pull/80164 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [compiler-rt] [llvm] [clang-tools-extra] [lld] [clang] [libc] [libcxx] [lldb] [flang] [TTI][RISCV]Improve costs for fixed vector whole reg extract/insert. (PR #80164)
@@ -326,6 +326,50 @@ InstructionCost RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind, switch (Kind) { default: break; +case TTI::SK_ExtractSubvector: + if (isa(SubTp)) { +unsigned TpRegs = getRegUsageForType(Tp); +unsigned NumElems = +divideCeil(Tp->getElementCount().getFixedValue(), TpRegs); +// Whole vector extract - just the vector itself + (possible) vsetvli. +// TODO: consider adding the cost for vsetvli. +if (Index == 0 || (ST->getRealMaxVLen() == ST->getRealMinVLen() && + Index % NumElems == 0)) { + std::pair SubLT = + getTypeLegalizationCost(SubTp); + return Index == 0 + ? TTI::TCC_Free + : SubLT.first * getRISCVInstructionCost(RISCV::VMV_V_V, + SubLT.second, + CostKind); +} + } + break; +case TTI::SK_InsertSubvector: + if (auto *FSubTy = dyn_cast(SubTp)) { +unsigned TpRegs = getRegUsageForType(Tp); preames wrote: Same basic style comments as above. https://github.com/llvm/llvm-project/pull/80164 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [libcxx] [libc] [mlir] [clang-tools-extra] [openmp] [llvm] [lldb] [clang] [flang] [lld] [SLP]Add support for strided loads. (PR #80310)
preames wrote: FYI - https://github.com/llvm/llvm-project/pull/80360 adds testing infrastructure to exercise the TTI hooks. https://github.com/llvm/llvm-project/pull/80310 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [libcxx] [libc] [mlir] [clang-tools-extra] [openmp] [llvm] [lldb] [clang] [flang] [lld] [SLP]Add support for strided loads. (PR #80310)
@@ -7,7 +7,7 @@ define i32 @test(ptr noalias %p, ptr noalias %addr) { ; CHECK-NEXT: entry: ; CHECK-NEXT:[[TMP0:%.*]] = insertelement <8 x ptr> poison, ptr [[ADDR:%.*]], i32 0 ; CHECK-NEXT:[[TMP1:%.*]] = shufflevector <8 x ptr> [[TMP0]], <8 x ptr> poison, <8 x i32> zeroinitializer -; CHECK-NEXT:[[TMP2:%.*]] = getelementptr i32, <8 x ptr> [[TMP1]], <8 x i32> +; CHECK-NEXT:[[TMP2:%.*]] = getelementptr i32, <8 x ptr> [[TMP1]], <8 x i32> preames wrote: Same as last. https://github.com/llvm/llvm-project/pull/80310 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [mlir] [clang] [libc] [lldb] [lld] [openmp] [flang] [libcxx] [clang-tools-extra] [llvm] [SLP]Add support for strided loads. (PR #80310)
https://github.com/preames edited https://github.com/llvm/llvm-project/pull/80310 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [flang] [clang] [openmp] [mlir] [libc] [lldb] [lld] [clang-tools-extra] [llvm] [libcxx] [SLP]Add support for strided loads. (PR #80310)
@@ -3930,30 +4065,68 @@ static LoadsState canVectorizeLoads(ArrayRef VL, const Value *VL0, std::optional Diff = getPointersDiff(ScalarTy, Ptr0, ScalarTy, PtrN, DL, SE); // Check that the sorted loads are consecutive. - if (static_cast(*Diff) == VL.size() - 1) + if (static_cast(*Diff) == Sz - 1) return LoadsState::Vectorize; // Simple check if not a strided access - clear order. - IsPossibleStrided = *Diff % (VL.size() - 1) == 0; + bool IsPossibleStrided = *Diff % (Sz - 1) == 0; + // Try to generate strided load node if: + // 1. Target with strided load support is detected. + // 2. The number of loads is greater than MinProfitableStridedLoads, + // or the potential stride <= MaxProfitableLoadStride and the + // potential stride is power-of-2 (to avoid perf regressions for the very + // small number of loads) and max distance > number of loads, or potential + // stride is -1. + // 3. The loads are ordered, or number of unordered loads <= + // MaxProfitableUnorderedLoads, or loads are in reversed order. + // (this check is to avoid extra costs for very expensive shuffles). + if (IsPossibleStrided && (((Sz > MinProfitableStridedLoads || + (static_cast(std::abs(*Diff)) <= + MaxProfitableLoadStride * Sz && + isPowerOf2_32(std::abs(*Diff && + static_cast(std::abs(*Diff)) > Sz) || +*Diff == -(static_cast(Sz) - 1))) { +int Stride = *Diff / static_cast(Sz - 1); +if (*Diff == Stride * static_cast(Sz - 1)) { + if (TTI.isTypeLegal(VecTy) && preames wrote: The isTypeLegal check here should be redundant. https://github.com/llvm/llvm-project/pull/80310 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [flang] [libc] [mlir] [libcxx] [lldb] [lld] [clang] [openmp] [clang-tools-extra] [llvm] [SLP]Add support for strided loads. (PR #80310)
@@ -397,27 +241,12 @@ define void @test3([48 x float]* %p, float* noalias %s) { ; CHECK-NEXT: entry: ; CHECK-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds [48 x float], ptr [[P:%.*]], i64 0, i64 0 ; CHECK-NEXT:[[ARRAYIDX2:%.*]] = getelementptr inbounds float, ptr [[S:%.*]], i64 0 -; CHECK-NEXT:[[ARRAYIDX4:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 4 -; CHECK-NEXT:[[ARRAYIDX11:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 8 -; CHECK-NEXT:[[ARRAYIDX18:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 12 -; CHECK-NEXT:[[ARRAYIDX25:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 16 -; CHECK-NEXT:[[ARRAYIDX32:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 20 -; CHECK-NEXT:[[ARRAYIDX39:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 24 -; CHECK-NEXT:[[ARRAYIDX46:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 28 ; CHECK-NEXT:[[ARRAYIDX48:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 23 -; CHECK-NEXT:[[TMP0:%.*]] = insertelement <8 x ptr> poison, ptr [[ARRAYIDX]], i32 0 -; CHECK-NEXT:[[TMP1:%.*]] = insertelement <8 x ptr> [[TMP0]], ptr [[ARRAYIDX4]], i32 1 -; CHECK-NEXT:[[TMP2:%.*]] = insertelement <8 x ptr> [[TMP1]], ptr [[ARRAYIDX11]], i32 2 -; CHECK-NEXT:[[TMP3:%.*]] = insertelement <8 x ptr> [[TMP2]], ptr [[ARRAYIDX18]], i32 3 -; CHECK-NEXT:[[TMP4:%.*]] = insertelement <8 x ptr> [[TMP3]], ptr [[ARRAYIDX25]], i32 4 -; CHECK-NEXT:[[TMP5:%.*]] = insertelement <8 x ptr> [[TMP4]], ptr [[ARRAYIDX32]], i32 5 -; CHECK-NEXT:[[TMP6:%.*]] = insertelement <8 x ptr> [[TMP5]], ptr [[ARRAYIDX39]], i32 6 -; CHECK-NEXT:[[TMP7:%.*]] = insertelement <8 x ptr> [[TMP6]], ptr [[ARRAYIDX46]], i32 7 -; CHECK-NEXT:[[TMP8:%.*]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0(<8 x ptr> [[TMP7]], i32 4, <8 x i1> , <8 x float> poison) -; CHECK-NEXT:[[TMP9:%.*]] = load <8 x float>, ptr [[ARRAYIDX48]], align 4 -; CHECK-NEXT:[[TMP10:%.*]] = shufflevector <8 x float> [[TMP9]], <8 x float> poison, <8 x i32> -; CHECK-NEXT:[[TMP11:%.*]] = fsub fast <8 x float> [[TMP10]], [[TMP8]] -; CHECK-NEXT:store <8 x float> [[TMP11]], ptr [[ARRAYIDX2]], align 4 +; CHECK-NEXT:[[TMP0:%.*]] = call <8 x float> @llvm.experimental.vp.strided.load.v8f32.p0.i64(ptr align 4 [[ARRAYIDX]], i64 16, <8 x i1> , i32 8) +; CHECK-NEXT:[[TMP1:%.*]] = load <8 x float>, ptr [[ARRAYIDX48]], align 4 +; CHECK-NEXT:[[TMP2:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> poison, <8 x i32> preames wrote: Can't this reverse become a negative strided load? https://github.com/llvm/llvm-project/pull/80310 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [clang] [libc] [lld] [llvm] [lldb] [libcxx] [mlir] [flang] [openmp] [clang-tools-extra] [SLP]Add support for strided loads. (PR #80310)
@@ -17,7 +17,7 @@ define i16 @test() { ; CHECK-NEXT:[[TMP4:%.*]] = call <2 x i16> @llvm.masked.gather.v2i16.v2p0(<2 x ptr> [[TMP3]], i32 2, <2 x i1> , <2 x i16> poison) ; CHECK-NEXT:[[TMP5:%.*]] = extractelement <2 x i16> [[TMP4]], i32 0 ; CHECK-NEXT:[[TMP6:%.*]] = extractelement <2 x i16> [[TMP4]], i32 1 -; CHECK-NEXT:[[CMP_I178:%.*]] = icmp ult i16 [[TMP6]], [[TMP5]] +; CHECK-NEXT:[[CMP_I178:%.*]] = icmp ult i16 [[TMP5]], [[TMP6]] ; CHECK-NEXT:br label [[WHILE_BODY_I]] ; entry: preames wrote: Unless this is specifically testing something about offsets from null, can you update this test to pass in a pointer argument and index off that? (Separate change, no review needed.) https://github.com/llvm/llvm-project/pull/80310 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [flang] [libcxx] [lldb] [lld] [mlir] [clang-tools-extra] [libc] [openmp] [llvm] [clang] [SLP]Add support for strided loads. (PR #80310)
@@ -30,7 +30,7 @@ define void @test() { ; CHECK-SLP-THRESHOLD: bb: ; CHECK-SLP-THRESHOLD-NEXT:[[TMP0:%.*]] = insertelement <4 x ptr> poison, ptr [[COND_IN_V]], i32 0 ; CHECK-SLP-THRESHOLD-NEXT:[[TMP1:%.*]] = shufflevector <4 x ptr> [[TMP0]], <4 x ptr> poison, <4 x i32> zeroinitializer -; CHECK-SLP-THRESHOLD-NEXT:[[TMP2:%.*]] = getelementptr i64, <4 x ptr> [[TMP1]], <4 x i64> +; CHECK-SLP-THRESHOLD-NEXT:[[TMP2:%.*]] = getelementptr i64, <4 x ptr> [[TMP1]], <4 x i64> preames wrote: Shouldn't this be a strided load with a stride of -4*8? If what you're aiming for is test stability, can you use a index which doesn't look anything like a strided load? https://github.com/llvm/llvm-project/pull/80310 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [libcxx] [flang] [mlir] [openmp] [llvm] [clang] [clang-tools-extra] [lldb] [lld] [libc] [SLP]Add support for strided loads. (PR #80310)
https://github.com/preames commented: These comments are trying to be helpful in pointing out bits which might be simplified or split off, but my track record with SLP reviews is not great. Feel free to ignore any or all of these. https://github.com/llvm/llvm-project/pull/80310 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [openmp] [clang-tools-extra] [libcxx] [lld] [flang] [clang] [llvm] [lldb] [mlir] [libc] [SLP]Add support for strided loads. (PR #80310)
@@ -3930,30 +4065,68 @@ static LoadsState canVectorizeLoads(ArrayRef VL, const Value *VL0, std::optional Diff = getPointersDiff(ScalarTy, Ptr0, ScalarTy, PtrN, DL, SE); // Check that the sorted loads are consecutive. - if (static_cast(*Diff) == VL.size() - 1) + if (static_cast(*Diff) == Sz - 1) return LoadsState::Vectorize; // Simple check if not a strided access - clear order. - IsPossibleStrided = *Diff % (VL.size() - 1) == 0; + bool IsPossibleStrided = *Diff % (Sz - 1) == 0; + // Try to generate strided load node if: + // 1. Target with strided load support is detected. + // 2. The number of loads is greater than MinProfitableStridedLoads, + // or the potential stride <= MaxProfitableLoadStride and the + // potential stride is power-of-2 (to avoid perf regressions for the very + // small number of loads) and max distance > number of loads, or potential + // stride is -1. + // 3. The loads are ordered, or number of unordered loads <= + // MaxProfitableUnorderedLoads, or loads are in reversed order. + // (this check is to avoid extra costs for very expensive shuffles). + if (IsPossibleStrided && (((Sz > MinProfitableStridedLoads || + (static_cast(std::abs(*Diff)) <= + MaxProfitableLoadStride * Sz && + isPowerOf2_32(std::abs(*Diff && + static_cast(std::abs(*Diff)) > Sz) || +*Diff == -(static_cast(Sz) - 1))) { +int Stride = *Diff / static_cast(Sz - 1); preames wrote: How is the diff-in-bytes divided by the number of elements the stride? Did you maybe mean to use element size here? It's also possible you have two Sz variables with different meaning. I did not check for this. https://github.com/llvm/llvm-project/pull/80310 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [libcxx] [lldb] [clang] [lld] [libc] [llvm] [mlir] [flang] [openmp] [clang-tools-extra] [SLP]Add support for strided loads. (PR #80310)
@@ -3878,6 +3883,130 @@ static Align computeCommonAlignment(ArrayRef VL) { return CommonAlignment; } +/// Check if \p Order represents reverse order. +static bool isReverseOrder(ArrayRef Order) { + unsigned Sz = Order.size(); + return !Order.empty() && all_of(enumerate(Order), [&](const auto &Pair) { +return Pair.value() == Sz || Sz - Pair.index() - 1 == Pair.value(); + }); +} + +/// Checks if the provided list of pointers \p Pointers represents the strided +/// pointers for type ElemTy. If they are not, std::nullopt is returned. +/// Otherwise, if \p Inst is not specified, just initialized optional value is +/// returned to show that the pointers represent strided pointers. If \p Inst +/// specified, the runtime stride is materialized before the given \p Inst. +/// \returns std::nullopt if the pointers are not pointers with the runtime +/// stride, nullptr or actual stride value, otherwise. +static std::optional +calculateRtStride(ArrayRef PointerOps, Type *ElemTy, + const DataLayout &DL, ScalarEvolution &SE, + SmallVectorImpl &SortedIndices, + Instruction *Inst = nullptr) { + SmallVector SCEVs; preames wrote: An alternate approach which might be simpler and yet cover many of the interesting test cases might be: * Loop over the pointers, check that getPointerBase matches. * Loop again doing removePointerBase * This gives a list of offsets from base, bail if any non-constant * Sort the list of constant offsets * Check if strided w/shuffle? If you don't want a shuffle afterwards, you can check the delta without sorting. This won't cover non-constant strides, but I'm not sure we really care about those in practice. https://github.com/llvm/llvm-project/pull/80310 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [clang] [libc] [libcxx] [lldb] [llvm] [doc] Add Discord invite link alongside channel links (PR #126352)
@@ -149,7 +149,7 @@ Open Clang Projects If you hit a bug with Clang, it is very useful for us if you reduce the code that demonstrates the problem down to something small. There are many ways to do this; ask on https://discourse.llvm.org/c/clang";>Discourse, -https://discord.com/channels/636084430946959380/636725486533345280";>Discord +https://discord.gg/xS7Z362";>Discord preames wrote: Can you keep the channel link for this one, and add the invite? https://github.com/llvm/llvm-project/pull/126352 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [clang] [libc] [libcxx] [lldb] [llvm] [doc] Add Discord invite link alongside channel links (PR #126352)
https://github.com/preames edited https://github.com/llvm/llvm-project/pull/126352 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [clang] [libc] [libcxx] [lldb] [llvm] [doc] Add Discord invite link alongside channel links (PR #126352)
https://github.com/preames approved this pull request. LGTM w/requested change. Another option would be to expand a section on joining discord somewhere, and then scatter links to that in the docs instead of the invite link itself. https://github.com/llvm/llvm-project/pull/126352 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [lldb] [lldb][test] Disable flaky test_qThreadInfo_matches_qC_attach test on AArch64 Linux (PR #138940)
https://github.com/preames approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/138940 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits