I committed a fix yesterday, should be fixed. Another one planning to commit later today or tomorrow.
Best regards, Alexey Bataev > 7 дек. 2021 г., в 07:08, Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> > написал(а): > > Hi Alexey, > > After your patch Clang crashes while building 453.povray for > aarch64-linux-gnu. Apparently, this happens only with LTO enabled at -O2 and > -O3. > > Did you get any bug reports against this patch already? > > Thanks, > > -- > Maxim Kuvyrkov > https://www.linaro.org > >> On 5 Dec 2021, at 02:55, ci_not...@linaro.org wrote: >> >> After llvm commit ba74bb3a226e1b4660537f274627285b1bf41ee1 >> Author: Alexey Bataev <a.bat...@outlook.com> >> >> [SLP]Fix reused extracts cost. >> >> the following benchmarks slowed down by more than 2%: >> - 453.povray failed to build >> >> Below reproducer instructions can be used to re-build both "first_bad" and >> "last_good" cross-toolchains used in this bisection. Naturally, the scripts >> will fail when triggerring benchmarking jobs if you don't have access to >> Linaro TCWG CI. >> >> For your convenience, we have uploaded tarballs with pre-processed source >> and assembly files at: >> - First_bad save-temps: >> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-ba74bb3a226e1b4660537f274627285b1bf41ee1/save-temps/ >> - Last_good save-temps: >> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-78cc133c63173a4b5b7a43750cc507d4cff683cf/save-temps/ >> - Baseline save-temps: >> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-baseline/save-temps/ >> >> Configuration: >> - Benchmark: SPEC CPU2006 >> - Toolchain: Clang + Glibc + LLVM Linker >> - Version: all components were built from their tip of trunk >> - Target: aarch64-linux-gnu >> - Compiler flags: -O3 -flto >> - Hardware: NVidia TX1 4x Cortex-A57 >> >> This benchmarking CI is work-in-progress, and we welcome feedback and >> suggestions at linaro-toolchain@lists.linaro.org . In our improvement plans >> is to add support for SPEC CPU2017 benchmarks and provide "perf >> report/annotate" data behind these reports. >> >> THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, >> REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. >> >> This commit has regressed these CI configurations: >> - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3_LTO >> >> First_bad build: >> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-ba74bb3a226e1b4660537f274627285b1bf41ee1/ >> Last_good build: >> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-78cc133c63173a4b5b7a43750cc507d4cff683cf/ >> Baseline build: >> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-baseline/ >> Even more details: >> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/ >> >> Reproduce builds: >> <cut> >> mkdir investigate-llvm-ba74bb3a226e1b4660537f274627285b1bf41ee1 >> cd investigate-llvm-ba74bb3a226e1b4660537f274627285b1bf41ee1 >> >> # Fetch scripts >> git clone https://git.linaro.org/toolchain/jenkins-scripts >> >> # Fetch manifests and test.sh script >> mkdir -p artifacts/manifests >> curl -o artifacts/manifests/build-baseline.sh >> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/manifests/build-baseline.sh >> --fail >> curl -o artifacts/manifests/build-parameters.sh >> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/manifests/build-parameters.sh >> --fail >> curl -o artifacts/test.sh >> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/test.sh >> --fail >> chmod +x artifacts/test.sh >> >> # Reproduce the baseline build (build all pre-requisites) >> ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh >> >> # Save baseline build state (which is then restored in artifacts/test.sh) >> mkdir -p ./bisect >> rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ >> --exclude /llvm/ ./ ./bisect/baseline/ >> >> cd llvm >> >> # Reproduce first_bad build >> git checkout --detach ba74bb3a226e1b4660537f274627285b1bf41ee1 >> ../artifacts/test.sh >> >> # Reproduce last_good build >> git checkout --detach 78cc133c63173a4b5b7a43750cc507d4cff683cf >> ../artifacts/test.sh >> >> cd .. >> </cut> >> >> Full commit (up to 1000 lines): >> <cut> >> commit ba74bb3a226e1b4660537f274627285b1bf41ee1 >> Author: Alexey Bataev <a.bat...@outlook.com> >> Date: Thu Dec 2 04:22:55 2021 -0800 >> >> [SLP]Fix reused extracts cost. >> >> If the extractelement instruction is used multiple times in the >> different tree entries (either vectorized, or gathered), need to >> compensate the scalar cost of such instructions. They are completely >> removed if all users are part of the tree but we need to compensate the >> cost only once for each instruction. >> >> Differential Revision: https://reviews.llvm.org/D114958 >> --- >> llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | 29 >> +++++++++++++--------- >> .../X86/extractelement-multiple-uses.ll | 23 +++++++++-------- >> 2 files changed, 29 insertions(+), 23 deletions(-) >> >> diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp >> b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp >> index 95061e9053fa..335ad6c85387 100644 >> --- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp >> +++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp >> @@ -4287,8 +4287,8 @@ bool BoUpSLP::canReuseExtract(ArrayRef<Value *> VL, >> Value *OpValue, >> bool BoUpSLP::areAllUsersVectorized(Instruction *I, >> ArrayRef<Value *> VectorizedVals) const { >> return (I->hasOneUse() && is_contained(VectorizedVals, I)) || >> - llvm::all_of(I->users(), [this](User *U) { >> - return ScalarToTreeEntry.count(U) > 0; >> + all_of(I->users(), [this](User *U) { >> + return ScalarToTreeEntry.count(U) > 0 || MustGather.contains(U); >> }); >> } >> >> @@ -4442,9 +4442,9 @@ InstructionCost BoUpSLP::getEntryCost(const TreeEntry >> *E, >> // FIXME: it tries to fix a problem with MSVC buildbots. >> TargetTransformInfo &TTIRef = *TTI; >> auto &&AdjustExtractsCost = [this, &TTIRef, CostKind, VL, VecTy, >> - VectorizedVals](InstructionCost &Cost, >> - bool IsGather) { >> + VectorizedVals, E](InstructionCost &Cost) { >> DenseMap<Value *, int> ExtractVectorsTys; >> + SmallPtrSet<Value *, 4> CheckedExtracts; >> for (auto *V : VL) { >> if (isa<UndefValue>(V)) >> continue; >> @@ -4452,7 +4452,12 @@ InstructionCost BoUpSLP::getEntryCost(const TreeEntry >> *E, >> // instruction itself is not going to be vectorized, consider this >> // instruction as dead and remove its cost from the final cost of the >> // vectorized tree. >> - if (!areAllUsersVectorized(cast<Instruction>(V), VectorizedVals)) >> + // Also, avoid adjusting the cost for extractelements with multiple >> uses >> + // in different graph entries. >> + const TreeEntry *VE = getTreeEntry(V); >> + if (!CheckedExtracts.insert(V).second || >> + !areAllUsersVectorized(cast<Instruction>(V), VectorizedVals) || >> + (VE && VE != E)) >> continue; >> auto *EE = cast<ExtractElementInst>(V); >> Optional<unsigned> EEIdx = getExtractIndex(EE); >> @@ -4549,11 +4554,6 @@ InstructionCost BoUpSLP::getEntryCost(const TreeEntry >> *E, >> } >> return GatherCost; >> } >> - if (isSplat(VL)) { >> - // Found the broadcasting of the single scalar, calculate the cost as >> the >> - // broadcast. >> - return TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy); >> - } >> if ((E->getOpcode() == Instruction::ExtractElement || >> all_of(E->Scalars, >> [](Value *V) { >> @@ -4571,13 +4571,18 @@ InstructionCost BoUpSLP::getEntryCost(const >> TreeEntry *E, >> // single input vector or of 2 input vectors. >> InstructionCost Cost = >> computeExtractCost(VL, VecTy, *ShuffleKind, Mask, *TTI); >> - AdjustExtractsCost(Cost, /*IsGather=*/true); >> + AdjustExtractsCost(Cost); >> if (NeedToShuffleReuses) >> Cost += >> TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, >> FinalVecTy, E->ReuseShuffleIndices); >> return Cost; >> } >> } >> + if (isSplat(VL)) { >> + // Found the broadcasting of the single scalar, calculate the cost as >> the >> + // broadcast. >> + return TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy); >> + } >> InstructionCost ReuseShuffleCost = 0; >> if (NeedToShuffleReuses) >> ReuseShuffleCost = TTI->getShuffleCost( >> @@ -4755,7 +4760,7 @@ InstructionCost BoUpSLP::getEntryCost(const TreeEntry >> *E, >> TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, I); >> } >> } else { >> - AdjustExtractsCost(CommonCost, /*IsGather=*/false); >> + AdjustExtractsCost(CommonCost); >> } >> return CommonCost; >> } >> diff --git >> a/llvm/test/Transforms/SLPVectorizer/X86/extractelement-multiple-uses.ll >> b/llvm/test/Transforms/SLPVectorizer/X86/extractelement-multiple-uses.ll >> index c47f255f0bfe..31696752bbb3 100644 >> --- a/llvm/test/Transforms/SLPVectorizer/X86/extractelement-multiple-uses.ll >> +++ b/llvm/test/Transforms/SLPVectorizer/X86/extractelement-multiple-uses.ll >> @@ -2,24 +2,25 @@ >> ; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux >> -march=core-avx2 -pass-remarks-output=%t | FileCheck %s >> ; RUN: FileCheck %s --input-file=%t --check-prefix=YAML >> >> -; YAML: --- !Missed >> +; YAML: --- !Passed >> ; YAML: Pass: slp-vectorizer >> -; YAML: Name: NotBeneficial >> +; YAML: Name: VectorizedList >> ; YAML: Function: multi_uses >> ; YAML: Args: >> -; YAML: - String: 'List vectorization was possible but not >> beneficial with cost ' >> -; YAML: - Cost: '0' >> -; YAML: - String: ' >= ' >> -; YAML: - Treshold: '0' >> +; YAML: - String: 'SLP vectorized with cost ' >> +; YAML: - Cost: '-1' >> +; YAML: - String: ' and with tree size ' >> +; YAML: - TreeSize: '3' >> >> define float @multi_uses(<2 x float> %x, <2 x float> %y) { >> ; CHECK-LABEL: @multi_uses( >> -; CHECK-NEXT: [[X0:%.*]] = extractelement <2 x float> [[X:%.*]], i32 0 >> -; CHECK-NEXT: [[X1:%.*]] = extractelement <2 x float> [[X]], i32 1 >> ; CHECK-NEXT: [[Y1:%.*]] = extractelement <2 x float> [[Y:%.*]], i32 1 >> -; CHECK-NEXT: [[X0X0:%.*]] = fmul float [[X0]], [[Y1]] >> -; CHECK-NEXT: [[X1X1:%.*]] = fmul float [[X1]], [[Y1]] >> -; CHECK-NEXT: [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]] >> +; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x float> poison, float >> [[Y1]], i32 0 >> +; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> [[TMP1]], float >> [[Y1]], i32 1 >> +; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x float> [[X:%.*]], [[TMP2]] >> +; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0 >> +; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 1 >> +; CHECK-NEXT: [[ADD:%.*]] = fadd float [[TMP4]], [[TMP5]] >> ; CHECK-NEXT: ret float [[ADD]] >> ; >> %x0 = extractelement <2 x float> %x, i32 0 >> </cut> > _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain