Re: [TCWG CI] 453.povray failed to build after llvm: [SLP]Fix reused extracts cost.

Alexey Bataev Tue, 07 Dec 2021 04:10:48 -0800

I committed a fix yesterday, should be fixed. Another one planning to commit 
later today or tomorrow.


Best regards,
Alexey Bataev

> 7 дек. 2021 г., в 07:08, Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> 
> написал(а):
> 
> Hi Alexey,
> 
> After your patch Clang crashes while building 453.povray for 
> aarch64-linux-gnu.  Apparently, this happens only with LTO enabled at -O2 and 
> -O3.
> 
> Did you get any bug reports against this patch already?
> 
> Thanks,
> 
> --
> Maxim Kuvyrkov
> https://www.linaro.org
> 
>> On 5 Dec 2021, at 02:55, ci_not...@linaro.org wrote:
>> 
>> After llvm commit ba74bb3a226e1b4660537f274627285b1bf41ee1
>> Author: Alexey Bataev <a.bat...@outlook.com>
>> 
>>   [SLP]Fix reused extracts cost.
>> 
>> the following benchmarks slowed down by more than 2%:
>> - 453.povray failed to build
>> 
>> Below reproducer instructions can be used to re-build both "first_bad" and 
>> "last_good" cross-toolchains used in this bisection.  Naturally, the scripts 
>> will fail when triggerring benchmarking jobs if you don't have access to 
>> Linaro TCWG CI.
>> 
>> For your convenience, we have uploaded tarballs with pre-processed source 
>> and assembly files at:
>> - First_bad save-temps: 
>> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-ba74bb3a226e1b4660537f274627285b1bf41ee1/save-temps/
>> - Last_good save-temps: 
>> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-78cc133c63173a4b5b7a43750cc507d4cff683cf/save-temps/
>> - Baseline save-temps: 
>> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-baseline/save-temps/
>> 
>> Configuration:
>> - Benchmark: SPEC CPU2006
>> - Toolchain: Clang + Glibc + LLVM Linker
>> - Version: all components were built from their tip of trunk
>> - Target: aarch64-linux-gnu
>> - Compiler flags: -O3 -flto
>> - Hardware: NVidia TX1 4x Cortex-A57
>> 
>> This benchmarking CI is work-in-progress, and we welcome feedback and 
>> suggestions at linaro-toolchain@lists.linaro.org .  In our improvement plans 
>> is to add support for SPEC CPU2017 benchmarks and provide "perf 
>> report/annotate" data behind these reports.
>> 
>> THIS IS THE END OF INTERESTING STUFF.  BELOW ARE LINKS TO BUILDS, 
>> REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
>> 
>> This commit has regressed these CI configurations:
>> - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3_LTO
>> 
>> First_bad build: 
>> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-ba74bb3a226e1b4660537f274627285b1bf41ee1/
>> Last_good build: 
>> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-78cc133c63173a4b5b7a43750cc507d4cff683cf/
>> Baseline build: 
>> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-baseline/
>> Even more details: 
>> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/
>> 
>> Reproduce builds:
>> <cut>
>> mkdir investigate-llvm-ba74bb3a226e1b4660537f274627285b1bf41ee1
>> cd investigate-llvm-ba74bb3a226e1b4660537f274627285b1bf41ee1
>> 
>> # Fetch scripts
>> git clone https://git.linaro.org/toolchain/jenkins-scripts
>> 
>> # Fetch manifests and test.sh script
>> mkdir -p artifacts/manifests
>> curl -o artifacts/manifests/build-baseline.sh 
>> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/manifests/build-baseline.sh
>>  --fail
>> curl -o artifacts/manifests/build-parameters.sh 
>> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/manifests/build-parameters.sh
>>  --fail
>> curl -o artifacts/test.sh 
>> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/test.sh
>>  --fail
>> chmod +x artifacts/test.sh
>> 
>> # Reproduce the baseline build (build all pre-requisites)
>> ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
>> 
>> # Save baseline build state (which is then restored in artifacts/test.sh)
>> mkdir -p ./bisect
>> rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ 
>> --exclude /llvm/ ./ ./bisect/baseline/
>> 
>> cd llvm
>> 
>> # Reproduce first_bad build
>> git checkout --detach ba74bb3a226e1b4660537f274627285b1bf41ee1
>> ../artifacts/test.sh
>> 
>> # Reproduce last_good build
>> git checkout --detach 78cc133c63173a4b5b7a43750cc507d4cff683cf
>> ../artifacts/test.sh
>> 
>> cd ..
>> </cut>
>> 
>> Full commit (up to 1000 lines):
>> <cut>
>> commit ba74bb3a226e1b4660537f274627285b1bf41ee1
>> Author: Alexey Bataev <a.bat...@outlook.com>
>> Date:   Thu Dec 2 04:22:55 2021 -0800
>> 
>>   [SLP]Fix reused extracts cost.
>> 
>>   If the extractelement instruction is used multiple times in the
>>   different tree entries (either vectorized, or gathered), need to
>>   compensate the scalar cost of such instructions. They are completely
>>   removed if all users are part of the tree but we need to compensate the
>>   cost only once for each instruction.
>> 
>>   Differential Revision: https://reviews.llvm.org/D114958
>> ---
>> llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp    | 29 
>> +++++++++++++---------
>> .../X86/extractelement-multiple-uses.ll            | 23 +++++++++--------
>> 2 files changed, 29 insertions(+), 23 deletions(-)
>> 
>> diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp 
>> b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
>> index 95061e9053fa..335ad6c85387 100644
>> --- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
>> +++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
>> @@ -4287,8 +4287,8 @@ bool BoUpSLP::canReuseExtract(ArrayRef<Value *> VL, 
>> Value *OpValue,
>> bool BoUpSLP::areAllUsersVectorized(Instruction *I,
>>                                    ArrayRef<Value *> VectorizedVals) const {
>>  return (I->hasOneUse() && is_contained(VectorizedVals, I)) ||
>> -         llvm::all_of(I->users(), [this](User *U) {
>> -           return ScalarToTreeEntry.count(U) > 0;
>> +         all_of(I->users(), [this](User *U) {
>> +           return ScalarToTreeEntry.count(U) > 0 || MustGather.contains(U);
>>         });
>> }
>> 
>> @@ -4442,9 +4442,9 @@ InstructionCost BoUpSLP::getEntryCost(const TreeEntry 
>> *E,
>>  // FIXME: it tries to fix a problem with MSVC buildbots.
>>  TargetTransformInfo &TTIRef = *TTI;
>>  auto &&AdjustExtractsCost = [this, &TTIRef, CostKind, VL, VecTy,
>> -                               VectorizedVals](InstructionCost &Cost,
>> -                                               bool IsGather) {
>> +                               VectorizedVals, E](InstructionCost &Cost) {
>>    DenseMap<Value *, int> ExtractVectorsTys;
>> +    SmallPtrSet<Value *, 4> CheckedExtracts;
>>    for (auto *V : VL) {
>>      if (isa<UndefValue>(V))
>>        continue;
>> @@ -4452,7 +4452,12 @@ InstructionCost BoUpSLP::getEntryCost(const TreeEntry 
>> *E,
>>      // instruction itself is not going to be vectorized, consider this
>>      // instruction as dead and remove its cost from the final cost of the
>>      // vectorized tree.
>> -      if (!areAllUsersVectorized(cast<Instruction>(V), VectorizedVals))
>> +      // Also, avoid adjusting the cost for extractelements with multiple 
>> uses
>> +      // in different graph entries.
>> +      const TreeEntry *VE = getTreeEntry(V);
>> +      if (!CheckedExtracts.insert(V).second ||
>> +          !areAllUsersVectorized(cast<Instruction>(V), VectorizedVals) ||
>> +          (VE && VE != E))
>>        continue;
>>      auto *EE = cast<ExtractElementInst>(V);
>>      Optional<unsigned> EEIdx = getExtractIndex(EE);
>> @@ -4549,11 +4554,6 @@ InstructionCost BoUpSLP::getEntryCost(const TreeEntry 
>> *E,
>>      }
>>      return GatherCost;
>>    }
>> -    if (isSplat(VL)) {
>> -      // Found the broadcasting of the single scalar, calculate the cost as 
>> the
>> -      // broadcast.
>> -      return TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy);
>> -    }
>>    if ((E->getOpcode() == Instruction::ExtractElement ||
>>         all_of(E->Scalars,
>>                [](Value *V) {
>> @@ -4571,13 +4571,18 @@ InstructionCost BoUpSLP::getEntryCost(const 
>> TreeEntry *E,
>>        // single input vector or of 2 input vectors.
>>        InstructionCost Cost =
>>            computeExtractCost(VL, VecTy, *ShuffleKind, Mask, *TTI);
>> -        AdjustExtractsCost(Cost, /*IsGather=*/true);
>> +        AdjustExtractsCost(Cost);
>>        if (NeedToShuffleReuses)
>>          Cost += 
>> TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,
>>                                      FinalVecTy, E->ReuseShuffleIndices);
>>        return Cost;
>>      }
>>    }
>> +    if (isSplat(VL)) {
>> +      // Found the broadcasting of the single scalar, calculate the cost as 
>> the
>> +      // broadcast.
>> +      return TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy);
>> +    }
>>    InstructionCost ReuseShuffleCost = 0;
>>    if (NeedToShuffleReuses)
>>      ReuseShuffleCost = TTI->getShuffleCost(
>> @@ -4755,7 +4760,7 @@ InstructionCost BoUpSLP::getEntryCost(const TreeEntry 
>> *E,
>>              TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, I);
>>        }
>>      } else {
>> -        AdjustExtractsCost(CommonCost, /*IsGather=*/false);
>> +        AdjustExtractsCost(CommonCost);
>>      }
>>      return CommonCost;
>>    }
>> diff --git 
>> a/llvm/test/Transforms/SLPVectorizer/X86/extractelement-multiple-uses.ll 
>> b/llvm/test/Transforms/SLPVectorizer/X86/extractelement-multiple-uses.ll
>> index c47f255f0bfe..31696752bbb3 100644
>> --- a/llvm/test/Transforms/SLPVectorizer/X86/extractelement-multiple-uses.ll
>> +++ b/llvm/test/Transforms/SLPVectorizer/X86/extractelement-multiple-uses.ll
>> @@ -2,24 +2,25 @@
>> ; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux 
>> -march=core-avx2 -pass-remarks-output=%t | FileCheck %s
>> ; RUN: FileCheck %s --input-file=%t --check-prefix=YAML
>> 
>> -; YAML: --- !Missed
>> +; YAML: --- !Passed
>> ; YAML: Pass:            slp-vectorizer
>> -; YAML: Name:            NotBeneficial
>> +; YAML: Name:            VectorizedList
>> ; YAML: Function:        multi_uses
>> ; YAML: Args:
>> -; YAML:  - String:          'List vectorization was possible but not 
>> beneficial with cost '
>> -; YAML:  - Cost:            '0'
>> -; YAML:  - String:          ' >= '
>> -; YAML:  - Treshold:        '0'
>> +; YAML:  - String:          'SLP vectorized with cost '
>> +; YAML:  - Cost:            '-1'
>> +; YAML:  - String:          ' and with tree size '
>> +; YAML:  - TreeSize:        '3'
>> 
>> define float @multi_uses(<2 x float> %x, <2 x float> %y) {
>> ; CHECK-LABEL: @multi_uses(
>> -; CHECK-NEXT:    [[X0:%.*]] = extractelement <2 x float> [[X:%.*]], i32 0
>> -; CHECK-NEXT:    [[X1:%.*]] = extractelement <2 x float> [[X]], i32 1
>> ; CHECK-NEXT:    [[Y1:%.*]] = extractelement <2 x float> [[Y:%.*]], i32 1
>> -; CHECK-NEXT:    [[X0X0:%.*]] = fmul float [[X0]], [[Y1]]
>> -; CHECK-NEXT:    [[X1X1:%.*]] = fmul float [[X1]], [[Y1]]
>> -; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]]
>> +; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <2 x float> poison, float 
>> [[Y1]], i32 0
>> +; CHECK-NEXT:    [[TMP2:%.*]] = insertelement <2 x float> [[TMP1]], float 
>> [[Y1]], i32 1
>> +; CHECK-NEXT:    [[TMP3:%.*]] = fmul <2 x float> [[X:%.*]], [[TMP2]]
>> +; CHECK-NEXT:    [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0
>> +; CHECK-NEXT:    [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 1
>> +; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[TMP4]], [[TMP5]]
>> ; CHECK-NEXT:    ret float [[ADD]]
>> ;
>>  %x0 = extractelement <2 x float> %x, i32 0
>> </cut>
> 
_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: [TCWG CI] 453.povray failed to build after llvm: [SLP]Fix reused extracts cost.

Reply via email to