Hi David,

Linaro’s benchmarking CI is churning through a backlog of old commits, and it 
appears that your patch increases code-size of a single function by 13% — from 
5890 bytes to 6642 bytes — that’s on Thumb2 at -Os.  Not a big deal, but may be 
it is exposing a case worth investigating.

The function is BZ2_compressBlock() from SPEC CPU2006’s 401.bzip2.  Let us know 
if you are interested in investigating it, and we’ll provide details to 
reproduce.

Regards,

--
Maxim Kuvyrkov
https://www.linaro.org

> On 19 Jul 2021, at 00:39, ci_not...@linaro.org wrote:
> 
> Successfully identified regression in *llvm* in CI configuration 
> tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-Os.  So far, this commit has 
> regressed CI configurations:
> - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-Os
> 
> Culprit:
> <cut>
> commit ab97c9bdb747c873cd35a18229e2694156a7607d
> Author: David Green <david.gr...@arm.com>
> Date:   Sat Dec 12 14:21:40 2020 +0000
> 
>    [LV] Fix scalar cost for tail predicated loops
> 
>    When it comes to the scalar cost of any predicated block, the loop
>    vectorizer by default regards this predication as a sign that it is
>    looking at an if-conversion and divides the scalar cost of the block by
>    2, assuming it would only be executed half the time. This however makes
>    no sense if the predication has been introduced to tail predicate the
>    loop.
> 
>    Original patch by Anna Welker
> 
>    Differential Revision: https://reviews.llvm.org/D86452
> </cut>
> 
> Results regressed to (for first_bad == 
> ab97c9bdb747c873cd35a18229e2694156a7607d)
> # reset_artifacts:
> -10
> # build_abe binutils:
> -9
> # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set 
> gcc_override_configure=--disable-libsanitizer:
> -8
> # build_abe linux:
> -7
> # build_abe glibc:
> -6
> # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set 
> gcc_override_configure=--disable-libsanitizer:
> -5
> # build_llvm true:
> -3
> # true:
> 0
> # benchmark -Os_mthumb -- 
> artifacts/build-ab97c9bdb747c873cd35a18229e2694156a7607d/results_id:
> 1
> # 401.bzip2,bzip2_base.default                                  regressed by 
> 103
> # 401.bzip2,[.] BZ2_compressBlock                               regressed by 
> 113
> # 473.astar,astar_base.default                                  regressed by 
> 103
> 
> from (for last_good == d716eab197abec0b9aab4a76cd1a52b248b8c3b1)
> # reset_artifacts:
> -10
> # build_abe binutils:
> -9
> # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set 
> gcc_override_configure=--disable-libsanitizer:
> -8
> # build_abe linux:
> -7
> # build_abe glibc:
> -6
> # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set 
> gcc_override_configure=--disable-libsanitizer:
> -5
> # build_llvm true:
> -3
> # true:
> 0
> # benchmark -Os_mthumb -- 
> artifacts/build-d716eab197abec0b9aab4a76cd1a52b248b8c3b1/results_id:
> 1
> 
> Artifacts of last_good build: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-arm-spec2k6-Os/19/artifact/artifacts/build-d716eab197abec0b9aab4a76cd1a52b248b8c3b1/
> Results ID of last_good: 
> tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-Os/1878
> Artifacts of first_bad build: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-arm-spec2k6-Os/19/artifact/artifacts/build-ab97c9bdb747c873cd35a18229e2694156a7607d/
> Results ID of first_bad: 
> tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-Os/1876
> Build top page/logs: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-arm-spec2k6-Os/19/
> 
> Configuration details:
> 
> 
> Reproduce builds:
> <cut>
> mkdir investigate-llvm-ab97c9bdb747c873cd35a18229e2694156a7607d
> cd investigate-llvm-ab97c9bdb747c873cd35a18229e2694156a7607d
> 
> git clone https://git.linaro.org/toolchain/jenkins-scripts
> 
> mkdir -p artifacts/manifests
> curl -o artifacts/manifests/build-baseline.sh 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-arm-spec2k6-Os/19/artifact/artifacts/manifests/build-baseline.sh
>  --fail
> curl -o artifacts/manifests/build-parameters.sh 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-arm-spec2k6-Os/19/artifact/artifacts/manifests/build-parameters.sh
>  --fail
> curl -o artifacts/test.sh 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-arm-spec2k6-Os/19/artifact/artifacts/test.sh
>  --fail
> chmod +x artifacts/test.sh
> 
> # Reproduce the baseline build (build all pre-requisites)
> ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
> 
> # Save baseline build state (which is then restored in artifacts/test.sh)
> rsync -a --del --delete-excluded --exclude bisect/ --exclude artifacts/ 
> --exclude llvm/ ./ ./bisect/baseline/
> 
> cd llvm
> 
> # Reproduce first_bad build
> git checkout --detach ab97c9bdb747c873cd35a18229e2694156a7607d
> ../artifacts/test.sh
> 
> # Reproduce last_good build
> git checkout --detach d716eab197abec0b9aab4a76cd1a52b248b8c3b1
> ../artifacts/test.sh
> 
> cd ..
> </cut>
> 
> History of pending regressions and results: 
> https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/ci/tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-Os
> 
> Artifacts: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-arm-spec2k6-Os/19/artifact/artifacts/
> Build log: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-arm-spec2k6-Os/19/consoleText
> 
> Full commit (up to 1000 lines):
> <cut>
> commit ab97c9bdb747c873cd35a18229e2694156a7607d
> Author: David Green <david.gr...@arm.com>
> Date:   Sat Dec 12 14:21:40 2020 +0000
> 
>    [LV] Fix scalar cost for tail predicated loops
> 
>    When it comes to the scalar cost of any predicated block, the loop
>    vectorizer by default regards this predication as a sign that it is
>    looking at an if-conversion and divides the scalar cost of the block by
>    2, assuming it would only be executed half the time. This however makes
>    no sense if the predication has been introduced to tail predicate the
>    loop.
> 
>    Original patch by Anna Welker
> 
>    Differential Revision: https://reviews.llvm.org/D86452
> ---
> llvm/lib/Transforms/Vectorize/LoopVectorize.cpp             | 7 ++++---
> llvm/test/Transforms/LoopVectorize/ARM/scalar-block-cost.ll | 2 +-
> 2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 
> b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
> index c381377b67c9..663ea50c4c02 100644
> --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
> +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
> @@ -6483,9 +6483,10 @@ LoopVectorizationCostModel::expectedCost(ElementCount 
> VF) {
>     // if-converted. This means that the block's instructions (aside from
>     // stores and instructions that may divide by zero) will now be
>     // unconditionally executed. For the scalar case, we may not always 
> execute
> -    // the predicated block. Thus, scale the block's cost by the probability 
> of
> -    // executing it.
> -    if (VF.isScalar() && blockNeedsPredication(BB))
> +    // the predicated block, if it is an if-else block. Thus, scale the 
> block's
> +    // cost by the probability of executing it. blockNeedsPredication from
> +    // Legal is used so as to not include all blocks in tail folded loops.
> +    if (VF.isScalar() && Legal->blockNeedsPredication(BB))
>       BlockCost.first /= getReciprocalPredBlockProb();
> 
>     Cost.first += BlockCost.first;
> diff --git a/llvm/test/Transforms/LoopVectorize/ARM/scalar-block-cost.ll 
> b/llvm/test/Transforms/LoopVectorize/ARM/scalar-block-cost.ll
> index 959fbe676e67..fc8ea4fc938c 100644
> --- a/llvm/test/Transforms/LoopVectorize/ARM/scalar-block-cost.ll
> +++ b/llvm/test/Transforms/LoopVectorize/ARM/scalar-block-cost.ll
> @@ -15,7 +15,7 @@ define void @pred_loop(i32* %off, i32* %data, i32* %dst, 
> i32 %n) #0 {
> ; CHECK-COST-NEXT: LV: Found an estimated cost of 1 for VF 1 For instruction: 
>   store i32 %add1, i32* %arrayidx2, align 4
> ; CHECK-COST-NEXT: LV: Found an estimated cost of 1 for VF 1 For instruction: 
>   %exitcond.not = icmp eq i32 %add, %n
> ; CHECK-COST-NEXT: LV: Found an estimated cost of 0 for VF 1 For instruction: 
>   br i1 %exitcond.not, label %exit.loopexit, label %for.body
> -; CHECK-COST-NEXT: LV: Scalar loop costs: 2.
> +; CHECK-COST-NEXT: LV: Scalar loop costs: 5.
> 
> entry:
>   %cmp8 = icmp sgt i32 %n, 0
> </cut>

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to