[TCWG CI] 464.h264ref slowed down by 6% after llvm: [SCEV] Use full logic when infering flags on add and gep

ci_notify Wed, 06 Oct 2021 18:18:40 -0700

After llvm commit d02db32644b7360bcda54cdf739fa42abe450fcd
Author: Philip Reames <listm...@philipreames.com>


    [SCEV] Use full logic when infering flags on add and gep

the following benchmarks slowed down by more than 2%:
- 464.h264ref slowed down by 6% from 10842 to 11545 perf samples

Below reproducer instructions can be used to re-build both "first_bad" and 
"last_good" cross-toolchains used in this bisection.  Naturally, the scripts 
will fail when triggerring benchmarking jobs if you don't have access to Linaro 
TCWG CI.

For your convenience, we have uploaded tarballs with pre-processed source and 
assembly files at:
- First_bad save-temps: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/34/artifact/artifacts/build-d02db32644b7360bcda54cdf739fa42abe450fcd/save-temps/
- Last_good save-temps: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/34/artifact/artifacts/build-f39978b84f1d3a1da6c32db48f64c8daae64b3ad/save-temps/
- Baseline save-temps: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/34/artifact/artifacts/build-baseline/save-temps/

Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O3 -flto
- Hardware: NVidia TX1 4x Cortex-A57

This benchmarking CI is work-in-progress, and we welcome feedback and 
suggestions at linaro-toolchain@lists.linaro.org .  In our improvement plans is 
to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" 
data behind these reports.

THIS IS THE END OF INTERESTING STUFF.  BELOW ARE LINKS TO BUILDS, REPRODUCTION 
INSTRUCTIONS, AND THE RAW COMMIT.

This commit has regressed these CI configurations:
 - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3_LTO

First_bad build: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/34/artifact/artifacts/build-d02db32644b7360bcda54cdf739fa42abe450fcd/
Last_good build: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/34/artifact/artifacts/build-f39978b84f1d3a1da6c32db48f64c8daae64b3ad/
Baseline build: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/34/artifact/artifacts/build-baseline/
Even more details: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/34/artifact/artifacts/

Reproduce builds:
<cut>
mkdir investigate-llvm-d02db32644b7360bcda54cdf739fa42abe450fcd
cd investigate-llvm-d02db32644b7360bcda54cdf739fa42abe450fcd

# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts

# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/34/artifact/artifacts/manifests/build-baseline.sh
 --fail
curl -o artifacts/manifests/build-parameters.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/34/artifact/artifacts/manifests/build-parameters.sh
 --fail
curl -o artifacts/test.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/34/artifact/artifacts/test.sh
 --fail
chmod +x artifacts/test.sh

# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh

# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ 
--exclude /llvm/ ./ ./bisect/baseline/

cd llvm

# Reproduce first_bad build
git checkout --detach d02db32644b7360bcda54cdf739fa42abe450fcd
../artifacts/test.sh

# Reproduce last_good build
git checkout --detach f39978b84f1d3a1da6c32db48f64c8daae64b3ad
../artifacts/test.sh

cd ..
</cut>

Full commit (up to 1000 lines):
<cut>
commit d02db32644b7360bcda54cdf739fa42abe450fcd
Author: Philip Reames <listm...@philipreames.com>
Date:   Sun Oct 3 15:32:15 2021 -0700

    [SCEV] Use full logic when infering flags on add and gep
    
    This is a followon to D109845. With that landed, we will have fixed all 
known instances of pr51817, and can thus start inferring flags more 
aggressively with greatly reduced risk of miscompiles. This patch simply 
applies the same inference logic used in that patch to our other major flag 
inference path.
    
    We can still do much better here (on both paths), but this is our first 
step.
    
    Differential Revision: https://reviews.llvm.org/D111003
---
 llvm/lib/Analysis/ScalarEvolution.cpp                          | 10 ++--------
 .../Delinearization/multidim_ivs_and_integer_offsets_3d.ll     |  2 +-
 .../Delinearization/multidim_ivs_and_parameteric_offsets_3d.ll |  2 +-
 llvm/test/Analysis/LoopCacheAnalysis/PowerPC/stencil.ll        |  4 ++--
 llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll        |  8 ++++----
 llvm/test/Analysis/ScalarEvolution/load.ll                     |  2 +-
 llvm/test/Analysis/ScalarEvolution/ptrtoint.ll                 |  2 +-
 polly/test/IstAstInfo/simple-run-time-condition.ll             |  2 +-
 8 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp 
b/llvm/lib/Analysis/ScalarEvolution.cpp
index 75cecbf48c08..70bf9aee6e0a 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -6657,14 +6657,8 @@ bool ScalarEvolution::isSCEVExprNeverPoison(const 
Instruction *I) {
     // TODO: We can do better here in some cases.
     if (!isSCEVable(Op->getType()))
       return false;
-    // TODO: the following two lines should be:
-    // if (auto *DefI = getDefinedScopeRoot(getSCEV(Op)))
-    //   if (isGuaranteedToTransferExecutionTo(DefI, I))
-    // We use the following instead for the purposes of seperating a bugfix
-    // change from an optimization change.  Once pr51817 is fully addressed,
-    // we should unlock this power.
-    if (auto *AddRecS = dyn_cast<SCEVAddRecExpr>(getSCEV(Op)))
-      if (isGuaranteedToExecuteForEveryIteration(I, AddRecS->getLoop()))
+    if (auto *DefI = getDefinedScopeRoot(getSCEV(Op)))
+      if (isGuaranteedToTransferExecutionTo(DefI, I))
         return true;
   }
   return false;
diff --git 
a/llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_3d.ll 
b/llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_3d.ll
index 712a52927dcb..77982c786e6e 100644
--- a/llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_3d.ll
+++ b/llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_3d.ll
@@ -11,7 +11,7 @@
 ; AddRec: {{{(56 + (8 * (-4 + (3 * %m)) * %o) + %A),+,(8 * %m * 
%o)}<%for.i>,+,(8 * %o)}<%for.j>,+,8}<%for.k>
 ; CHECK: Base offset: %A
 ; CHECK: ArrayDecl[UnknownSize][%m][%o] with elements of 8 bytes.
-; CHECK: 
ArrayRef[{3,+,1}<nuw><%for.i>][{-4,+,1}<nw><%for.j>][{7,+,1}<nuw><nsw><%for.k>]
+; CHECK: 
ArrayRef[{3,+,1}<nuw><%for.i>][{-4,+,1}<nsw><%for.j>][{7,+,1}<nuw><nsw><%for.k>]
 
 define void @foo(i64 %n, i64 %m, i64 %o, double* %A) {
 entry:
diff --git 
a/llvm/test/Analysis/Delinearization/multidim_ivs_and_parameteric_offsets_3d.ll 
b/llvm/test/Analysis/Delinearization/multidim_ivs_and_parameteric_offsets_3d.ll
index e3fdb0642211..8ecd498ea211 100644
--- 
a/llvm/test/Analysis/Delinearization/multidim_ivs_and_parameteric_offsets_3d.ll
+++ 
b/llvm/test/Analysis/Delinearization/multidim_ivs_and_parameteric_offsets_3d.ll
@@ -11,7 +11,7 @@
 ; AddRec: {{{((8 * ((((%m * %p) + %q) * %o) + %r)) + %A),+,(8 * %m * 
%o)}<%for.i>,+,(8 * %o)}<%for.j>,+,8}<%for.k>
 ; CHECK: Base offset: %A
 ; CHECK: ArrayDecl[UnknownSize][%m][%o] with elements of 8 bytes.
-; CHECK: 
ArrayRef[{%p,+,1}<nw><%for.i>][{%q,+,1}<nw><%for.j>][{%r,+,1}<nsw><%for.k>]
+; CHECK: 
ArrayRef[{%p,+,1}<nw><%for.i>][{%q,+,1}<nsw><%for.j>][{%r,+,1}<nsw><%for.k>]
 
 define void @foo(i64 %n, i64 %m, i64 %o, double* %A, i64 %p, i64 %q, i64 %r) {
 entry:
diff --git a/llvm/test/Analysis/LoopCacheAnalysis/PowerPC/stencil.ll 
b/llvm/test/Analysis/LoopCacheAnalysis/PowerPC/stencil.ll
index 821513199546..1f1515435e1a 100644
--- a/llvm/test/Analysis/LoopCacheAnalysis/PowerPC/stencil.ll
+++ b/llvm/test/Analysis/LoopCacheAnalysis/PowerPC/stencil.ll
@@ -11,8 +11,8 @@ target triple = "powerpc64le-unknown-linux-gnu"
 ;     }   
 ; }
 
-; CHECK-DAG: Loop 'for.i' has cost = 20300
-; CHECK-DAG: Loop 'for.j' has cost = 700
+; CHECK-DAG: Loop 'for.i' has cost = 20600
+; CHECK-DAG: Loop 'for.j' has cost = 800
 
 define void @foo(i64 %n, i64 %m, i32* %A, i32* %B, i32* %C) {
 entry:
diff --git a/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll 
b/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll
index c8d3137f8dc9..5ab24159c250 100644
--- a/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll
+++ b/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll
@@ -273,9 +273,9 @@ define void @test-add-scope-bound-unkn-header(i32* %input, 
i32 %needle) {
 ; CHECK-NEXT:    %offset = load i32, i32* %gep, align 4
 ; CHECK-NEXT:    --> %offset U: full-set S: full-set Exits: <<Unknown>> 
LoopDispositions: { %loop: Variant }
 ; CHECK-NEXT:    %i.next = add nuw i32 %i, %offset
-; CHECK-NEXT:    --> (%offset + %i) U: full-set S: full-set Exits: <<Unknown>> 
LoopDispositions: { %loop: Variant }
+; CHECK-NEXT:    --> (%offset + %i)<nuw> U: full-set S: full-set Exits: 
<<Unknown>> LoopDispositions: { %loop: Variant }
 ; CHECK-NEXT:    %gep2 = getelementptr i32, i32* %input, i32 %i.next
-; CHECK-NEXT:    --> ((4 * (sext i32 (%offset + %i) to i64))<nsw> + %input) U: 
full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant }
+; CHECK-NEXT:    --> ((4 * (sext i32 (%offset + %i)<nuw> to i64))<nsw> + 
%input) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: 
Variant }
 ; CHECK-NEXT:  Determining loop execution counts for: 
@test-add-scope-bound-unkn-header
 ; CHECK-NEXT:  Loop %loop: Unpredictable backedge-taken count.
 ; CHECK-NEXT:  Loop %loop: Unpredictable max backedge-taken count.
@@ -307,9 +307,9 @@ define void @test-add-scope-bound-unkn-header2(i32* %input, 
i32 %needle) {
 ; CHECK-NEXT:    %offset = load i32, i32* %gep, align 4
 ; CHECK-NEXT:    --> %offset U: full-set S: full-set Exits: <<Unknown>> 
LoopDispositions: { %loop: Variant }
 ; CHECK-NEXT:    %i.next = add nuw i32 %i, %offset
-; CHECK-NEXT:    --> (%offset + %i) U: full-set S: full-set Exits: <<Unknown>> 
LoopDispositions: { %loop: Variant }
+; CHECK-NEXT:    --> (%offset + %i)<nuw> U: full-set S: full-set Exits: 
<<Unknown>> LoopDispositions: { %loop: Variant }
 ; CHECK-NEXT:    %gep2 = getelementptr i32, i32* %input, i32 %i.next
-; CHECK-NEXT:    --> ((4 * (sext i32 (%offset + %i) to i64))<nsw> + %input) U: 
full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant }
+; CHECK-NEXT:    --> ((4 * (sext i32 (%offset + %i)<nuw> to i64))<nsw> + 
%input) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: 
Variant }
 ; CHECK-NEXT:  Determining loop execution counts for: 
@test-add-scope-bound-unkn-header2
 ; CHECK-NEXT:  Loop %loop: Unpredictable backedge-taken count.
 ; CHECK-NEXT:  Loop %loop: Unpredictable max backedge-taken count.
diff --git a/llvm/test/Analysis/ScalarEvolution/load.ll 
b/llvm/test/Analysis/ScalarEvolution/load.ll
index c0d671342af7..e95a093b2a8b 100644
--- a/llvm/test/Analysis/ScalarEvolution/load.ll
+++ b/llvm/test/Analysis/ScalarEvolution/load.ll
@@ -73,7 +73,7 @@ define i32 @test2() nounwind uwtable readonly {
 ; CHECK-NEXT:    %n.01 = phi %struct.ListNode* [ bitcast ({ %struct.ListNode*, 
i32, [4 x i8] }* @node5 to %struct.ListNode*), %entry ], [ %1, %for.body ]
 ; CHECK-NEXT:    --> %n.01 U: full-set S: full-set Exits: @node1 
LoopDispositions: { %for.body: Variant }
 ; CHECK-NEXT:    %i = getelementptr inbounds %struct.ListNode, 
%struct.ListNode* %n.01, i64 0, i32 1
-; CHECK-NEXT:    --> (4 + %n.01) U: full-set S: full-set Exits: (4 + 
@node1)<nuw><nsw> LoopDispositions: { %for.body: Variant }
+; CHECK-NEXT:    --> (4 + %n.01)<nuw> U: [4,0) S: [4,0) Exits: (4 + 
@node1)<nuw><nsw> LoopDispositions: { %for.body: Variant }
 ; CHECK-NEXT:    %0 = load i32, i32* %i, align 4
 ; CHECK-NEXT:    --> %0 U: full-set S: full-set Exits: 0 LoopDispositions: { 
%for.body: Variant }
 ; CHECK-NEXT:    %add = add nsw i32 %0, %sum.02
diff --git a/llvm/test/Analysis/ScalarEvolution/ptrtoint.ll 
b/llvm/test/Analysis/ScalarEvolution/ptrtoint.ll
index 93d8782f373e..cb40ddda9369 100644
--- a/llvm/test/Analysis/ScalarEvolution/ptrtoint.ll
+++ b/llvm/test/Analysis/ScalarEvolution/ptrtoint.ll
@@ -502,7 +502,7 @@ define void @pr46786_c26_int(i32* %arg, i32* %arg1, i32* 
%arg2) {
 ; X32-NEXT:    %i11 = ashr exact i64 %i10, 2
 ; X32-NEXT:    --> %i11 U: [-2147483648,2147483648) S: 
[-2147483648,2147483648) Exits: <<Unknown>> LoopDispositions: { %bb6: Variant }
 ; X32-NEXT:    %i12 = getelementptr inbounds i32, i32* %arg2, i64 %i11
-; X32-NEXT:    --> ((4 * (trunc i64 %i11 to i32)) + %arg2) U: full-set S: 
full-set Exits: <<Unknown>> LoopDispositions: { %bb6: Variant }
+; X32-NEXT:    --> ((4 * (trunc i64 %i11 to i32))<nsw> + %arg2) U: full-set S: 
full-set Exits: <<Unknown>> LoopDispositions: { %bb6: Variant }
 ; X32-NEXT:    %i13 = load i32, i32* %i12, align 4
 ; X32-NEXT:    --> %i13 U: full-set S: full-set Exits: <<Unknown>> 
LoopDispositions: { %bb6: Variant }
 ; X32-NEXT:    %i14 = add nsw i32 %i13, %i8
diff --git a/polly/test/IstAstInfo/simple-run-time-condition.ll 
b/polly/test/IstAstInfo/simple-run-time-condition.ll
index aba5d9e34f50..0d167566291b 100644
--- a/polly/test/IstAstInfo/simple-run-time-condition.ll
+++ b/polly/test/IstAstInfo/simple-run-time-condition.ll
@@ -20,7 +20,7 @@ target datalayout = 
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f3
 ; for the delinearization is simplified such that conditions that would not
 ; cause any code to be executed are not generated.
 
-; CHECK: if (((o >= 1 && q <= 0 && m + q >= 0) || (o <= 0 && m + q >= 100 && q 
<= 100)) && 0 == ((m >= 1 && n + p >= 9223372036854775809) || (o <= 0 && n >= 1 
&& m + q >= 9223372036854775909) || (o <= 0 && m >= 1 && n >= 1 && q <= 
-9223372036854775709)))
+; CHECK: if (((o >= 1 && q <= 0 && m + q >= 0) || (o <= 0 && m + q >= 100 && q 
<= 100)) && 0 == ((o <= 0 && n >= 1 && m + q >= 9223372036854775909) || (o <= 0 
&& m >= 1 && n >= 1 && q <= -9223372036854775709)))
 
 ; CHECK:     if (o <= 0) {
 ; CHECK:       for (int c0 = 0; c0 < n; c0 += 1)
</cut>
_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

[TCWG CI] 464.h264ref slowed down by 6% after llvm: [SCEV] Use full logic when infering flags on add and gep

Reply via email to