[TCWG CI] 464.h264ref slowed down by 7% after llvm: [PassManager] `buildModuleOptimizationPipeline()`: schedule `LoopDeletion` pass run before vectorization passes

ci_notify Fri, 05 Nov 2021 23:09:36 -0700

After llvm commit 9c2469c1ddb34517de8dafd83d1940deada3fc22
Author: Roman Lebedev <[email protected]>


    [PassManager] `buildModuleOptimizationPipeline()`: schedule `LoopDeletion` 
pass run before vectorization passes

the following benchmarks slowed down by more than 2%:
- 464.h264ref slowed down by 7% from 10836 to 11596 perf samples
  - 464.h264ref:[.] FastFullPelBlockMotionSearch slowed down by 46% from 1525 
to 2231 perf samples

Below reproducer instructions can be used to re-build both "first_bad" and 
"last_good" cross-toolchains used in this bisection.  Naturally, the scripts 
will fail when triggerring benchmarking jobs if you don't have access to Linaro 
TCWG CI.

For your convenience, we have uploaded tarballs with pre-processed source and 
assembly files at:
- First_bad save-temps: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/32/artifact/artifacts/build-9c2469c1ddb34517de8dafd83d1940deada3fc22/save-temps/
- Last_good save-temps: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/32/artifact/artifacts/build-4bef0304e153c757c9f42c2001d4c56e8f99929e/save-temps/
- Baseline save-temps: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/32/artifact/artifacts/build-baseline/save-temps/

Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O3
- Hardware: NVidia TX1 4x Cortex-A57

This benchmarking CI is work-in-progress, and we welcome feedback and 
suggestions at [email protected] .  In our improvement plans is 
to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" 
data behind these reports.

THIS IS THE END OF INTERESTING STUFF.  BELOW ARE LINKS TO BUILDS, REPRODUCTION 
INSTRUCTIONS, AND THE RAW COMMIT.

This commit has regressed these CI configurations:
 - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3

First_bad build: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/32/artifact/artifacts/build-9c2469c1ddb34517de8dafd83d1940deada3fc22/
Last_good build: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/32/artifact/artifacts/build-4bef0304e153c757c9f42c2001d4c56e8f99929e/
Baseline build: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/32/artifact/artifacts/build-baseline/
Even more details: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/32/artifact/artifacts/

Reproduce builds:
<cut>
mkdir investigate-llvm-9c2469c1ddb34517de8dafd83d1940deada3fc22
cd investigate-llvm-9c2469c1ddb34517de8dafd83d1940deada3fc22

# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts

# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/32/artifact/artifacts/manifests/build-baseline.sh
 --fail
curl -o artifacts/manifests/build-parameters.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/32/artifact/artifacts/manifests/build-parameters.sh
 --fail
curl -o artifacts/test.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/32/artifact/artifacts/test.sh
 --fail
chmod +x artifacts/test.sh

# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh

# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ 
--exclude /llvm/ ./ ./bisect/baseline/

cd llvm

# Reproduce first_bad build
git checkout --detach 9c2469c1ddb34517de8dafd83d1940deada3fc22
../artifacts/test.sh

# Reproduce last_good build
git checkout --detach 4bef0304e153c757c9f42c2001d4c56e8f99929e
../artifacts/test.sh

cd ..
</cut>

Full commit (up to 1000 lines):
<cut>
commit 9c2469c1ddb34517de8dafd83d1940deada3fc22
Author: Roman Lebedev <[email protected]>
Date:   Wed Nov 3 19:23:25 2021 +0300

    [PassManager] `buildModuleOptimizationPipeline()`: schedule `LoopDeletion` 
pass run before vectorization passes
    
    Test thanks to Michael Kuklinski from `#llvm`: 
https://godbolt.org/z/bdrah5Goo
    originally inspired by Daniel Lemire's 
https://lemire.me/blog/2021/10/26/in-c-is-empty-faster-than-comparing-the-size-with-zero/
    
    We manage to deduce that the answer does not require looping,
    but we do that after the last `LoopDeletion` pass run,
    so we end up being stuck with a dead loop.
    
    Now, as with all things SCEV, this has
    a very expected ~`+0.12%` compile time performance regression:
    
https://llvm-compile-time-tracker.com/compare.php?from=0ae7bf124a9bca76dd9a91b2f7379168ff13f562&to=c2ae57c9b961aeb4a28c747266949340613a6d84&stat=instructions
    (for comparison, doing that in function simplification pipeline
    would have been ~`+0.5` compile time performance regression, D112840)
    
    Looking at the transformation stats over vanilla test-suite, i think it's 
rather expected:
    ```
    | statistic name                                   |  baseline |  proposed 
|     Δ |      % |    |%| |
    
|--------------------------------------------------|----------:|----------:|------:|-------:|-------:|
    | scalar-evolution.NumBruteForceTripCountsComputed |       789 |       888 
|    99 | 12.55% | 12.55% |
    | scalar-evolution.NumTripCountsNotComputed        |    105592 |    117900 
| 12308 | 11.66% | 11.66% |
    | loop-delete.NumBackedgesBroken                   |       542 |       559 
|    17 |  3.14% |  3.14% |
    | regalloc.numExtends                              |        81 |        79 
|    -2 | -2.47% |  2.47% |
    | indvars.NumFoldedUser                            |       408 |       400 
|    -8 | -1.96% |  1.96% |
    | indvars.NumElimCmp                               |      3831 |      3758 
|   -73 | -1.91% |  1.91% |
    | scalar-evolution.NumTripCountsComputed           |    299759 |    304278 
|  4519 |  1.51% |  1.51% |
    | loop-delete.NumDeleted                           |      8055 |      8128 
|    73 |  0.91% |  0.91% |
    | machine-cse.NumCommutes                          |       111 |       110 
|    -1 | -0.90% |  0.90% |
    | globaldce.NumFunctions                           |      1187 |      1192 
|     5 |  0.42% |  0.42% |
    | codegenprepare.NumSelectsExpanded                |       277 |       278 
|     1 |  0.36% |  0.36% |
    | loop-unroll.NumRuntimeUnrolled                   |     13841 |     13791 
|   -50 | -0.36% |  0.36% |
    | machinelicm.NumPostRAHoisted                     |      1168 |      1172 
|     4 |  0.34% |  0.34% |
    | phi-node-elimination.NumCriticalEdgesSplit       |     83054 |     82879 
|  -175 | -0.21% |  0.21% |
    | machine-cse.NumPREs                              |      3085 |      3079 
|    -6 | -0.19% |  0.19% |
    | branch-folder.NumBranchOpts                      |    108122 |    107942 
|  -180 | -0.17% |  0.17% |
    | loop-unroll.NumUnrolled                          |     40136 |     40067 
|   -69 | -0.17% |  0.17% |
    | branch-folder.NumDeadBlocks                      |    130818 |    130607 
|  -211 | -0.16% |  0.16% |
    | codegenprepare.NumBlocksElim                     |     92856 |     92714 
|  -142 | -0.15% |  0.15% |
    | instsimplify.NumSimplified                       |    103263 |    103129 
|  -134 | -0.13% |  0.13% |
    | instcombine.NumConstProp                         |     26070 |     26102 
|    32 |  0.12% |  0.12% |
    | instsimplify.NumExpand                           |      1716 |      1718 
|     2 |  0.12% |  0.12% |
    | loop-unroll.NumCompletelyUnrolled                |      9236 |      9225 
|   -11 | -0.12% |  0.12% |
    | branch-folder.NumHoist                           |      2773 |      2770 
|    -3 | -0.11% |  0.11% |
    | regalloc.NumReloadsRemoved                       |     10822 |     10834 
|    12 |  0.11% |  0.11% |
    | regalloc.NumSnippets                             |     11394 |     11406 
|    12 |  0.11% |  0.11% |
    | machine-cse.NumCrossBBCSEs                       |      1052 |      1053 
|     1 |  0.10% |  0.10% |
    | machinelicm.NumCSEed                             |     99887 |     99784 
|  -103 | -0.10% |  0.10% |
    | branch-folder.NumTailMerge                       |     72501 |     72435 
|   -66 | -0.09% |  0.09% |
    | codegenprepare.NumExtUses                        |     22007 |     21987 
|   -20 | -0.09% |  0.09% |
    | local.NumRemoved                                 |     68232 |     68294 
|    62 |  0.09% |  0.09% |
    | loop-vectorize.LoopsAnalyzed                     |     75483 |     75413 
|   -70 | -0.09% |  0.09% |
    ```
    
    Note that i'm only changing current PM, and not touching obsolete PM.
    
    This is an alternative to the function simplification pipeline variant
    of the same change, D112840. It has both less compile time impact
    (since the additional number of SCEV trip count calculations
    is way lass less than with the D112840), and it is
    much more powerful/impactful (almost 2x more loops deleted).
    
    I have checked, and doing this after loop rotation
    is favorable (more loops deleted).
    
    Reviewed By: mkazantsev
    
    Differential Revision: https://reviews.llvm.org/D112851
---
 llvm/lib/Passes/PassBuilderPipelines.cpp           |  9 +++-
 llvm/test/Other/new-pm-defaults.ll                 |  1 +
 llvm/test/Other/new-pm-thinlto-defaults.ll         |  1 +
 .../Other/new-pm-thinlto-postlink-pgo-defaults.ll  |  1 +
 .../new-pm-thinlto-postlink-samplepgo-defaults.ll  |  1 +
 ...letion-of-loops-that-became-side-effect-free.ll | 49 ++++------------------
 6 files changed, 18 insertions(+), 44 deletions(-)

diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp 
b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 2009a687ae7d..f0f7803ed3ae 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -1093,11 +1093,16 @@ 
PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
   for (auto &C : VectorizerStartEPCallbacks)
     C(OptimizePM, Level);
 
+  LoopPassManager LPM;
   // First rotate loops that may have been un-rotated by prior passes.
   // Disable header duplication at -Oz.
+  LPM.addPass(LoopRotatePass(Level != OptimizationLevel::Oz, LTOPreLink));
+  // Some loops may have become dead by now. Try to delete them.
+  // FIXME: see disscussion in https://reviews.llvm.org/D112851
+  //        this may need to be revisited once GVN is more powerful.
+  LPM.addPass(LoopDeletionPass());
   OptimizePM.addPass(createFunctionToLoopPassAdaptor(
-      LoopRotatePass(Level != OptimizationLevel::Oz, LTOPreLink),
-      /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/false));
+      std::move(LPM), /*UseMemorySSA=*/false, 
/*UseBlockFrequencyInfo=*/false));
 
   // Distribute loops to allow partial vectorization.  I.e. isolate dependences
   // into separate loop that would otherwise inhibit vectorization.  This is
diff --git a/llvm/test/Other/new-pm-defaults.ll 
b/llvm/test/Other/new-pm-defaults.ll
index 5067b6fbdd18..b9f90dad8224 100644
--- a/llvm/test/Other/new-pm-defaults.ll
+++ b/llvm/test/Other/new-pm-defaults.ll
@@ -216,6 +216,7 @@
 ; CHECK-O-NEXT: Running pass: LoopSimplifyPass
 ; CHECK-O-NEXT: Running pass: LCSSAPass
 ; CHECK-O-NEXT: Running pass: LoopRotatePass
+; CHECK-O-NEXT: Running pass: LoopDeletionPass
 ; CHECK-O-NEXT: Running pass: LoopDistributePass
 ; CHECK-O-NEXT: Running pass: InjectTLIMappings
 ; CHECK-O-NEXT: Running pass: LoopVectorizePass
diff --git a/llvm/test/Other/new-pm-thinlto-defaults.ll 
b/llvm/test/Other/new-pm-thinlto-defaults.ll
index 1f52fe47ae73..7836de5c6cce 100644
--- a/llvm/test/Other/new-pm-thinlto-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-defaults.ll
@@ -196,6 +196,7 @@
 ; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass
 ; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass
 ; CHECK-POSTLINK-O-NEXT: Running pass: LoopRotatePass
+; CHECK-POSTLINK-O-NEXT: Running pass: LoopDeletionPass
 ; CHECK-POSTLINK-O-NEXT: Running pass: LoopDistributePass
 ; CHECK-POSTLINK-O-NEXT: Running pass: InjectTLIMappings
 ; CHECK-POSTLINK-O-NEXT: Running pass: LoopVectorizePass
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll 
b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
index 3a80efba3c56..e66e8672358c 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
@@ -167,6 +167,7 @@
 ; CHECK-O-NEXT: Running pass: LoopSimplifyPass on foo
 ; CHECK-O-NEXT: Running pass: LCSSAPass on foo
 ; CHECK-O-NEXT: Running pass: LoopRotatePass
+; CHECK-O-NEXT: Running pass: LoopDeletionPass
 ; CHECK-O-NEXT: Running pass: LoopDistributePass
 ; CHECK-O-NEXT: Running pass: InjectTLIMappings
 ; CHECK-O-NEXT: Running pass: LoopVectorizePass
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll 
b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
index 2e822b21f8a1..410841124c8e 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
@@ -179,6 +179,7 @@
 ; CHECK-O-NEXT: Running pass: LoopSimplifyPass
 ; CHECK-O-NEXT: Running pass: LCSSAPass
 ; CHECK-O-NEXT: Running pass: LoopRotatePass
+; CHECK-O-NEXT: Running pass: LoopDeletionPass
 ; CHECK-O-NEXT: Running pass: LoopDistributePass
 ; CHECK-O-NEXT: Running pass: InjectTLIMappings
 ; CHECK-O-NEXT: Running pass: LoopVectorizePass
diff --git 
a/llvm/test/Transforms/PhaseOrdering/deletion-of-loops-that-became-side-effect-free.ll
 
b/llvm/test/Transforms/PhaseOrdering/deletion-of-loops-that-became-side-effect-free.ll
index ec8db3cceeb1..99a52acd3b2b 100644
--- 
a/llvm/test/Transforms/PhaseOrdering/deletion-of-loops-that-became-side-effect-free.ll
+++ 
b/llvm/test/Transforms/PhaseOrdering/deletion-of-loops-that-became-side-effect-free.ll
@@ -11,17 +11,8 @@
 define dso_local zeroext i1 @is_not_empty_variant1(%struct.node* %p) {
 ; ALL-LABEL: @is_not_empty_variant1(
 ; ALL-NEXT:  entry:
-; ALL-NEXT:    [[TOBOOL_NOT3_I:%.*]] = icmp eq %struct.node* [[P:%.*]], null
-; ALL-NEXT:    br i1 [[TOBOOL_NOT3_I]], label 
[[COUNT_NODES_VARIANT1_EXIT:%.*]], label [[WHILE_BODY_I:%.*]]
-; ALL:       while.body.i:
-; ALL-NEXT:    [[P_ADDR_04_I:%.*]] = phi %struct.node* [ [[TMP0:%.*]], 
[[WHILE_BODY_I]] ], [ [[P]], [[ENTRY:%.*]] ]
-; ALL-NEXT:    [[NEXT_I:%.*]] = getelementptr inbounds [[STRUCT_NODE:%.*]], 
%struct.node* [[P_ADDR_04_I]], i64 0, i32 0
-; ALL-NEXT:    [[TMP0]] = load %struct.node*, %struct.node** [[NEXT_I]], align 
8
-; ALL-NEXT:    [[TOBOOL_NOT_I:%.*]] = icmp eq %struct.node* [[TMP0]], null
-; ALL-NEXT:    br i1 [[TOBOOL_NOT_I]], label [[COUNT_NODES_VARIANT1_EXIT]], 
label [[WHILE_BODY_I]], !llvm.loop [[LOOP0:![0-9]+]]
-; ALL:       count_nodes_variant1.exit:
-; ALL-NEXT:    [[TMP1:%.*]] = xor i1 [[TOBOOL_NOT3_I]], true
-; ALL-NEXT:    ret i1 [[TMP1]]
+; ALL-NEXT:    [[TOBOOL_NOT3_I:%.*]] = icmp ne %struct.node* [[P:%.*]], null
+; ALL-NEXT:    ret i1 [[TOBOOL_NOT3_I]]
 ;
 entry:
   %p.addr = alloca %struct.node*, align 8
@@ -113,39 +104,13 @@ while.end:
 define dso_local zeroext i1 @is_not_empty_variant3(%struct.node* %p) {
 ; O3-LABEL: @is_not_empty_variant3(
 ; O3-NEXT:  entry:
-; O3-NEXT:    [[TOBOOL_NOT4_I:%.*]] = icmp eq %struct.node* [[P:%.*]], null
-; O3-NEXT:    br i1 [[TOBOOL_NOT4_I]], label 
[[COUNT_NODES_VARIANT3_EXIT:%.*]], label [[WHILE_BODY_I:%.*]]
-; O3:       while.body.i:
-; O3-NEXT:    [[SIZE_06_I:%.*]] = phi i64 [ [[INC_I:%.*]], [[WHILE_BODY_I]] ], 
[ 0, [[ENTRY:%.*]] ]
-; O3-NEXT:    [[P_ADDR_05_I:%.*]] = phi %struct.node* [ [[TMP0:%.*]], 
[[WHILE_BODY_I]] ], [ [[P]], [[ENTRY]] ]
-; O3-NEXT:    [[CMP_I:%.*]] = icmp ne i64 [[SIZE_06_I]], -1
-; O3-NEXT:    tail call void @llvm.assume(i1 [[CMP_I]]) #[[ATTR3:[0-9]+]]
-; O3-NEXT:    [[NEXT_I:%.*]] = getelementptr inbounds [[STRUCT_NODE:%.*]], 
%struct.node* [[P_ADDR_05_I]], i64 0, i32 0
-; O3-NEXT:    [[TMP0]] = load %struct.node*, %struct.node** [[NEXT_I]], align 8
-; O3-NEXT:    [[INC_I]] = add nuw i64 [[SIZE_06_I]], 1
-; O3-NEXT:    [[TOBOOL_NOT_I:%.*]] = icmp eq %struct.node* [[TMP0]], null
-; O3-NEXT:    br i1 [[TOBOOL_NOT_I]], label [[COUNT_NODES_VARIANT3_EXIT]], 
label [[WHILE_BODY_I]], !llvm.loop [[LOOP2:![0-9]+]]
-; O3:       count_nodes_variant3.exit:
-; O3-NEXT:    [[TMP1:%.*]] = xor i1 [[TOBOOL_NOT4_I]], true
-; O3-NEXT:    ret i1 [[TMP1]]
+; O3-NEXT:    [[TOBOOL_NOT4_I:%.*]] = icmp ne %struct.node* [[P:%.*]], null
+; O3-NEXT:    ret i1 [[TOBOOL_NOT4_I]]
 ;
 ; O2-LABEL: @is_not_empty_variant3(
 ; O2-NEXT:  entry:
-; O2-NEXT:    [[TOBOOL_NOT4_I:%.*]] = icmp eq %struct.node* [[P:%.*]], null
-; O2-NEXT:    br i1 [[TOBOOL_NOT4_I]], label 
[[COUNT_NODES_VARIANT3_EXIT:%.*]], label [[WHILE_BODY_I:%.*]]
-; O2:       while.body.i:
-; O2-NEXT:    [[SIZE_06_I:%.*]] = phi i64 [ [[INC_I:%.*]], [[WHILE_BODY_I]] ], 
[ 0, [[ENTRY:%.*]] ]
-; O2-NEXT:    [[P_ADDR_05_I:%.*]] = phi %struct.node* [ [[TMP0:%.*]], 
[[WHILE_BODY_I]] ], [ [[P]], [[ENTRY]] ]
-; O2-NEXT:    [[CMP_I:%.*]] = icmp ne i64 [[SIZE_06_I]], -1
-; O2-NEXT:    tail call void @llvm.assume(i1 [[CMP_I]]) #[[ATTR3:[0-9]+]]
-; O2-NEXT:    [[NEXT_I:%.*]] = getelementptr inbounds [[STRUCT_NODE:%.*]], 
%struct.node* [[P_ADDR_05_I]], i64 0, i32 0
-; O2-NEXT:    [[TMP0]] = load %struct.node*, %struct.node** [[NEXT_I]], align 8
-; O2-NEXT:    [[INC_I]] = add nuw i64 [[SIZE_06_I]], 1
-; O2-NEXT:    [[TOBOOL_NOT_I:%.*]] = icmp eq %struct.node* [[TMP0]], null
-; O2-NEXT:    br i1 [[TOBOOL_NOT_I]], label [[COUNT_NODES_VARIANT3_EXIT]], 
label [[WHILE_BODY_I]], !llvm.loop [[LOOP2:![0-9]+]]
-; O2:       count_nodes_variant3.exit:
-; O2-NEXT:    [[TMP1:%.*]] = xor i1 [[TOBOOL_NOT4_I]], true
-; O2-NEXT:    ret i1 [[TMP1]]
+; O2-NEXT:    [[TOBOOL_NOT4_I:%.*]] = icmp ne %struct.node* [[P:%.*]], null
+; O2-NEXT:    ret i1 [[TOBOOL_NOT4_I]]
 ;
 ; O1-LABEL: @is_not_empty_variant3(
 ; O1-NEXT:  entry:
@@ -160,7 +125,7 @@ define dso_local zeroext i1 
@is_not_empty_variant3(%struct.node* %p) {
 ; O1-NEXT:    [[TMP0]] = load %struct.node*, %struct.node** [[NEXT_I]], align 8
 ; O1-NEXT:    [[INC_I]] = add i64 [[SIZE_06_I]], 1
 ; O1-NEXT:    [[TOBOOL_NOT_I:%.*]] = icmp eq %struct.node* [[TMP0]], null
-; O1-NEXT:    br i1 [[TOBOOL_NOT_I]], label 
[[COUNT_NODES_VARIANT3_EXIT_LOOPEXIT:%.*]], label [[WHILE_BODY_I]], !llvm.loop 
[[LOOP2:![0-9]+]]
+; O1-NEXT:    br i1 [[TOBOOL_NOT_I]], label 
[[COUNT_NODES_VARIANT3_EXIT_LOOPEXIT:%.*]], label [[WHILE_BODY_I]], !llvm.loop 
[[LOOP0:![0-9]+]]
 ; O1:       count_nodes_variant3.exit.loopexit:
 ; O1-NEXT:    [[PHI_CMP:%.*]] = icmp ne i64 [[INC_I]], 0
 ; O1-NEXT:    br label [[COUNT_NODES_VARIANT3_EXIT]]
</cut>
_______________________________________________
linaro-toolchain mailing list
[email protected]
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

[TCWG CI] 464.h264ref slowed down by 7% after llvm: [PassManager] `buildModuleOptimizationPipeline()`: schedule `LoopDeletion` pass run before vectorization passes

Reply via email to