[TCWG CI] 464.h264ref slowed down by 7% after llvm: [BasicAA] Handle known bits as ranges

ci_notify Thu, 28 Oct 2021 14:54:18 -0700

After llvm commit fbc0c308d599fe3300ab6516650b65b41979446d
Author: Nikita Popov <nikita....@gmail.com>


    [BasicAA] Handle known bits as ranges

the following benchmarks slowed down by more than 2%:
- 464.h264ref slowed down by 7% from 10899 to 11610 perf samples
  - 464.h264ref:libc.so.6 slowed down by 11% from 3538 to 3922 perf samples

Below reproducer instructions can be used to re-build both "first_bad" and 
"last_good" cross-toolchains used in this bisection.  Naturally, the scripts 
will fail when triggerring benchmarking jobs if you don't have access to Linaro 
TCWG CI.

For your convenience, we have uploaded tarballs with pre-processed source and 
assembly files at:
- First_bad save-temps: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O2_LTO/27/artifact/artifacts/build-fbc0c308d599fe3300ab6516650b65b41979446d/save-temps/
- Last_good save-temps: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O2_LTO/27/artifact/artifacts/build-30a3652b6ade43504087f6e3acd8dc879055f501/save-temps/
- Baseline save-temps: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O2_LTO/27/artifact/artifacts/build-baseline/save-temps/

Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O2 -flto
- Hardware: NVidia TX1 4x Cortex-A57

This benchmarking CI is work-in-progress, and we welcome feedback and 
suggestions at linaro-toolchain@lists.linaro.org .  In our improvement plans is 
to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" 
data behind these reports.

THIS IS THE END OF INTERESTING STUFF.  BELOW ARE LINKS TO BUILDS, REPRODUCTION 
INSTRUCTIONS, AND THE RAW COMMIT.

This commit has regressed these CI configurations:
 - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO

First_bad build: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O2_LTO/27/artifact/artifacts/build-fbc0c308d599fe3300ab6516650b65b41979446d/
Last_good build: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O2_LTO/27/artifact/artifacts/build-30a3652b6ade43504087f6e3acd8dc879055f501/
Baseline build: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O2_LTO/27/artifact/artifacts/build-baseline/
Even more details: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O2_LTO/27/artifact/artifacts/

Reproduce builds:
<cut>
mkdir investigate-llvm-fbc0c308d599fe3300ab6516650b65b41979446d
cd investigate-llvm-fbc0c308d599fe3300ab6516650b65b41979446d

# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts

# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O2_LTO/27/artifact/artifacts/manifests/build-baseline.sh
 --fail
curl -o artifacts/manifests/build-parameters.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O2_LTO/27/artifact/artifacts/manifests/build-parameters.sh
 --fail
curl -o artifacts/test.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O2_LTO/27/artifact/artifacts/test.sh
 --fail
chmod +x artifacts/test.sh

# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh

# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ 
--exclude /llvm/ ./ ./bisect/baseline/

cd llvm

# Reproduce first_bad build
git checkout --detach fbc0c308d599fe3300ab6516650b65b41979446d
../artifacts/test.sh

# Reproduce last_good build
git checkout --detach 30a3652b6ade43504087f6e3acd8dc879055f501
../artifacts/test.sh

cd ..
</cut>

Full commit (up to 1000 lines):
<cut>
commit fbc0c308d599fe3300ab6516650b65b41979446d
Author: Nikita Popov <nikita....@gmail.com>
Date:   Mon Oct 25 15:47:21 2021 +0200

    [BasicAA] Handle known bits as ranges
    
    BasicAA currently tries to determine that the offset is positive by
    checking whether all variable indices are positive based on known
    bits, multiplied by a positive scale. However, this is incorrect
    if the scale multiplication might overflow. In the modified test
    case the original value is positive, but may be negative after a
    left shift.
    
    Fix this by converting known bits into a constant range and reusing
    the range-based logic, which handles overflow correctly.
    
    Differential Revision: https://reviews.llvm.org/D112611
---
 llvm/lib/Analysis/BasicAliasAnalysis.cpp           | 51 +++++-----------------
 .../test/Analysis/BasicAA/assume-index-positive.ll |  4 +-
 2 files changed, 12 insertions(+), 43 deletions(-)

diff --git a/llvm/lib/Analysis/BasicAliasAnalysis.cpp 
b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
index 0305732ca5d5..8cf947c43bf4 100644
--- a/llvm/lib/Analysis/BasicAliasAnalysis.cpp
+++ b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
@@ -318,15 +318,6 @@ struct CastedValue {
     return N;
   }
 
-  KnownBits evaluateWith(KnownBits N) const {
-    assert(N.getBitWidth() == V->getType()->getPrimitiveSizeInBits() &&
-           "Incompatible bit width");
-    if (TruncBits) N = N.trunc(N.getBitWidth() - TruncBits);
-    if (SExtBits) N = N.sext(N.getBitWidth() + SExtBits);
-    if (ZExtBits) N = N.zext(N.getBitWidth() + ZExtBits);
-    return N;
-  }
-
   ConstantRange evaluateWith(ConstantRange N) const {
     assert(N.getBitWidth() == V->getType()->getPrimitiveSizeInBits() &&
            "Incompatible bit width");
@@ -1250,8 +1241,6 @@ AliasResult BasicAAResult::aliasGEP(
 
   if (!DecompGEP1.VarIndices.empty()) {
     APInt GCD;
-    bool AllNonNegative = DecompGEP1.Offset.isNonNegative();
-    bool AllNonPositive = DecompGEP1.Offset.isNonPositive();
     ConstantRange OffsetRange = ConstantRange(DecompGEP1.Offset);
     for (unsigned i = 0, e = DecompGEP1.VarIndices.size(); i != e; ++i) {
       const VariableGEPIndex &Index = DecompGEP1.VarIndices[i];
@@ -1266,24 +1255,19 @@ AliasResult BasicAAResult::aliasGEP(
       else
         GCD = APIntOps::GreatestCommonDivisor(GCD, ScaleForGCD.abs());
 
-      if (AllNonNegative || AllNonPositive) {
-        KnownBits Known = Index.Val.evaluateWith(
-            computeKnownBits(Index.Val.V, DL, 0, &AC, Index.CxtI, DT));
-        bool SignKnownZero = Known.isNonNegative();
-        bool SignKnownOne = Known.isNegative();
-        AllNonNegative &= (SignKnownZero && Scale.isNonNegative()) ||
-                          (SignKnownOne && Scale.isNonPositive());
-        AllNonPositive &= (SignKnownZero && Scale.isNonPositive()) ||
-                          (SignKnownOne && Scale.isNonNegative());
-      }
+      ConstantRange CR =
+          computeConstantRange(Index.Val.V, true, &AC, Index.CxtI);
+      KnownBits Known =
+          computeKnownBits(Index.Val.V, DL, 0, &AC, Index.CxtI, DT);
+      CR = CR.intersectWith(
+          ConstantRange::fromKnownBits(Known, /* Signed */ true),
+          ConstantRange::Signed);
 
       assert(OffsetRange.getBitWidth() == Scale.getBitWidth() &&
              "Bit widths are normalized to MaxPointerSize");
-      OffsetRange = OffsetRange.add(Index.Val
-                            .evaluateWith(computeConstantRange(
-                                Index.Val.V, true, &AC, Index.CxtI))
-                            .sextOrTrunc(OffsetRange.getBitWidth())
-                            .smul_fast(ConstantRange(Scale)));
+      OffsetRange = OffsetRange.add(
+          Index.Val.evaluateWith(CR).sextOrTrunc(OffsetRange.getBitWidth())
+                                    .smul_fast(ConstantRange(Scale)));
     }
 
     // We now have accesses at two offsets from the same base:
@@ -1300,21 +1284,6 @@ AliasResult BasicAAResult::aliasGEP(
         (GCD - ModOffset).uge(V1Size.getValue()))
       return AliasResult::NoAlias;
 
-    // If we know all the variables are non-negative, then the total offset is
-    // also non-negative and >= DecompGEP1.Offset. We have the following 
layout:
-    // [0, V2Size) ... [TotalOffset, TotalOffer+V1Size]
-    // If DecompGEP1.Offset >= V2Size, the accesses don't alias.
-    if (AllNonNegative && V2Size.hasValue() &&
-        DecompGEP1.Offset.uge(V2Size.getValue()))
-      return AliasResult::NoAlias;
-    // Similarly, if the variables are non-positive, then the total offset is
-    // also non-positive and <= DecompGEP1.Offset. We have the following 
layout:
-    // [TotalOffset, TotalOffset+V1Size) ... [0, V2Size)
-    // If -DecompGEP1.Offset >= V1Size, the accesses don't alias.
-    if (AllNonPositive && V1Size.hasValue() &&
-        (-DecompGEP1.Offset).uge(V1Size.getValue()))
-      return AliasResult::NoAlias;
-
     if (V1Size.hasValue() && V2Size.hasValue()) {
       // Compute ranges of potentially accessed bytes for both accesses. If the
       // interseciton is empty, there can be no overlap.
diff --git a/llvm/test/Analysis/BasicAA/assume-index-positive.ll 
b/llvm/test/Analysis/BasicAA/assume-index-positive.ll
index 451592067f4b..a53fff2c6009 100644
--- a/llvm/test/Analysis/BasicAA/assume-index-positive.ll
+++ b/llvm/test/Analysis/BasicAA/assume-index-positive.ll
@@ -130,12 +130,12 @@ define void @symmetry([0 x i8]* %ptr, i32 %a, i32 %b, i32 
%c) {
   ret void
 }
 
-; TODO: %ptr.neg and %ptr.shl may alias, as the shl renders the previously
+; %ptr.neg and %ptr.shl may alias, as the shl renders the previously
 ; non-negative value potentially negative.
 define void @shl_of_non_negative(i8* %ptr, i64 %a) {
 ; CHECK-LABEL: Function: shl_of_non_negative
 ; CHECK: NoAlias: i8* %ptr.a, i8* %ptr.neg
-; CHECK: NoAlias: i8* %ptr.neg, i8* %ptr.shl
+; CHECK: MayAlias: i8* %ptr.neg, i8* %ptr.shl
   %a.cmp = icmp sge i64 %a, 0
   call void @llvm.assume(i1 %a.cmp)
   %ptr.neg = getelementptr i8, i8* %ptr, i64 -2
</cut>
_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

[TCWG CI] 464.h264ref slowed down by 7% after llvm: [BasicAA] Handle known bits as ranges

Reply via email to