Re: [TCWG CI] 462.libquantum grew in size by 3% after llvm: [JumpThreading] Ignore free instructions

Maxim Kuvyrkov Thu, 30 Sep 2021 02:58:34 -0700

Hi Nikita,

FYI, your patch seems to increase code-size of 462.libquantum by 3%.  Would you 
please check if this could be easily fixed?


Regards,

--
Maxim Kuvyrkov
https://www.linaro.org

> On 27 Sep 2021, at 01:34, ci_not...@linaro.org wrote:
> 
> After llvm commit 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff
> Author: Nikita Popov <nikita....@gmail.com>
> 
>    [JumpThreading] Ignore free instructions
> 
> the following benchmarks grew in size by more than 1%:
> - 462.libquantum grew in size by 3% from 14035 to 14398 bytes
> 
> Below reproducer instructions can be used to re-build both "first_bad" and 
> "last_good" cross-toolchains used in this bisection.  Naturally, the scripts 
> will fail when triggerring benchmarking jobs if you don't have access to 
> Linaro TCWG CI.
> 
> For your convenience, we have uploaded tarballs with pre-processed source and 
> assembly files at:
> - First_bad save-temps: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-arm-spec2k6-Os_LTO/4/artifact/artifacts/build-1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff/save-temps/
> - Last_good save-temps: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-arm-spec2k6-Os_LTO/4/artifact/artifacts/build-1a6e1ee42a6af255d45e3fd2fe87021dd31f79bb/save-temps/
> - Baseline save-temps: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-arm-spec2k6-Os_LTO/4/artifact/artifacts/build-baseline/save-temps/
> 
> Configuration:
> - Benchmark: SPEC CPU2006
> - Toolchain: Clang + Glibc + LLVM Linker
> - Version: all components were built from their tip of trunk
> - Target: arm-linux-gnueabihf
> - Compiler flags: -Os -flto -mthumb
> - Hardware: APM Mustang 8x X-Gene1
> 
> This benchmarking CI is work-in-progress, and we welcome feedback and 
> suggestions at linaro-toolchain@lists.linaro.org .  In our improvement plans 
> is to add support for SPEC CPU2017 benchmarks and provide "perf 
> report/annotate" data behind these reports.
> 
> THIS IS THE END OF INTERESTING STUFF.  BELOW ARE LINKS TO BUILDS, 
> REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
> 
> This commit has regressed these CI configurations:
> - tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Os_LTO
> 
> First_bad build: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-arm-spec2k6-Os_LTO/4/artifact/artifacts/build-1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff/
> Last_good build: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-arm-spec2k6-Os_LTO/4/artifact/artifacts/build-1a6e1ee42a6af255d45e3fd2fe87021dd31f79bb/
> Baseline build: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-arm-spec2k6-Os_LTO/4/artifact/artifacts/build-baseline/
> Even more details: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-arm-spec2k6-Os_LTO/4/artifact/artifacts/
> 
> Reproduce builds:
> <cut>
> mkdir investigate-llvm-1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff
> cd investigate-llvm-1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff
> 
> # Fetch scripts
> git clone https://git.linaro.org/toolchain/jenkins-scripts
> 
> # Fetch manifests and test.sh script
> mkdir -p artifacts/manifests
> curl -o artifacts/manifests/build-baseline.sh 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-arm-spec2k6-Os_LTO/4/artifact/artifacts/manifests/build-baseline.sh
>  --fail
> curl -o artifacts/manifests/build-parameters.sh 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-arm-spec2k6-Os_LTO/4/artifact/artifacts/manifests/build-parameters.sh
>  --fail
> curl -o artifacts/test.sh 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-arm-spec2k6-Os_LTO/4/artifact/artifacts/test.sh
>  --fail
> chmod +x artifacts/test.sh
> 
> # Reproduce the baseline build (build all pre-requisites)
> ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
> 
> # Save baseline build state (which is then restored in artifacts/test.sh)
> mkdir -p ./bisect
> rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ 
> --exclude /llvm/ ./ ./bisect/baseline/
> 
> cd llvm
> 
> # Reproduce first_bad build
> git checkout --detach 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff
> ../artifacts/test.sh
> 
> # Reproduce last_good build
> git checkout --detach 1a6e1ee42a6af255d45e3fd2fe87021dd31f79bb
> ../artifacts/test.sh
> 
> cd ..
> </cut>
> 
> Full commit (up to 1000 lines):
> <cut>
> commit 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff
> Author: Nikita Popov <nikita....@gmail.com>
> Date:   Wed Sep 22 21:34:24 2021 +0200
> 
>    [JumpThreading] Ignore free instructions
> 
>    This is basically D108837 but for jump threading. Free instructions
>    should be ignored for the threading decision. JumpThreading already
>    skips some free instructions (like pointer bitcasts), but does not
>    skip various free intrinsics -- in fact, it currently gives them a
>    fairly large cost of 2.
> 
>    Differential Revision: https://reviews.llvm.org/D110290
> ---
> .../include/llvm/Transforms/Scalar/JumpThreading.h |  8 +--
> llvm/lib/Transforms/Scalar/JumpThreading.cpp       | 61 ++++++++++------------
> .../Transforms/JumpThreading/free_instructions.ll  | 24 +++++----
> .../inlining-alignment-assumptions.ll              | 12 ++---
> 4 files changed, 52 insertions(+), 53 deletions(-)
> 
> diff --git a/llvm/include/llvm/Transforms/Scalar/JumpThreading.h 
> b/llvm/include/llvm/Transforms/Scalar/JumpThreading.h
> index 816ea1071e52..0ac7d7c62b7a 100644
> --- a/llvm/include/llvm/Transforms/Scalar/JumpThreading.h
> +++ b/llvm/include/llvm/Transforms/Scalar/JumpThreading.h
> @@ -44,6 +44,7 @@ class PHINode;
> class SelectInst;
> class SwitchInst;
> class TargetLibraryInfo;
> +class TargetTransformInfo;
> class Value;
> 
> /// A private "module" namespace for types and utilities used by
> @@ -78,6 +79,7 @@ enum ConstantPreference { WantInteger, WantBlockAddress };
> /// revectored to the false side of the second if.
> class JumpThreadingPass : public PassInfoMixin<JumpThreadingPass> {
>   TargetLibraryInfo *TLI;
> +  TargetTransformInfo *TTI;
>   LazyValueInfo *LVI;
>   AAResults *AA;
>   DomTreeUpdater *DTU;
> @@ -99,9 +101,9 @@ public:
>   JumpThreadingPass(bool InsertFreezeWhenUnfoldingSelect = false, int T = -1);
> 
>   // Glue for old PM.
> -  bool runImpl(Function &F, TargetLibraryInfo *TLI, LazyValueInfo *LVI,
> -               AAResults *AA, DomTreeUpdater *DTU, bool HasProfileData,
> -               std::unique_ptr<BlockFrequencyInfo> BFI,
> +  bool runImpl(Function &F, TargetLibraryInfo *TLI, TargetTransformInfo *TTI,
> +               LazyValueInfo *LVI, AAResults *AA, DomTreeUpdater *DTU,
> +               bool HasProfileData, std::unique_ptr<BlockFrequencyInfo> BFI,
>                std::unique_ptr<BranchProbabilityInfo> BPI);
> 
>   PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
> diff --git a/llvm/lib/Transforms/Scalar/JumpThreading.cpp 
> b/llvm/lib/Transforms/Scalar/JumpThreading.cpp
> index 688902ecb9ff..fe9a7211967c 100644
> --- a/llvm/lib/Transforms/Scalar/JumpThreading.cpp
> +++ b/llvm/lib/Transforms/Scalar/JumpThreading.cpp
> @@ -331,7 +331,7 @@ bool JumpThreading::runOnFunction(Function &F) {
>     BFI.reset(new BlockFrequencyInfo(F, *BPI, LI));
>   }
> 
> -  bool Changed = Impl.runImpl(F, TLI, LVI, AA, &DTU, F.hasProfileData(),
> +  bool Changed = Impl.runImpl(F, TLI, TTI, LVI, AA, &DTU, F.hasProfileData(),
>                               std::move(BFI), std::move(BPI));
>   if (PrintLVIAfterJumpThreading) {
>     dbgs() << "LVI for function '" << F.getName() << "':\n";
> @@ -360,7 +360,7 @@ PreservedAnalyses JumpThreadingPass::run(Function &F,
>     BFI.reset(new BlockFrequencyInfo(F, *BPI, LI));
>   }
> 
> -  bool Changed = runImpl(F, &TLI, &LVI, &AA, &DTU, F.hasProfileData(),
> +  bool Changed = runImpl(F, &TLI, &TTI, &LVI, &AA, &DTU, F.hasProfileData(),
>                          std::move(BFI), std::move(BPI));
> 
>   if (PrintLVIAfterJumpThreading) {
> @@ -377,12 +377,14 @@ PreservedAnalyses JumpThreadingPass::run(Function &F,
> }
> 
> bool JumpThreadingPass::runImpl(Function &F, TargetLibraryInfo *TLI_,
> -                                LazyValueInfo *LVI_, AliasAnalysis *AA_,
> -                                DomTreeUpdater *DTU_, bool HasProfileData_,
> +                                TargetTransformInfo *TTI_, LazyValueInfo 
> *LVI_,
> +                                AliasAnalysis *AA_, DomTreeUpdater *DTU_,
> +                                bool HasProfileData_,
>                                 std::unique_ptr<BlockFrequencyInfo> BFI_,
>                                 std::unique_ptr<BranchProbabilityInfo> BPI_) {
>   LLVM_DEBUG(dbgs() << "Jump threading on function '" << F.getName() << 
> "'\n");
>   TLI = TLI_;
> +  TTI = TTI_;
>   LVI = LVI_;
>   AA = AA_;
>   DTU = DTU_;
> @@ -514,7 +516,8 @@ static void replaceFoldableUses(Instruction *Cond, Value 
> *ToVal) {
> /// Return the cost of duplicating a piece of this block from first non-phi
> /// and before StopAt instruction to thread across it. Stop scanning the block
> /// when exceeding the threshold. If duplication is impossible, returns ~0U.
> -static unsigned getJumpThreadDuplicationCost(BasicBlock *BB,
> +static unsigned getJumpThreadDuplicationCost(const TargetTransformInfo *TTI,
> +                                             BasicBlock *BB,
>                                              Instruction *StopAt,
>                                              unsigned Threshold) {
>   assert(StopAt->getParent() == BB && "Not an instruction from proper BB?");
> @@ -550,26 +553,21 @@ static unsigned getJumpThreadDuplicationCost(BasicBlock 
> *BB,
>     if (Size > Threshold)
>       return Size;
> 
> -    // Debugger intrinsics don't incur code size.
> -    if (isa<DbgInfoIntrinsic>(I)) continue;
> -
> -    // Pseudo-probes don't incur code size.
> -    if (isa<PseudoProbeInst>(I))
> -      continue;
> -
> -    // If this is a pointer->pointer bitcast, it is free.
> -    if (isa<BitCastInst>(I) && I->getType()->isPointerTy())
> -      continue;
> -
> -    // Freeze instruction is free, too.
> -    if (isa<FreezeInst>(I))
> -      continue;
> -
>     // Bail out if this instruction gives back a token type, it is not 
> possible
>     // to duplicate it if it is used outside this BB.
>     if (I->getType()->isTokenTy() && I->isUsedOutsideOfBlock(BB))
>       return ~0U;
> 
> +    // Blocks with NoDuplicate are modelled as having infinite cost, so they
> +    // are never duplicated.
> +    if (const CallInst *CI = dyn_cast<CallInst>(I))
> +      if (CI->cannotDuplicate() || CI->isConvergent())
> +        return ~0U;
> +
> +    if (TTI->getUserCost(&*I, TargetTransformInfo::TCK_SizeAndLatency)
> +            == TargetTransformInfo::TCC_Free)
> +      continue;
> +
>     // All other instructions count for at least one unit.
>     ++Size;
> 
> @@ -578,11 +576,7 @@ static unsigned getJumpThreadDuplicationCost(BasicBlock 
> *BB,
>     // as having cost of 2 total, and if they are a vector intrinsic, we model
>     // them as having cost 1.
>     if (const CallInst *CI = dyn_cast<CallInst>(I)) {
> -      if (CI->cannotDuplicate() || CI->isConvergent())
> -        // Blocks with NoDuplicate are modelled as having infinite cost, so 
> they
> -        // are never duplicated.
> -        return ~0U;
> -      else if (!isa<IntrinsicInst>(CI))
> +      if (!isa<IntrinsicInst>(CI))
>         Size += 3;
>       else if (!CI->getType()->isVectorTy())
>         Size += 1;
> @@ -2234,10 +2228,10 @@ bool 
> JumpThreadingPass::maybethreadThroughTwoBasicBlocks(BasicBlock *BB,
>   }
> 
>   // Compute the cost of duplicating BB and PredBB.
> -  unsigned BBCost =
> -      getJumpThreadDuplicationCost(BB, BB->getTerminator(), BBDupThreshold);
> +  unsigned BBCost = getJumpThreadDuplicationCost(
> +      TTI, BB, BB->getTerminator(), BBDupThreshold);
>   unsigned PredBBCost = getJumpThreadDuplicationCost(
> -      PredBB, PredBB->getTerminator(), BBDupThreshold);
> +      TTI, PredBB, PredBB->getTerminator(), BBDupThreshold);
> 
>   // Give up if costs are too high.  We need to check BBCost and PredBBCost
>   // individually before checking their sum because 
> getJumpThreadDuplicationCost
> @@ -2345,8 +2339,8 @@ bool JumpThreadingPass::tryThreadEdge(
>     return false;
>   }
> 
> -  unsigned JumpThreadCost =
> -      getJumpThreadDuplicationCost(BB, BB->getTerminator(), BBDupThreshold);
> +  unsigned JumpThreadCost = getJumpThreadDuplicationCost(
> +      TTI, BB, BB->getTerminator(), BBDupThreshold);
>   if (JumpThreadCost > BBDupThreshold) {
>     LLVM_DEBUG(dbgs() << "  Not threading BB '" << BB->getName()
>                       << "' - Cost is too high: " << JumpThreadCost << "\n");
> @@ -2614,8 +2608,8 @@ bool 
> JumpThreadingPass::duplicateCondBranchOnPHIIntoPred(
>     return false;
>   }
> 
> -  unsigned DuplicationCost =
> -      getJumpThreadDuplicationCost(BB, BB->getTerminator(), BBDupThreshold);
> +  unsigned DuplicationCost = getJumpThreadDuplicationCost(
> +      TTI, BB, BB->getTerminator(), BBDupThreshold);
>   if (DuplicationCost > BBDupThreshold) {
>     LLVM_DEBUG(dbgs() << "  Not duplicating BB '" << BB->getName()
>                       << "' - Cost is too high: " << DuplicationCost << "\n");
> @@ -3031,7 +3025,8 @@ bool JumpThreadingPass::threadGuard(BasicBlock *BB, 
> IntrinsicInst *Guard,
> 
>   ValueToValueMapTy UnguardedMapping, GuardedMapping;
>   Instruction *AfterGuard = Guard->getNextNode();
> -  unsigned Cost = getJumpThreadDuplicationCost(BB, AfterGuard, 
> BBDupThreshold);
> +  unsigned Cost =
> +      getJumpThreadDuplicationCost(TTI, BB, AfterGuard, BBDupThreshold);
>   if (Cost > BBDupThreshold)
>     return false;
>   // Duplicate all instructions before the guard and the guard itself to the
> diff --git a/llvm/test/Transforms/JumpThreading/free_instructions.ll 
> b/llvm/test/Transforms/JumpThreading/free_instructions.ll
> index f768ec996779..76392af77d33 100644
> --- a/llvm/test/Transforms/JumpThreading/free_instructions.ll
> +++ b/llvm/test/Transforms/JumpThreading/free_instructions.ll
> @@ -5,26 +5,28 @@
> ; the jump threading threshold, as everything else are free instructions.
> define i32 @free_instructions(i1 %c, i32* %p) {
> ; CHECK-LABEL: @free_instructions(
> -; CHECK-NEXT:    br i1 [[C:%.*]], label [[IF:%.*]], label [[ELSE:%.*]]
> -; CHECK:       if:
> +; CHECK-NEXT:    br i1 [[C:%.*]], label [[IF2:%.*]], label [[ELSE2:%.*]]
> +; CHECK:       if2:
> ; CHECK-NEXT:    store i32 -1, i32* [[P:%.*]], align 4
> -; CHECK-NEXT:    br label [[JOIN:%.*]]
> -; CHECK:       else:
> -; CHECK-NEXT:    store i32 -2, i32* [[P]], align 4
> -; CHECK-NEXT:    br label [[JOIN]]
> -; CHECK:       join:
> ; CHECK-NEXT:    call void @llvm.experimental.noalias.scope.decl(metadata 
> [[META0:![0-9]+]])
> ; CHECK-NEXT:    store i32 1, i32* [[P]], align 4, !noalias !0
> ; CHECK-NEXT:    call void @llvm.assume(i1 true) [ "align"(i32* [[P]], i64 
> 32) ]
> ; CHECK-NEXT:    store i32 2, i32* [[P]], align 4
> +; CHECK-NEXT:    [[P21:%.*]] = bitcast i32* [[P]] to i8*
> +; CHECK-NEXT:    [[P32:%.*]] = call i8* 
> @llvm.launder.invariant.group.p0i8(i8* [[P21]])
> +; CHECK-NEXT:    [[P43:%.*]] = bitcast i8* [[P32]] to i32*
> +; CHECK-NEXT:    store i32 3, i32* [[P43]], align 4, !invariant.group !3
> +; CHECK-NEXT:    ret i32 0
> +; CHECK:       else2:
> +; CHECK-NEXT:    store i32 -2, i32* [[P]], align 4
> +; CHECK-NEXT:    call void @llvm.experimental.noalias.scope.decl(metadata 
> [[META4:![0-9]+]])
> +; CHECK-NEXT:    store i32 1, i32* [[P]], align 4, !noalias !4
> +; CHECK-NEXT:    call void @llvm.assume(i1 true) [ "align"(i32* [[P]], i64 
> 32) ]
> +; CHECK-NEXT:    store i32 2, i32* [[P]], align 4
> ; CHECK-NEXT:    [[P2:%.*]] = bitcast i32* [[P]] to i8*
> ; CHECK-NEXT:    [[P3:%.*]] = call i8* @llvm.launder.invariant.group.p0i8(i8* 
> [[P2]])
> ; CHECK-NEXT:    [[P4:%.*]] = bitcast i8* [[P3]] to i32*
> ; CHECK-NEXT:    store i32 3, i32* [[P4]], align 4, !invariant.group !3
> -; CHECK-NEXT:    br i1 [[C]], label [[IF2:%.*]], label [[ELSE2:%.*]]
> -; CHECK:       if2:
> -; CHECK-NEXT:    ret i32 0
> -; CHECK:       else2:
> ; CHECK-NEXT:    ret i32 1
> ;
>   br i1 %c, label %if, label %else
> diff --git 
> a/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll 
> b/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll
> index 57014e856a09..f764a59dd8a2 100644
> --- a/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll
> +++ b/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll
> @@ -32,13 +32,10 @@ define void @caller1(i1 %c, i64* align 1 %ptr) {
> ; ASSUMPTIONS-OFF-NEXT:    br label [[COMMON_RET]]
> ;
> ; ASSUMPTIONS-ON-LABEL: @caller1(
> -; ASSUMPTIONS-ON-NEXT:    br i1 [[C:%.*]], label [[COMMON_RET:%.*]], label 
> [[FALSE1:%.*]]
> -; ASSUMPTIONS-ON:       false1:
> -; ASSUMPTIONS-ON-NEXT:    store volatile i64 1, i64* [[PTR:%.*]], align 4
> -; ASSUMPTIONS-ON-NEXT:    br label [[COMMON_RET]]
> +; ASSUMPTIONS-ON-NEXT:    br i1 [[C:%.*]], label [[COMMON_RET:%.*]], label 
> [[FALSE2:%.*]]
> ; ASSUMPTIONS-ON:       common.ret:
> -; ASSUMPTIONS-ON-NEXT:    [[DOTSINK:%.*]] = phi i64 [ 3, [[FALSE1]] ], [ 2, 
> [[TMP0:%.*]] ]
> -; ASSUMPTIONS-ON-NEXT:    call void @llvm.assume(i1 true) [ "align"(i64* 
> [[PTR]], i64 8) ]
> +; ASSUMPTIONS-ON-NEXT:    [[DOTSINK:%.*]] = phi i64 [ 3, [[FALSE2]] ], [ 2, 
> [[TMP0:%.*]] ]
> +; ASSUMPTIONS-ON-NEXT:    call void @llvm.assume(i1 true) [ "align"(i64* 
> [[PTR:%.*]], i64 8) ]
> ; ASSUMPTIONS-ON-NEXT:    store volatile i64 0, i64* [[PTR]], align 8
> ; ASSUMPTIONS-ON-NEXT:    store volatile i64 -1, i64* [[PTR]], align 8
> ; ASSUMPTIONS-ON-NEXT:    store volatile i64 -1, i64* [[PTR]], align 8
> @@ -47,6 +44,9 @@ define void @caller1(i1 %c, i64* align 1 %ptr) {
> ; ASSUMPTIONS-ON-NEXT:    store volatile i64 -1, i64* [[PTR]], align 8
> ; ASSUMPTIONS-ON-NEXT:    store volatile i64 [[DOTSINK]], i64* [[PTR]], align 
> 8
> ; ASSUMPTIONS-ON-NEXT:    ret void
> +; ASSUMPTIONS-ON:       false2:
> +; ASSUMPTIONS-ON-NEXT:    store volatile i64 1, i64* [[PTR]], align 4
> +; ASSUMPTIONS-ON-NEXT:    br label [[COMMON_RET]]
> ;
>   br i1 %c, label %true1, label %false1
> 
> </cut>

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: [TCWG CI] 462.libquantum grew in size by 3% after llvm: [JumpThreading] Ignore free instructions

Reply via email to