Also, it slows down 464.h264ref by 7% at -O3 -flto: https://lists.linaro.org/pipermail/linaro-toolchain/2021-September/007882.html
Regards, -- Maxim Kuvyrkov https://www.linaro.org > On 27 Sep 2021, at 15:59, Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> wrote: > > Hi Nikita, > > FYI, your patch seems to increase code-size of 462.libquantum by 3%. Would > you please check if this could be easily fixed? > > Regards, > > -- > Maxim Kuvyrkov > https://www.linaro.org > >> On 27 Sep 2021, at 01:34, ci_not...@linaro.org wrote: >> >> After llvm commit 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff >> Author: Nikita Popov <nikita....@gmail.com> >> >> [JumpThreading] Ignore free instructions >> >> the following benchmarks grew in size by more than 1%: >> - 462.libquantum grew in size by 3% from 14035 to 14398 bytes >> >> Below reproducer instructions can be used to re-build both "first_bad" and >> "last_good" cross-toolchains used in this bisection. Naturally, the scripts >> will fail when triggerring benchmarking jobs if you don't have access to >> Linaro TCWG CI. >> >> For your convenience, we have uploaded tarballs with pre-processed source >> and assembly files at: >> - First_bad save-temps: >> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-arm-spec2k6-Os_LTO/4/artifact/artifacts/build-1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff/save-temps/ >> - Last_good save-temps: >> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-arm-spec2k6-Os_LTO/4/artifact/artifacts/build-1a6e1ee42a6af255d45e3fd2fe87021dd31f79bb/save-temps/ >> - Baseline save-temps: >> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-arm-spec2k6-Os_LTO/4/artifact/artifacts/build-baseline/save-temps/ >> >> Configuration: >> - Benchmark: SPEC CPU2006 >> - Toolchain: Clang + Glibc + LLVM Linker >> - Version: all components were built from their tip of trunk >> - Target: arm-linux-gnueabihf >> - Compiler flags: -Os -flto -mthumb >> - Hardware: APM Mustang 8x X-Gene1 >> >> This benchmarking CI is work-in-progress, and we welcome feedback and >> suggestions at linaro-toolchain@lists.linaro.org . In our improvement plans >> is to add support for SPEC CPU2017 benchmarks and provide "perf >> report/annotate" data behind these reports. >> >> THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, >> REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. >> >> This commit has regressed these CI configurations: >> - tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Os_LTO >> >> First_bad build: >> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-arm-spec2k6-Os_LTO/4/artifact/artifacts/build-1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff/ >> Last_good build: >> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-arm-spec2k6-Os_LTO/4/artifact/artifacts/build-1a6e1ee42a6af255d45e3fd2fe87021dd31f79bb/ >> Baseline build: >> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-arm-spec2k6-Os_LTO/4/artifact/artifacts/build-baseline/ >> Even more details: >> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-arm-spec2k6-Os_LTO/4/artifact/artifacts/ >> >> Reproduce builds: >> <cut> >> mkdir investigate-llvm-1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff >> cd investigate-llvm-1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff >> >> # Fetch scripts >> git clone https://git.linaro.org/toolchain/jenkins-scripts >> >> # Fetch manifests and test.sh script >> mkdir -p artifacts/manifests >> curl -o artifacts/manifests/build-baseline.sh >> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-arm-spec2k6-Os_LTO/4/artifact/artifacts/manifests/build-baseline.sh >> --fail >> curl -o artifacts/manifests/build-parameters.sh >> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-arm-spec2k6-Os_LTO/4/artifact/artifacts/manifests/build-parameters.sh >> --fail >> curl -o artifacts/test.sh >> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-arm-spec2k6-Os_LTO/4/artifact/artifacts/test.sh >> --fail >> chmod +x artifacts/test.sh >> >> # Reproduce the baseline build (build all pre-requisites) >> ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh >> >> # Save baseline build state (which is then restored in artifacts/test.sh) >> mkdir -p ./bisect >> rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ >> --exclude /llvm/ ./ ./bisect/baseline/ >> >> cd llvm >> >> # Reproduce first_bad build >> git checkout --detach 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff >> ../artifacts/test.sh >> >> # Reproduce last_good build >> git checkout --detach 1a6e1ee42a6af255d45e3fd2fe87021dd31f79bb >> ../artifacts/test.sh >> >> cd .. >> </cut> >> >> Full commit (up to 1000 lines): >> <cut> >> commit 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff >> Author: Nikita Popov <nikita....@gmail.com> >> Date: Wed Sep 22 21:34:24 2021 +0200 >> >> [JumpThreading] Ignore free instructions >> >> This is basically D108837 but for jump threading. Free instructions >> should be ignored for the threading decision. JumpThreading already >> skips some free instructions (like pointer bitcasts), but does not >> skip various free intrinsics -- in fact, it currently gives them a >> fairly large cost of 2. >> >> Differential Revision: https://reviews.llvm.org/D110290 >> --- >> .../include/llvm/Transforms/Scalar/JumpThreading.h | 8 +-- >> llvm/lib/Transforms/Scalar/JumpThreading.cpp | 61 >> ++++++++++------------ >> .../Transforms/JumpThreading/free_instructions.ll | 24 +++++---- >> .../inlining-alignment-assumptions.ll | 12 ++--- >> 4 files changed, 52 insertions(+), 53 deletions(-) >> >> diff --git a/llvm/include/llvm/Transforms/Scalar/JumpThreading.h >> b/llvm/include/llvm/Transforms/Scalar/JumpThreading.h >> index 816ea1071e52..0ac7d7c62b7a 100644 >> --- a/llvm/include/llvm/Transforms/Scalar/JumpThreading.h >> +++ b/llvm/include/llvm/Transforms/Scalar/JumpThreading.h >> @@ -44,6 +44,7 @@ class PHINode; >> class SelectInst; >> class SwitchInst; >> class TargetLibraryInfo; >> +class TargetTransformInfo; >> class Value; >> >> /// A private "module" namespace for types and utilities used by >> @@ -78,6 +79,7 @@ enum ConstantPreference { WantInteger, WantBlockAddress }; >> /// revectored to the false side of the second if. >> class JumpThreadingPass : public PassInfoMixin<JumpThreadingPass> { >> TargetLibraryInfo *TLI; >> + TargetTransformInfo *TTI; >> LazyValueInfo *LVI; >> AAResults *AA; >> DomTreeUpdater *DTU; >> @@ -99,9 +101,9 @@ public: >> JumpThreadingPass(bool InsertFreezeWhenUnfoldingSelect = false, int T = -1); >> >> // Glue for old PM. >> - bool runImpl(Function &F, TargetLibraryInfo *TLI, LazyValueInfo *LVI, >> - AAResults *AA, DomTreeUpdater *DTU, bool HasProfileData, >> - std::unique_ptr<BlockFrequencyInfo> BFI, >> + bool runImpl(Function &F, TargetLibraryInfo *TLI, TargetTransformInfo >> *TTI, >> + LazyValueInfo *LVI, AAResults *AA, DomTreeUpdater *DTU, >> + bool HasProfileData, std::unique_ptr<BlockFrequencyInfo> BFI, >> std::unique_ptr<BranchProbabilityInfo> BPI); >> >> PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM); >> diff --git a/llvm/lib/Transforms/Scalar/JumpThreading.cpp >> b/llvm/lib/Transforms/Scalar/JumpThreading.cpp >> index 688902ecb9ff..fe9a7211967c 100644 >> --- a/llvm/lib/Transforms/Scalar/JumpThreading.cpp >> +++ b/llvm/lib/Transforms/Scalar/JumpThreading.cpp >> @@ -331,7 +331,7 @@ bool JumpThreading::runOnFunction(Function &F) { >> BFI.reset(new BlockFrequencyInfo(F, *BPI, LI)); >> } >> >> - bool Changed = Impl.runImpl(F, TLI, LVI, AA, &DTU, F.hasProfileData(), >> + bool Changed = Impl.runImpl(F, TLI, TTI, LVI, AA, &DTU, >> F.hasProfileData(), >> std::move(BFI), std::move(BPI)); >> if (PrintLVIAfterJumpThreading) { >> dbgs() << "LVI for function '" << F.getName() << "':\n"; >> @@ -360,7 +360,7 @@ PreservedAnalyses JumpThreadingPass::run(Function &F, >> BFI.reset(new BlockFrequencyInfo(F, *BPI, LI)); >> } >> >> - bool Changed = runImpl(F, &TLI, &LVI, &AA, &DTU, F.hasProfileData(), >> + bool Changed = runImpl(F, &TLI, &TTI, &LVI, &AA, &DTU, F.hasProfileData(), >> std::move(BFI), std::move(BPI)); >> >> if (PrintLVIAfterJumpThreading) { >> @@ -377,12 +377,14 @@ PreservedAnalyses JumpThreadingPass::run(Function &F, >> } >> >> bool JumpThreadingPass::runImpl(Function &F, TargetLibraryInfo *TLI_, >> - LazyValueInfo *LVI_, AliasAnalysis *AA_, >> - DomTreeUpdater *DTU_, bool HasProfileData_, >> + TargetTransformInfo *TTI_, LazyValueInfo >> *LVI_, >> + AliasAnalysis *AA_, DomTreeUpdater *DTU_, >> + bool HasProfileData_, >> std::unique_ptr<BlockFrequencyInfo> BFI_, >> std::unique_ptr<BranchProbabilityInfo> BPI_) { >> LLVM_DEBUG(dbgs() << "Jump threading on function '" << F.getName() << >> "'\n"); >> TLI = TLI_; >> + TTI = TTI_; >> LVI = LVI_; >> AA = AA_; >> DTU = DTU_; >> @@ -514,7 +516,8 @@ static void replaceFoldableUses(Instruction *Cond, Value >> *ToVal) { >> /// Return the cost of duplicating a piece of this block from first non-phi >> /// and before StopAt instruction to thread across it. Stop scanning the >> block >> /// when exceeding the threshold. If duplication is impossible, returns ~0U. >> -static unsigned getJumpThreadDuplicationCost(BasicBlock *BB, >> +static unsigned getJumpThreadDuplicationCost(const TargetTransformInfo *TTI, >> + BasicBlock *BB, >> Instruction *StopAt, >> unsigned Threshold) { >> assert(StopAt->getParent() == BB && "Not an instruction from proper BB?"); >> @@ -550,26 +553,21 @@ static unsigned >> getJumpThreadDuplicationCost(BasicBlock *BB, >> if (Size > Threshold) >> return Size; >> >> - // Debugger intrinsics don't incur code size. >> - if (isa<DbgInfoIntrinsic>(I)) continue; >> - >> - // Pseudo-probes don't incur code size. >> - if (isa<PseudoProbeInst>(I)) >> - continue; >> - >> - // If this is a pointer->pointer bitcast, it is free. >> - if (isa<BitCastInst>(I) && I->getType()->isPointerTy()) >> - continue; >> - >> - // Freeze instruction is free, too. >> - if (isa<FreezeInst>(I)) >> - continue; >> - >> // Bail out if this instruction gives back a token type, it is not >> possible >> // to duplicate it if it is used outside this BB. >> if (I->getType()->isTokenTy() && I->isUsedOutsideOfBlock(BB)) >> return ~0U; >> >> + // Blocks with NoDuplicate are modelled as having infinite cost, so they >> + // are never duplicated. >> + if (const CallInst *CI = dyn_cast<CallInst>(I)) >> + if (CI->cannotDuplicate() || CI->isConvergent()) >> + return ~0U; >> + >> + if (TTI->getUserCost(&*I, TargetTransformInfo::TCK_SizeAndLatency) >> + == TargetTransformInfo::TCC_Free) >> + continue; >> + >> // All other instructions count for at least one unit. >> ++Size; >> >> @@ -578,11 +576,7 @@ static unsigned getJumpThreadDuplicationCost(BasicBlock >> *BB, >> // as having cost of 2 total, and if they are a vector intrinsic, we model >> // them as having cost 1. >> if (const CallInst *CI = dyn_cast<CallInst>(I)) { >> - if (CI->cannotDuplicate() || CI->isConvergent()) >> - // Blocks with NoDuplicate are modelled as having infinite cost, so >> they >> - // are never duplicated. >> - return ~0U; >> - else if (!isa<IntrinsicInst>(CI)) >> + if (!isa<IntrinsicInst>(CI)) >> Size += 3; >> else if (!CI->getType()->isVectorTy()) >> Size += 1; >> @@ -2234,10 +2228,10 @@ bool >> JumpThreadingPass::maybethreadThroughTwoBasicBlocks(BasicBlock *BB, >> } >> >> // Compute the cost of duplicating BB and PredBB. >> - unsigned BBCost = >> - getJumpThreadDuplicationCost(BB, BB->getTerminator(), BBDupThreshold); >> + unsigned BBCost = getJumpThreadDuplicationCost( >> + TTI, BB, BB->getTerminator(), BBDupThreshold); >> unsigned PredBBCost = getJumpThreadDuplicationCost( >> - PredBB, PredBB->getTerminator(), BBDupThreshold); >> + TTI, PredBB, PredBB->getTerminator(), BBDupThreshold); >> >> // Give up if costs are too high. We need to check BBCost and PredBBCost >> // individually before checking their sum because >> getJumpThreadDuplicationCost >> @@ -2345,8 +2339,8 @@ bool JumpThreadingPass::tryThreadEdge( >> return false; >> } >> >> - unsigned JumpThreadCost = >> - getJumpThreadDuplicationCost(BB, BB->getTerminator(), BBDupThreshold); >> + unsigned JumpThreadCost = getJumpThreadDuplicationCost( >> + TTI, BB, BB->getTerminator(), BBDupThreshold); >> if (JumpThreadCost > BBDupThreshold) { >> LLVM_DEBUG(dbgs() << " Not threading BB '" << BB->getName() >> << "' - Cost is too high: " << JumpThreadCost << "\n"); >> @@ -2614,8 +2608,8 @@ bool >> JumpThreadingPass::duplicateCondBranchOnPHIIntoPred( >> return false; >> } >> >> - unsigned DuplicationCost = >> - getJumpThreadDuplicationCost(BB, BB->getTerminator(), BBDupThreshold); >> + unsigned DuplicationCost = getJumpThreadDuplicationCost( >> + TTI, BB, BB->getTerminator(), BBDupThreshold); >> if (DuplicationCost > BBDupThreshold) { >> LLVM_DEBUG(dbgs() << " Not duplicating BB '" << BB->getName() >> << "' - Cost is too high: " << DuplicationCost << "\n"); >> @@ -3031,7 +3025,8 @@ bool JumpThreadingPass::threadGuard(BasicBlock *BB, >> IntrinsicInst *Guard, >> >> ValueToValueMapTy UnguardedMapping, GuardedMapping; >> Instruction *AfterGuard = Guard->getNextNode(); >> - unsigned Cost = getJumpThreadDuplicationCost(BB, AfterGuard, >> BBDupThreshold); >> + unsigned Cost = >> + getJumpThreadDuplicationCost(TTI, BB, AfterGuard, BBDupThreshold); >> if (Cost > BBDupThreshold) >> return false; >> // Duplicate all instructions before the guard and the guard itself to the >> diff --git a/llvm/test/Transforms/JumpThreading/free_instructions.ll >> b/llvm/test/Transforms/JumpThreading/free_instructions.ll >> index f768ec996779..76392af77d33 100644 >> --- a/llvm/test/Transforms/JumpThreading/free_instructions.ll >> +++ b/llvm/test/Transforms/JumpThreading/free_instructions.ll >> @@ -5,26 +5,28 @@ >> ; the jump threading threshold, as everything else are free instructions. >> define i32 @free_instructions(i1 %c, i32* %p) { >> ; CHECK-LABEL: @free_instructions( >> -; CHECK-NEXT: br i1 [[C:%.*]], label [[IF:%.*]], label [[ELSE:%.*]] >> -; CHECK: if: >> +; CHECK-NEXT: br i1 [[C:%.*]], label [[IF2:%.*]], label [[ELSE2:%.*]] >> +; CHECK: if2: >> ; CHECK-NEXT: store i32 -1, i32* [[P:%.*]], align 4 >> -; CHECK-NEXT: br label [[JOIN:%.*]] >> -; CHECK: else: >> -; CHECK-NEXT: store i32 -2, i32* [[P]], align 4 >> -; CHECK-NEXT: br label [[JOIN]] >> -; CHECK: join: >> ; CHECK-NEXT: call void @llvm.experimental.noalias.scope.decl(metadata >> [[META0:![0-9]+]]) >> ; CHECK-NEXT: store i32 1, i32* [[P]], align 4, !noalias !0 >> ; CHECK-NEXT: call void @llvm.assume(i1 true) [ "align"(i32* [[P]], i64 >> 32) ] >> ; CHECK-NEXT: store i32 2, i32* [[P]], align 4 >> +; CHECK-NEXT: [[P21:%.*]] = bitcast i32* [[P]] to i8* >> +; CHECK-NEXT: [[P32:%.*]] = call i8* >> @llvm.launder.invariant.group.p0i8(i8* [[P21]]) >> +; CHECK-NEXT: [[P43:%.*]] = bitcast i8* [[P32]] to i32* >> +; CHECK-NEXT: store i32 3, i32* [[P43]], align 4, !invariant.group !3 >> +; CHECK-NEXT: ret i32 0 >> +; CHECK: else2: >> +; CHECK-NEXT: store i32 -2, i32* [[P]], align 4 >> +; CHECK-NEXT: call void @llvm.experimental.noalias.scope.decl(metadata >> [[META4:![0-9]+]]) >> +; CHECK-NEXT: store i32 1, i32* [[P]], align 4, !noalias !4 >> +; CHECK-NEXT: call void @llvm.assume(i1 true) [ "align"(i32* [[P]], i64 >> 32) ] >> +; CHECK-NEXT: store i32 2, i32* [[P]], align 4 >> ; CHECK-NEXT: [[P2:%.*]] = bitcast i32* [[P]] to i8* >> ; CHECK-NEXT: [[P3:%.*]] = call i8* >> @llvm.launder.invariant.group.p0i8(i8* [[P2]]) >> ; CHECK-NEXT: [[P4:%.*]] = bitcast i8* [[P3]] to i32* >> ; CHECK-NEXT: store i32 3, i32* [[P4]], align 4, !invariant.group !3 >> -; CHECK-NEXT: br i1 [[C]], label [[IF2:%.*]], label [[ELSE2:%.*]] >> -; CHECK: if2: >> -; CHECK-NEXT: ret i32 0 >> -; CHECK: else2: >> ; CHECK-NEXT: ret i32 1 >> ; >> br i1 %c, label %if, label %else >> diff --git >> a/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll >> b/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll >> index 57014e856a09..f764a59dd8a2 100644 >> --- a/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll >> +++ b/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll >> @@ -32,13 +32,10 @@ define void @caller1(i1 %c, i64* align 1 %ptr) { >> ; ASSUMPTIONS-OFF-NEXT: br label [[COMMON_RET]] >> ; >> ; ASSUMPTIONS-ON-LABEL: @caller1( >> -; ASSUMPTIONS-ON-NEXT: br i1 [[C:%.*]], label [[COMMON_RET:%.*]], label >> [[FALSE1:%.*]] >> -; ASSUMPTIONS-ON: false1: >> -; ASSUMPTIONS-ON-NEXT: store volatile i64 1, i64* [[PTR:%.*]], align 4 >> -; ASSUMPTIONS-ON-NEXT: br label [[COMMON_RET]] >> +; ASSUMPTIONS-ON-NEXT: br i1 [[C:%.*]], label [[COMMON_RET:%.*]], label >> [[FALSE2:%.*]] >> ; ASSUMPTIONS-ON: common.ret: >> -; ASSUMPTIONS-ON-NEXT: [[DOTSINK:%.*]] = phi i64 [ 3, [[FALSE1]] ], [ 2, >> [[TMP0:%.*]] ] >> -; ASSUMPTIONS-ON-NEXT: call void @llvm.assume(i1 true) [ "align"(i64* >> [[PTR]], i64 8) ] >> +; ASSUMPTIONS-ON-NEXT: [[DOTSINK:%.*]] = phi i64 [ 3, [[FALSE2]] ], [ 2, >> [[TMP0:%.*]] ] >> +; ASSUMPTIONS-ON-NEXT: call void @llvm.assume(i1 true) [ "align"(i64* >> [[PTR:%.*]], i64 8) ] >> ; ASSUMPTIONS-ON-NEXT: store volatile i64 0, i64* [[PTR]], align 8 >> ; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8 >> ; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8 >> @@ -47,6 +44,9 @@ define void @caller1(i1 %c, i64* align 1 %ptr) { >> ; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8 >> ; ASSUMPTIONS-ON-NEXT: store volatile i64 [[DOTSINK]], i64* [[PTR]], >> align 8 >> ; ASSUMPTIONS-ON-NEXT: ret void >> +; ASSUMPTIONS-ON: false2: >> +; ASSUMPTIONS-ON-NEXT: store volatile i64 1, i64* [[PTR]], align 4 >> +; ASSUMPTIONS-ON-NEXT: br label [[COMMON_RET]] >> ; >> br i1 %c, label %true1, label %false1 >> >> </cut> > _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain