On Sun, Nov 17, 2024 at 1:25 AM Jan Hubicka <hubi...@ucw.cz> wrote: > > > > > > > > > > > Am 16.11.2024 um 14:08 schrieb Jan Hubicka <hubi...@ucw.cz>: > > > > > > > > Ignore conditions guarding __builtin_unreachable in inliner metrics > > > > > > > > This patch extends my last year attempt to make inliner metric ignore > > > > conditionals guarding __builtin_unreachable. Compared to previous > > > > patch, this > > > > one implements a "mini-dce" in ipa-fnsummary to avoid accounting all > > > > statements > > > > that are only used to determine conditionals guarding > > > > __builtin_unnecesary. > > > > These will be removed later once value ranges are determined. > > > > > > > > While working on this, I noticed that we do have a lot of dead code > > > > while > > > > computing fnsummary for early inline. Those are only used to apply > > > > large-function growth, but it seems there is enough dead code to make > > > > this > > > > valud kind of irrelevant. Also there seems to be quite a lot of > > > > const/pure > > > > calls that can be cheaply removed before we inline them. So I wonder > > > > if we > > > > want to run one DCE before early inlining. > > > > > > I would not have expected a ‚lot‘ of dead const function calls. By same > > > argument we should rather run CCP before inlining as that tends to prune > > > most dead code early? > Just to quantify a lot, on tramp3d there are 2263 calls declared dead > and 34206 declared live (by my mini-dce) in fnsummary1. So 6%. > There are overall 3797 unnecesary stmts and 97263 necessary ones. > So most of dead stuff are actually calls. > > In IPA fnsummary there are 4 dead calls, all of them .part clones. > This looks like missed optimization pre ipa-fnsplit. > > Those are most frequent ones: > 12 skipping unnecesary stmt > Evaluator<RemoteMultiPatchEvaluatorTag>::Evaluator (&evaluator); > 12 skipping unnecesary stmt Pooma::DummyMutex::lock (_1); > 16 skipping unnecesary stmt ForEach<Scalar<double>, DomainFunctorTag, > DomainFunctorTag>::apply (_2, f_7(D), c_8(D)); > 23 skipping unnecesary stmt operator delete (_5, _2); > 24 skipping unnecesary stmt > Evaluator<RemoteMultiPatchEvaluatorTag>::~Evaluator (&evaluator); > 28 skipping unnecesary stmt ForEach<Scalar<double>, DomainFunctorTag, > DomainFunctorTag>::apply (_1, f_6(D), c_7(D)); > 37 skipping unnecesary stmt _37 = > Field<UniformRectilinearMesh<MeshTraits<3, double, UniformRectilinearTag, > CartesianTag, 3> >, double, BrickView>::engine (_9); > 37 skipping unnecesary stmt _39 = engineFunctor<Engine<3, double, > BrickView>, DataObjectRequest<BlockAffinity> > (_10, &getAffinity); > 43 skipping unnecesary stmt > DataObjectRequest<BlockAffinity>::DataObjectRequest (&getAffinity); > 43 skipping unnecesary stmt > MultiArgEvaluator<SinglePatchEvaluatorTag>::MultiArgEvaluator (&speval); > 43 skipping unnecesary stmt > Smarts::Iterate<Smarts::Stub>::hintAffinity (_8, _11); > 86 skipping unnecesary stmt > MultiArgEvaluator<SinglePatchEvaluatorTag>::~MultiArgEvaluator (&speval); > 227 skipping unnecesary stmt PoomaCTAssert<true>::test (); > > Mostly those are functions which are empty at release_ssa time. Perhaps > we could special case pures/const calls with no LHS and get rid of them > during early inline. This is cheaper then doing actual inline which > triggers a lot of logic in tree-inline...
That's an interesting idea. Note the original thinking was that CFG cleanup gets rid of most dead code (from C++ templates). forwprop as a very simple "CCP" and "DCE" as part of its RPO walk with lattice and folding, CCP can be somewhat expensive due to it using the SSA propagator, DCE can be expensive due to its weak DSE support. Richard. > > Honza