> >
> >
> > > Am 16.11.2024 um 14:08 schrieb Jan Hubicka <[email protected]>:
> > >
> > > Ignore conditions guarding __builtin_unreachable in inliner metrics
> > >
> > > This patch extends my last year attempt to make inliner metric ignore
> > > conditionals guarding __builtin_unreachable. Compared to previous patch,
> > > this
> > > one implements a "mini-dce" in ipa-fnsummary to avoid accounting all
> > > statements
> > > that are only used to determine conditionals guarding
> > > __builtin_unnecesary.
> > > These will be removed later once value ranges are determined.
> > >
> > > While working on this, I noticed that we do have a lot of dead code while
> > > computing fnsummary for early inline. Those are only used to apply
> > > large-function growth, but it seems there is enough dead code to make this
> > > valud kind of irrelevant. Also there seems to be quite a lot of
> > > const/pure
> > > calls that can be cheaply removed before we inline them. So I wonder if
> > > we
> > > want to run one DCE before early inlining.
> >
> > I would not have expected a ‚lot‘ of dead const function calls. By same
> > argument we should rather run CCP before inlining as that tends to prune
> > most dead code early?
Just to quantify a lot, on tramp3d there are 2263 calls declared dead
and 34206 declared live (by my mini-dce) in fnsummary1. So 6%.
There are overall 3797 unnecesary stmts and 97263 necessary ones.
So most of dead stuff are actually calls.
In IPA fnsummary there are 4 dead calls, all of them .part clones.
This looks like missed optimization pre ipa-fnsplit.
Those are most frequent ones:
12 skipping unnecesary stmt
Evaluator<RemoteMultiPatchEvaluatorTag>::Evaluator (&evaluator);
12 skipping unnecesary stmt Pooma::DummyMutex::lock (_1);
16 skipping unnecesary stmt ForEach<Scalar<double>, DomainFunctorTag,
DomainFunctorTag>::apply (_2, f_7(D), c_8(D));
23 skipping unnecesary stmt operator delete (_5, _2);
24 skipping unnecesary stmt
Evaluator<RemoteMultiPatchEvaluatorTag>::~Evaluator (&evaluator);
28 skipping unnecesary stmt ForEach<Scalar<double>, DomainFunctorTag,
DomainFunctorTag>::apply (_1, f_6(D), c_7(D));
37 skipping unnecesary stmt _37 =
Field<UniformRectilinearMesh<MeshTraits<3, double, UniformRectilinearTag,
CartesianTag, 3> >, double, BrickView>::engine (_9);
37 skipping unnecesary stmt _39 = engineFunctor<Engine<3, double,
BrickView>, DataObjectRequest<BlockAffinity> > (_10, &getAffinity);
43 skipping unnecesary stmt
DataObjectRequest<BlockAffinity>::DataObjectRequest (&getAffinity);
43 skipping unnecesary stmt
MultiArgEvaluator<SinglePatchEvaluatorTag>::MultiArgEvaluator (&speval);
43 skipping unnecesary stmt Smarts::Iterate<Smarts::Stub>::hintAffinity
(_8, _11);
86 skipping unnecesary stmt
MultiArgEvaluator<SinglePatchEvaluatorTag>::~MultiArgEvaluator (&speval);
227 skipping unnecesary stmt PoomaCTAssert<true>::test ();
Mostly those are functions which are empty at release_ssa time. Perhaps
we could special case pures/const calls with no LHS and get rid of them
during early inline. This is cheaper then doing actual inline which
triggers a lot of logic in tree-inline...
Honza