Re: [PATCH] Fix PR37448 (and dups?)

Richard Biener Thu, 18 Feb 2016 07:27:07 -0800

On Thu, 18 Feb 2016, Jan Hubicka wrote:

> > 
> > This fixes the IPA inline analysis quadraticness that arises when we
> > inline functions into the same function many times via
> > inline_to_all_callers as we do in the testcase for PR37448.  In that
> > case updating the overall summary of the caller is done for each
> > inlined call instead of just once.
> > 
> > The solution is to keep track of which nodes we inlined to and
> > delay the summary update.
> > 
> > Honza, I believe this is safe as the callers summary should only
> > matter for further inline decisions and not this one (inlining
> > into all callers).  Double-checking is appreciated - there should
> > be no code changes by this patch (I verified this on the PR37448
> > testcase).
> 
> I think it is not the case if one function contains multiple calls
> and eventually hits large function growth.  In that case we inline
> just some callers and not the others.


Are you sure?  I thought we compute that up-front ... at least
we make sure we can_inline_edge_p all calls before even trying
to inline all calls - that looks somewhat redundant then if it
can happen that we give up anyway.  Ah, so can_inline_edge_p has

  /* Check if caller growth allows the inlining.  */
  else if (!DECL_DISREGARD_INLINE_LIMITS (callee->decl)
           && !disregard_limits
           && !lookup_attribute ("flatten",
                                 DECL_ATTRIBUTES (caller->decl))
           && !caller_growth_limits (e))
    inlinable = false;

and we check that from when we do the inlining and upfront from
want_inline_function_to_all_callers_p ... we also give up
if we inline a recursive function it seems.

> For next stage1 I plan to finish the overahaul to sreals that will let
> me to make update_overall_summary incremental (i.e. 
> O(size_of_inlined_function)
> and not O(size_of_inlined_function+size_of_function_inlined_to)) that will
> also solve these issues.

Hopefully.

> One can probably do a pesimistic estimate on code size of the function 
> (i.e. add the size of inlined function) and only if that hits the large 
> function growth do the exact calculation. Or we can just decide that it 
> is OK to ignore large function growht in this partiuclar case.

Ignoring it is probably not a good idea and will just lead to a different
degenerate case.  As we update the summary afterwards it's probably
ok to do incremental (pessimistic) updates on the size anyway?  In
inline_merge_summary I mean?  Or should I open-code that into
inline_call if ! update_overall_summary?

Richard.

> Honza
> > 
> > This brings down compile-time of ipa inlining heuristics from
> > 
> >  ipa inlining heuristics : 260.14 (84%) usr   0.50 (23%) sys 260.91 (84%) 
> > wall  102217 kB (10%) ggc
> > 
> > (that's an optimized non-checking compiler) to
> > 
> >  ipa inlining heuristics :   3.52 ( 3%) usr   0.55 (12%) sys   4.21 ( 3%) 
> > wall  102216 kB (12%) ggc
> > 
> > (that's an unoptimized, checking enabled compiler, "before" on that
> > unoptimized compiler is  ipa inlining heuristics : 935.06 (89%)).
> > 
> > We still do a lot of inlining here so we see
> > 
> >  integration             :  21.22 (17%) usr   0.45 (10%) sys  21.68 (17%) 
> > wall  142979 kB (17%) ggc
> > 
> > which is supposedly expected (same unoptimized, checking enabled 
> > compiler).
> > 
> > Bootstrap & regtest is running on x86_64-unknown-linux-gnu, ok for trunk
> > and branch(es)?
> > 
> > Thanks,
> > Richard.
> > 
> > 2016-02-18  Richard Biener  <rguent...@suse.de>
> > 
> >     PR ipa/37448
> >     * ipa-inline.c (inline_to_all_callers_1): Add to the callers
> >     hash-set, do not update overall summaries here.  Renamed from ...
> >     (inline_to_all_callers): ... this which is now wrapping the
> >     above and performing delayed overall summary update.
> > 
> > Index: gcc/ipa-inline.c
> > ===================================================================
> > *** gcc/ipa-inline.c        (revision 233498)
> > --- gcc/ipa-inline.c        (working copy)
> > *************** flatten_function (struct cgraph_node *no
> > *** 2163,2169 ****
> >      recursion.  */
> >   
> >   static bool
> > ! inline_to_all_callers (struct cgraph_node *node, void *data)
> >   {
> >     int *num_calls = (int *)data;
> >     bool callee_removed = false;
> > --- 2163,2170 ----
> >      recursion.  */
> >   
> >   static bool
> > ! inline_to_all_callers_1 (struct cgraph_node *node, void *data,
> > !                    hash_set<cgraph_node *> *callers)
> >   {
> >     int *num_calls = (int *)data;
> >     bool callee_removed = false;
> > *************** inline_to_all_callers (struct cgraph_nod
> > *** 2193,2199 ****
> >                inline_summaries->get (node->callers->caller)->size);
> >     }
> >   
> > !       inline_call (node->callers, true, NULL, NULL, true, 
> > &callee_removed);
> >         if (dump_file)
> >     fprintf (dump_file,
> >              " Inlined into %s which now has %i size\n",
> > --- 2194,2203 ----
> >                inline_summaries->get (node->callers->caller)->size);
> >     }
> >   
> > !       /* Remember which callers we inlined to, delaying updating the
> > !    overall summary.  */
> > !       callers->add (node->callers->caller);
> > !       inline_call (node->callers, true, NULL, NULL, false, 
> > &callee_removed);
> >         if (dump_file)
> >     fprintf (dump_file,
> >              " Inlined into %s which now has %i size\n",
> > *************** inline_to_all_callers (struct cgraph_nod
> > *** 2211,2216 ****
> > --- 2215,2237 ----
> >     return false;
> >   }
> >   
> > + /* Wrapper around inline_to_all_callers_1 doing delayed overall summary
> > +    update.  */
> > + 
> > + static bool
> > + inline_to_all_callers (struct cgraph_node *node, void *data)
> > + {
> > +   hash_set<cgraph_node *> callers;
> > +   bool res = inline_to_all_callers_1 (node, data, &callers);
> > +   /* Perform the delayed update of the overall summary of all callers
> > +      processed.  This avoids quadratic behavior in the cases where
> > +      we have a lot of calls to the same function.  */
> > +   for (hash_set<cgraph_node *>::iterator i = callers.begin ();
> > +        i != callers.end (); ++i)
> > +     inline_update_overall_summary (*i);
> > +   return res;
> > + }
> > + 
> >   /* Output overall time estimate.  */
> >   static void
> >   dump_overall_stats (void)
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [PATCH] Fix PR37448 (and dups?)

Reply via email to