On Thu, 18 Feb 2016, Jan Hubicka wrote: > > > > This fixes the IPA inline analysis quadraticness that arises when we > > inline functions into the same function many times via > > inline_to_all_callers as we do in the testcase for PR37448. In that > > case updating the overall summary of the caller is done for each > > inlined call instead of just once. > > > > The solution is to keep track of which nodes we inlined to and > > delay the summary update. > > > > Honza, I believe this is safe as the callers summary should only > > matter for further inline decisions and not this one (inlining > > into all callers). Double-checking is appreciated - there should > > be no code changes by this patch (I verified this on the PR37448 > > testcase). > > I think it is not the case if one function contains multiple calls > and eventually hits large function growth. In that case we inline > just some callers and not the others.
Are you sure? I thought we compute that up-front ... at least we make sure we can_inline_edge_p all calls before even trying to inline all calls - that looks somewhat redundant then if it can happen that we give up anyway. Ah, so can_inline_edge_p has /* Check if caller growth allows the inlining. */ else if (!DECL_DISREGARD_INLINE_LIMITS (callee->decl) && !disregard_limits && !lookup_attribute ("flatten", DECL_ATTRIBUTES (caller->decl)) && !caller_growth_limits (e)) inlinable = false; and we check that from when we do the inlining and upfront from want_inline_function_to_all_callers_p ... we also give up if we inline a recursive function it seems. > For next stage1 I plan to finish the overahaul to sreals that will let > me to make update_overall_summary incremental (i.e. > O(size_of_inlined_function) > and not O(size_of_inlined_function+size_of_function_inlined_to)) that will > also solve these issues. Hopefully. > One can probably do a pesimistic estimate on code size of the function > (i.e. add the size of inlined function) and only if that hits the large > function growth do the exact calculation. Or we can just decide that it > is OK to ignore large function growht in this partiuclar case. Ignoring it is probably not a good idea and will just lead to a different degenerate case. As we update the summary afterwards it's probably ok to do incremental (pessimistic) updates on the size anyway? In inline_merge_summary I mean? Or should I open-code that into inline_call if ! update_overall_summary? Richard. > Honza > > > > This brings down compile-time of ipa inlining heuristics from > > > > ipa inlining heuristics : 260.14 (84%) usr 0.50 (23%) sys 260.91 (84%) > > wall 102217 kB (10%) ggc > > > > (that's an optimized non-checking compiler) to > > > > ipa inlining heuristics : 3.52 ( 3%) usr 0.55 (12%) sys 4.21 ( 3%) > > wall 102216 kB (12%) ggc > > > > (that's an unoptimized, checking enabled compiler, "before" on that > > unoptimized compiler is ipa inlining heuristics : 935.06 (89%)). > > > > We still do a lot of inlining here so we see > > > > integration : 21.22 (17%) usr 0.45 (10%) sys 21.68 (17%) > > wall 142979 kB (17%) ggc > > > > which is supposedly expected (same unoptimized, checking enabled > > compiler). > > > > Bootstrap & regtest is running on x86_64-unknown-linux-gnu, ok for trunk > > and branch(es)? > > > > Thanks, > > Richard. > > > > 2016-02-18 Richard Biener <rguent...@suse.de> > > > > PR ipa/37448 > > * ipa-inline.c (inline_to_all_callers_1): Add to the callers > > hash-set, do not update overall summaries here. Renamed from ... > > (inline_to_all_callers): ... this which is now wrapping the > > above and performing delayed overall summary update. > > > > Index: gcc/ipa-inline.c > > =================================================================== > > *** gcc/ipa-inline.c (revision 233498) > > --- gcc/ipa-inline.c (working copy) > > *************** flatten_function (struct cgraph_node *no > > *** 2163,2169 **** > > recursion. */ > > > > static bool > > ! inline_to_all_callers (struct cgraph_node *node, void *data) > > { > > int *num_calls = (int *)data; > > bool callee_removed = false; > > --- 2163,2170 ---- > > recursion. */ > > > > static bool > > ! inline_to_all_callers_1 (struct cgraph_node *node, void *data, > > ! hash_set<cgraph_node *> *callers) > > { > > int *num_calls = (int *)data; > > bool callee_removed = false; > > *************** inline_to_all_callers (struct cgraph_nod > > *** 2193,2199 **** > > inline_summaries->get (node->callers->caller)->size); > > } > > > > ! inline_call (node->callers, true, NULL, NULL, true, > > &callee_removed); > > if (dump_file) > > fprintf (dump_file, > > " Inlined into %s which now has %i size\n", > > --- 2194,2203 ---- > > inline_summaries->get (node->callers->caller)->size); > > } > > > > ! /* Remember which callers we inlined to, delaying updating the > > ! overall summary. */ > > ! callers->add (node->callers->caller); > > ! inline_call (node->callers, true, NULL, NULL, false, > > &callee_removed); > > if (dump_file) > > fprintf (dump_file, > > " Inlined into %s which now has %i size\n", > > *************** inline_to_all_callers (struct cgraph_nod > > *** 2211,2216 **** > > --- 2215,2237 ---- > > return false; > > } > > > > + /* Wrapper around inline_to_all_callers_1 doing delayed overall summary > > + update. */ > > + > > + static bool > > + inline_to_all_callers (struct cgraph_node *node, void *data) > > + { > > + hash_set<cgraph_node *> callers; > > + bool res = inline_to_all_callers_1 (node, data, &callers); > > + /* Perform the delayed update of the overall summary of all callers > > + processed. This avoids quadratic behavior in the cases where > > + we have a lot of calls to the same function. */ > > + for (hash_set<cgraph_node *>::iterator i = callers.begin (); > > + i != callers.end (); ++i) > > + inline_update_overall_summary (*i); > > + return res; > > + } > > + > > /* Output overall time estimate. */ > > static void > > dump_overall_stats (void) > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)