Re: [PATCH] Fix PR37448 (and dups?)

Jan Hubicka Thu, 18 Feb 2016 07:16:31 -0800

> 
> This fixes the IPA inline analysis quadraticness that arises when we
> inline functions into the same function many times via
> inline_to_all_callers as we do in the testcase for PR37448.  In that
> case updating the overall summary of the caller is done for each
> inlined call instead of just once.
> 
> The solution is to keep track of which nodes we inlined to and
> delay the summary update.
> 
> Honza, I believe this is safe as the callers summary should only
> matter for further inline decisions and not this one (inlining
> into all callers).  Double-checking is appreciated - there should
> be no code changes by this patch (I verified this on the PR37448
> testcase).


I think it is not the case if one function contains multiple calls
and eventually hits large function growth.  In that case we inline
just some callers and not the others.

For next stage1 I plan to finish the overahaul to sreals that will let
me to make update_overall_summary incremental (i.e. O(size_of_inlined_function)
and not O(size_of_inlined_function+size_of_function_inlined_to)) that will
also solve these issues.

One can probably do a pesimistic estimate on code size of the function (i.e. add
the size of inlined function) and only if that hits the large function growth
do the exact calculation. Or we can just decide that it is OK to ignore large
function growht in this partiuclar case.

Honza
> 
> This brings down compile-time of ipa inlining heuristics from
> 
>  ipa inlining heuristics : 260.14 (84%) usr   0.50 (23%) sys 260.91 (84%) 
> wall  102217 kB (10%) ggc
> 
> (that's an optimized non-checking compiler) to
> 
>  ipa inlining heuristics :   3.52 ( 3%) usr   0.55 (12%) sys   4.21 ( 3%) 
> wall  102216 kB (12%) ggc
> 
> (that's an unoptimized, checking enabled compiler, "before" on that
> unoptimized compiler is  ipa inlining heuristics : 935.06 (89%)).
> 
> We still do a lot of inlining here so we see
> 
>  integration             :  21.22 (17%) usr   0.45 (10%) sys  21.68 (17%) 
> wall  142979 kB (17%) ggc
> 
> which is supposedly expected (same unoptimized, checking enabled 
> compiler).
> 
> Bootstrap & regtest is running on x86_64-unknown-linux-gnu, ok for trunk
> and branch(es)?
> 
> Thanks,
> Richard.
> 
> 2016-02-18  Richard Biener  <rguent...@suse.de>
> 
>       PR ipa/37448
>       * ipa-inline.c (inline_to_all_callers_1): Add to the callers
>       hash-set, do not update overall summaries here.  Renamed from ...
>       (inline_to_all_callers): ... this which is now wrapping the
>       above and performing delayed overall summary update.
> 
> Index: gcc/ipa-inline.c
> ===================================================================
> *** gcc/ipa-inline.c  (revision 233498)
> --- gcc/ipa-inline.c  (working copy)
> *************** flatten_function (struct cgraph_node *no
> *** 2163,2169 ****
>      recursion.  */
>   
>   static bool
> ! inline_to_all_callers (struct cgraph_node *node, void *data)
>   {
>     int *num_calls = (int *)data;
>     bool callee_removed = false;
> --- 2163,2170 ----
>      recursion.  */
>   
>   static bool
> ! inline_to_all_callers_1 (struct cgraph_node *node, void *data,
> !                      hash_set<cgraph_node *> *callers)
>   {
>     int *num_calls = (int *)data;
>     bool callee_removed = false;
> *************** inline_to_all_callers (struct cgraph_nod
> *** 2193,2199 ****
>                  inline_summaries->get (node->callers->caller)->size);
>       }
>   
> !       inline_call (node->callers, true, NULL, NULL, true, &callee_removed);
>         if (dump_file)
>       fprintf (dump_file,
>                " Inlined into %s which now has %i size\n",
> --- 2194,2203 ----
>                  inline_summaries->get (node->callers->caller)->size);
>       }
>   
> !       /* Remember which callers we inlined to, delaying updating the
> !      overall summary.  */
> !       callers->add (node->callers->caller);
> !       inline_call (node->callers, true, NULL, NULL, false, &callee_removed);
>         if (dump_file)
>       fprintf (dump_file,
>                " Inlined into %s which now has %i size\n",
> *************** inline_to_all_callers (struct cgraph_nod
> *** 2211,2216 ****
> --- 2215,2237 ----
>     return false;
>   }
>   
> + /* Wrapper around inline_to_all_callers_1 doing delayed overall summary
> +    update.  */
> + 
> + static bool
> + inline_to_all_callers (struct cgraph_node *node, void *data)
> + {
> +   hash_set<cgraph_node *> callers;
> +   bool res = inline_to_all_callers_1 (node, data, &callers);
> +   /* Perform the delayed update of the overall summary of all callers
> +      processed.  This avoids quadratic behavior in the cases where
> +      we have a lot of calls to the same function.  */
> +   for (hash_set<cgraph_node *>::iterator i = callers.begin ();
> +        i != callers.end (); ++i)
> +     inline_update_overall_summary (*i);
> +   return res;
> + }
> + 
>   /* Output overall time estimate.  */
>   static void
>   dump_overall_stats (void)

Re: [PATCH] Fix PR37448 (and dups?)

Reply via email to