http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60911

--- Comment #10 from Jan Hubicka <hubicka at ucw dot cz> ---
> 
> We also have, in execute_one_pass,
> 
>   /* SIPLE IPA passes do not handle callgraphs with IPA transforms in it.
>      Apply all trnasforms first.  */
>   if (pass->type == SIMPLE_IPA_PASS)
>     {
>       bool applied = false;
>       do_per_function (apply_ipa_transforms, (void *)&applied);
>       if (applied)
>         symtab_remove_unreachable_nodes (true, dump_file);
>       /* Restore current_pass.  */
>       current_pass = pass;
> 
> but that doesn't seem to work ... at least with LTO?  Which is
> because for main()
> 
>       FOR_EACH_DEFINED_FUNCTION (node)
>         if (node->analyzed && gimple_has_body_p (node->decl)
>             && (!node->clone_of || node->decl != node->clone_of->decl))
>           {
> 
> this doesn't trigger (for f.constprop it does).
> gimple_has_body_p (node->decl) is false.

yeah, this is something I missed with the lazy loading. There is predicate
cgraph_function_with_gimple_body_p with does not care about the lazy stuff
(i.e. will return same value for both). The problem is that we do not want to
run the verifiers that are also called by this for_each.* wrapper.

As written in previous post, for 4.10, I think we should have cgraph_get_body
to lazily load body and apply transformations/produce the clone. In short it
should
do whatever necessary to actually get the real body to the optimizer.

I want to keep the default optimization queue to not necessarily load
everything into memory. Therefore I think late IPA passes should be responsible
to call cgraph_get_body themselves to avoid growth in ltrans peak memory use
for passes that touch just few functions (like this SIMD stuff). 

In long term IPA-PTA, that is an example of pass that touch all bodies, should
go to real IPA queue IMO and be reimplemented by one of the more scalable
algorithms perhaps with context senstivity. In fact I have student (Lada Laska)
who is looking into the topic.

For local passes I think it should be pass manager's responsibilty to load the
bodies, as it is now.

This would hopefully not be difficult to use and it won't boost peak ltrans
memory that now, with WPA improvements, seems to be a bottleneck
http://hubicka.blogspot.ca/2014/04/linktime-optimization-in-gcc-2-firefox.html
at least for firefox and -flto=16 (it should actually reduce the peak)

For 4.9 I would suggest simply making the above code to apply transformations
even with LTO (i.e. drop the gimple_has_body_p check from the patch and replace
it
cgraph_function_with_gimple_body_p). Will look into it now.

Reply via email to