http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60911
--- Comment #10 from Jan Hubicka <hubicka at ucw dot cz> --- > > We also have, in execute_one_pass, > > /* SIPLE IPA passes do not handle callgraphs with IPA transforms in it. > Apply all trnasforms first. */ > if (pass->type == SIMPLE_IPA_PASS) > { > bool applied = false; > do_per_function (apply_ipa_transforms, (void *)&applied); > if (applied) > symtab_remove_unreachable_nodes (true, dump_file); > /* Restore current_pass. */ > current_pass = pass; > > but that doesn't seem to work ... at least with LTO? Which is > because for main() > > FOR_EACH_DEFINED_FUNCTION (node) > if (node->analyzed && gimple_has_body_p (node->decl) > && (!node->clone_of || node->decl != node->clone_of->decl)) > { > > this doesn't trigger (for f.constprop it does). > gimple_has_body_p (node->decl) is false. yeah, this is something I missed with the lazy loading. There is predicate cgraph_function_with_gimple_body_p with does not care about the lazy stuff (i.e. will return same value for both). The problem is that we do not want to run the verifiers that are also called by this for_each.* wrapper. As written in previous post, for 4.10, I think we should have cgraph_get_body to lazily load body and apply transformations/produce the clone. In short it should do whatever necessary to actually get the real body to the optimizer. I want to keep the default optimization queue to not necessarily load everything into memory. Therefore I think late IPA passes should be responsible to call cgraph_get_body themselves to avoid growth in ltrans peak memory use for passes that touch just few functions (like this SIMD stuff). In long term IPA-PTA, that is an example of pass that touch all bodies, should go to real IPA queue IMO and be reimplemented by one of the more scalable algorithms perhaps with context senstivity. In fact I have student (Lada Laska) who is looking into the topic. For local passes I think it should be pass manager's responsibilty to load the bodies, as it is now. This would hopefully not be difficult to use and it won't boost peak ltrans memory that now, with WPA improvements, seems to be a bottleneck http://hubicka.blogspot.ca/2014/04/linktime-optimization-in-gcc-2-firefox.html at least for firefox and -flto=16 (it should actually reduce the peak) For 4.9 I would suggest simply making the above code to apply transformations even with LTO (i.e. drop the gimple_has_body_p check from the patch and replace it cgraph_function_with_gimple_body_p). Will look into it now.