On Fri, Oct 28, 2011 at 1:05 AM, Maxim Kuvyrkov <ma...@codesourcery.com> wrote: > Richard, > > Just as Matt posted his findings about the effect of iterating early > optimizations, I've got the new patch ready. This patch is essentially a > complete rewrite and addresses the comments you made. > > On 18/10/2011, at 9:56 PM, Richard Guenther wrote: > >>>> >>>> If we'd want to iterate early optimizations we'd want to do it by iterating >>>> an IPA pass so that we benefit from more precise size estimates >>>> when trying to inline a function the second time. >>> >>> Could you elaborate on this a bit? Early optimizations are gimple passes, >>> so I'm missing your point here. >> >> pass_early_local_passes is an IPA pass, you want to iterate >> fn1, fn2, fn1, fn2, ..., not fn1, fn1 ..., fn2, fn2 ... precisely for better >> inlining. Thus you need to split pass_early_local_passes into pieces >> so you can iterate one of the IPA pieces. > > Early_local_passes are now split into _main, _iter and _late parts. To avoid > changing the default case, _late part is merged into _main when no iterative > optimizations are requested. > >> >>>> Also statically >>>> scheduling the passes will mess up dump files and you have no >>>> chance of say, noticing that nothing changed for function f and its >>>> callees in iteration N and thus you can skip processing them in >>>> iteration N + 1. >>> >>> Yes, these are the shortcomings. The dump files name changes can be fixed, >>> e.g., by adding a suffix to the passes on iterations after the first one. >>> The analysis to avoid unnecessary iterations is more complex problem. > > To avoid changing the dump file names the patch appends "_iter" suffix to the > dumps of iterative passes. > >> >> Sure. I analyzed early passes by manually duplicating them and >> test that they do nothing for tramp3d, which they pretty much all did >> at some point. >> >>>> >>>> So, at least you should split the pass_early_local_passes IPA pass >>>> into three, you'd iterate over the 2nd (definitely not over >>>> pass_split_functions >>>> though), the third would be pass_profile and pass_split_functions only. >>>> And you'd iterate from the place the 2nd IPA pass is executed, not >>>> by scheduling them N times. >>> >>> OK, I will look into this. > > Done. > >>> >>>> >>>> Then you'd have to analyze the compile-time impact of the IPA >>>> splitting on its own when not iterating. > > I decided to avoid this and keep the pass pipeline effectively the same when > not running iterative optimizations. This is achieved by scheduling > pass_early_optimizations_late in different places in the pipeline depending > on whether iterative optimizations are enabled or not. > > The patch bootstraps and passes regtest on i686-pc-linux-gnu {-m32/-m64} with > 3 iterations enabled by default. The only failures are 5 scan-dump tests > that are due to more functions being inlined than expected. With iterative > optimizations disabled there is no change. > > I've kicked off SPEC2000/SPEC2006 benchmark runs to see the performance > effect of the patch, and those will be posted in the same Google Docs > spreadsheet in several days. > > OK for trunk?
diff --git a/gcc/cgraph.c b/gcc/cgraph.c index f056d3d..4738b28 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -2416,7 +2416,7 @@ cgraph_add_new_function (tree fndecl, bool lowered) tree_lowering_passes (fndecl); bitmap_obstack_initialize (NULL); if (!gimple_in_ssa_p (DECL_STRUCT_FUNCTION (fndecl))) - execute_pass_list (pass_early_local_passes.pass.sub); + execute_early_local_passes_for_current_function (); bitmap_obstack_release (NULL); pop_cfun (); current_function_decl = NULL; @@ -2441,7 +2441,7 @@ cgraph_add_new_function (tree fndecl, bool lowered) gimple_register_cfg_hooks (); bitmap_obstack_initialize (NULL); if (!gimple_in_ssa_p (DECL_STRUCT_FUNCTION (fndecl))) - execute_pass_list (pass_early_local_passes.pass.sub); + execute_early_local_passes_for_current_function (); I think these should only execute the lowering pieces of early local passes, let me see if that's properly split ... @@ -255,7 +255,7 @@ cgraph_process_new_functions (void) /* When not optimizing, be sure we run early local passes anyway to expand OMP. */ || !optimize) - execute_pass_list (pass_early_local_passes.pass.sub); + execute_early_local_passes_for_current_function (); similar. About all this -suffix stuff, I'd like to have the iterations simply re-use the existing dump-files, thus statically sub-divide pass_early_local_passes like NEXT_PASS (pass_early_local_lowering_passes); { NEXT_PASS (pass_fixup_cfg); NEXT_PASS (pass_init_datastructures); NEXT_PASS (pass_expand_omp); NEXT_PASS (pass_referenced_vars); NEXT_PASS (pass_build_ssa); NEXT_PASS (pass_lower_vector); NEXT_PASS (pass_early_warn_uninitialized); } /* The following you maybe iterate. */ NEXT_PASS (pass_early_local_optimizations); { NEXT_PASS (pass_rebuild_cgraph_edges); NEXT_PASS (pass_inline_parameters); NEXT_PASS (pass_early_inline); NEXT_PASS (pass_all_early_optimizations); { struct opt_pass **p = &pass_all_early_optimizations.pass.sub; NEXT_PASS (pass_remove_cgraph_callee_edges); NEXT_PASS (pass_rename_ssa_copies); NEXT_PASS (pass_ccp); NEXT_PASS (pass_forwprop); /* pass_build_ealias is a dummy pass that ensures that we execute TODO_rebuild_alias at this point. Re-building alias information also rewrites no longer addressed locals into SSA form if possible. */ NEXT_PASS (pass_build_ealias); NEXT_PASS (pass_sra_early); NEXT_PASS (pass_fre); NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_merge_phi); NEXT_PASS (pass_cd_dce); NEXT_PASS (pass_early_ipa_sra); NEXT_PASS (pass_tail_recursion); NEXT_PASS (pass_convert_switch); NEXT_PASS (pass_cleanup_eh); NEXT_PASS (pass_profile); NEXT_PASS (pass_local_pure_const); } } NEXT_PASS (pass_late_early_local_optimizations) { /* Split functions creates parts that are not run through early optimizations again. It is thus good idea to do this late. */ NEXT_PASS (pass_split_functions); NEXT_PASS (pass_release_ssa_names); NEXT_PASS (pass_rebuild_cgraph_edges); NEXT_PASS (pass_inline_parameters); } +++ b/gcc/toplev.c @@ -1228,7 +1228,6 @@ general_init (const char *argv0) /* This must be done after global_init_params but before argument processing. */ init_ggc_heuristics(); - init_optimization_passes (); statistics_early_init (); finish_params (); } @@ -1989,6 +1988,8 @@ toplev_main (int argc, char **argv) save_decoded_options, save_decoded_options_count, UNKNOWN_LOCATION, global_dc); + init_optimization_passes (); + handle_common_deferred_options (); init_local_tick (); any reason for this? + /* Don't recurse or wonder on to the next pass when running + execute_ipa_pass_list below. */ + current->execute = NULL; + current->next = NULL; that looks awkward ;) Shouldn't instead + execute_ipa_pass_list (current); not do what it does? Thus, split that into two pieces, one that we can iterate w/o the fiddling with current->? I like this variant a lot better than the last one - still it lacks any analysis-based justification for iteration (see my reply to Matt on what I discussed with Honza). Thus, I don't think we want to merge this in its current form or in this stage1. I'd also like to know _which_ early passes are worth iterating (and if iterating is only worth if the iterated early inlining inlined something). I guess that goes back to the point that iterating should be driven by the early inliner and/or the fact whether in the previous iteration new direct calls were discovered. Richard. > -- > Maxim Kuvyrkov > CodeSourcery / Mentor Graphics > >