(cross-posting because of the request for testing below) This is a first step towards getting rid of some passes and re-ordering passes to be more effective and less compile-time consuming. The following measurements have been done on top of some more statistics instrumentation (which I will post as [2/n] later).
As a perfect example of benefiting from nearly every pass located anywhere I chose tramp3d-v4.ii as the testcase to look at statistics numbers. The first observation is that the cfg_cleanup pass right after the early inlining pass is doing nothing. Investigation shows why, early inlining already does TODO_cleanup_cfg. Thus I have removed that cfg_cleanup pass. The second observation is that the second update_address_taken pass during early optimization triggers a single(!) time in all of tramp3d at -O3. Thus I have removed that pass (but see below). A third observation is that one simple DSE pass during early optimizations is enough (even though simple DSE would need iterating a few times to no longer find dead statements mainly due to dead struct copies). The first DCE pass after alias computation gets rid of all of the simple dead stores. Thus, I have removed the first simple DSE pass. A fourth observation is that for tramp3d I need to iterate DCE in early optimizations a few times before it finally "converges". As we want to be agressive during early optimizations as far as dead code concerns I have replaced the DCE with control-dependent DCE which does not need iterating (that DCEs 20% more statements during early optimization; for tramp3d it also improves compile-time) A long-standing observation of mine is that the propagation state of addresses before we compute aliasing for the first time is non-optimal - we have still lots of memory accesses through pointers that skew aliasing results mainly due to increased partitioning. Thus, with the goal of doing a similar amount of "cleanup" after the final inlining as we do after early inlining, I moved rename_ssa_copies, complete_unrolli and CCP before build_alias. Noting that build_alias does the equivalent thing to update_address_taken (and we have removed one of them during early optimization) and inlining exposes quite some extra registers (due to propagating parameters passed by reference into their dereference sites), I put the update_address_taken pass back right after the final inlining (it catches about the same number of registers than the one after early inlining). These changes improve tramp3d runtime performance by up to 280%(!) (72s to 25s). This is all for the first round, together with some wrong-code fallout from SRA, some testcase adjustments for moved/changed dumps and an adjustment to the missing alias info warnings from the operand scanner (which actually were what the forwprop-[12].c testcases were matching! duh.) In addition to looking at tramp3d statistics numbers I have run our set of C++ "benchmarks", Polyhedron and SPEC CPU 2000 on Itanium with overall slight improvements in compile-time and run-time. (You can look for yourself at http://gcc.opensuse.org/ for the machine "terbium" and the run right at Aug 14th) Bootstrapped and tested on x86_64-unknown-linux-gnu. I am strongly considering to apply this early next week. So if you have doubts or concerns this is the time to raise them (and back them up with data). Thanks, Richard. 2008-08-14 Richard Guenther <[EMAIL PROTECTED]> * passes.c (init_optimization_passes): Remove cleanup_cfg1, sdse1 and addressables2 passes. Replace dce1 with cddce1. Move call_cdce before build_alias. Move copyrename2, cunrolli and ccp2 beafore build_alias. Re-add addressable2 right after final inlining. * tree-sra.c (generate_element_init_1): Deal with NULL constructor element index. (scalarize_init): If we failed to generate some initializers do not generate zeros for not instantiated members. Instead rely on the copy out. * tree-cfg.c (build_gimple_cfg): Do not dump function here. (pass_build_cfg): But instead via TODO_dump_func. * tree-ssa-operands.c (get_addr_dereference_operands): Warn about missing flow-sensitive alias info only if we have aliases computed. * gcc.dg/fold-alloca-1.c: Scan cfg dump instead of cleanup_cfg1. * gcc.dg/fold-compare-3.c: Likewise. * gcc.dg/tree-ssa/20030709-2.c: Scan cddce2 dump. * gcc.dg/tree-ssa/20030808-1.c: Likewise. * gcc.dg/tree-ssa/20040211-1.c: Likewise. * gcc.dg/tree-ssa/20040305-1.c: Likewise. * gcc.dg/tree-ssa/forwprop-1.c: Adjust pattern. * gcc.dg/tree-ssa/forwprop-2.c: Likewise.. * gcc.dg/tree-ssa/ssa-dce-3.c: Scan cddce1 dump. Index: trunk/gcc/passes.c =================================================================== *** trunk.orig/gcc/passes.c 2008-08-14 14:11:40.000000000 +0200 --- trunk/gcc/passes.c 2008-08-14 14:12:47.000000000 +0200 *************** init_optimization_passes (void) *** 550,574 **** struct opt_pass **p = &pass_all_early_optimizations.pass.sub; NEXT_PASS (pass_rebuild_cgraph_edges); NEXT_PASS (pass_early_inline); - NEXT_PASS (pass_cleanup_cfg); NEXT_PASS (pass_rename_ssa_copies); NEXT_PASS (pass_ccp); NEXT_PASS (pass_forwprop); NEXT_PASS (pass_update_address_taken); - NEXT_PASS (pass_simple_dse); NEXT_PASS (pass_sra_early); NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_merge_phi); ! NEXT_PASS (pass_dce); ! /* Ideally the function call conditional ! dead code elimination phase can be delayed ! till later where potentially more opportunities ! can be found. Due to lack of good ways to ! update VDEFs associated with the shrink-wrapped ! calls, it is better to do the transformation ! here where memory SSA is not built yet. */ ! NEXT_PASS (pass_call_cdce); ! NEXT_PASS (pass_update_address_taken); NEXT_PASS (pass_simple_dse); NEXT_PASS (pass_tail_recursion); NEXT_PASS (pass_convert_switch); --- 550,563 ---- struct opt_pass **p = &pass_all_early_optimizations.pass.sub; NEXT_PASS (pass_rebuild_cgraph_edges); NEXT_PASS (pass_early_inline); NEXT_PASS (pass_rename_ssa_copies); NEXT_PASS (pass_ccp); NEXT_PASS (pass_forwprop); NEXT_PASS (pass_update_address_taken); NEXT_PASS (pass_sra_early); NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_merge_phi); ! NEXT_PASS (pass_cd_dce); NEXT_PASS (pass_simple_dse); NEXT_PASS (pass_tail_recursion); NEXT_PASS (pass_convert_switch); *************** init_optimization_passes (void) *** 594,607 **** NEXT_PASS (pass_all_optimizations); { struct opt_pass **p = &pass_all_optimizations.pass.sub; ! /* pass_build_alias is a dummy pass that ensures that we ! execute TODO_rebuild_alias at this point. */ ! NEXT_PASS (pass_build_alias); ! NEXT_PASS (pass_return_slot); NEXT_PASS (pass_rename_ssa_copies); - /* Initial scalar cleanups. */ NEXT_PASS (pass_complete_unrolli); NEXT_PASS (pass_ccp); NEXT_PASS (pass_phiprop); NEXT_PASS (pass_fre); NEXT_PASS (pass_dce); --- 583,608 ---- NEXT_PASS (pass_all_optimizations); { struct opt_pass **p = &pass_all_optimizations.pass.sub; ! /* Initial scalar cleanups before alias computation. ! They ensure memory accesses are not indirect wherever possible. */ ! NEXT_PASS (pass_update_address_taken); NEXT_PASS (pass_rename_ssa_copies); NEXT_PASS (pass_complete_unrolli); NEXT_PASS (pass_ccp); + /* Ideally the function call conditional + dead code elimination phase can be delayed + till later where potentially more opportunities + can be found. Due to lack of good ways to + update VDEFs associated with the shrink-wrapped + calls, it is better to do the transformation + here where memory SSA is not built yet. */ + NEXT_PASS (pass_call_cdce); + /* pass_build_alias is a dummy pass that ensures that we + execute TODO_rebuild_alias at this point. Re-building + alias information also rewrites no longer addressed + locals into SSA form if possible. */ + NEXT_PASS (pass_build_alias); + NEXT_PASS (pass_return_slot); NEXT_PASS (pass_phiprop); NEXT_PASS (pass_fre); NEXT_PASS (pass_dce); Index: trunk/gcc/tree-sra.c =================================================================== *** trunk.orig/gcc/tree-sra.c 2008-08-14 14:11:40.000000000 +0200 --- trunk/gcc/tree-sra.c 2008-08-14 14:12:47.000000000 +0200 *************** generate_element_init_1 (struct sra_elt *** 2779,2784 **** --- 2779,2790 ---- case CONSTRUCTOR: FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (init), idx, purpose, value) { + /* Array constructors are routinely created with NULL indices. */ + if (purpose == NULL_TREE) + { + result = false; + break; + } if (TREE_CODE (purpose) == RANGE_EXPR) { tree lower = TREE_OPERAND (purpose, 0); *************** scalarize_init (struct sra_elt *lhs_elt, *** 3404,3414 **** result = generate_element_init (lhs_elt, rhs, &init_seq); } - /* CONSTRUCTOR is defined such that any member not mentioned is assigned - a zero value. Initialize the rest of the instantiated elements. */ - generate_element_zero (lhs_elt, &seq); - gimple_seq_add_seq (&seq, init_seq); - if (!result) { /* If we failed to convert the entire initializer, then we must --- 3410,3415 ---- *************** scalarize_init (struct sra_elt *lhs_elt, *** 3423,3428 **** --- 3424,3436 ---- gimple_seq_add_seq (&seq0, seq); seq = seq0; } + else + { + /* CONSTRUCTOR is defined such that any member not mentioned is assigned + a zero value. Initialize the rest of the instantiated elements. */ + generate_element_zero (lhs_elt, &seq); + gimple_seq_add_seq (&seq, init_seq); + } if (lhs_elt->use_block_copy || !result) { Index: trunk/gcc/testsuite/gcc.dg/fold-alloca-1.c =================================================================== *** trunk.orig/gcc/testsuite/gcc.dg/fold-alloca-1.c 2008-08-14 14:11:40.000000000 +0200 --- trunk/gcc/testsuite/gcc.dg/fold-alloca-1.c 2008-08-14 14:12:47.000000000 +0200 *************** *** 1,5 **** /* { dg-do compile } */ ! /* { dg-options "-fdump-tree-cleanup_cfg1" } */ void *alloca (__SIZE_TYPE__); void link_error (); --- 1,5 ---- /* { dg-do compile } */ ! /* { dg-options "-fdump-tree-cfg" } */ void *alloca (__SIZE_TYPE__); void link_error (); *************** int main (int argc, char *argv[]) { *** 10,14 **** link_error (); return 0; } ! /* { dg-final { scan-tree-dump-times "link_error" 0 "cleanup_cfg1" } } */ ! /* { dg-final { cleanup-tree-dump "cleanup_cfg1" } } */ --- 10,14 ---- link_error (); return 0; } ! /* { dg-final { scan-tree-dump-times "link_error" 0 "cfg" } } */ ! /* { dg-final { cleanup-tree-dump "cfg" } } */ Index: trunk/gcc/testsuite/gcc.dg/fold-compare-3.c =================================================================== *** trunk.orig/gcc/testsuite/gcc.dg/fold-compare-3.c 2008-08-14 14:11:40.000000000 +0200 --- trunk/gcc/testsuite/gcc.dg/fold-compare-3.c 2008-08-14 14:12:47.000000000 +0200 *************** *** 1,5 **** /* { dg-do compile } */ ! /* { dg-options "-O2 -fdump-tree-cleanup_cfg1" } */ #include <limits.h> --- 1,5 ---- /* { dg-do compile } */ ! /* { dg-options "-O2 -fdump-tree-cfg" } */ #include <limits.h> *************** void bla4ge (int var) *** 151,159 **** this_comparison_is_not_decidable (); } ! /* { dg-final { scan-tree-dump-times "this_comparison_is_false" 0 "cleanup_cfg1" } } */ ! /* { dg-final { scan-tree-dump-times "this_comparison_is_true" 6 "cleanup_cfg1" } } */ ! /* { dg-final { scan-tree-dump-times "this_comparison_is_not_decidable" 12 "cleanup_cfg1" } } */ ! /* { dg-final { scan-tree-dump-times "if " 12 "cleanup_cfg1" } } */ ! /* { dg-final { cleanup-tree-dump "cleanup_cfg1" } } */ --- 151,159 ---- this_comparison_is_not_decidable (); } ! /* { dg-final { scan-tree-dump-times "this_comparison_is_false" 0 "cfg" } } */ ! /* { dg-final { scan-tree-dump-times "this_comparison_is_true" 6 "cfg" } } */ ! /* { dg-final { scan-tree-dump-times "this_comparison_is_not_decidable" 12 "cfg" } } */ ! /* { dg-final { scan-tree-dump-times "if " 12 "cfg" } } */ ! /* { dg-final { cleanup-tree-dump "cfg" } } */ Index: trunk/gcc/tree-cfg.c =================================================================== *** trunk.orig/gcc/tree-cfg.c 2008-08-14 14:11:40.000000000 +0200 --- trunk/gcc/tree-cfg.c 2008-08-14 14:12:47.000000000 +0200 *************** build_gimple_cfg (gimple_seq seq) *** 212,221 **** #ifdef ENABLE_CHECKING verify_stmts (); #endif - - /* Dump a textual representation of the flowgraph. */ - if (dump_file) - gimple_dump_cfg (dump_file, dump_flags); } static unsigned int --- 212,217 ---- *************** struct gimple_opt_pass pass_build_cfg = *** 240,246 **** PROP_cfg, /* properties_provided */ 0, /* properties_destroyed */ 0, /* todo_flags_start */ ! TODO_verify_stmts | TODO_cleanup_cfg /* todo_flags_finish */ } }; --- 236,243 ---- PROP_cfg, /* properties_provided */ 0, /* properties_destroyed */ 0, /* todo_flags_start */ ! TODO_verify_stmts | TODO_cleanup_cfg ! | TODO_dump_func /* todo_flags_finish */ } }; Index: trunk/gcc/tree-ssa-operands.c =================================================================== *** trunk.orig/gcc/tree-ssa-operands.c 2008-08-14 14:11:40.000000000 +0200 --- trunk/gcc/tree-ssa-operands.c 2008-08-14 14:12:47.000000000 +0200 *************** get_addr_dereference_operands (gimple st *** 1504,1510 **** && TREE_CODE (ptr) == SSA_NAME && (pi == NULL || (pi->name_mem_tag == NULL_TREE ! && !pi->pt_anything))) { fprintf (dump_file, "NOTE: no flow-sensitive alias info for "); --- 1504,1511 ---- && TREE_CODE (ptr) == SSA_NAME && (pi == NULL || (pi->name_mem_tag == NULL_TREE ! && !pi->pt_anything)) ! && gimple_aliases_computed_p (cfun)) { fprintf (dump_file, "NOTE: no flow-sensitive alias info for "); Index: trunk/gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c =================================================================== *** trunk.orig/gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c 2006-02-07 11:14:01.000000000 +0100 --- trunk/gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c 2008-08-14 14:17:31.000000000 +0200 *************** *** 1,5 **** /* { dg-do compile } */ ! /* { dg-options "-O2 -fdump-tree-cddce" } */ struct rtx_def; typedef struct rtx_def *rtx; --- 1,5 ---- /* { dg-do compile } */ ! /* { dg-options "-O2 -fdump-tree-cddce2" } */ struct rtx_def; typedef struct rtx_def *rtx; *************** get_alias_set (t) *** 41,54 **** /* There should be precisely one load of ->decl.rtl. If there is more than, then the dominator optimizations failed. */ ! /* { dg-final { scan-tree-dump-times "->decl\\.rtl" 1 "cddce"} } */ /* There should be no loads of .rtmem since the complex return statement is just "return 0". */ ! /* { dg-final { scan-tree-dump-times ".rtmem" 0 "cddce"} } */ /* There should be one IF statement (the complex return statement should collapse down to a simple return 0 without any conditionals). */ ! /* { dg-final { scan-tree-dump-times "if " 1 "cddce"} } */ ! /* { dg-final { cleanup-tree-dump "cddce" } } */ --- 41,54 ---- /* There should be precisely one load of ->decl.rtl. If there is more than, then the dominator optimizations failed. */ ! /* { dg-final { scan-tree-dump-times "->decl\\.rtl" 1 "cddce2"} } */ /* There should be no loads of .rtmem since the complex return statement is just "return 0". */ ! /* { dg-final { scan-tree-dump-times ".rtmem" 0 "cddce2"} } */ /* There should be one IF statement (the complex return statement should collapse down to a simple return 0 without any conditionals). */ ! /* { dg-final { scan-tree-dump-times "if " 1 "cddce2"} } */ ! /* { dg-final { cleanup-tree-dump "cddce2" } } */ Index: trunk/gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c =================================================================== *** trunk.orig/gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c 2006-02-07 11:14:01.000000000 +0100 --- trunk/gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c 2008-08-14 14:17:51.000000000 +0200 *************** *** 1,5 **** /* { dg-do compile } */ ! /* { dg-options "-O1 -fdump-tree-cddce" } */ extern void abort (void); --- 1,5 ---- /* { dg-do compile } */ ! /* { dg-options "-O1 -fdump-tree-cddce2" } */ extern void abort (void); *************** delete_dead_jumptables () *** 33,41 **** /* There should be no loads of ->code. If any exist, then we failed to optimize away all the IF statements and the statements feeding their conditions. */ ! /* { dg-final { scan-tree-dump-times "->code" 0 "cddce"} } */ /* There should be no IF statements. */ ! /* { dg-final { scan-tree-dump-times "if " 0 "cddce"} } */ ! /* { dg-final { cleanup-tree-dump "cddce" } } */ --- 33,41 ---- /* There should be no loads of ->code. If any exist, then we failed to optimize away all the IF statements and the statements feeding their conditions. */ ! /* { dg-final { scan-tree-dump-times "->code" 0 "cddce2"} } */ /* There should be no IF statements. */ ! /* { dg-final { scan-tree-dump-times "if " 0 "cddce2"} } */ ! /* { dg-final { cleanup-tree-dump "cddce2" } } */ Index: trunk/gcc/testsuite/gcc.dg/tree-ssa/20040211-1.c =================================================================== *** trunk.orig/gcc/testsuite/gcc.dg/tree-ssa/20040211-1.c 2006-02-07 11:14:01.000000000 +0100 --- trunk/gcc/testsuite/gcc.dg/tree-ssa/20040211-1.c 2008-08-14 14:18:07.000000000 +0200 *************** *** 1,8 **** /* { dg-do compile } */ ! /* { dg-options "-O2 -fdump-tree-cddce" } */ ! ! ! struct rtx_def; typedef struct rtx_def *rtx; --- 1,5 ---- /* { dg-do compile } */ ! /* { dg-options "-O2 -fdump-tree-cddce2" } */ struct rtx_def; typedef struct rtx_def *rtx; *************** com (rtx insn, int blah) *** 37,41 **** /* Cddce cannot remove possibly infinite loops and there is no way how to determine whether the loop in can_move_up ends. */ ! /* { dg-final { scan-tree-dump "if " "cddce"} } */ ! /* { dg-final { cleanup-tree-dump "cddce" } } */ --- 34,38 ---- /* Cddce cannot remove possibly infinite loops and there is no way how to determine whether the loop in can_move_up ends. */ ! /* { dg-final { scan-tree-dump "if " "cddce2"} } */ ! /* { dg-final { cleanup-tree-dump "cddce2" } } */ Index: trunk/gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c =================================================================== *** trunk.orig/gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c 2006-02-07 11:14:01.000000000 +0100 --- trunk/gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c 2008-08-14 14:23:06.000000000 +0200 *************** *** 1,5 **** /* { dg-do compile } */ ! /* { dg-options "-O2 -fdump-tree-cddce -fdump-tree-forwprop1-details" } */ int abarney[2]; int afred[1]; --- 1,5 ---- /* { dg-do compile } */ ! /* { dg-options "-O2 -fdump-tree-cddce2 -fdump-tree-forwprop1-details" } */ int abarney[2]; int afred[1]; *************** void foo(int edx, int eax) *** 28,32 **** /* After cddce we should have two IF statements remaining as the other two tests can be threaded. */ ! /* { dg-final { scan-tree-dump-times "if " 2 "cddce"} } */ ! /* { dg-final { cleanup-tree-dump "cddce" } } */ --- 28,32 ---- /* After cddce we should have two IF statements remaining as the other two tests can be threaded. */ ! /* { dg-final { scan-tree-dump-times "if " 2 "cddce2"} } */ ! /* { dg-final { cleanup-tree-dump "cddce2" } } */ Index: trunk/gcc/testsuite/gcc.dg/tree-ssa/forwprop-1.c =================================================================== *** trunk.orig/gcc/testsuite/gcc.dg/tree-ssa/forwprop-1.c 2007-11-30 13:59:28.000000000 +0100 --- trunk/gcc/testsuite/gcc.dg/tree-ssa/forwprop-1.c 2008-08-14 14:14:15.000000000 +0200 *************** void f(struct a * b, __SIZE_TYPE__ i) *** 15,19 **** c[i] = 1; } ! /* { dg-final { scan-tree-dump "t\\\[i.*\\\] = 1;" "forwprop1" } } */ /* { dg-final { cleanup-tree-dump "forwprop1" } } */ --- 15,19 ---- c[i] = 1; } ! /* { dg-final { scan-tree-dump-times "t\\\[i.*\\\] =.* 1;" 1 "forwprop1" } } */ /* { dg-final { cleanup-tree-dump "forwprop1" } } */ Index: trunk/gcc/testsuite/gcc.dg/tree-ssa/forwprop-2.c =================================================================== *** trunk.orig/gcc/testsuite/gcc.dg/tree-ssa/forwprop-2.c 2007-11-30 13:59:28.000000000 +0100 --- trunk/gcc/testsuite/gcc.dg/tree-ssa/forwprop-2.c 2008-08-14 14:14:47.000000000 +0200 *************** void f(__SIZE_TYPE__ i) *** 17,21 **** c[i] = 1; } ! /* { dg-final { scan-tree-dump "t\\\[i.*\\\] = 1;" "forwprop1" } } */ /* { dg-final { cleanup-tree-dump "forwprop?" } } */ --- 17,21 ---- c[i] = 1; } ! /* { dg-final { scan-tree-dump-times "t\\\[i.*\\\] =.* 1;" 1 "forwprop1" } } */ /* { dg-final { cleanup-tree-dump "forwprop?" } } */ Index: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-3.c =================================================================== *** trunk.orig/gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-3.c 2006-02-07 11:14:01.000000000 +0100 --- trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-3.c 2008-08-14 14:23:45.000000000 +0200 *************** *** 1,5 **** /* { dg-do compile } */ ! /* { dg-options "-O2 -fdump-tree-cddce" } */ int main(void) { --- 1,5 ---- /* { dg-do compile } */ ! /* { dg-options "-O2 -fdump-tree-cddce1" } */ int main(void) { *************** int main(void) *** 23,31 **** /* We should eliminate the inner condition, but the loop must be preserved as it is infinite. Therefore there should be just one phi node (for i): */ ! /* { dg-final { scan-tree-dump-times "PHI " 1 "cddce"} } */ /* And one if (for the exit condition of the loop): */ ! /* { dg-final { scan-tree-dump-times "if " 1 "cddce"} } */ ! /* { dg-final { cleanup-tree-dump "cddce" } } */ --- 23,31 ---- /* We should eliminate the inner condition, but the loop must be preserved as it is infinite. Therefore there should be just one phi node (for i): */ ! /* { dg-final { scan-tree-dump-times "PHI " 1 "cddce1"} } */ /* And one if (for the exit condition of the loop): */ ! /* { dg-final { scan-tree-dump-times "if " 1 "cddce1"} } */ ! /* { dg-final { cleanup-tree-dump "cddce1" } } */