Hi, this patch enabled -fpeel-loops by default at -O3 and makes it to use likely upper bound estimates. The patch also adds -fpeel-all-loops flag that is symmetric to -funroll-all-loops. Long time ago we used to interpret -fpeel-loops this way and blindly peel every loop but this behaviour got lost and now we only peel loop we have some evidence for.
Bootstrapped/regtested x86_64-linux, I am retesting after last minute change (adding of the testcase). OK? Honza * common.opt (flag_peel_all_loops): New option. * doc/invoke.texi: (-fpeel-loops): Update documentation. (-fpeel-all-loops): Document. * opts.c (default_options): Add OPT_fpeel_loops to -O3+. * toplev.c (process_options): flag_peel_all_loops implies flag_peel_loops. * tree-ssa-lop-ivcanon.c (try_peel_loop): Update comment; handle -fpeel-all-loops, use likely estimates. * gcc.dg/tree-ssa/peel1.c: New testcase. * gcc.dg/tree-ssa/peel2.c: New testcase. Index: common.opt =================================================================== --- common.opt (revision 236815) +++ common.opt (working copy) @@ -1840,6 +1840,10 @@ fpeel-loops Common Report Var(flag_peel_loops) Optimization Perform loop peeling. +fpeel-all-loops +Common Report Var(flag_peel_all_loops) Optimization +Perform loop peeling of all loops. + fpeephole Common Report Var(flag_no_peephole,0) Optimization Enable machine specific peephole optimizations. Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 236815) +++ doc/invoke.texi (working copy) @@ -8661,10 +8661,17 @@ the loop is entered. This usually makes @item -fpeel-loops @opindex fpeel-loops Peels loops for which there is enough information that they do not -roll much (from profile feedback). It also turns on complete loop peeling -(i.e.@: complete removal of loops with small constant number of iterations). +roll much (from profile feedback or static analysis). It also turns on +complete loop peeling (i.e.@: complete removal of loops with small constant +number of iterations). -Enabled with @option{-fprofile-use}. +Enabled with @option{-O3} and @option{-fprofile-use}. + +@item -fpeel-all-loops +@opindex fpeel-all-loops +Peel all loops, even if their number of iterations is uncertain when +the loop is entered. For loops with large number of iterations this leads +to wasted code size. @item -fmove-loop-invariants @opindex fmove-loop-invariants Index: opts.c =================================================================== --- opts.c (revision 236815) +++ opts.c (working copy) @@ -535,6 +535,7 @@ static const struct default_options defa { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, VECT_COST_MODEL_DYNAMIC }, { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 }, { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 }, + { OPT_LEVELS_3_PLUS, OPT_fpeel_loops, NULL, 1 }, /* -Ofast adds optimizations to -O3. */ { OPT_LEVELS_FAST, OPT_ffast_math, NULL, 1 }, Index: testsuite/gcc.dg/tree-ssa/peel1.c =================================================================== --- testsuite/gcc.dg/tree-ssa/peel1.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/peel1.c (working copy) @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-loop-ivcanon" } */ +struct foo {int b; int a[3];} foo; +void add(struct foo *a,int l) +{ + int i; + for (i=0;i<l;i++) + a->a[i]++; +} +/* { dg-final { scan-tree-dump "Loop likely 1 iterates at most 3 times." 1 "ivcanon"} } */ +/* { dg-final { scan-tree-dump "Peeled loop 1, 4 times." 1 "ivcanon"} } */ Index: testsuite/gcc.dg/tree-ssa/peel2.c =================================================================== --- testsuite/gcc.dg/tree-ssa/peel2.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/peel2.c (working copy) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fpeel-all-loops -fdump-tree-loop-ivcanon" } */ +void add(int *a,int l) +{ + int i; + for (i=0;i<l;i++) + a[i]++; +} +/* { dg-final { scan-tree-dump "Loop likely 1 iterates at most 3 times." 1 "ivcanon"} } */ +/* { dg-final { scan-tree-dump "Peeled loop 1, 4 times." 1 "ivcanon"} } */ Index: toplev.c =================================================================== --- toplev.c (revision 236815) +++ toplev.c (working copy) @@ -1294,6 +1294,9 @@ process_options (void) if (flag_unroll_all_loops) flag_unroll_loops = 1; + if (flag_peel_all_loops) + flag_peel_loops = 1; + /* web and rename-registers help when run after loop unrolling. */ if (flag_web == AUTODETECT_VALUE) flag_web = flag_unroll_loops || flag_peel_loops; Index: tree-ssa-loop-ivcanon.c =================================================================== --- tree-ssa-loop-ivcanon.c (revision 236816) +++ tree-ssa-loop-ivcanon.c (working copy) @@ -951,7 +951,9 @@ try_peel_loop (struct loop *loop, if (!flag_peel_loops || PARAM_VALUE (PARAM_MAX_PEEL_TIMES) <= 0) return false; - /* Peel only innermost loops. */ + /* Peel only innermost loops. + While the code is perfectly capable of peeling non-innermost loops, + the heuristics would probably need some improvements. */ if (loop->inner) { if (dump_file) @@ -969,12 +971,16 @@ try_peel_loop (struct loop *loop, /* Check if there is an estimate on the number of iterations. */ npeel = estimated_loop_iterations_int (loop); if (npeel < 0) + npeel = likely_max_loop_iterations_int (loop); + if (npeel < 0 && flag_peel_all_loops) + npeel = PARAM_VALUE (PARAM_MAX_PEEL_TIMES) - 1; + if (npeel < 0) { if (dump_file) fprintf (dump_file, "Not peeling: number of iterations is not " "estimated\n"); return false; } if (maxiter >= 0 && maxiter <= npeel) { if (dump_file)