On Tue, Jan 21, 2014 at 2:49 PM, Xinliang David Li <davi...@google.com> wrote: > I think it might be better to introduce a new parameter for max peel > insn at O2 (e.g, call it MAX_O2_COMPLETELY_PEEL_INSN or > MAX_DEFAULT_...), and use the same logic in your patch to override the > MAX_COMPLETELY_PEELED_INSN parameter at O2). > > By so doing, we don't need to have a hard coded factor of 2.
Patch attached with that change. Sri > > In the longer run, we really need better cost/benefit analysis, but > that is independent. > > David > > On Tue, Jan 21, 2014 at 1:49 PM, Sriraman Tallam <tmsri...@google.com> wrote: >> Hi, >> >> Currently, tree unrolling pass(cunroll) does not allow any code >> size growth in O2 mode. Code size growth is permitted only if O3 or >> funroll-loops/fpeel-loops is used. I have created a patch to allow >> partial code size increase in O2 mode. With funroll-loops the maximum >> allowed code growth is 400 unrolled insns. I have set it to 200 >> unrolled insns in O2 mode. This patch improves an image processing >> benchmark by 20%. It improves most benchmarks by 1-2%. The code size >> increase is <1% for all the benchmarks except the image processing >> benchmark which increases by 6% (perf improves by 20%). >> >> I am working on getting this patch reviewed for trunk. Here is >> the disussion on this: >> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02643.html I have >> incorporated the comments on making the patch simpler. I will >> follow-up on that patch to trunk by also getting data on limiting >> complete peeling with O2. >> >> Is this ok for the google branch? >> >> Thanks >> Sri
Index: params.def =================================================================== --- params.def (revision 206638) +++ params.def (working copy) @@ -339,6 +339,11 @@ DEFPARAM(PARAM_MAX_COMPLETELY_PEELED_INSNS, "max-completely-peeled-insns", "The maximum number of insns of a completely peeled loop", 400, 0, 0) +/* The default maximum number of insns of a peeled loop, with -O2. */ +DEFPARAM(PARAM_MAX_DEFAULT_COMPLETELY_PEELED_INSNS, + "max-default-completely-peeled-insns", + "The maximum number of insns of a completely peeled loop", + 200, 0, 0) /* The maximum number of peelings of a single loop that is peeled completely. */ DEFPARAM(PARAM_MAX_COMPLETELY_PEEL_TIMES, "max-completely-peel-times", Index: opts.c =================================================================== --- opts.c (revision 206638) +++ opts.c (working copy) @@ -855,6 +855,18 @@ finish_options (struct gcc_options *opts, struct g 0, opts->x_param_values, opts_set->x_param_values); } + /* Set PARAM_MAX_COMPLETELY_PEELED_INSNS to the default original value during + -O2 when -funroll-loops and -fpeel-loops are not set. */ + if (optimize == 2 && !opts->x_flag_unroll_loops && !opts->x_flag_peel_loops + && !opts->x_flag_unroll_all_loops) + + { + maybe_set_param_value + (PARAM_MAX_COMPLETELY_PEELED_INSNS, + PARAM_VALUE (PARAM_MAX_DEFAULT_COMPLETELY_PEELED_INSNS), + opts->x_param_values, opts_set->x_param_values); + } + /* Set PARAM_MAX_STORES_TO_SINK to 0 if either vectorization or if-conversion is disabled. */ if ((!opts->x_flag_tree_loop_vectorize && !opts->x_flag_tree_slp_vectorize) Index: tree-ssa-loop.c =================================================================== --- tree-ssa-loop.c (revision 206638) +++ tree-ssa-loop.c (working copy) @@ -467,7 +467,7 @@ tree_complete_unroll (void) return tree_unroll_loops_completely (flag_unroll_loops || flag_peel_loops - || optimize >= 3, true); + || optimize >= 2, true); } static bool