[Bug tree-optimization/67213] When compiling for size with -Os loops can get bigger after peeling

fredrik.hederstie...@securitas-direct.com Mon, 21 Mar 2016 08:35:10 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67213


--- Comment #4 from Fredrik Hederstierna 
<fredrik.hederstie...@securitas-direct.com> ---
I've investigated this issue some further, and I believe the problem might be
that we allow too many iterations when doing complete peeling of loops on ARM.

The heuristics in "tree-ssa-loop-ivcanon.c" for estimating unrolled cost/size
in "estimated_unrolled_size()" is quite rough, just assuming it will be reduced
in further passes to 2/3? This is not always true and can lead to larger code
size I think after a complete peeling of loops (as in the example in this
issue).

It seems very difficult to estimate the final size of complete peeling, also
across all architectures. I've experimented with 3/4 if optimizing for size,
but it became worse.

One solution that works for me is to set a lower limit for the number of times
the unpeeling may use:

I did this patch and it worked.
(Same thing is done in "spu.c" for SPU architecture when they want small code
size.)

In function "arm_option_override (void)":

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c868490..2ba8244 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
+
+  /* Small loops might be completely unpeeled even at -Os.
+     Try to keep code small.  */
+  if (optimize_function_for_size_p (cfun)
+      && !flag_unroll_loops && !flag_peel_loops)
+    maybe_set_param_value (PARAM_MAX_COMPLETELY_PEEL_TIMES, 4,
+                          global_options.x_param_values,
+                          global_options_set.x_param_values);


I simply override max-completely-peel-times to be 4 instead of default 16, and
this seems to work well.

I tested it with CSiBE benchmark on arm/thumb1/thumb2 and I got shorter code on
all tests, no negative results on any function.

What do you think, is it a okey solution to solve this issue, even though the
long-term best solution would be to be able to estimate cost/size better of
unrolling, but this seems like a much more difficult problem to solve.

/Fredrik

[Bug tree-optimization/67213] When compiling for size with -Os loops can get bigger after peeling

Reply via email to