6 regression] loop performance regression

wschmidt at gcc dot gnu.org Wed, 12 Aug 2015 06:25:21 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


--- Comment #59 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
(In reply to rguent...@suse.de from comment #57)
> 
> It's been a long time since I've done SPEC measuring with/without
> -funroll-loops (or/and -fpeel-loops).  Note that these flags have
> secondary effects as well:
> 
> toplev.c:    flag_web = flag_unroll_loops || flag_peel_loops;
> toplev.c:    flag_rename_registers = flag_unroll_loops || flag_peel_loops;

We don't have a lot of data yet, but we have seen several examples in SPEC and
other benchmarks where turning on -funroll-loops is helpful, but should be much
more helpful -- in many cases performance improves with a much higher unroll
factor.  However, the effectiveness of unrolling is very much tied up with
these issues in IVOPTS, where we currently end up with too many separate base
registers for IVs.  As we increase the unroll factor, we eventually hit this as
a limiting factor, so fixing this IVOPTS issue would be very helpful for POWER.

As a side note, with -fprofile-use a GIMPLE unroller could peel and unroll hot
loop traces in loops that would otherwise be too complex to unroll.  I.e., if
there is a single hot trace through a loop, you can do tail duplication on the
trace to force it into superblock form, and then peel and unroll that
superblock while falling into the original loop if the trace is left.  Complete
unrolling and unrolling by a factor are both possible.  I don't know of
specific benchmarks that would be helped by this, though.

(An RTL unroller could do this as well, but it seems much more natural and
implementable in GIMPLE.)

[Bug target/29256] [4.9/5/6 regression] loop performance regression

Reply via email to