[Bug tree-optimization/88760] GCC unrolling is suboptimal

rguenther at suse dot de Wed, 09 Jan 2019 05:38:47 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760


--- Comment #6 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 9 Jan 2019, wilco at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760
> 
> --- Comment #5 from Wilco <wilco at gcc dot gnu.org> ---
> (In reply to Wilco from comment #4)
> > (In reply to ktkachov from comment #2)
> > > Created attachment 45386 [details]
> > > aarch64-llvm output with -Ofast -mcpu=cortex-a57
> > > 
> > > I'm attaching the full LLVM aarch64 output.
> > > 
> > > The output you quoted is with -funroll-loops. If that's not given, GCC
> > > doesn't seem to unroll by default at all (on aarch64 or x86_64 from my
> > > testing).
> > > 
> > > Is there anything we can do to make the default unrolling a bit more
> > > aggressive?
> > 
> > I don't think the RTL unroller works at all. It doesn't have the right
> > settings, and doesn't understand how to unroll, so we always get inefficient
> > and bloated code.
> > 
> > To do unrolling correctly it has to be integrated at tree level - for
> > example when vectorization isn't possible/beneficial, unrolling might still
> > be a good idea.
> 
> To add some numbers to the conversation, the gain LLVM gets from default
> unrolling is 4.5% on SPECINT2017 and 1.0% on SPECFP2017.
> 
> This clearly shows there is huge potential from unrolling, *if* we can teach
> GCC to unroll properly like LLVM. That means early unrolling, using good
> default settings and using a trailing loop rather than inefficient peeling.

I don't see why this cannot be done on RTL where we have vastly more
information of whether there are execution resources that can be
used by unrolling.  Note we also want unrolling to interleave
instructions to not rely on pre-reload scheduling which in turn means
having a good eye on register pressure (again sth not very well handled
on GIMPLE)

[Bug tree-optimization/88760] GCC unrolling is suboptimal

Reply via email to