https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760

--- Comment #5 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Wilco from comment #4)
> (In reply to ktkachov from comment #2)
> > Created attachment 45386 [details]
> > aarch64-llvm output with -Ofast -mcpu=cortex-a57
> > 
> > I'm attaching the full LLVM aarch64 output.
> > 
> > The output you quoted is with -funroll-loops. If that's not given, GCC
> > doesn't seem to unroll by default at all (on aarch64 or x86_64 from my
> > testing).
> > 
> > Is there anything we can do to make the default unrolling a bit more
> > aggressive?
> 
> I don't think the RTL unroller works at all. It doesn't have the right
> settings, and doesn't understand how to unroll, so we always get inefficient
> and bloated code.
> 
> To do unrolling correctly it has to be integrated at tree level - for
> example when vectorization isn't possible/beneficial, unrolling might still
> be a good idea.

To add some numbers to the conversation, the gain LLVM gets from default
unrolling is 4.5% on SPECINT2017 and 1.0% on SPECFP2017.

This clearly shows there is huge potential from unrolling, *if* we can teach
GCC to unroll properly like LLVM. That means early unrolling, using good
default settings and using a trailing loop rather than inefficient peeling.

Reply via email to