[Bug tree-optimization/88760] GCC unrolling is suboptimal

ktkachov at gcc dot gnu.org Wed, 16 Jan 2019 10:29:41 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760


--- Comment #12 from ktkachov at gcc dot gnu.org ---
(In reply to ktkachov from comment #11)
> 
> As an experiment I hacked the AArch64 assembly of the function generated
> with -funroll-loops to replace the peeled prologue version with a simple
> non-unrolled loop. That gave a sizeable speedup on two AArch64 platforms:
> >7%.
> 
> So beyond the vectorisation point Richard S. made above, maybe it's worth
> considering replacing the peeled prologue with a simple loop instead?
> Or at least add that as a distinct unrolling strategy and work to come up
> with an analysis that would allow us to choose one over the other?

Upon reflection I think I may have bungled up the assembly hacking (the changes
I made may not be equivalent to the source). I'll redo that experiment soon, so
please disregard that part for now. The iteration count distribution numbers
are still valid though.

[Bug tree-optimization/88760] GCC unrolling is suboptimal

Reply via email to