https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503

--- Comment #22 from Wilco <wdijkstr at arm dot com> ---
(In reply to Evandro from comment #21)
> (In reply to ramana.radhakrish...@arm.com from comment #20)
> > What's the kind of performance delta you see if you managed to unroll 
> > the loop just a wee bit ? Probably not much looking at the code produced 
> > here.
> 
> Comparing the cycle counts on Juno when running the program from the matrix
> multiplication test above built with -Ofast and unrolling:
> 
> -fno-unroll-loops: 592000
> -funroll-loops --param max-unroll-times=2: 594000
> -funroll-loops --param max-unroll-times=4: 592000
> -funroll-loops: 590000 (implies --param max-unroll-times=8)
> -funroll-loops --param max-unroll-times=16: 581000
> 
> It seems to me that without effective iv-opt in place, loops have to be
> unrolled too aggressively to make any difference in this case, greatly
> sacrificing code size.

Unrolling alone isn't good enough in sum reductions. As I mentioned before, GCC
doesn't enable any of the useful loop optimizations by default. So add
-fvariable-expansion-in-unroller to get a good speedup with unrolling. Again
these are all generic GCC issues.

Reply via email to