https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67435

--- Comment #8 from Yann Collet <yann.collet.73 at gmail dot com> ---
Thanks for the link.
It's a very good read, and indeed, completely in line with my recent
experience.
Recommended solution seems to be the same : "-falign-loops=32"


The article also mentions that the issue is valid for Sandy Bridge cpus.
This broadens the scope : it's not just about Broadwell, but also Haswell, Ivy
Bridge and sandy Bridge. All new cpus from Intel since 2011. It looks like a
large enough installed base to care about.

However, for some reason, in the table provided, both Sandy Bridge and Haswell
get a default loop alignment value of 16. not 32.

Is there a reason for that choice ?


> Optimizing for just one specific model will negatively affect performance on 
> an other.

Well, this issue is apparently important for more than one architecture.
Moreover, being inlined on 32 imply being inlined on 16 too, so it doesn't
introduce drawback for older siblings.


Since then, I could find a few other complaints about the same issue. One
example here : https://software.intel.com/en-us/forums/topic/479392

and a close cousin here :
http://stackoverflow.com/questions/9881002/is-this-a-gcc-bug-when-using-falign-loops-option


This last one introduce a good question : while it's possible to use
"-falign-loops=32" to set the preference for the whole program, it seems not
possible to set it precisely for a single loop.

It looks like a good feature request, as this loop-alignment issue can have a
pretty large impact on performance (~20%), but only matters for a few selected
critical loops. The programmer is typically in good position to know which loop
matters the most. Hence, we don't necessarily need *all* loops to be 32-bytes
aligned, just a handful ones.

Less precise but still great, having the ability to set this optimization
parameter for a function or a section code would be great. But my experiment
seem to show that using #pragma or __attribute__ with align-loops does not
work, as if the optimization setting was simply ignored.

Reply via email to