https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67435
--- Comment #8 from Yann Collet <yann.collet.73 at gmail dot com> --- Thanks for the link. It's a very good read, and indeed, completely in line with my recent experience. Recommended solution seems to be the same : "-falign-loops=32" The article also mentions that the issue is valid for Sandy Bridge cpus. This broadens the scope : it's not just about Broadwell, but also Haswell, Ivy Bridge and sandy Bridge. All new cpus from Intel since 2011. It looks like a large enough installed base to care about. However, for some reason, in the table provided, both Sandy Bridge and Haswell get a default loop alignment value of 16. not 32. Is there a reason for that choice ? > Optimizing for just one specific model will negatively affect performance on > an other. Well, this issue is apparently important for more than one architecture. Moreover, being inlined on 32 imply being inlined on 16 too, so it doesn't introduce drawback for older siblings. Since then, I could find a few other complaints about the same issue. One example here : https://software.intel.com/en-us/forums/topic/479392 and a close cousin here : http://stackoverflow.com/questions/9881002/is-this-a-gcc-bug-when-using-falign-loops-option This last one introduce a good question : while it's possible to use "-falign-loops=32" to set the preference for the whole program, it seems not possible to set it precisely for a single loop. It looks like a good feature request, as this loop-alignment issue can have a pretty large impact on performance (~20%), but only matters for a few selected critical loops. The programmer is typically in good position to know which loop matters the most. Hence, we don't necessarily need *all* loops to be 32-bytes aligned, just a handful ones. Less precise but still great, having the ability to set this optimization parameter for a function or a section code would be great. But my experiment seem to show that using #pragma or __attribute__ with align-loops does not work, as if the optimization setting was simply ignored.