http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56200
Alexander Monakov <amonakov at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hjl.tools at gmail dot com, | |ubizjak at gmail dot com --- Comment #4 from Alexander Monakov <amonakov at gcc dot gnu.org> 2013-02-05 09:46:13 UTC --- The need for the first alignment is clear: it aligns the loop to a 16-byte boundary, and gcc does set that alignment at -O2. Uros, H.J., any idea why separating the first conditional jump from the rest by additional alignment is crucial for performance in this case? Is there anything that can be improved in GCC here?