https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118356
--- Comment #2 from Javier Mora <cousteaulecommandant at gmail dot com> --- It has been brought to my attention that there are -falign-labels, -falign-loops, and -falign-jumps options, and that perhaps -labels should be left as is, and only -loops and -jumps should be touched. I (possibly incorrectly) interpreted from the documentation that labels were classified as either "can only be reached by jumping" (-jumps) or "can not only be reached by jumping" (-loops), and that -labels was the union of the two sets; but perhaps the correct interpretation is that there are actually three categories (always jumped to, often jumped to, rarely/never jumped to), and that -loops refers strictly to the second and not the third, whereas -labels refers to all three, with the compiler following some magical algorithm to determine whether a target is "often" or "rarely" jumped to. If that's the case, then it might make no sense to force the alignment of the latter, and doing so might be detrimental to performance depending on how this is done (if it's done by inserting NOPs then it would, but if it's done by expanding compressed instructions to force alignment then that wouldn't have an impact on the clock cycle count). So, depending on that, it might make more sense to only modify the default value of -falign-loops=0 and -falign-jumps=0 so that they default to 4, but not that of -falign-labels=0 which should be kept as 1. (Or 2; I don't know how it's internally implemented.) FWIW, here's some code for experimentation. I haven't tested this specific example but if I'm not mistaken it should take about 26*limit clock cycles to run in a CVE4 platform with `-O2 -falign-loops=4`, but 28*limit clock cycles with `-O2 -falign-loops=1` (two of the four loops take 1 clock cycle longer due to misalignment), resulting in a 7% performance loss. ```c void test_align(int limit) { for (int i=0; i<limit; i++) asm volatile ("nop"); for (int i=0; i<limit; i++) asm volatile ("nop\n\tnop"); for (int i=0; i<limit; i++) asm volatile ("nop"); for (int i=0; i<limit; i++) asm volatile ("nop\n\tnop"); } ```