https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96535
--- Comment #3 from Daniel Han-Chen <danielhanchen at gmail dot com> --- Oh lolll I was just about to add a comment about further experimentation Seems like Jakub and Hongtao have found the root cause of the issues? Anyways what I was gonna write [probs not necessary anymore so no need to read] """ Anyways from more experimentation, it seems like O1, O2, O3 are not ignored, but the unrolling only gets turned on via O3. So if one passes O1, O2 in __attribute__, but the command line is O3, the function still unrolls. For eg, when commandline is O3, in GCC 9, __attribute__((optimize("O1 / 2")) causes code to use VMULPS and VADDPS with an unroll factor of 1. However in GCC 10.x, when the commandline is O3, VMULPS and VADDPS is used (optimize("O1/2")), however, unrolling is still done??? Passing "no-unroll-loops" in attribute also does not work. It seems like the commandline O3 overrides unrolling or something? The resulting assembly does use VMULPS/VADDPS and not VFMADDPS for O1/O2, but O3 causes an unrolling factor of 6 or so [it should be 1] https://gcc.godbolt.org/z/qb3d5M for new example. """