https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65492
--- Comment #10 from Allan Jensen <linux at carewolf dot com> --- Just make things more complicated, I just tried the test on a Haswell, and surprisingly disabling if-convert or tree-vectorize on -O3 has no effect on performance, but activating tree-vectorize on -O2 does. In conclusion. This test is slower in -O3 than -O2 on all tested CPUs Phenom, SandyBridge and Haswell, but for different reasons. On Phenom, it is slower due to if-convert, but not unroll (unrolled might even be slightly faster, but only by a small amount). On SandyBridge, it slower due to both if-convert and unroll, and even slower when both are active. On Haswell, it is slower due to both if-convert and unroll, but if-convert on top of unroll is no slower than unroll on its own. In general it is probably safe to try to avoid or undo the if-convert. There appears to be special if-conversions only performed when vectorization is active. Presumably they are only used in that case because they are known to likely be slower when the loop is not vectorized. In this case the if-conversion is done, but the loop not vectorized in the end, just slowing it down (on non Haswell). The unroll issue could perhaps be handled by controlling some optimization params with tuning profiles. Where is trivial unrolling like this even performed?