https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65492

--- Comment #10 from Allan Jensen <linux at carewolf dot com> ---
Just make things more complicated, I just tried the test on a Haswell, and
surprisingly disabling if-convert or tree-vectorize on -O3 has no effect on
performance, but activating tree-vectorize on -O2 does.

In conclusion. This test is slower in -O3 than -O2 on all tested CPUs Phenom,
SandyBridge and Haswell, but for different reasons.

On Phenom, it is slower due to if-convert, but not unroll (unrolled might even
be slightly faster, but only by a small amount).
On SandyBridge, it slower due to both if-convert and unroll, and even slower
when both are active.
On Haswell, it is slower due to both if-convert and unroll, but if-convert on
top of unroll is no slower than unroll on its own.

In general it is probably safe to try to avoid or undo the if-convert. There
appears to be special if-conversions only performed when vectorization is
active. Presumably they are only used in that case because they are known to
likely be slower when the loop is not vectorized. In this case the
if-conversion is done, but the loop not vectorized in the end, just slowing it
down (on non Haswell).

The unroll issue could perhaps be handled by controlling some optimization
params with tuning profiles. Where is trivial unrolling like this even
performed?

Reply via email to