[Bug target/87077] missed optimization for horizontal add for x86 SSE

rguenth at gcc dot gnu.org via Gcc-bugs Mon, 02 Aug 2021 23:56:11 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87077


--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
We are now vectorizing the outer loop with the inner loop being unrolled.

If you add #pragma GCC unroll 0 to the inner loop we get comparatively good
code, but we reduce to scalar 4 times.

If you add #pragma GCC unroll 4 to both loops we apply BB vectorization
which expands the reductions in suboptimal way - it now also detects the
reductions but they are covered by the BB vectorization we recognize
for the store of the reduction results.

Note haddp[sd] is slow.

[Bug target/87077] missed optimization for horizontal add for x86 SSE

Reply via email to