[Bug target/109690] bad SLP vectorization on zen

amonakov at gcc dot gnu.org via Gcc-bugs Sat, 06 May 2023 01:45:29 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109690


Alexander Monakov <amonakov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amonakov at gcc dot gnu.org

--- Comment #8 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Note that the vectorized variant is latency-bound: vector load in loop() waits
for the vector store into the same location done in the previous invocation of
'loop'. This makes the microbenchmark take 10 cycles per iteration (9 cycles as
the vector store forwarding latency, plus 1 cycle for the ALU op).

In contrast, the fully-scalar variant benefits from "memory renaming" in Zen 2
and Zen 4 (absent in Zen 3) where store-forwarding happens earlier in the
pipeline with zero-cycle latency. I think it bottlenecks on taken branches.

[Bug target/109690] bad SLP vectorization on zen

Reply via email to