https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118057
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Target| |riscv --- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- I would expect this to be always slower when vectorized unless the core is seriously bottle-necked on the frontend. The loads/stores need to be decomposed to separate uops, there's no actual vector operation. The vector op introduces an artificial dependence between otherwise independent lanes which could execute OOO in scalar. I think GCC behaves better here.