https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101178
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Another case: double a[2], b[2], c[2]; void foo () { double tem0 = a[1] + b[1]; double tem1 = a[0] - b[0]; c[0] = tem0; c[1] = tem1; } here the addsub VEC_PERM merge node has wrong order (+, - instead of -, +) for x86 addsub at the place we currently match for SLP patterns. But if we'd move the load permutations across this node we can not only save one permute but also match x86 addsub. Currently optimize_slp materializes the perms at the addsub VEC_PERM merge node which does the trick (but pattern matching was too early here). It will be a cost thing to decide whether to materialize here or to hope for eliding another permute up the chain. double a[2], b[2], c[2]; void foo () { double tem0 = a[1] - b[1]; double tem1 = a[0] + b[0]; c[0] = tem0; c[1] = tem1; } is currently miscompiled when we first match .ADDSUB since then optimize_slp happily treats it as lane agnostic operation. That's probably a latent wrong-code issue on the branch as well.