https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114057
--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> --- OK, so I think the change is that we get to "correctly" notice -vec.h:380:9: note: node (external) 0x6a2e9d8 (max_nunits=2, refcnt=1) vector(2) float -vec.h:380:9: note: stmt 0 _164 = MEM[(const real *)_27 + 8B]; -vec.h:380:9: note: stmt 1 _158 = MEM[(const real *)_27]; +vec.h:380:9: note: node (external) 0x5a823a8 (max_nunits=2, refcnt=1) vector(2) float +vec.h:380:9: note: [l] stmt 0 _164 = MEM[(const real *)_27 + 8B]; +vec.h:380:9: note: [l] stmt 1 _158 = MEM[(const real *)_27]; for the loads we do not handle because of gaps and promoted external. That leads to extra costs. But also +vec.h:380:9: note: node 0x5a81770 (max_nunits=2, refcnt=2) vector(2) float vec.h:380:9: note: op template: x_160 = _158 - _159; vec.h:380:9: note: stmt 0 x_160 = _158 - _159; -vec.h:380:9: note: [l] stmt 1 y_163 = _161 - _162; +vec.h:380:9: note: stmt 1 y_163 = _161 - _162; so y_163 isn't considered live for some reason. We find _123 = _117 * y_163; is vectorized as part of a reduction. On the costing side we then see -_161 - _162 1 times scalar_stmt costs 12 in body -MEM[(const real *)_27 + 4B] 1 times scalar_load costs 12 in body -MEM[(const real *)_24 + 4B] 1 times scalar_load costs 12 in body which is the live (and dependent) stmts no longer costed on the scalar side but also +MEM[(const real *)_27 + 8B] 1 times vec_to_scalar costs 4 in epilogue +MEM[(const real *)_24 + 8B] 1 times vec_to_scalar costs 4 in epilogue costed in the vector epilog. This is because we're conservative as we don't really know whether we'll be able to code-generate the live operation. The costing side here is also not in sync as can be seen from the _161 - _162 op removed. I should also note that the setting of PURE_SLP is done a bit too early, before we analyze operations and eventually throw away instances or prune it by promoting ops external. For reductions we also falsely claim all root stmts are vectorized - we do have remain ops. Fixing this restores the LIVE on them and in some way restores vectorization. I'm going to test this as fix for now.