https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118852
--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> --- So we're failing to vectorize hi$slot_78 = PHI <_75(24), hi$slot_71(15)> which we lack SLP discovery for. That's because the stmt isn't live but only forced-live by early break and that's because we start SLP discovery from the latch def, but here we have <bb 16> [local count: 1014686024]: # _75 = PHI <_18(24), _62(15)> # hi$slot_78 = PHI <_75(24), hi$slot_71(15)> ... <bb 18> [local count: 958878294]: PROF_edge_counter_35 = __gcov0.set_hashtable_value_ids_1_I_lsm.24_5 + 1; _18 = _75 + 8; ... <bb 24> [local count: 906139988]: goto <bb 16>; [100.00%] so discovery for _18 will get us the _75 SSA cycle, not including the hi$slot_78 PHI. I'm testing a fix.