https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102756
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rsandifo at gcc dot gnu.org --- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> --- Btw, the same happens on x86-64. With -O2 and vectorization we end up with <bb 3> [local count: 858993457]: # ivtmp.14_11 = PHI <ivtmp.14_12(5), ivtmp.14_13(2)> _14 = (void *) ivtmp.14_11; _1 = MEM <int> [(vector(4) int *)_14]; if (_1 != -3) goto <bb 4>; [0.00%] else goto <bb 5>; [100.00%] <bb 4> [count: 0]: __builtin_abort (); <bb 5> [local count: 858993457]: ivtmp.14_12 = ivtmp.14_11 + 4; if (ivtmp.14_12 != _16) goto <bb 3>; [80.00%] else goto <bb 6>; [20.00%] <bb 6> [local count: 214748368]: r ={v} {CLOBBER}; while everything is optimized away with -O2 -fno-tree-vectorize. Let's keep this open as a regression since -O2 now enables vectorization. In principle we could preserve the previous behavior for the very-cheap vectorizer cost model or adjust the heuristic for that case to only cover loops with a single BB. The real issue here is of course the unroller not considering the true size after simplification.