https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110991
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Target| |x86_64-*-* Target Milestone|--- |14.0 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- So the difference is that GCC 14 vectorizes the loop and that vectorized loop is not completely unrolled because Loop 1 likely iterates at most 2 times. Estimating sizes for loop 1 BB: 3, after_exit: 0 size: 1 _34 = vect_vec_iv_.15_33 + { 252, 252, 252, 252 }; size: 0 vect_a.16_35 = VIEW_CONVERT_EXPR<vector(4) signed char>(vect_vec_iv_.15_33); size: 1 vect_iftmp.17_36 = vect_a.16_35 << 3; size: 1 mask__23.18_38 = vect_a.16_35 < { 0, 0, 0, 0 }; size: 1 vect_iftmp.19_40 = VEC_COND_EXPR <mask__23.18_38, { 1, 1, 1, 1 }, vect_iftmp.17_36>; size: 1 ivtmp_44 = ivtmp_43 + 1; Induction variable computation will be folded away. size: 2 if (ivtmp_44 < 3) Exit condition will be eliminated in peeled copies. Exit condition will be eliminated in last copy. Constant conditional. BB: 9, after_exit: 1 size: 7-3, last_iteration: 7-3 Loop size: 7 Estimated size after unrolling: 8 Not unrolling loop 1: size would grow. when we still have a loop there's nothing that can fully elide things. Without vectorization we have Loop 2 likely iterates at most 11 times. Estimating sizes for loop 2 BB: 10, after_exit: 0 size: 0 a.2_13 = (signed char) a.6_22; Induction variable computation will be folded away. size: 2 if (a.2_13 < 0) Constant conditional. BB: 13, after_exit: 1 BB: 12, after_exit: 0 size: 1 _26 = a.6_22 + 255; Induction variable computation will be folded away. size: 1 ivtmp_27 = ivtmp_4 - 1; Induction variable computation will be folded away. size: 2 if (ivtmp_27 != 0) Exit condition will be eliminated in peeled copies. Exit condition will be eliminated in last copy. Constant conditional. BB: 11, after_exit: 0 size: 1 iftmp.0_12 = a.2_13 << 3; Induction variable computation will be folded away. size: 7-7, last_iteration: 7-7 Loop size: 7 Estimated size after unrolling: 1 unrolling relies on constant_after_peeling which relies on SCEV which doesn't handle vector IVs. I have a patch improving it to size: 7-4, last_iteration: 7-4 Loop size: 7 Estimated size after unrolling: 6 IIRC I also had a patch more appropriately "propagating" constness at some point.