https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69710
--- Comment #8 from amker at gcc dot gnu.org --- Reproduced on arm with saxpy.c. The dump for slp is as below: <bb 13>: _82 = prologue_after_cost_adjust.7_43 * 4; vectp_dy.13_81 = dy_9(D) + _82; _87 = prologue_after_cost_adjust.7_43 * 4; vectp_dx.16_86 = dx_13(D) + _87; vect_cst__91 = {da_6(D), da_6(D), da_6(D), da_6(D)}; _95 = prologue_after_cost_adjust.7_43 * 4; vectp_dy.21_94 = dy_9(D) + _95; <bb 14>: # vectp_dy.12_83 = PHI <vectp_dy.13_81(13), vectp_dy.12_84(21)> # vectp_dx.15_88 = PHI <vectp_dx.16_86(13), vectp_dx.15_89(21)> # vectp_dy.20_96 = PHI <vectp_dy.21_94(13), vectp_dy.20_97(21)> # ivtmp_99 = PHI <0(13), ivtmp_100(21)> vect__12.14_85 = MEM[(float *)vectp_dy.12_83]; vect__15.17_90 = MEM[(float *)vectp_dx.15_88]; vect__16.18_92 = vect_cst__91 * vect__15.17_90; vect__17.19_93 = vect__12.14_85 + vect__16.18_92; MEM[(float *)vectp_dy.20_96] = vect__17.19_93; vectp_dy.12_84 = vectp_dy.12_83 + 16; vectp_dx.15_89 = vectp_dx.15_88 + 16; vectp_dy.20_97 = vectp_dy.20_96 + 16; ivtmp_100 = ivtmp_99 + 1; if (ivtmp_100 < bnd.9_53) goto <bb 21>; else goto <bb 16>; <bb 21>: goto <bb 14>; IVO recognized below uses: use 0 address in statement vect__12.14_85 = MEM[(float *)vectp_dy.12_83]; at position MEM[(float *)vectp_dy.12_83] type vector(4) float * base vectp_dy.13_81 step 16 base object (void *) vectp_dy.13_81 related candidates use 1 generic in statement vectp_dx.15_88 = PHI <vectp_dx.16_86(13), vectp_dx.15_89(21)> at position type vector(4) float * base vectp_dx.16_86 step 16 base object (void *) vectp_dx.16_86 is a biv related candidates use 2 address in statement MEM[(float *)vectp_dy.20_96] = vect__17.19_93; at position MEM[(float *)vectp_dy.20_96] type vector(4) float * base vectp_dy.21_94 step 16 base object (void *) vectp_dy.21_94 related candidates use 3 compare in statement if (ivtmp_100 < bnd.9_53) at position type unsigned int base 1 step 1 is a biv related candidates There are two problems: 1) we failed recognize that use 0 and 2 are identical to each other. This is because vectorizer generates redundant setup code in loop pre-header. There are two possible fixes here. One is to make expand_simple_operations more aggressive in expanding (used by ivopts) in tree-ssa-loop-niter.c. But I don't think this is a good idea in all cases, because expanded complicated expression makes ivo transform and niter analysis harder. The other is to fix vectorizer to generate clean code. Richard's suggestion is to use gimple_build for that. 2) use 1 is not recognized as an address iv because alignment of that memory reference.