https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69710
--- Comment #8 from amker at gcc dot gnu.org ---
Reproduced on arm with saxpy.c. The dump for slp is as below:
<bb 13>:
_82 = prologue_after_cost_adjust.7_43 * 4;
vectp_dy.13_81 = dy_9(D) + _82;
_87 = prologue_after_cost_adjust.7_43 * 4;
vectp_dx.16_86 = dx_13(D) + _87;
vect_cst__91 = {da_6(D), da_6(D), da_6(D), da_6(D)};
_95 = prologue_after_cost_adjust.7_43 * 4;
vectp_dy.21_94 = dy_9(D) + _95;
<bb 14>:
# vectp_dy.12_83 = PHI <vectp_dy.13_81(13), vectp_dy.12_84(21)>
# vectp_dx.15_88 = PHI <vectp_dx.16_86(13), vectp_dx.15_89(21)>
# vectp_dy.20_96 = PHI <vectp_dy.21_94(13), vectp_dy.20_97(21)>
# ivtmp_99 = PHI <0(13), ivtmp_100(21)>
vect__12.14_85 = MEM[(float *)vectp_dy.12_83];
vect__15.17_90 = MEM[(float *)vectp_dx.15_88];
vect__16.18_92 = vect_cst__91 * vect__15.17_90;
vect__17.19_93 = vect__12.14_85 + vect__16.18_92;
MEM[(float *)vectp_dy.20_96] = vect__17.19_93;
vectp_dy.12_84 = vectp_dy.12_83 + 16;
vectp_dx.15_89 = vectp_dx.15_88 + 16;
vectp_dy.20_97 = vectp_dy.20_96 + 16;
ivtmp_100 = ivtmp_99 + 1;
if (ivtmp_100 < bnd.9_53)
goto <bb 21>;
else
goto <bb 16>;
<bb 21>:
goto <bb 14>;
IVO recognized below uses:
use 0
address
in statement vect__12.14_85 = MEM[(float *)vectp_dy.12_83];
at position MEM[(float *)vectp_dy.12_83]
type vector(4) float *
base vectp_dy.13_81
step 16
base object (void *) vectp_dy.13_81
related candidates
use 1
generic
in statement vectp_dx.15_88 = PHI <vectp_dx.16_86(13), vectp_dx.15_89(21)>
at position
type vector(4) float *
base vectp_dx.16_86
step 16
base object (void *) vectp_dx.16_86
is a biv
related candidates
use 2
address
in statement MEM[(float *)vectp_dy.20_96] = vect__17.19_93;
at position MEM[(float *)vectp_dy.20_96]
type vector(4) float *
base vectp_dy.21_94
step 16
base object (void *) vectp_dy.21_94
related candidates
use 3
compare
in statement if (ivtmp_100 < bnd.9_53)
at position
type unsigned int
base 1
step 1
is a biv
related candidates
There are two problems:
1) we failed recognize that use 0 and 2 are identical to each other. This is
because vectorizer generates redundant setup code in loop pre-header. There
are two possible fixes here. One is to make expand_simple_operations more
aggressive in expanding (used by ivopts) in tree-ssa-loop-niter.c. But I don't
think this is a good idea in all cases, because expanded complicated expression
makes ivo transform and niter analysis harder. The other is to fix vectorizer
to generate clean code. Richard's suggestion is to use gimple_build for that.
2) use 1 is not recognized as an address iv because alignment of that memory
reference.