https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84935
--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Seems it actually is vectorized, probably just using DImode vectors for
2xSImode,
and dom doesn't handle vector stores followed by scalar loads. Before
store-merging the dump is:
MEM[(int *)&a] = { 0, 1 };
MEM[(int *)&a + 8B] = { 4, 9 };
MEM[(int *)&a + 16B] = { 16, 25 };
MEM[(int *)&a + 24B] = { 36, 49 };
MEM[(int *)&a + 32B] = { 64, 81 };
_6 = a[0];
_28 = a[1];
res_29 = _6 + _28;
_35 = a[2];
res_36 = res_29 + _35;
_42 = a[3];
res_43 = res_36 + _42;
_49 = a[4];
res_50 = res_43 + _49;
_56 = a[5];
res_57 = res_50 + _56;
_63 = a[6];
res_64 = res_57 + _63;
_70 = a[7];
res_71 = res_64 + _70;
_77 = a[8];
res_78 = res_71 + _77;
_2 = a[9];
res_11 = _2 + res_78;
a ={v} {CLOBBER};
return res_11;
and nothing really changes till *.optimized in it.