https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111970
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
So I can see we don't recognize a gather IFN during pattern recog here.
t.c:15:1: note: Final SLP tree for instance 0x502e9a0:
t.c:15:1: note: node 0x4f84700 (max_nunits=128, refcnt=2) vector(32) float
t.c:15:1: note: op template: *_10 = _11;
t.c:15:1: note: stmt 0 *_10 = _11;
t.c:15:1: note: stmt 1 *_20 = _21;
t.c:15:1: note: children 0x4f84790
t.c:15:1: note: node 0x4f84790 (max_nunits=128, refcnt=2) vector(32) float
t.c:15:1: note: op template: _11 = _8 + 1.0e+0;
t.c:15:1: note: stmt 0 _11 = _8 + 1.0e+0;
t.c:15:1: note: stmt 1 _21 = _18 + 2.0e+0;
t.c:15:1: note: children 0x4f84820 0x4f84940
t.c:15:1: note: node 0x4f84820 (max_nunits=128, refcnt=2) vector(32) float
t.c:15:1: note: op template: _8 = *_7;
t.c:15:1: note: stmt 0 _8 = *_7;
t.c:15:1: note: stmt 1 _18 = *_17;
t.c:15:1: note: children 0x4f848b0
t.c:15:1: note: node 0x4f848b0 (max_nunits=128, refcnt=2) vector(128)
unsigned char
t.c:15:1: note: op template: _4 = *_3;
t.c:15:1: note: stmt 0 _4 = *_3;
t.c:15:1: note: stmt 1 _14 = *_13;
t.c:15:1: note: load permutation { 0 1 }
t.c:15:1: note: node (constant) 0x4f84940 (max_nunits=1, refcnt=1)
t.c:15:1: note: { 1.0e+0, 2.0e+0 }
t.c:15:1: note: === vect_match_slp_patterns ===
t.c:15:1: note: Analyzing SLP tree 0x4f84700 for patterns
t.c:15:1: note: === vect_make_slp_decision ===
t.c:15:1: note: Decided to SLP 1 instances. Unrolling factor 64
it tries a few other modes, one even having .MASK_LEN_GATHER_LOAD but that
fails to build SLP. In the end we choose
t.c:15:1: note: ***** Choosing vector mode RVVM4QI
t.c:15:1: note: ***** Choosing epilogue vector mode RVVMF4QI
the main loop instance is
t.c:15:1: note: Vectorizing SLP tree:
t.c:15:1: note: node 0x4f849d0 (max_nunits=64, refcnt=1) vector(32) float
t.c:15:1: note: op template: *_10 = _11;
t.c:15:1: note: stmt 0 *_10 = _11;
t.c:15:1: note: stmt 1 *_20 = _21;
t.c:15:1: note: children 0x4f84a60
t.c:15:1: note: node 0x4f84a60 (max_nunits=64, refcnt=1) vector(32) float
t.c:15:1: note: op template: _11 = _8 + 1.0e+0;
t.c:15:1: note: stmt 0 _11 = _8 + 1.0e+0;
t.c:15:1: note: stmt 1 _21 = _18 + 2.0e+0;
t.c:15:1: note: children 0x4f84af0 0x4f84c10
t.c:15:1: note: node 0x4f84af0 (max_nunits=64, refcnt=1) vector(32) float
t.c:15:1: note: op template: _8 = *_7;
t.c:15:1: note: stmt 0 _8 = *_7;
t.c:15:1: note: stmt 1 _18 = *_17;
t.c:15:1: note: children 0x4f84b80
t.c:15:1: note: node 0x4f84b80 (max_nunits=64, refcnt=1) vector(64) unsigned
char
t.c:15:1: note: op template: _4 = *_3;
t.c:15:1: note: stmt 0 _4 = *_3;
t.c:15:1: note: stmt 1 _14 = *_13;
t.c:15:1: note: node (constant) 0x4f84c10 (max_nunits=1, refcnt=1) vector(32)
float
t.c:15:1: note: { 1.0e+0, 2.0e+0 }
so the main loop uses emulated gather but the epilog uses non-SLP but
gathers here.
# vectp_index.6_209 = PHI <vectp_index.6_210(5), index_25(D)(2)>
# vectp_y.12_601 = PHI <vectp_y.12_602(5), y_27(D)(2)>
vect__4.8_211 = MEM <vector(64) unsigned char> [(uint8_t
*)vectp_index.6_209];
...
MEM <vector(32) float> [(float *)vectp_y.12_601] = vect__11.11_599;
vectp_y.12_604 = vectp_y.12_601 + 128;
MEM <vector(32) float> [(float *)vectp_y.12_604] = vect__11.11_599;
...
vectp_index.6_210 = vectp_index.6_209 + 64;
vectp_y.12_602 = vectp_y.12_604 + 128;
ivtmp_607 = ivtmp_606 + 1;
if (ivtmp_607 < 3)
that IV updates look OK to me.
So not sure what to do? Does the testcase execute correctly with
--param vect-epilogues-nomask=0 ?