https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68861
Jeffrey A. Law <law at redhat dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenth at gcc dot gnu.org --- Comment #2 from Jeffrey A. Law <law at redhat dot com> --- This is starting to look like a latent but in the SLP vectorizer. Sadly, I cherry picked the latest SLP changes from Richi, but they don't help. In .cunroll (for my strangely reduced testcase) we have the following key statements: So the first hint that makes this easier to understand is x_33 will always be zero. It's hidden from the compiler, but knowing makes it easier to follow. _13 = x_33 * 25; _10 = _13 + 5; _65 = _13 + 6; _149 = _13 + 7; _164 = 2; _162 = _10 + _164; *f_26[_169] = _162; // *f_26[_169] = 7 _132 = _1 + 12; *f_26[_132] = _162; // *f_26[_132] = 7 _135 = _1 + 13; _165 = _65 + _164; *f_26[_135] = _165; // *f_26[_135] = 8 _141 = _1 + 14; *f_26[_141] = _165; // *f_26[_141] = 8 _139 = _1 + 15; _146 = _164 + _149; *f_26[_139] = _146; // *f_26[_139] = 9 _179 = _1 + 16; *f_26[_179] = _146; // *f[_26][_179] = 9 Then another batch (skipping some of the array index calculations) r.49_188 = 2; _136 = r.49_188 * 2; _137 = _10 + _136; *f_26[_133] = _137; // *f_26[_133] = 9 _145 = _129 + 12; *f_26[_145] = _137; // *f_26[_145] = 9 _163 = _129 + 13; _167 = _65 + _136; *f_26[_163] = _167; // *f_26[_163] = 10 _175 = _129 + 14; *f_26[_175] = _167; // *f_26[_175] = 10 _193 = _129 + 15; _197 = _136 + _149; *f_26[_193] = _197; // *f_26[_193] = 11 _205 = _129 + 16; *f_26[_205] = _197; // *f_26[_205] = 11 Which is correct. The SLP code is hairy, but the key bits from slp1 are (remember that x_33 is always zero) vect_cst__201 = { 5, 5 }; vect_cst__200 = { 6, 6 }; vect_cst__199 = { 7, 7 }; vect_cst__174 = { 25, 25 }; vect_cst__171 = { 25, 25 }; vect_cst__5 = {x_33, x_33}; vect_cst__176 = {x_33, x_33}; vect_cst__170 = {x_33, x_33}; vect__13.88_192 = vect_cst__170 * vect_cst__174; // { 0, 0 } vect__13.88_194 = vect_cst__176 * vect_cst__171; // { 0, 0 } vect__13.88_196 = vect_cst__5 * vect_cst__52; // { 0, 0 } vect__10.89_204 = vect__13.88_192 + vect_cst__201; // { 5, 5 } _164 = 2; vect__10.89_206 = vect__13.88_194 + vect_cst__200; // { 6, 6 } vect__10.89_207 = vect__13.88_196 + vect_cst__199; // { 7, 7 } _13 = x_33 * 25; _149 = _13 + 7; vect_cst__208 = {_149, _149}; // { 7, 7 } vect_cst__209 = {_164, _164}; // { 2, 2 } vect_cst__11 = {_164, _164}; // { 2, 2 } vect__162.90_186 = vect__10.89_204 + vect_cst__11; // { 7, 7 } vect__162.90_185 = vect__10.89_206 + vect_cst__209; // { 8, 8 } vect__162.90_156 = vect__10.89_207 + vect_cst__208; // {14, 14 } WTF! vectp.92_155 = &*f_26[_169]; MEM[(integer(kind=4) *)vectp.92_155] = vect__162.90_186; vectp.92_125 = vectp.92_155 + 8; MEM[(integer(kind=4) *)vectp.92_125] = vect__162.90_185; vectp.92_90 = vectp.92_125 + 8; MEM[(integer(kind=4) *)vectp.92_90] = vect__162.90_156; Note how the last assignment stores vect__162.90_156, which is the wrong value. It should have been { 9, 9 }. Te botch gets repeated in the next block of stores where we end up storing { 14, 14} instead of { 11, 11 } in the last store. The bogus values obviously cause grief later. Anyway, it's late.