https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112331
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|middle-end: Fail |Fail vectorization after |vectorization |loop interchange CC| |rguenth at gcc dot gnu.org --- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- Well, the "issue" is that we are performing loop interchange on this benchmark loop and the vectorizer doesn't like the zero-step in the then innermost loop. It's not a practical example, nobody would do such outer loop in practice. There's a missed optimization in that we fail to elide the then inner loop. The solution is to insert a use of 'a' after the inner loop, like TSVC benchmarks usually have: real_t s111(struct args_t * func_args) { // linear dependence testing // no dependence - vectorizable initialise_arrays(__func__); for (int nl = 0; nl < 2*iterations; nl++) { for (int i = 1; i < LEN_1D; i += 2) { a[i] = a[i - 1] + b[i]; } dummy(a, b, c, d, e, aa, bb, cc, 0.); } return calc_checksum(__func__); } the it just works(TM). WONTFIX (in the vectorizer). In "theory" the interchanged loop could be vectorized by outer loop vectorization. But as said, IMHO a waste of time to cheat badly written benchmarks.