https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99415
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|middle-end |tree-optimization Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Blocks| |53947 Last reconfirmed| |2021-03-08 Keywords| |missed-optimization --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- The benchmark is written badly to confuse our loop header copying it seems. Writing for (int j = 0; j < LEN_2D-1; j++) { for (int i = j+1; i < LEN_2D; i++) { a[i] -= aa[j][i] * a[j]; } } fixes the vectorizing. Possibly a mistake users do, so probably worth investigating further. Not sure how to most easily address this - we'd like to peel the last iteration of the outer loop, noting it does nothing. Maybe loop-splitting can figure this out? Alternatively loop header copying should just do its job... Hmm, actually loop-header copying does do its job but then there's jump threading messing this up again (the loop header check is redundant for all but the last iteration of the outer loop). So -fno-tree-dominator-opts fixes this as well. And for some reason ch_vect thinks the loops are all do-while loops. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations