https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99414
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Jan Hubicka from comment #4) > s275: > typedef float real_t; > > #define iterations 100000 > #define LEN_1D 32000 > #define LEN_2D 256 > // array definitions > > real_t > a[LEN_2D],d[LEN_2D],aa[LEN_2D][LEN_2D],bb[LEN_2D][LEN_2D],cc[LEN_2D][LEN_2D], > tt[LEN_2D][LEN_2D]; > > int main(struct args_t * func_args) > { > // control flow > // if around inner loop, interchanging needed > > for (int i = 0; i < LEN_2D; i++) > aa[0][i]=1; > > for (int nl = 0; nl < 10*(iterations/LEN_2D); nl++) { > for (int i = 0; i < LEN_2D; i++) { > if (aa[0][i] > (real_t)0.) { > for (int j = 1; j < LEN_2D; j++) { > aa[j][i] = aa[j-1][i] + bb[j][i] * cc[j][i]; > } > } > } > dummy(); > } > return aa[0][0]; > } This would need to be supported by interchange itself - it's basically un-unswitching the condition, moving it inside the innermost loop and then performing interchange. Not sure how difficult it would be to handle this, but it looks like a candidate for diagnostic aka 'note: interchanging loops will improve performance' aka "fix your code".