https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99414
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #4)
> s275:
> typedef float real_t;
>
> #define iterations 100000
> #define LEN_1D 32000
> #define LEN_2D 256
> // array definitions
>
> real_t
> a[LEN_2D],d[LEN_2D],aa[LEN_2D][LEN_2D],bb[LEN_2D][LEN_2D],cc[LEN_2D][LEN_2D],
> tt[LEN_2D][LEN_2D];
>
> int main(struct args_t * func_args)
> {
> // control flow
> // if around inner loop, interchanging needed
>
> for (int i = 0; i < LEN_2D; i++)
> aa[0][i]=1;
>
> for (int nl = 0; nl < 10*(iterations/LEN_2D); nl++) {
> for (int i = 0; i < LEN_2D; i++) {
> if (aa[0][i] > (real_t)0.) {
> for (int j = 1; j < LEN_2D; j++) {
> aa[j][i] = aa[j-1][i] + bb[j][i] * cc[j][i];
> }
> }
> }
> dummy();
> }
> return aa[0][0];
> }
This would need to be supported by interchange itself - it's basically
un-unswitching the condition, moving it inside the innermost loop and then
performing interchange. Not sure how difficult it would be to handle this,
but it looks like a candidate for diagnostic aka 'note: interchanging loops
will improve performance' aka "fix your code".