[Bug tree-optimization/114269] [14 Regression] Multiple 3-27% exec time regressions of 434.zeusmp since r14-9193-ga0b1798042d033

rguenth at gcc dot gnu.org via Gcc-bugs Fri, 08 Mar 2024 04:24:04 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114269


--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
The following is a C testcase for a case where ranges will not help:

void foo (int *a, long js, long je, long is, long ie, long ks, long ke, long
xi, long xj)
{
  for (long j = js; j < je; ++j)
    for (long i = is; i < ie; ++i)
      for (long k = ks; k < ke; ++k)
        a[i + j*xi + k*xi*xj] = 5;
}

SCEV analysis result before/after shows issues.  When you re-order the loops
so the fast increment goes innermost this doesn't make a difference for
vectorization though.  In the order above we now require (emulated) gather
which with SSE didn't work out and previously we used strided stores.

The reason seems to be that when analyzing k*xi*xj the first multiply
yields

(long int) {(unsigned long) ks_21(D) * (unsigned long) xi_24(D), +, (unsigned
long) xi_24(D)}_3

but when then asking to fold the multiply by xj we fail as we run into

tree
chrec_fold_multiply (tree type,
                     tree op0,
                     tree op1)
{         
...
    CASE_CONVERT:
      if (tree_contains_chrecs (op0, NULL))
        return chrec_dont_know;
      /* FALLTHRU */ 

but this case is somewhat odd as all other unhandled cases simply run into
fold_build2.  This possibly means we'd never build other ops with
CHREC operands.  This was added for PR42326.

I think we can handle sign-conversions from unsigned just fine, chrec_fold_plus
does such thing already (but it misses one case).

Doing this restores things to some extent.

I'm testing this as an intermediate step before considering reversion of the
change.

[Bug tree-optimization/114269] [14 Regression] Multiple 3-27% exec time regressions of 434.zeusmp since r14-9193-ga0b1798042d033

Reply via email to