https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81082
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Jakub Jelinek from comment #2) > So is there anything we can do here? > Isn't the bigger problem that we no longer notice the multiplications can't > overflow? Yes, that's the issue with all arithmetic we drop to unsigned because of association. It's always a trade-off ... > With the int multiplications gone, we need to assume [1, INT_MAX] > ranges on b1, b2 and b3 in the loop (otherwise it wouldn't loop at all), and > [0, INT_MAX - 1] ranges for i1, i2 and i3. But, from the i1 * b2 * b3 + i2 > * b3 + (i3 - 1) we could have determined that b1 * b2 * b3 is [0, INT_MAX]. > > We do vectorize: > int > f (int *x, int b1, int b2, int b3) > { > int foo = 0; > for (int i1 = 0; i1 < b1; ++i1) > for (int i2 = 0; i2 < b2; ++i2) > for (int i3 = 0; i3 < b3; ++i3) > foo += x[(i1 * b2 + i2) * b3 + (i3 - 1)]; > return foo; > } > without gather loads, while the #c0 only with them (if available). > > Regarding the PR66313 optimization, first of all, what happened to the move > of it to match.pd? Given we have the reassoc pass doing this "properly" (well, but not for signed integer types...) I have not pursused fully moving it. We might want to move some of the more "obvious" cases (but IIRC some cases involving constants are already handled). So, the ultimate plan is to make reassoc stronger, rewriting assoicated signed arithmetic to unsigned where required. > As we vectorize: > int > f (int *x, int b1, int b2, int b3) > { > int foo = 0; > for (int i1 = 0; i1 < b1; ++i1) > for (int i2 = 0; i2 < b2; ++i2) > for (int i3 = 0; i3 < b3; ++i3) > { > int t1 = i1 * b2 * b3; > int t2 = i2 * b3; > int t3 = i3 - 1; > foo += x[t1 + t2 + t3]; > } > return foo; > } > > it seems that hasn't been done. > > Second thing, in fold_plusminus_mult_expr I don't understand the: > /* We are neither factoring zero nor minus one. */ > || TREE_CODE (same) == INTEGER_CST) > part, shouldn't that be || (TREE_CODE (same) == INTEGER_CST && > !integer_zerop (same) && !integer_minus_onep (same)) ? The comment says that same == 0 or same == -1 don't happen by construction. > Another thing is, for signed a, b, c we can't optimize a * b + a * c as > a * (b + c) if a can be -1 or 0, exhaustive proof for signed char: > int > main () > { > int a, b, c; > for (a = -128; a < 128; a++) > for (b = -128; b < 128; b++) > for (c = -128; c < 128; c++) > { > int ab = a * b; > int ac = a * c; > int ab_ac = ab + ac; > if (ab < -128 || ab >= 128 || ac < -128 || ac >= 128 || ab_ac < -128 > || > ab_ac >= 128) > continue; > /* Ok, in this case a * b + a * c is without UB. */ > int b_c = b + c; > #ifdef NEW > b_c = (signed char) b_c; > #endif > int r = a * b_c; > if (b_c < -128 || b_c >= 128 || r != ab_ac) > __builtin_printf ("%d * %d + %d * %d = %d wrong opt (%d)\n", a, b, > a, > c, ab_ac, r); > } > return 0; > } > > But, if we know for sure that a is not -1, we could optimize it as > a * (signed)((unsigned) b + c) > (which doesn't work for a == -1, but works for others) instead of > (signed)(a * ((unsigned) b + c)) > (which works for all, but is a too big hammer). > > Perhaps if this optimization is moved over to match.pd and we'd defer it if > a isn't constant until at least vrp1 and tune based on range info? > The range for b3 at least during vrp pass itself is [1, INT_MAX], so neither > -1 nor 0 are possible and we could safely reassociate it as a * (b + c) > rather than the variants with unsigned. > > Or do we want to somehow mark the MULT_EXPR we've created, propagate it > through the IL and if we in vrp notice this way marked MULT_EXPR (or IFN) we > fold it as integer multiplication? That would be ugly. I don't really see a good solutuon to this PR either :/ Delaying the folding to GIMPLE would be ok with me but as said I'd like to see reassoc enhanced rather than stuffing more and more "complex" patterns to match.pd. Ultimatively we'd want to compute SCEV for all relevant IVs early and cache them. That should be easy when only constants are involved but will be tricky (impossible) when any symbols are. In the case of this testcase it's all constants of course. I once did experiment with "lowering" GIMPLE to -fwrapv (as RTL is) before late reassoc and that "worked" (well, it of course regressed stuff, and it was merely to prove reassoc of signed arithmetic is useful).