On Tue, Oct 4, 2016 at 2:53 PM, Wilco Dijkstra <wilco.dijks...@arm.com> wrote: > GCC currently doesn't canonicalize address expressions. As a result > inefficient code is generated even for trivial index address expressions, > blocking CSE and other optimizations: > > int f(int *p, int i) { return p[i+2] + p[i+1]; } > > sxtw x1, w1 > add x1, x1, 2 > add x2, x0, x1, lsl 2 > ldr w0, [x0, x1, lsl 2] > ldr w1, [x2, -4] > add w0, w1, w0 > ret > > After this patch: > > add x1, x0, x1, sxtw 2 > ldp w0, w2, [x1, 4] > add w0, w2, w0 > ret > > The reason for this is that array index expressions are preferably kept in > the *(p + (i + C0) * C1) form eventhough it is best on most targets to make > use of an offset in memory accesses - ie. *(p + i * C1 + (C0*C1)). > > This patch disables the folding in fold_plusminus_mult_expr that changes > the latter form into the former. Unfortunately it isn't possible to know > it is an address expression, and neither is there a way to decide when > C0*C1 is too complex. > > So is there a better way/place to do this, or do we need an address > canonicalization phase in the tree that ensures we expand addresses in an > efficient manner, taking into account target offsets?
Note there is also the case where the address expression is not exposed in GIMPLE until after RTL expansion which is if the array reference is not based on a pointer but on an actual array. Then we keep the ARRAY_REF form in GIMPLE. The more general issue that want's to be addressed here is expression re-association to maximize CSE opportunities (or minimize addressing cost) across all expressions that are "close enough". We have the re-association pass which does decisions based on a single expression only (not considering CSE opportunities with others). Then we have the IVOPTs pass which should do what you want but it only runs on memory references in loops. Then we have the SLSR pass which was supposed to cover the IVOPTs in scalar code case but it doesn't consider addressing mode costs. The folding patch disables a (maybe premature) canonicalization (it's really supposed to be a canonicalization, not an optimization) which will end up helping a testcase like yours but regress another case where the CSE opportunity arises with the other canonicalization. If you disable this canonicalization, saying A * C0 +- C1 is better than (A +- (C1/C0)) * C0 then you should at least implement the reverse canonicalization. These days the appropriate place to do that is match.pd (ISTR having a patch moving fold_plusminus_mult_expr to match.pd, I don't remember what happened to it). Richard. > > ChangeLog: > 2016-10-04 Wilco Dijkstra <wdijk...@arm.com> > > gcc/ > * fold-const.c (fold_plusminus_mult_expr): Block folding of immediates > into multiply. > -- > > diff --git a/gcc/fold-const.c b/gcc/fold-const.c > index > e71ce5e0f23adbb1d4a73506769f7243900cfd2d..bc9fb1e8ff3e33c94e66a2d1282235b71fac2730 > 100644 > --- a/gcc/fold-const.c > +++ b/gcc/fold-const.c > @@ -6912,7 +6912,9 @@ fold_plusminus_mult_expr (location_t loc, enum > tree_code code, tree type, > (A * C) +- A -> A * (C+-1). > We are most concerned about the case where C is a constant, > but other combinations show up during loop reduction. Since > - it is not difficult, try all four possibilities. */ > + it is not difficult, try all four possibilities. > + However avoid moving integer constants into the multiply: > + (A * C0) +- C1 is better than (A +- (C1/C0)) * C0. */ > > if (TREE_CODE (arg0) == MULT_EXPR) > { > @@ -6920,10 +6922,7 @@ fold_plusminus_mult_expr (location_t loc, enum > tree_code code, tree type, > arg01 = TREE_OPERAND (arg0, 1); > } > else if (TREE_CODE (arg0) == INTEGER_CST) > - { > - arg00 = build_one_cst (type); > - arg01 = arg0; > - } > + return NULL_TREE; > else > { > /* We cannot generate constant 1 for fract. */ > @@ -6938,20 +6937,7 @@ fold_plusminus_mult_expr (location_t loc, enum > tree_code code, tree type, > arg11 = TREE_OPERAND (arg1, 1); > } > else if (TREE_CODE (arg1) == INTEGER_CST) > - { > - arg10 = build_one_cst (type); > - /* As we canonicalize A - 2 to A + -2 get rid of that sign for > - the purpose of this canonicalization. */ > - if (wi::neg_p (arg1, TYPE_SIGN (TREE_TYPE (arg1))) > - && negate_expr_p (arg1) > - && code == PLUS_EXPR) > - { > - arg11 = negate_expr (arg1); > - code = MINUS_EXPR; > - } > - else > - arg11 = arg1; > - } > + return NULL_TREE; > else > { > /* We cannot generate constant 1 for fract. */ >