On Mon, Oct 8, 2012 at 2:32 PM, Richard Guenther <richard.guent...@gmail.com> wrote: > On Mon, Oct 8, 2012 at 12:38 PM, Eric Botcazou <ebotca...@adacore.com> wrote: >> Hi, >> >> we recently noticed that, even at -O3, the compiler doesn't figure out that >> the following loop is dumb: >> >> #define SIZE 64 >> >> int foo (int v[]) >> { >> int r; >> >> for (i = 0; i < SIZE; i++) >> r = v[i]; >> >> return r; >> } >> >> which was a bit of a surprise. On second thoughts, this isn't entirely >> unexpected, as it probably matters only for (slightly) pathological cases. >> The attached patch nevertheless implements a form of load sinking in loops so >> as to optimize these cases. It's combined with invariant motion to optimize: >> >> int foo (int v[], int a) >> { >> int r, i; >> >> for (i = 0; i < SIZE; i++) >> r = v[i] + a; >> >> return r; >> } >> >> and with store sinking to optimize: >> >> int foo (int v1[], int v2[]) >> { >> int r[SIZE]; >> int i, j; >> >> for (j = 0; j < SIZE; j++) >> for (i = 0; i < SIZE; i++) >> r[j] = v1[j] + v2[i]; >> >> return r[SIZE - 1]; >> } >> >> The optimization is enabled at -O2 in the patch for measurement purposes but, >> given how rarely it triggers (e.g. exactly 10 occurrences in a GCC bootstrap, >> compiler-only, all languages except Go), it's probably best suited to -O3. >> Or perhaps we don't care and it should simply be dropped... Thoughts? > > Incidentially we have scev-const-prop to deal with the similar case of > scalar computations. But I realize this doesn't work for expressions that > are dependent on a loop variant load. > > @@ -103,6 +103,7 @@ typedef struct mem_ref_loc > { > tree *ref; /* The reference itself. */ > gimple stmt; /* The statement in that it occurs. */ > + tree lhs; /* The (ultimate) LHS for a load. */ > } *mem_ref_loc_p; > > isn't that the lhs of stmt? > > +static gimple_seq > +copy_load_and_single_use_chain (mem_ref_loc_p loc, tree *new_lhs) > +{ > + tree mem = *loc->ref; > + tree lhs, tmp_var, ssa_name; > + gimple_seq seq = NULL; > + gimple stmt; > + unsigned n = 0; > + > + /* First copy the load and create the new LHS for it. */ > + lhs = gimple_assign_lhs (loc->stmt); > + tmp_var = create_tmp_reg (TREE_TYPE (lhs), get_lsm_tmp_name (mem, n++)); > > use make_temp_ssa_name or simply copy_ssa_name (not sure you need > fancy names here). > > + if (gimple_assign_rhs1 (use_stmt) == lhs) > + { > + op1 = ssa_name; > + op2 = gimple_assign_rhs2 (use_stmt); > + } > + else > + { > + op1 = gimple_assign_rhs1 (use_stmt); > + op2 = ssa_name; > + } > > this may enlarge lifetime of the other operand? And it looks like it would > break with unary stmts (accessing out-of-bounds op2). Also for > is_gimple_min_invariant other operand which may be for example &a.b > you need to unshare_expr it. > > + lhs = gimple_assign_lhs (use_stmt); > + tmp_var = create_tmp_reg (TREE_TYPE (lhs), get_lsm_tmp_name (mem, > n++)); > + stmt = gimple_build_assign_with_ops (rhs_code, tmp_var, op1, op2); > + ssa_name = make_ssa_name (tmp_var, stmt); > + gimple_assign_set_lhs (stmt, ssa_name); > > see above. This can now be simplified to > > lhs = gimple_assign_lhs (use_stmt); > ssa_name = copy_ssa_name (lhs, NULL); > stmt = gimple_build_assign_with_ops (rhs_code, ssa_name, op1, op2); > > Btw - isn't this all a bit backward (I mean the analysis in execute_lm?) > What you want is apply this transform to as much of the _DEF_s of > the loop-closed PHI nodes - only values used outside of the loop are > interesting. Thats (sort-of) what SCEV const-prop does (well, it also > uses SCEV to compute the overall effect of the iterations). So what > you want to know is whether when walking the DEF chain of the > loop closed PHI you end up at definitions before the loop or at > definitions that are not otherwise used inside the loop. > > Which means it is really expression sinking. Does tree-ssa-sink manage > to sink anything out of a loop? Even scalar computation parts I mean? For > > for (..) > { > a = x[i]; > y[i] = a; > b = a * 2; > } > ... = b; > > it should be able to sink b = a*2. > > So I think the more natural place to implement this is either SCEV cprop > or tree-ssa-sink.c. And process things from the loop-closed PHI use > walking the DEFs (first process all, marking interesting things to also > catch commonly used exprs for two PHI uses). > > Again you might simply want to open a bugreport for this unless you > want to implement it yourself.
We indeed sink 2*tem but not a[i] here. Because tree-ssa-sink.c doesn't sink loads (IIRC) at all, but I've seen patches to fix that (IIRC). int a[256]; int foo (int x) { int i, k = 0; for (i = 0; i < x; ++i) { int tem = a[i]; k = 2*tem; } return k; } Richard. > Thanks, > Richard. > >> Tested on x86_64-suse-linux. >> >> >> 2012-10-08 Eric Botcazou <ebotca...@adacore.com> >> >> * gimple.h (gsi_insert_seq_on_edge_before): Declare. >> * gimple-iterator.c (gsi_insert_seq_on_edge_before): New function. >> * tree-ssa-loop-im.c (struct mem_ref_loc): Add LHS field. >> (mem_ref_in_stmt): Remove gcc_assert. >> (copy_load_and_single_use_chain): New function. >> (execute_lm): Likewise. >> (hoist_memory_references): Hoist the loads after the stores. >> (ref_always_accessed_p): Rename into... >> (ref_always_stored_p): ...this. Remove STORE_P and add ONCE_P. >> (can_lsm_ref_p): New function extracted from... >> (can_sm_ref_p): ...here. Call it. >> (follow_invariant_single_use_chain): New function. >> (can_lm_ref_p): Likewise. >> (find_refs_for_sm): Rename into.. >> (find_refs_for_lsm): ...this. Find load hoisting opportunities. >> (loop_suitable_for_sm): Rename into... >> (loop_suitable_for_lsm): ...this. >> (store_motion_loop): Rename into... >> (load_store_motion_loop): ...this. Adjust calls to above functions. >> (tree_ssa_lim): Likewise. >> >> >> 2012-10-08 Eric Botcazou <ebotca...@adacore.com> >> >> * gcc.dg/tree-ssa/loadmotion-1.c: New test. >> * gcc.dg/tree-ssa/loadmotion-2.c: New test. >> * gcc.dg/tree-ssa/loadmotion-3.c: New test. >> >> >> -- >> Eric Botcazou