On Mon, Oct 8, 2012 at 2:32 PM, Richard Guenther
<richard.guent...@gmail.com> wrote:
> On Mon, Oct 8, 2012 at 12:38 PM, Eric Botcazou <ebotca...@adacore.com> wrote:
>> Hi,
>>
>> we recently noticed that, even at -O3, the compiler doesn't figure out that
>> the following loop is dumb:
>>
>> #define SIZE 64
>>
>> int foo (int v[])
>> {
>>   int r;
>>
>>   for (i = 0; i < SIZE; i++)
>>     r = v[i];
>>
>>   return r;
>> }
>>
>> which was a bit of a surprise.  On second thoughts, this isn't entirely
>> unexpected, as it probably matters only for (slightly) pathological cases.
>> The attached patch nevertheless implements a form of load sinking in loops so
>> as to optimize these cases.  It's combined with invariant motion to optimize:
>>
>> int foo (int v[], int a)
>> {
>>   int r, i;
>>
>>   for (i = 0; i < SIZE; i++)
>>     r = v[i] + a;
>>
>>   return r;
>> }
>>
>> and with store sinking to optimize:
>>
>> int foo (int v1[], int v2[])
>> {
>>   int r[SIZE];
>>   int i, j;
>>
>>   for (j = 0; j < SIZE; j++)
>>     for (i = 0; i < SIZE; i++)
>>       r[j] = v1[j] + v2[i];
>>
>>   return r[SIZE - 1];
>> }
>>
>> The optimization is enabled at -O2 in the patch for measurement purposes but,
>> given how rarely it triggers (e.g. exactly 10 occurrences in a GCC bootstrap,
>> compiler-only, all languages except Go), it's probably best suited to -O3.
>> Or perhaps we don't care and it should simply be dropped...  Thoughts?
>
> Incidentially we have scev-const-prop to deal with the similar case of
> scalar computations.  But I realize this doesn't work for expressions that
> are dependent on a loop variant load.
>
> @@ -103,6 +103,7 @@ typedef struct mem_ref_loc
>  {
>    tree *ref;                   /* The reference itself.  */
>    gimple stmt;                 /* The statement in that it occurs.  */
> +  tree lhs;                    /* The (ultimate) LHS for a load.  */
>  } *mem_ref_loc_p;
>
> isn't that the lhs of stmt?
>
> +static gimple_seq
> +copy_load_and_single_use_chain (mem_ref_loc_p loc, tree *new_lhs)
> +{
> +  tree mem = *loc->ref;
> +  tree lhs, tmp_var, ssa_name;
> +  gimple_seq seq = NULL;
> +  gimple stmt;
> +  unsigned n = 0;
> +
> +  /* First copy the load and create the new LHS for it.  */
> +  lhs = gimple_assign_lhs (loc->stmt);
> +  tmp_var = create_tmp_reg (TREE_TYPE (lhs), get_lsm_tmp_name (mem, n++));
>
> use make_temp_ssa_name or simply copy_ssa_name (not sure you need
> fancy names here).
>
> +      if (gimple_assign_rhs1 (use_stmt) == lhs)
> +       {
> +         op1 = ssa_name;
> +         op2 = gimple_assign_rhs2 (use_stmt);
> +       }
> +      else
> +       {
> +         op1 = gimple_assign_rhs1 (use_stmt);
> +         op2 = ssa_name;
> +       }
>
> this may enlarge lifetime of the other operand?  And it looks like it would
> break with unary stmts (accessing out-of-bounds op2).  Also for
> is_gimple_min_invariant other operand which may be for example &a.b
> you need to unshare_expr it.
>
> +      lhs = gimple_assign_lhs (use_stmt);
> +      tmp_var = create_tmp_reg (TREE_TYPE (lhs), get_lsm_tmp_name (mem, 
> n++));
> +      stmt = gimple_build_assign_with_ops (rhs_code, tmp_var, op1, op2);
> +      ssa_name = make_ssa_name (tmp_var, stmt);
> +      gimple_assign_set_lhs (stmt, ssa_name);
>
> see above.  This can now be simplified to
>
>    lhs = gimple_assign_lhs (use_stmt);
>    ssa_name = copy_ssa_name (lhs, NULL);
>    stmt = gimple_build_assign_with_ops (rhs_code, ssa_name, op1, op2);
>
> Btw - isn't this all a bit backward (I mean the analysis in execute_lm?)
> What you want is apply this transform to as much of the _DEF_s of
> the loop-closed PHI nodes - only values used outside of the loop are
> interesting.  Thats (sort-of) what SCEV const-prop does (well, it also
> uses SCEV to compute the overall effect of the iterations).  So what
> you want to know is whether when walking the DEF chain of the
> loop closed PHI you end up at definitions before the loop or at
> definitions that are not otherwise used inside the loop.
>
> Which means it is really expression sinking.  Does tree-ssa-sink manage
> to sink anything out of a loop?  Even scalar computation parts I mean?  For
>
>  for (..)
>    {
>      a = x[i];
>      y[i] = a;
>      b = a * 2;
>    }
>   ... = b;
>
> it should be able to sink b = a*2.
>
> So I think the more natural place to implement this is either SCEV cprop
> or tree-ssa-sink.c.  And process things from the loop-closed PHI use
> walking the DEFs (first process all, marking interesting things to also
> catch commonly used exprs for two PHI uses).
>
> Again you might simply want to open a bugreport for this unless you
> want to implement it yourself.

We indeed sink 2*tem but not a[i] here.  Because tree-ssa-sink.c doesn't
sink loads (IIRC) at all, but I've seen patches to fix that (IIRC).

int a[256];
int foo (int x)
{
  int i, k = 0;
  for (i = 0; i < x; ++i)
    {
      int tem = a[i];
      k = 2*tem;
    }
  return k;
}

Richard.

> Thanks,
> Richard.
>
>> Tested on x86_64-suse-linux.
>>
>>
>> 2012-10-08  Eric Botcazou  <ebotca...@adacore.com>
>>
>>         * gimple.h (gsi_insert_seq_on_edge_before): Declare.
>>         * gimple-iterator.c (gsi_insert_seq_on_edge_before): New function.
>>         * tree-ssa-loop-im.c (struct mem_ref_loc): Add LHS field.
>>         (mem_ref_in_stmt): Remove gcc_assert.
>>         (copy_load_and_single_use_chain): New function.
>>         (execute_lm): Likewise.
>>         (hoist_memory_references): Hoist the loads after the stores.
>>         (ref_always_accessed_p): Rename into...
>>         (ref_always_stored_p): ...this.  Remove STORE_P and add ONCE_P.
>>         (can_lsm_ref_p): New function extracted from...
>>         (can_sm_ref_p): ...here.  Call it.
>>         (follow_invariant_single_use_chain): New function.
>>         (can_lm_ref_p): Likewise.
>>         (find_refs_for_sm): Rename into..
>>         (find_refs_for_lsm): ...this.  Find load hoisting opportunities.
>>         (loop_suitable_for_sm): Rename into...
>>         (loop_suitable_for_lsm): ...this.
>>         (store_motion_loop): Rename into...
>>         (load_store_motion_loop): ...this.  Adjust calls to above functions.
>>         (tree_ssa_lim): Likewise.
>>
>>
>> 2012-10-08  Eric Botcazou  <ebotca...@adacore.com>
>>
>>         * gcc.dg/tree-ssa/loadmotion-1.c: New test.
>>         * gcc.dg/tree-ssa/loadmotion-2.c: New test.
>>         * gcc.dg/tree-ssa/loadmotion-3.c: New test.
>>
>>
>> --
>> Eric Botcazou

Reply via email to