On Wed, 22 Nov 2017, Richard Biener wrote:

> On Wed, 22 Nov 2017, Jakub Jelinek wrote:
> 
> > On Wed, Nov 22, 2017 at 11:41:24AM +0100, Richard Biener wrote:
> > > 
> > > I am testing the following (old) patch to value number call lhs
> > > according to the ERF_RETURNS_ARG annotation.  This allows for
> > > more expression simplification and also changes downstream
> > > users of the argument to use the return value which should help
> > > register allocation and register lifetime in general.
> > 
> > If the argument is an SSA_NAME then that is generally a win, but won't
> > it also replace if the argument is say ADDR_EXPR of a VAR_DECL or similar?
> 
> Looks like it also confuses passes, for gcc.dg/tree-ssa/pr62112-1.c
> 
> char*g;
> void h(){
>   char*p=__builtin_malloc(42);
>   g=__builtin_memset(p,3,10);
>   __builtin_free(p);
> }
> 
> we no longer DSE the memset because we end up with
> 
>   <bb 2> :
>   p_4 = __builtin_malloc (42);
>   _1 = __builtin_memset (p_4, 3, 10);
>   g = _1;
>   __builtin_free (_1);
> 
> and we no longer see that the free call clobbers the memset destination.
> 
> So I guess I'll try to do the reverse (always replace with the argument).
> That will of course make DCE remove the LHS of such calls.
> 
> In theory it's easy to recover somewhere before RTL expansion ...
> 
> > If we only later (say some later forwprop) determine the argument is
> > gimple_min_invariant, shouldn't we have something that will effectively
> > undo this (i.e. replace the uses of the function result with whatever
> > we've simplified the argument to)?
> 
> ... of course not (easily) if we substituted a constant.
> 
> The question is whether for example a pointer in TLS is cheaper
> to remat from the return value or via the TLS reg like in
> 
> __thread int x[4];
> 
> int *foo ()
> {
>   int *q = __builtin_memset (&x, 3, 4*4);
>   return q;
> }
> 
> of course if the user has written in that way we currently do not
> change it.  And we have no easy way (currently) to replace equal
> constants by a register where that is available.  So
> 
> __thread int x[4];
>   
> int *foo ()
> {
>   int *q = __builtin_memset (&x, 3, 4*4);
>   return &x;
> } 
> 
> wouldn't be optimized to return q.  Anyway, it seems the register
> allocator can do the trick at least for memset.

So doing the reverse (always propagating the result) causes

FAIL: gcc.c-torture/execute/builtins/memcpy-chk.c execution,  -O2 
...
FAIL: gcc.c-torture/execute/builtins/memmove-chk.c execution,  -O2 
...
FAIL: gcc.c-torture/execute/builtins/strncat-chk.c execution,  -O2 
...
FAIL: gcc.dg/builtin-object-size-4.c execution test

where I looked into the object-size case.  For

  r = &a.a[4];
  r = memset (r, 'a', 2);
  if (__builtin_object_size (r, 3) != sizeof (a.a) - 4)
    abort ();
  r = memset (r + 2, 'b', 2) + 2;
  if (__builtin_object_size (r, 3) != sizeof (a.a) - 8)
    abort ();

FRE now propagates &a.a[4] + CST adjustments through the memsets
and constant folds the last __builtin_object_size check to false.

The testcase looks quite artificial so I'm not sure this regression
is important.  The way it happens is

Value numbering .MEM_120 stmt = r_121 = memset (&a.a[4], 97, 2);
Setting value number of r_121 to &a.a[4] (changed)
...
Value numbering _37 stmt = _37 = r_121 + 2;
RHS r_121 + 2 simplified to &MEM[(void *)&a + 6B]
Setting value number of _37 to &MEM[(void *)&a + 6B] (changed)
Value numbering .MEM_122 stmt = _38 = memset (_37, 98, 2);
Setting value number of _38 to &MEM[(void *)&a + 6B] (changed)
Value numbering _171 stmt = _171 = __builtin_object_size (r_123, 3);
call __builtin_object_size (r_123, 3) simplified to 20

oops.  So while we have some cfun->after_inlining guard in place
to inhibit SCCVN itself from aggressively pruning COMPONENT_REFs
match-and-simplify still happily optimizes &a.a[4] + 2.

The gcc.c-torture/execute/builtins/memcpy-chk.c case is somewhat odd,
but we end up with two chk_calls because somehow we do figure out
the object size for

  /* buf3 points to an unknown object, so __memcpy_chk should not be done.  
*/
  if (memcpy ((char *) buf3 + 4, buf5, n + 6) != (char *) buf1 + 4
      || memcmp (buf1, "aBcdRSTUVWklmnopq\0", 19))
    abort ();

we do this during EVRP somehow as

        _19: [&MEM[(void *)&buf1 + 4B], &MEM[(void *)&buf1 + 4B]]

so we leaked that buf3 == buf1 because of dominating checks:

  if (memcpy (buf3, buf5, 17) != (char *) buf1
      || memcmp (buf1, "RSTUVWXYZ01234567\0", 19))
    abort ();

Hmm.  Any idea how to best dumb down / fix things here?  All those
tests were not really supposed to be optimized at compile-time but
are supposed to be verified at runtime?

Thus add -fno-tree-vrp / -fno-tree-fre to the never ending options
we disable for them?

For reference the updated patch is appended below, otherwise
bootstrapped and tested ok on x86_64-unknown-linux-gnu.

Thanks,
Richard.

2017-11-22  Richard Biener  <rguent...@suse.de>

        PR tree-optimization/82991
        * tree-ssa-sccvn.c (visit_use): Value number lhs of call to
        argument that we know is returned.
        (eliminate_dom_walker::before_dom_children): Watch out for
        side-effects even if we know the value of the call result.
        Set up availability so the LHS can be eliminated.

        * gcc.dg/tree-ssa/ssa-fre-60.c: New testcase.

Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c        (revision 255051)
+++ gcc/tree-ssa-sccvn.c        (working copy)
@@ -4131,6 +4131,29 @@ visit_use (tree use)
              changed = defs_to_varying (call_stmt);
              goto done;
            }
+
+         /* If the return value is a copy of an argument record that.  */
+         int rflags = gimple_call_return_flags (call_stmt);
+         if (rflags & ERF_RETURNS_ARG)
+           {
+             unsigned argnum = rflags & ERF_RETURN_ARG_MASK;
+             if (argnum < gimple_call_num_args (call_stmt))
+               {
+                 tree arg = gimple_call_arg (call_stmt, argnum);
+                 if (TREE_CODE (arg) == SSA_NAME
+                     || is_gimple_min_invariant (arg))
+                   {
+                     if (TREE_CODE (arg) == SSA_NAME)
+                       changed = visit_copy (lhs, arg);
+                     else
+                       changed = set_ssa_val_to (lhs, arg);
+                     if (gimple_vdef (call_stmt))
+                       changed |= set_ssa_val_to (gimple_vdef (call_stmt),
+                                                  gimple_vdef (call_stmt));
+                     goto done;
+                   }
+               }
+           }
        }
 
       /* Pick up flags from a devirtualization target.  */
@@ -5390,6 +5413,7 @@ eliminate_dom_walker::before_dom_childre
        gsi_next (&gsi))
     {
       tree sprime = NULL_TREE;
+      bool def_removed = false;
       gimple *stmt = gsi_stmt (gsi);
       tree lhs = gimple_get_lhs (stmt);
       if (lhs && TREE_CODE (lhs) == SSA_NAME
@@ -5421,6 +5445,20 @@ eliminate_dom_walker::before_dom_childre
                eliminate_push_avail (sprime);
            }
 
+         /* While we might have value numbered a call return value to
+            one of its arguments the call itself might not be solely
+            represented by its return value.  Thus do not ignore
+            side-effects indicated by a VARYING vdef.  */
+         if (sprime
+             && gimple_vdef (stmt)
+             && VN_INFO (gimple_vdef (stmt))->valnum == gimple_vdef (stmt))
+           {
+             sprime = NULL_TREE;
+             /* We want to be able to remove the def so make sure we
+                do not make it available for replacement.  */
+             def_removed = true;
+           }
+
          /* If this now constitutes a copy duplicate points-to
             and range info appropriately.  This is especially
             important for inserted code.  See tree-ssa-copy.c
@@ -5832,10 +5870,14 @@ eliminate_dom_walker::before_dom_childre
        }
 
       /* Make new values available - for fully redundant LHS we
-         continue with the next stmt above and skip this.  */
-      def_operand_p defp;
-      FOR_EACH_SSA_DEF_OPERAND (defp, stmt, iter, SSA_OP_DEF)
-       eliminate_push_avail (DEF_FROM_PTR (defp));
+         continue with the next stmt above and skip this.  Do not
+        make the LHS available if we want to prune it.  */
+      if (! def_removed)
+       {
+         def_operand_p defp;
+         FOR_EACH_SSA_DEF_OPERAND (defp, stmt, iter, SSA_OP_DEF)
+           eliminate_push_avail (DEF_FROM_PTR (defp));
+       }
     }
 
   /* Replace destination PHI arguments.  */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-60.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-60.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-60.c  (working copy)
@@ -0,0 +1,43 @@
+/* { dg-do link } */
+/* { dg-options "-O -fdump-tree-fre1" } */
+
+void link_error (void);
+
+void f1 (char *p)
+{
+  char *q = __builtin_stpcpy (p, "123");
+  unsigned n = q - p;
+
+  if (n != 3)
+    link_error ();
+}
+
+char *f2 (char *p)
+{
+  char *q = __builtin_strcpy (p, "123");
+  unsigned n = q - p;
+
+  if (n)
+    link_error ();
+
+  return p;
+}
+
+char *f3 (char *p, const char *s)
+{
+  char *q = __builtin_memcpy (p, s, 3);
+  unsigned n = q - p;
+
+  if (n)
+    link_error ();
+
+  return q;
+}
+
+int main()
+{
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "link_error" 0 "fre1" } } */
+/* { dg-final { scan-tree-dump-times "return p" 2 "fre1" } } */

Reply via email to