On Wed, 22 Nov 2017, Richard Biener wrote:
> On Wed, 22 Nov 2017, Jakub Jelinek wrote:
>
> > On Wed, Nov 22, 2017 at 11:41:24AM +0100, Richard Biener wrote:
> > >
> > > I am testing the following (old) patch to value number call lhs
> > > according to the ERF_RETURNS_ARG annotation. This allows for
> > > more expression simplification and also changes downstream
> > > users of the argument to use the return value which should help
> > > register allocation and register lifetime in general.
> >
> > If the argument is an SSA_NAME then that is generally a win, but won't
> > it also replace if the argument is say ADDR_EXPR of a VAR_DECL or similar?
>
> Looks like it also confuses passes, for gcc.dg/tree-ssa/pr62112-1.c
>
> char*g;
> void h(){
> char*p=__builtin_malloc(42);
> g=__builtin_memset(p,3,10);
> __builtin_free(p);
> }
>
> we no longer DSE the memset because we end up with
>
> <bb 2> :
> p_4 = __builtin_malloc (42);
> _1 = __builtin_memset (p_4, 3, 10);
> g = _1;
> __builtin_free (_1);
>
> and we no longer see that the free call clobbers the memset destination.
>
> So I guess I'll try to do the reverse (always replace with the argument).
> That will of course make DCE remove the LHS of such calls.
>
> In theory it's easy to recover somewhere before RTL expansion ...
>
> > If we only later (say some later forwprop) determine the argument is
> > gimple_min_invariant, shouldn't we have something that will effectively
> > undo this (i.e. replace the uses of the function result with whatever
> > we've simplified the argument to)?
>
> ... of course not (easily) if we substituted a constant.
>
> The question is whether for example a pointer in TLS is cheaper
> to remat from the return value or via the TLS reg like in
>
> __thread int x[4];
>
> int *foo ()
> {
> int *q = __builtin_memset (&x, 3, 4*4);
> return q;
> }
>
> of course if the user has written in that way we currently do not
> change it. And we have no easy way (currently) to replace equal
> constants by a register where that is available. So
>
> __thread int x[4];
>
> int *foo ()
> {
> int *q = __builtin_memset (&x, 3, 4*4);
> return &x;
> }
>
> wouldn't be optimized to return q. Anyway, it seems the register
> allocator can do the trick at least for memset.
So doing the reverse (always propagating the result) causes
FAIL: gcc.c-torture/execute/builtins/memcpy-chk.c execution, -O2
...
FAIL: gcc.c-torture/execute/builtins/memmove-chk.c execution, -O2
...
FAIL: gcc.c-torture/execute/builtins/strncat-chk.c execution, -O2
...
FAIL: gcc.dg/builtin-object-size-4.c execution test
where I looked into the object-size case. For
r = &a.a[4];
r = memset (r, 'a', 2);
if (__builtin_object_size (r, 3) != sizeof (a.a) - 4)
abort ();
r = memset (r + 2, 'b', 2) + 2;
if (__builtin_object_size (r, 3) != sizeof (a.a) - 8)
abort ();
FRE now propagates &a.a[4] + CST adjustments through the memsets
and constant folds the last __builtin_object_size check to false.
The testcase looks quite artificial so I'm not sure this regression
is important. The way it happens is
Value numbering .MEM_120 stmt = r_121 = memset (&a.a[4], 97, 2);
Setting value number of r_121 to &a.a[4] (changed)
...
Value numbering _37 stmt = _37 = r_121 + 2;
RHS r_121 + 2 simplified to &MEM[(void *)&a + 6B]
Setting value number of _37 to &MEM[(void *)&a + 6B] (changed)
Value numbering .MEM_122 stmt = _38 = memset (_37, 98, 2);
Setting value number of _38 to &MEM[(void *)&a + 6B] (changed)
Value numbering _171 stmt = _171 = __builtin_object_size (r_123, 3);
call __builtin_object_size (r_123, 3) simplified to 20
oops. So while we have some cfun->after_inlining guard in place
to inhibit SCCVN itself from aggressively pruning COMPONENT_REFs
match-and-simplify still happily optimizes &a.a[4] + 2.
The gcc.c-torture/execute/builtins/memcpy-chk.c case is somewhat odd,
but we end up with two chk_calls because somehow we do figure out
the object size for
/* buf3 points to an unknown object, so __memcpy_chk should not be done.
*/
if (memcpy ((char *) buf3 + 4, buf5, n + 6) != (char *) buf1 + 4
|| memcmp (buf1, "aBcdRSTUVWklmnopq\0", 19))
abort ();
we do this during EVRP somehow as
_19: [&MEM[(void *)&buf1 + 4B], &MEM[(void *)&buf1 + 4B]]
so we leaked that buf3 == buf1 because of dominating checks:
if (memcpy (buf3, buf5, 17) != (char *) buf1
|| memcmp (buf1, "RSTUVWXYZ01234567\0", 19))
abort ();
Hmm. Any idea how to best dumb down / fix things here? All those
tests were not really supposed to be optimized at compile-time but
are supposed to be verified at runtime?
Thus add -fno-tree-vrp / -fno-tree-fre to the never ending options
we disable for them?
For reference the updated patch is appended below, otherwise
bootstrapped and tested ok on x86_64-unknown-linux-gnu.
Thanks,
Richard.
2017-11-22 Richard Biener <[email protected]>
PR tree-optimization/82991
* tree-ssa-sccvn.c (visit_use): Value number lhs of call to
argument that we know is returned.
(eliminate_dom_walker::before_dom_children): Watch out for
side-effects even if we know the value of the call result.
Set up availability so the LHS can be eliminated.
* gcc.dg/tree-ssa/ssa-fre-60.c: New testcase.
Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c (revision 255051)
+++ gcc/tree-ssa-sccvn.c (working copy)
@@ -4131,6 +4131,29 @@ visit_use (tree use)
changed = defs_to_varying (call_stmt);
goto done;
}
+
+ /* If the return value is a copy of an argument record that. */
+ int rflags = gimple_call_return_flags (call_stmt);
+ if (rflags & ERF_RETURNS_ARG)
+ {
+ unsigned argnum = rflags & ERF_RETURN_ARG_MASK;
+ if (argnum < gimple_call_num_args (call_stmt))
+ {
+ tree arg = gimple_call_arg (call_stmt, argnum);
+ if (TREE_CODE (arg) == SSA_NAME
+ || is_gimple_min_invariant (arg))
+ {
+ if (TREE_CODE (arg) == SSA_NAME)
+ changed = visit_copy (lhs, arg);
+ else
+ changed = set_ssa_val_to (lhs, arg);
+ if (gimple_vdef (call_stmt))
+ changed |= set_ssa_val_to (gimple_vdef (call_stmt),
+ gimple_vdef (call_stmt));
+ goto done;
+ }
+ }
+ }
}
/* Pick up flags from a devirtualization target. */
@@ -5390,6 +5413,7 @@ eliminate_dom_walker::before_dom_childre
gsi_next (&gsi))
{
tree sprime = NULL_TREE;
+ bool def_removed = false;
gimple *stmt = gsi_stmt (gsi);
tree lhs = gimple_get_lhs (stmt);
if (lhs && TREE_CODE (lhs) == SSA_NAME
@@ -5421,6 +5445,20 @@ eliminate_dom_walker::before_dom_childre
eliminate_push_avail (sprime);
}
+ /* While we might have value numbered a call return value to
+ one of its arguments the call itself might not be solely
+ represented by its return value. Thus do not ignore
+ side-effects indicated by a VARYING vdef. */
+ if (sprime
+ && gimple_vdef (stmt)
+ && VN_INFO (gimple_vdef (stmt))->valnum == gimple_vdef (stmt))
+ {
+ sprime = NULL_TREE;
+ /* We want to be able to remove the def so make sure we
+ do not make it available for replacement. */
+ def_removed = true;
+ }
+
/* If this now constitutes a copy duplicate points-to
and range info appropriately. This is especially
important for inserted code. See tree-ssa-copy.c
@@ -5832,10 +5870,14 @@ eliminate_dom_walker::before_dom_childre
}
/* Make new values available - for fully redundant LHS we
- continue with the next stmt above and skip this. */
- def_operand_p defp;
- FOR_EACH_SSA_DEF_OPERAND (defp, stmt, iter, SSA_OP_DEF)
- eliminate_push_avail (DEF_FROM_PTR (defp));
+ continue with the next stmt above and skip this. Do not
+ make the LHS available if we want to prune it. */
+ if (! def_removed)
+ {
+ def_operand_p defp;
+ FOR_EACH_SSA_DEF_OPERAND (defp, stmt, iter, SSA_OP_DEF)
+ eliminate_push_avail (DEF_FROM_PTR (defp));
+ }
}
/* Replace destination PHI arguments. */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-60.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-60.c (nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-60.c (working copy)
@@ -0,0 +1,43 @@
+/* { dg-do link } */
+/* { dg-options "-O -fdump-tree-fre1" } */
+
+void link_error (void);
+
+void f1 (char *p)
+{
+ char *q = __builtin_stpcpy (p, "123");
+ unsigned n = q - p;
+
+ if (n != 3)
+ link_error ();
+}
+
+char *f2 (char *p)
+{
+ char *q = __builtin_strcpy (p, "123");
+ unsigned n = q - p;
+
+ if (n)
+ link_error ();
+
+ return p;
+}
+
+char *f3 (char *p, const char *s)
+{
+ char *q = __builtin_memcpy (p, s, 3);
+ unsigned n = q - p;
+
+ if (n)
+ link_error ();
+
+ return q;
+}
+
+int main()
+{
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "link_error" 0 "fre1" } } */
+/* { dg-final { scan-tree-dump-times "return p" 2 "fre1" } } */