On Wed, 22 Nov 2017, Richard Biener wrote: > On Wed, 22 Nov 2017, Jakub Jelinek wrote: > > > On Wed, Nov 22, 2017 at 11:41:24AM +0100, Richard Biener wrote: > > > > > > I am testing the following (old) patch to value number call lhs > > > according to the ERF_RETURNS_ARG annotation. This allows for > > > more expression simplification and also changes downstream > > > users of the argument to use the return value which should help > > > register allocation and register lifetime in general. > > > > If the argument is an SSA_NAME then that is generally a win, but won't > > it also replace if the argument is say ADDR_EXPR of a VAR_DECL or similar? > > Looks like it also confuses passes, for gcc.dg/tree-ssa/pr62112-1.c > > char*g; > void h(){ > char*p=__builtin_malloc(42); > g=__builtin_memset(p,3,10); > __builtin_free(p); > } > > we no longer DSE the memset because we end up with > > <bb 2> : > p_4 = __builtin_malloc (42); > _1 = __builtin_memset (p_4, 3, 10); > g = _1; > __builtin_free (_1); > > and we no longer see that the free call clobbers the memset destination. > > So I guess I'll try to do the reverse (always replace with the argument). > That will of course make DCE remove the LHS of such calls. > > In theory it's easy to recover somewhere before RTL expansion ... > > > If we only later (say some later forwprop) determine the argument is > > gimple_min_invariant, shouldn't we have something that will effectively > > undo this (i.e. replace the uses of the function result with whatever > > we've simplified the argument to)? > > ... of course not (easily) if we substituted a constant. > > The question is whether for example a pointer in TLS is cheaper > to remat from the return value or via the TLS reg like in > > __thread int x[4]; > > int *foo () > { > int *q = __builtin_memset (&x, 3, 4*4); > return q; > } > > of course if the user has written in that way we currently do not > change it. And we have no easy way (currently) to replace equal > constants by a register where that is available. So > > __thread int x[4]; > > int *foo () > { > int *q = __builtin_memset (&x, 3, 4*4); > return &x; > } > > wouldn't be optimized to return q. Anyway, it seems the register > allocator can do the trick at least for memset.
So doing the reverse (always propagating the result) causes FAIL: gcc.c-torture/execute/builtins/memcpy-chk.c execution, -O2 ... FAIL: gcc.c-torture/execute/builtins/memmove-chk.c execution, -O2 ... FAIL: gcc.c-torture/execute/builtins/strncat-chk.c execution, -O2 ... FAIL: gcc.dg/builtin-object-size-4.c execution test where I looked into the object-size case. For r = &a.a[4]; r = memset (r, 'a', 2); if (__builtin_object_size (r, 3) != sizeof (a.a) - 4) abort (); r = memset (r + 2, 'b', 2) + 2; if (__builtin_object_size (r, 3) != sizeof (a.a) - 8) abort (); FRE now propagates &a.a[4] + CST adjustments through the memsets and constant folds the last __builtin_object_size check to false. The testcase looks quite artificial so I'm not sure this regression is important. The way it happens is Value numbering .MEM_120 stmt = r_121 = memset (&a.a[4], 97, 2); Setting value number of r_121 to &a.a[4] (changed) ... Value numbering _37 stmt = _37 = r_121 + 2; RHS r_121 + 2 simplified to &MEM[(void *)&a + 6B] Setting value number of _37 to &MEM[(void *)&a + 6B] (changed) Value numbering .MEM_122 stmt = _38 = memset (_37, 98, 2); Setting value number of _38 to &MEM[(void *)&a + 6B] (changed) Value numbering _171 stmt = _171 = __builtin_object_size (r_123, 3); call __builtin_object_size (r_123, 3) simplified to 20 oops. So while we have some cfun->after_inlining guard in place to inhibit SCCVN itself from aggressively pruning COMPONENT_REFs match-and-simplify still happily optimizes &a.a[4] + 2. The gcc.c-torture/execute/builtins/memcpy-chk.c case is somewhat odd, but we end up with two chk_calls because somehow we do figure out the object size for /* buf3 points to an unknown object, so __memcpy_chk should not be done. */ if (memcpy ((char *) buf3 + 4, buf5, n + 6) != (char *) buf1 + 4 || memcmp (buf1, "aBcdRSTUVWklmnopq\0", 19)) abort (); we do this during EVRP somehow as _19: [&MEM[(void *)&buf1 + 4B], &MEM[(void *)&buf1 + 4B]] so we leaked that buf3 == buf1 because of dominating checks: if (memcpy (buf3, buf5, 17) != (char *) buf1 || memcmp (buf1, "RSTUVWXYZ01234567\0", 19)) abort (); Hmm. Any idea how to best dumb down / fix things here? All those tests were not really supposed to be optimized at compile-time but are supposed to be verified at runtime? Thus add -fno-tree-vrp / -fno-tree-fre to the never ending options we disable for them? For reference the updated patch is appended below, otherwise bootstrapped and tested ok on x86_64-unknown-linux-gnu. Thanks, Richard. 2017-11-22 Richard Biener <rguent...@suse.de> PR tree-optimization/82991 * tree-ssa-sccvn.c (visit_use): Value number lhs of call to argument that we know is returned. (eliminate_dom_walker::before_dom_children): Watch out for side-effects even if we know the value of the call result. Set up availability so the LHS can be eliminated. * gcc.dg/tree-ssa/ssa-fre-60.c: New testcase. Index: gcc/tree-ssa-sccvn.c =================================================================== --- gcc/tree-ssa-sccvn.c (revision 255051) +++ gcc/tree-ssa-sccvn.c (working copy) @@ -4131,6 +4131,29 @@ visit_use (tree use) changed = defs_to_varying (call_stmt); goto done; } + + /* If the return value is a copy of an argument record that. */ + int rflags = gimple_call_return_flags (call_stmt); + if (rflags & ERF_RETURNS_ARG) + { + unsigned argnum = rflags & ERF_RETURN_ARG_MASK; + if (argnum < gimple_call_num_args (call_stmt)) + { + tree arg = gimple_call_arg (call_stmt, argnum); + if (TREE_CODE (arg) == SSA_NAME + || is_gimple_min_invariant (arg)) + { + if (TREE_CODE (arg) == SSA_NAME) + changed = visit_copy (lhs, arg); + else + changed = set_ssa_val_to (lhs, arg); + if (gimple_vdef (call_stmt)) + changed |= set_ssa_val_to (gimple_vdef (call_stmt), + gimple_vdef (call_stmt)); + goto done; + } + } + } } /* Pick up flags from a devirtualization target. */ @@ -5390,6 +5413,7 @@ eliminate_dom_walker::before_dom_childre gsi_next (&gsi)) { tree sprime = NULL_TREE; + bool def_removed = false; gimple *stmt = gsi_stmt (gsi); tree lhs = gimple_get_lhs (stmt); if (lhs && TREE_CODE (lhs) == SSA_NAME @@ -5421,6 +5445,20 @@ eliminate_dom_walker::before_dom_childre eliminate_push_avail (sprime); } + /* While we might have value numbered a call return value to + one of its arguments the call itself might not be solely + represented by its return value. Thus do not ignore + side-effects indicated by a VARYING vdef. */ + if (sprime + && gimple_vdef (stmt) + && VN_INFO (gimple_vdef (stmt))->valnum == gimple_vdef (stmt)) + { + sprime = NULL_TREE; + /* We want to be able to remove the def so make sure we + do not make it available for replacement. */ + def_removed = true; + } + /* If this now constitutes a copy duplicate points-to and range info appropriately. This is especially important for inserted code. See tree-ssa-copy.c @@ -5832,10 +5870,14 @@ eliminate_dom_walker::before_dom_childre } /* Make new values available - for fully redundant LHS we - continue with the next stmt above and skip this. */ - def_operand_p defp; - FOR_EACH_SSA_DEF_OPERAND (defp, stmt, iter, SSA_OP_DEF) - eliminate_push_avail (DEF_FROM_PTR (defp)); + continue with the next stmt above and skip this. Do not + make the LHS available if we want to prune it. */ + if (! def_removed) + { + def_operand_p defp; + FOR_EACH_SSA_DEF_OPERAND (defp, stmt, iter, SSA_OP_DEF) + eliminate_push_avail (DEF_FROM_PTR (defp)); + } } /* Replace destination PHI arguments. */ Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-60.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-60.c (nonexistent) +++ gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-60.c (working copy) @@ -0,0 +1,43 @@ +/* { dg-do link } */ +/* { dg-options "-O -fdump-tree-fre1" } */ + +void link_error (void); + +void f1 (char *p) +{ + char *q = __builtin_stpcpy (p, "123"); + unsigned n = q - p; + + if (n != 3) + link_error (); +} + +char *f2 (char *p) +{ + char *q = __builtin_strcpy (p, "123"); + unsigned n = q - p; + + if (n) + link_error (); + + return p; +} + +char *f3 (char *p, const char *s) +{ + char *q = __builtin_memcpy (p, s, 3); + unsigned n = q - p; + + if (n) + link_error (); + + return q; +} + +int main() +{ + return 0; +} + +/* { dg-final { scan-tree-dump-times "link_error" 0 "fre1" } } */ +/* { dg-final { scan-tree-dump-times "return p" 2 "fre1" } } */