On Thu, 26 Jun 2014, Richard Biener wrote:
> On Thu, 26 Jun 2014, Richard Biener wrote:
>
> > On Wed, 25 Jun 2014, Jeff Law wrote:
> >
> > > On 06/25/14 08:05, Richard Biener wrote:
> > > >
> > > > This removes restrictions in DOM cprop_operand that inhibit
> > > > some optimizations. The volatile pointer thing is really realy
> > > > old and no longer necessary while the loop-depth consideration
> > > > is only valid for loop-closed PHI nodes (but we're not in
> > > > loop-closed SSA in DOM) - the coalescing is handled in out-of-SSA
> > > > phase by inserting copies appropriately.
> > > >
> > > > Bootstrapped on x86_64-unknown-linux-gnu, ok?
> > > >
> > > > Thanks,
> > > > Richard.
> > > >
> > > > 2014-06-25 Richard Biener <[email protected]>
> > > >
> > > > PR tree-optimization/61607
> > > > * tree-ssa-dom.c (cprop_operand): Remove restriction on
> > > > propagating volatile pointers and on loop depth.
> > > The first hunk is OK.
> > >
> > > I thought we had tests for the do not copy propagate out of a loop nest
> > > in the
> > > suite. Did you check that tests in BZ 19038 still generate good code
> > > after
> > > this change? If we still generate good code for those tests, then this
> > > hunk
> > > is fine too.
> >
> > I have applied the first hunk and will investigate further. Testing
> > didn't show any issue and I know how to retain the check but not
> > cause the missed optimization shown in PR61607.
>
> Let's try to summarize what the restriction is supposed to avoid.
> It tries to avoid introducing uses of SSA names defined inside a
> loop outside of it because if the SSA name is live over the backedge
> we will then have an overlapping life-range which prevents out-of-SSA
> from coalescing it to a single register.
>
> Now, the existing test is not working in that way.
>
> Rather the best way we have to ensure this property (all outside
> uses go through a copy that is placed on exit edges rather than
> possibly on the backedge) is to go into loop-closed SSA form.
> This is also where the PHI nodes that confuse DOM in PR61607
> come from in the first place.
>
> Now as the existing measure is ineffective in some cases out-of-SSA
> has gotten the ability to deal with this (or a subset):
>
> /* If elimination of a PHI requires inserting a copy on a backedge,
> then we will have to split the backedge which has numerous
> undesirable performance effects.
>
> A significant number of such cases can be handled here by inserting
> copies into the loop itself. */
> insert_backedge_copies ();
>
> now, this doesn't seem to deal with outside uses. But eventually
> the coalescing code already assigns proper cost to backedge copies
> so that we choose to place copies on the exit edges rather than
> the backedge ones - seems not so from looking at coalesce_cost_edge.
>
> So I think that we should remove the copy-propagation restrictions
> and instead address this in out-of-SSA.
>
> For now the following patch retains the exact same restriction in
> DOM as it is present in copyprop (but not in FRE - ok my recent fault,
> or in VRP). By avoiding to record the equivalency for PHIs
> (where we know that either all or no uses should be covered by
> the loop depth check) we retain the ability to record the equivalency
> for the two loop exit PHI nodes and thus the threading (if only
> on the false path).
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>
> I'll try to see what happens to the PR19038 testcases (though
> that PR is a mess ...)
I checked the very original one (thin6d.f from sixtrack) and the
generated assembly for -Ofast is the same without any patch
and with _all_ loop_depth_of_name restrictions removed from
both DOM and copyprop (thus making loop_depth_of_name dead).
The cost of out-of-SSA copies for backedges (or in the case
of the PR, loop latch edges causing an edge split) is dealt
with by
/* Inserting copy on critical edge costs more than inserting it
elsewhere. */
if (EDGE_CRITICAL_P (e))
mult = 2;
in coalesce_cost_edge.
So in the end, without a testcase to investigate, I'd propose
to get rid of those restrictions. I'm still going forward
with the patch below for now.
Richard.
> Richard.
>
> 2014-06-26 Richard Biener <[email protected]>
>
> PR tree-optimization/61607
> * tree-ssa-copy.c (copy_prop_visit_phi_node): Adjust comment
> explaining why we restrict copies on loop depth.
> * tree-ssa-dom.c (cprop_operand): Remove restriction on
> on loop depth.
> (record_equivalences_from_phis): Instead add it here.
>
> * gcc.dg/tree-ssa/ssa-dom-thread-5.c: New testcase.
>
> Index: gcc/tree-ssa-copy.c
> ===================================================================
> --- gcc/tree-ssa-copy.c (revision 212012)
> +++ gcc/tree-ssa-copy.c (working copy)
> @@ -401,11 +401,8 @@ copy_prop_visit_phi_node (gimple phi)
> arg_value = valueize_val (arg);
>
> /* Avoid copy propagation from an inner into an outer loop.
> - Otherwise, this may move loop variant variables outside of
> - their loops and prevent coalescing opportunities. If the
> - value was loop invariant, it will be hoisted by LICM and
> - exposed for copy propagation.
> - ??? The value will be always loop invariant.
> + Otherwise, this may introduce uses of loop variant variables
> + outside of their loops and prevent coalescing opportunities.
> In loop-closed SSA form do not copy-propagate through
> PHI nodes in blocks with a loop exit edge predecessor. */
> if (TREE_CODE (arg_value) == SSA_NAME
> Index: gcc/tree-ssa-dom.c
> ===================================================================
> --- gcc/tree-ssa-dom.c (revision 212013)
> +++ gcc/tree-ssa-dom.c (working copy)
> @@ -1234,7 +1234,13 @@ record_equivalences_from_phis (basic_blo
> this, since this is a true assignment and not an equivalence
> inferred from a comparison. All uses of this ssa name are dominated
> by this assignment, so unwinding just costs time and space. */
> - if (i == gimple_phi_num_args (phi) && may_propagate_copy (lhs, rhs))
> + if (i == gimple_phi_num_args (phi)
> + && may_propagate_copy (lhs, rhs)
> + /* Do not propagate copies if the propagated value is at a deeper loop
> + depth than the propagatee. Otherwise, this may introduce uses
> + of loop variant variables outside of their loops and prevent
> + coalescing opportunities. */
> + && !(loop_depth_of_name (rhs) > loop_depth_of_name (lhs)))
> set_ssa_name_value (lhs, rhs);
> }
> }
> @@ -2247,14 +2253,6 @@ cprop_operand (gimple stmt, use_operand_
> if (!may_propagate_copy (op, val))
> return;
>
> - /* Do not propagate copies if the propagated value is at a deeper loop
> - depth than the propagatee. Otherwise, this may move loop variant
> - variables outside of their loops and prevent coalescing
> - opportunities. If the value was loop invariant, it will be hoisted
> - by LICM and exposed for copy propagation. */
> - if (loop_depth_of_name (val) > loop_depth_of_name (op))
> - return;
> -
> /* Do not propagate copies into simple IV increment statements.
> See PR23821 for how this can disturb IV analysis. */
> if (TREE_CODE (val) != INTEGER_CST
> Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-5.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-5.c (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-5.c (working copy)
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Os -fno-tree-fre -fdump-tree-dom1-details" } */
> +
> +void foo(int *);
> +void f2(int dst[3], int R)
> +{
> + int i, inter[2];
> + _Bool inter0p = 0;
> + _Bool inter1p = 0;
> + for (i = 1; i < R; i++)
> + {
> + inter0p = 1;
> + inter1p = 1;
> + }
> + if (inter0p)
> + inter[0] = 1;
> + if (inter1p)
> + inter[1] = 1;
> + foo(inter);
> +}
> +
> +/* { dg-final { scan-tree-dump "Threaded jump" "dom1" } } */
> +/* { dg-final { cleanup-tree-dump "dom1" } } */
>
--
Richard Biener <[email protected]>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer