On Wed, 6 Nov 2013, Jakub Jelinek wrote:

> On Tue, Nov 05, 2013 at 02:00:16PM -0800, Cong Hou wrote:
> > > I'm also curious -- did this code show up in a particular benchmark, if 
> > > so,
> > > which one?
> > 
> > I didn't find this problem from any benchmark, but from another
> > concern about loop upper bound estimation. Look at the following code:
> > 
> > int foo(unsigned int n, int r)
> > {
> >   int i;
> >   if (n > 0)
> >     if (n < 4)
> >     {
> >       do {
> >          --n;
> >          r *= 2;
> >       } while (n > 0);
> >     }
> >   return r+n;
> > }
> > 
> > 
> > In order to get the upper bound of the loop in this function, GCC
> > traverses conditions n<4 and n>0 separately and tries to get any
> > useful information. But as those two conditions cannot be combined
> > into one due to this issue (note that n>0 will be transformed into
> > n!=0), when GCC sees n<4, it will consider the possibility that n may
> > be equal to 0, in which case the upper bound is UINT_MAX. If those two
> > conditions can be combined into one, which is n-1<=2, then we can get
> > the correct upper bound of the loop.
> 
> I've looked at the above testcase to see why we aren't able to determine
> the number of iterations upper bound properly here.
> 
> That doesn't mean your patch is useless, though I must say I'm far from
> being convinced it is safe ATM and also it looks like quite ugly special
> case (trying to undo a VRP optimization but only one single specific case).
> 
> The first problem is VRP, we end up with:
>   <bb 5>:
>   # n_1 = PHI <n_5(D)(4), n_7(6)>
>   # r_3 = PHI <r_6(D)(4), r_8(6)>
>   # RANGE [0, 4294967295]
>   n_7 = n_1 + 4294967295;
>   # RANGE [-2147483648, 2147483647] NONZERO 0x000000000fffffffe
>   r_8 = r_3 * 2;
>   if (n_7 != 0)
>     goto <bb 6>;
>   else
>     goto <bb 7>;
>   
>   <bb 6>:
>   goto <bb 5>;
> - missing range on n_1, extremely conservative range on n_7.  With the
> attached patch we get instead:
>   <bb 5>:
>   # RANGE [1, 3] NONZERO 0x00000000000000003
>   # n_1 = PHI <n_5(D)(4), n_7(6)>
>   # r_3 = PHI <r_6(D)(4), r_8(6)>
>   # RANGE [0, 2] NONZERO 0x00000000000000003
>   n_7 = n_1 + 4294967295;
>   # RANGE [-2147483648, 2147483647] NONZERO 0x000000000fffffffe
>   r_8 = r_3 * 2;
>   if (n_7 != 0)
>     goto <bb 6>;
>   else
>     goto <bb 7>;
> 
>   <bb 6>:
>   goto <bb 5>;
> 
> The issue is that we use live_on_edge to determine if it is desirable
> to added ASSERT_EXPRs, but as we walk bbs in RPO order, we first enter
> the loop through the bb with exit edge and thus live of the latch isn't
> computed (and, generally the propagation for live ignores dfs back edges
> anyway), and because in the above live_on_edge ((5)->(6), n_7) is false,
> we don't add ASSERT_EXPR that n_7 is != 0 in the latch bb, so during
> iteration we start with n_1 being assumed [1, 3] (that is range of the
> assertion from the preceeding conditions on n_5(D)), but in the next
> iteration widen it to [0, 3] because we think n_7 can be [0, 2] in the
> PHI and thus end up with uselessly wide range because we think the
> subtraction can wrap around.  This patch improves live_on_edge for
> empty latches, by just looking at the PHIs on loop->header from the
> latch -> header edge and noting which SSA_NAMEs are used there.
> 
> I had to add -fno-tree-vrp to 4 unroll_*.c tests, because they disable
> various tree optimizations already and want to test unrolling of loops
> iterating exactly twice, but with this VRP change VRP is smart enough
> to replace the PHI argument on the i IV from
> -  # i_13 = PHI <i_8(3), 0(2)>
> +  # i_13 = PHI <1(3), 0(2)>
> (the loop iterates exactly twice) and RTL unrolling doesn't do the
> tested thing in that case.  But, this should affect only loops that roll
> exactly twice and those realy should be unrolled already far before, so IMHO
> it isn't something to try to optimize better at the RTL level.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2013-11-06  Jakub Jelinek  <ja...@redhat.com>
> 
>       * tree-vrp.c (live_on_edge): If e->dest is an empty latch
>       of some loop, but live[e->dest->index] is not computed, look
>       for SSA_NAMEs used in loop header PHIs from the latch edge.
> 
>       * gcc.dg/unroll_1.c: Add -fno-tree-vrp to dg-options.
>       * gcc.dg/unroll_2.c: Likewise.
>       * gcc.dg/unroll_3.c: Likewise.
>       * gcc.dg/unroll_4.c: Likewise.
> 
> --- gcc/tree-vrp.c.jj 2013-11-06 08:48:01.000000000 +0100
> +++ gcc/tree-vrp.c    2013-11-06 09:32:19.205104029 +0100
> @@ -92,6 +92,42 @@ static sbitmap *live;
>  static bool
>  live_on_edge (edge e, tree name)
>  {
> +  if (live[e->dest->index] == NULL)
> +    {

I'd rather have live[] computed "correctly" than the following
which I think is a hack ... as you say the issue is ordering
(or rather that there isn't an "order" for CFG cycles unless
we want to iterate).  For loop BB order we explicitely handle
the latch, maybe just using a different order than
RPO order, with special-casing the latch, makes more sense?

Richard.

> +      /* Handle edges to empty latch blocks.  If NAME appears
> +      in loop header phis on edges from latch, return true
> +      and remember those SSA_NAMEs.  */
> +      basic_block bb = e->dest;
> +      if (bb->loop_father
> +       && bb->loop_father->latch == bb
> +       && single_succ_p (bb)
> +       && empty_block_p (bb))
> +     {
> +       gimple_stmt_iterator gsi;
> +       edge e2 = single_succ_edge (bb);
> +       for (gsi = gsi_start_phis (e2->dest);
> +            !gsi_end_p (gsi);
> +            gsi_next (&gsi))
> +         {
> +           gimple phi = gsi_stmt (gsi);
> +           tree res = gimple_phi_result (phi), arg;
> +
> +           if (virtual_operand_p (res))
> +             continue;
> +           arg = PHI_ARG_DEF_FROM_EDGE (phi, e2);
> +           if (TREE_CODE (arg) == SSA_NAME)
> +             {
> +               if (live[e->dest->index] == NULL)
> +                 {
> +                   live[e->dest->index] = sbitmap_alloc (num_ssa_names);
> +                   bitmap_clear (live[e->dest->index]);
> +                 }
> +               bitmap_set_bit (live[e->dest->index],
> +                               SSA_NAME_VERSION (arg));
> +             }
> +         }
> +     }
> +    }
>    return (live[e->dest->index]
>         && bitmap_bit_p (live[e->dest->index], SSA_NAME_VERSION (name)));
>  }
> --- gcc/testsuite/gcc.dg/unroll_1.c.jj        2013-09-09 11:32:36.000000000 
> +0200
> +++ gcc/testsuite/gcc.dg/unroll_1.c   2013-11-06 17:10:52.900722932 +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-rtl-loop2_unroll=stderr -fno-peel-loops 
> -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll" } */
> +/* { dg-options "-O2 -fdump-rtl-loop2_unroll=stderr -fno-peel-loops 
> -fno-tree-vrp -fdisable-tree-cunroll -fdisable-tree-cunrolli 
> -fenable-rtl-loop2_unroll" } */
>  
>  unsigned a[100], b[100];
>  inline void bar()
> --- gcc/testsuite/gcc.dg/unroll_2.c.jj        2013-08-30 14:38:39.000000000 
> +0200
> +++ gcc/testsuite/gcc.dg/unroll_2.c   2013-11-06 17:10:30.751845504 +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops 
> -fdisable-tree-cunroll=foo -fdisable-tree-cunrolli=foo 
> -fenable-rtl-loop2_unroll" } */
> +/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fno-tree-vrp 
> -fdisable-tree-cunroll=foo -fdisable-tree-cunrolli=foo 
> -fenable-rtl-loop2_unroll" } */
>  
>  unsigned a[100], b[100];
>  inline void bar()
> --- gcc/testsuite/gcc.dg/unroll_3.c.jj        2013-08-30 14:38:39.000000000 
> +0200
> +++ gcc/testsuite/gcc.dg/unroll_3.c   2013-11-06 17:10:38.864800338 +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops 
> -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo" 
> } */
> +/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fno-tree-vrp 
> -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo" 
> } */
>  
>  unsigned a[100], b[100];
>  inline void bar()
> --- gcc/testsuite/gcc.dg/unroll_4.c.jj        2013-08-30 14:38:40.000000000 
> +0200
> +++ gcc/testsuite/gcc.dg/unroll_4.c   2013-11-06 17:11:03.816665603 +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops 
> -fdisable-tree-cunroll -fdisable-tree-cunrolli 
> -fenable-rtl-loop2_unroll=foo2" } */
> +/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fno-tree-vrp 
> -fdisable-tree-cunroll -fdisable-tree-cunrolli 
> -fenable-rtl-loop2_unroll=foo2" } */
>  
>  unsigned a[100], b[100];
>  inline void bar()
> 
> 
>       Jakub
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend

Reply via email to