Re: [patch tree-ssa-forwprop]: Add type raising in shift-operations

Kai Tietz Fri, 22 Nov 2013 17:44:11 -0800

----- Original Message -----
> On 11/21/13 08:15, Kai Tietz wrote:
> > Hi,
> >
> > this is the required forward-propagation part for the type-demotion pass as
> > separate patch.
> > This patch is sharing some adjustments in testsuite due new optimization of
> > type-casts on shift-operations.
> >
> > This patch tries to generate the "normal" form (TYPE) (X shift-op Y) out of
> > the "denormal" form "((TYPE) X) shift-op Y".
> >
> > ChangeLog
> >
> > 2013-11-21  Kai Tietz  <kti...@redhat.com>
> >
> >     * tree-ssa-forwprop.c (simplify_shift):  New function.
> >     (ssa_forward_propagate_and_combine): Use it.
> >
> > ChangeLog testsuite
> >
> >     * gcc.dg/tree-ssa-ts-shift-2.c: New test.
> >
> >
> > Shared testsuite part between type-demotion and forward-propagation
> > patches.
> >
> > Changelog gcc/testsuite:
> >
> >     * gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
> >     * gcc.dg/vect/vect-over-widen-1.c: Likewise.
> >     * gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
> >     * gcc.dg/vect/vect-over-widen-3.c: Likewise.
> >     * gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
> >     * gcc.dg/vect/vect-over-widen-4.c: Likewise.
> >
> > Bootstrapped for x86_64-unknown-linux-gnu.  Ok for apply?
> >
> > Index: gcc-org/gcc/testsuite/gcc.dg/tree-ssa/ts-shift-2.c
> > ===================================================================
> > --- /dev/null
> > +++ gcc-org/gcc/testsuite/gcc.dg/tree-ssa/ts-shift-2.c
> > @@ -0,0 +1,33 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +
> > +unsigned char f1 (char x)
> > +{
> > +  unsigned char x1 = (unsigned char) x;
> > +  return (x1 << 5);
> > +}
> > +
> > +unsigned short f2 (unsigned int x)
> > +{
> > +  unsigned short x1 = (unsigned short) x;
> > +  return (x1 << 15);
> > +}
> > +
> > +unsigned int f3 (unsigned short x)
> > +{
> > +  unsigned long long x1 = (unsigned long long) x;
> > +  return (unsigned int) (x1 >> 15);
> > +}
> > +
> > +unsigned long long f4 (unsigned short x)
> > +{
> > +  unsigned int x1 = (unsigned int) x;
> > +  return (unsigned long long) (x1 >> 7);
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "= \\\(long long unsigned int\\\)" 1
> > "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "= \\\(short unsigned int\\\)" 1
> > "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "= \\\(unsigned int\\\)" 1
> > "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "= \\\(unsigned char\\\)" 1
> > "optimized" } } */
> So at least for f1 and f2, these regexps match regardless of whether or
> not the code was transformed.
> 
> Without your patch (f1)
>    x1_2 = (unsigned char) x_1(D);
>    _3 = x1_2 << 5;
>    return _3;
> 
> With your patch (f1):
>    _4 = x_1(D) << 5;
>    _2 = (unsigned char) _4;
>    return _2;
> 
> So all that's happened is we changed which object got casted -- we have
> the same number casts of an object to (unsigned char).


That is actual wished.  We shouldn't come to patterns, which have more 
type-casts by this patch.
What we see here is the normalization of shift-left/right operations.  This 
patch takes care that we prefer in general (Type) (X shift-left/right y) over 
((Type) X) shift-left/right y notation.


> ISTM you might be better off scanning the details of the forwprop dump.
>   For example, in f1 we get:
> 
>     Replaced _3 = x1_2 << 5;
>       x1_2 = (unsigned char) x_1(D);
>     by    by _3 = (unsigned char) _5;
>       _5 = x_1(D) << 5;
> 
> 
> We basically see the same thing happen in f2.
> 
> For f1 & f2 I don't see that we've done anything other than change the
> casts.  They don't show any benefit from this patch.
> 
> f3 & f4 on the other hand do show improvements.

I adjusted test so, that f1, and f2 are showing an effect, too.  I had kept 
here simple transformation to see normalized pattern-form.
In new variant the checks won't match without that patch anymore.
 
> 
> WHen I looked over a bunch of .i files, the results were mixed.
> Sometimes better, sometimes worse without any clear bias.

Hmm, this sample here is questionable.
 
> So here's an example that gets worse:
> 
> 
> *************** rshift_double (long unsigned int l1, lon
> *** 651,661 ****
>      _29 = (long int) _28;
>      *hv_13(D) = _29;
>      _31 = l1_9(D) >> _27;
> !   _32 = (unsigned int) count_4(D);
> !   _33 = 63 - _32;
> !   _34 = (int) _33;
> !   _35 = h1.18_26 << _34;
> !   _36 = _35 << 1;
>      _37 = _36 | _31;
>      *lv_12(D) = _37;
>    ;;    succ:       11
> --- 652,663 ----
>      _29 = (long int) _28;
>      *hv_13(D) = _29;
>      _31 = l1_9(D) >> _27;
> !   _33 = (unsigned int) count_4(D);
> !   _34 = 63 - _33;
> !   _35 = (int) _34;
> !   _71 = h1_10(D) << _35;
> !   _32 = _71 << 1;
> !   _36 = (long unsigned int) _32;
>      _37 = _36 | _31;
>      *lv_12(D) = _37;
> 
> 
> > +/* { dg-final { cleanup-tree-dump "optimized" } } */
> > +
> > Index: gcc-org/gcc/tree-ssa-forwprop.c
> > ===================================================================
> > --- gcc-org.orig/gcc/tree-ssa-forwprop.c
> > +++ gcc-org/gcc/tree-ssa-forwprop.c
> > @@ -551,6 +551,107 @@ forward_propagate_into_gimple_cond (gimp
> >     return 0;
> >   }
> >
> > +/* Try to simplify shift-statement ((TYPE1) X) CODE Y for
> > +   integral-kind types.
> > +   Returns none-zero if the stmt was changed.  */
> Simplify it to what?  I know from reading the patch explanation, but
> it'd be better to include that in a comment.  Even better would be to
> show a little sequence of psuedo-gimple and how it gets changed.

Agreed. Simplify might be the wrong term here. Due we are doing 
pattern-transformation - if possible - into its normalized form.
I modified comment here, and adjusted the print to display by-string only once.
 
> 
> > +
> > +static int
> > +simplify_shift (enum tree_code code, gimple_stmt_iterator *gsi_p)
> > +{
> > +  gimple stmt = gsi_stmt (*gsi_p);
> > +  gimple def, newop;
> > +  tree op1, opx, op2, t_opx, tem;
> > +  int ret;
> > +
> > +  if (!INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt))))
> > +    return 0;
> > +
> > +  op1 = gimple_assign_rhs1 (stmt);
> > +  if (TREE_CODE (op1) != SSA_NAME
> > +      || !(def = SSA_NAME_DEF_STMT (op1))
> > +      || !is_gimple_assign (def)
> > +      || !gimple_assign_cast_p (def)
> > +      || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op1)
> > +      || !has_single_use (op1))
> > +    return 0;
> > +
> > +  opx = gimple_assign_rhs1 (def);
> > +  if (TREE_CODE (opx) != SSA_NAME)
> > +    return 0;
> > +
> > +  t_opx = TREE_TYPE (opx);
> > +
> > +  if (!INTEGRAL_TYPE_P (t_opx))
> > +    return 0;
> > +
> > +  op2 = gimple_assign_rhs2 (stmt);
> > +
> > +  if (code == LSHIFT_EXPR)
> > +    {
> > +      if (TYPE_PRECISION (TREE_TYPE (op1))
> > +     > TYPE_PRECISION (t_opx))
> > +        return 0;
> > +    }
> > +  else if (code == RSHIFT_EXPR)
> > +    {
> > +      /* If type of OPX and OP1 are compatible, we don't need to check OP2
> > +         for validity as we don't change range of operation.
> > +         Otherwise we need to check that right-hand operand of shift-right
> > +    doesn't exceed type-precision of inner operand.  */
> There's some kind of whitespace problem here.
> 
> > +
> > +  /* Do transformation ((T1) X) shift-code N -> (T1) (X shift-code N).  */
> > +  if (dump_file)
> > +    {
> > +      fprintf (dump_file, "  Replaced ");
> > +      print_gimple_stmt (dump_file, stmt, 0, 0);
> > +      fprintf (dump_file, "    ");
> > +      print_gimple_stmt (dump_file, def, 0, 0);
> > +      fprintf (dump_file, "  by ");
> > +    }
> > +
> > +  tem = make_ssa_name (t_opx, NULL);
> > +  newop = gimple_build_assign_with_ops (code, tem, opx, op2);
> > +  gimple_set_location (newop, gimple_location (stmt));
> > +  gsi_insert_before (gsi_p, newop, GSI_SAME_STMT);
> > +  gimple_assign_set_rhs_with_ops_1 (gsi_p, NOP_EXPR, tem, NULL_TREE,
> > NULL_TREE);
> > +
> > +  if (gimple_location (def) != UNKNOWN_LOCATION)
> > +    gimple_set_location (stmt, gimple_location (def));
> > +
> > +  stmt = gsi_stmt (*gsi_p);
> > +  update_stmt (stmt);
> > +
> > +  if (TREE_CODE (op1) == SSA_NAME)
> > +    ret = remove_prop_source_from_use (op1) ? 2 : 1;
> > +  else
> > +    ret = 1;
> > +  if (dump_file)
> > +    {
> > +      fprintf (dump_file, "   by ");
> > +      print_gimple_stmt (dump_file, stmt, 0, 0);
> > +      fprintf (dump_file, "    ");
> > +      print_gimple_stmt (dump_file, newop, 0, 0);
> As someone has already noted, the dump files are sub-optimal with the
> "by by" output:

Yeah, adjusted it in attached patch.
 
>    Replaced _3 = x1_2 << 5;
>      x1_2 = (unsigned char) x_1(D);
>    by    by _3 = (unsigned char) _5;
>      _5 = x_1(D) << 5;
> >
> >   /* Propagate from the ssa name definition statements of COND_EXPR
> >      in the rhs of statement STMT into the conditional if that simplifies
> >      it.
> > @@ -3432,6 +3533,15 @@ ssa_forward_propagate_and_combine (void)
> >                  || code == NEGATE_EXPR)
> >                 && TREE_CODE (rhs1) == SSA_NAME)
> >               changed = simplify_not_neg_expr (&gsi);
> > +           else if (code == LSHIFT_EXPR
> > +                    || code == RSHIFT_EXPR)
> > +             {
> > +               int did_something;
> > +               did_something = simplify_shift (code, &gsi);
> > +               if (did_something == 2)
> > +                 cfg_changed = true;
> > +               changed = did_something != 0;
> > +             }
> >             else if (code == COND_EXPR
> >                      || code == VEC_COND_EXPR)
> >               {
> Does this apply to rotates as well?

For rotate this transformation doesn't apply in the same way.  Just in case 
that type-cast and X have same precision we could do this transformation.  So 
this case might be some future enhancement for this function.

> 
> 
> >
> > Index: gcc-trunk/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
> > ===================================================================
> > --- gcc-trunk.orig/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
> > +++ gcc-trunk/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
> > @@ -58,9 +58,8 @@ int main (void)
> >     return 0;
> >   }
> >
> > -/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern:
> > detected" 2 "vect" { target vect_widen_shift } } } */
> > -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern:
> > detected" 2 "vect" { target vect_widen_shift } } } */
> > -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern:
> > detected" 4 "vect" { target { ! vect_widen_shift } } } } */
> > +/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern:
> > detected" 4 "vect" { target vect_widen_shift } } } */
> > +/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern:
> > detected" 0 "vect" } } */
> >   /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> >   /* { dg-final { cleanup-tree-dump "vect" } } */
> I'm guessing this was one of the cases you mentioned to me privately via
> email.  Are we still vectorizing as well, just differently without the
> need for as many conversions?

Yes.  The vectorization still happens in the same way as before.  Just the 
shift-widening-patterns getting superfluous by this patch.
 
> Simliarly for the other vect testcases.  At the least you need to
> indicate we're not actually regressing on those tests -- when you change
> an epected output in a way that makes it look like the test might be
> compromised, you need to explain yourself more thoroughly.  My obvious
> concern is we're not vectorizing as much and the testsuite changes are
> merely a way of hiding that fact.

The output-test isn't changed.  All tests still result in vectorized loop and 
if you take a closer look to generated output, you will notice that generated 
vectorization is identical to that one without this patch.
 
> I think there's some questions that need to be answered before this can
> go forward.
> 
> jeff
> 
> 
> 

Regards,
Kai

ChangeLog

2013-11-22  Kai Tietz  <kti...@redhat.com>

        * tree-ssa-forwprop.c (simplify_shift):  New function.
        (ssa_forward_propagate_and_combine): Use it.

ChangeLog testsuite

        * gcc.dg/tree-ssa-ts-shift-2.c: New test.


Bootstrapped for x86_64-unknown-linux-gnu.  Ok for apply?

Regards,
Kai

Index: gcc-org/gcc/testsuite/gcc.dg/tree-ssa/ts-shift-2.c
===================================================================
--- /dev/null
+++ gcc-org/gcc/testsuite/gcc.dg/tree-ssa/ts-shift-2.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+char f1 (char x)
+{
+  unsigned char x1 = (unsigned char) x;
+  return (x1 << 5);
+}
+
+short f2 (unsigned int x)
+{
+  unsigned short x1 = (unsigned short) x;
+  return (x1 << 15);
+}
+
+unsigned int f3 (unsigned short x)
+{
+  unsigned long long x1 = (unsigned long long) x;
+  return (unsigned int) (x1 >> 15);
+}
+
+unsigned long long f4 (unsigned short x)
+{
+  unsigned int x1 = (unsigned int) x;
+  return (unsigned long long) (x1 >> 7);
+}
+
+/* { dg-final { scan-tree-dump-times "= \\\(long long unsigned int\\\)" 1 
"optimized" } } */
+/* { dg-final { scan-tree-dump-times "= \\\(short int\\\)" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "= \\\(unsigned int\\\)" 1 "optimized" } 
} */
+/* { dg-final { scan-tree-dump-times "= \\\(unsigned char\\\)" 1 "optimized" } 
} */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
+
Index: gcc-org/gcc/tree-ssa-forwprop.c
===================================================================
--- gcc-org.orig/gcc/tree-ssa-forwprop.c
+++ gcc-org/gcc/tree-ssa-forwprop.c
@@ -551,6 +551,107 @@ forward_propagate_into_gimple_cond (gimp
   return 0;
 }
 
+/* Try to normalize shift-left/right statement from form
+   ((TYPE1) X) CODE Y to (TYPE) (X CODE Y).  TYPE, and X have to be
+   integral-kind types.
+   Returns none-zero if the stmt was changed.  */
+
+static int
+simplify_shift (enum tree_code code, gimple_stmt_iterator *gsi_p)
+{
+  gimple stmt = gsi_stmt (*gsi_p);
+  gimple def, newop;
+  tree op1, opx, op2, t_opx, tem;
+  int ret;
+
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt))))
+    return 0;
+
+  op1 = gimple_assign_rhs1 (stmt);
+  if (TREE_CODE (op1) != SSA_NAME
+      || !(def = SSA_NAME_DEF_STMT (op1))
+      || !is_gimple_assign (def)
+      || !gimple_assign_cast_p (def)
+      || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op1)
+      || !has_single_use (op1))
+    return 0;
+
+  opx = gimple_assign_rhs1 (def);
+  if (TREE_CODE (opx) != SSA_NAME)
+    return 0;
+
+  t_opx = TREE_TYPE (opx);
+
+  if (!INTEGRAL_TYPE_P (t_opx))
+    return 0;
+
+  op2 = gimple_assign_rhs2 (stmt);
+
+  if (code == LSHIFT_EXPR)
+    {
+      if (TYPE_PRECISION (TREE_TYPE (op1))
+         > TYPE_PRECISION (t_opx))
+        return 0;
+    }
+  else if (code == RSHIFT_EXPR)
+    {
+      /* If type of OPX and OP1 are compatible, we don't need to check OP2
+         for validity as we don't change range of operation.
+         Otherwise we need to check that right-hand operand of shift-right
+        doesn't exceed type-precision of inner operand.  */
+      if (!useless_type_conversion_p (TREE_TYPE (op1), t_opx))
+        {
+         if (TREE_CODE (op2) != INTEGER_CST)
+           return 0;
+         if (integer_zerop (fold_binary (LT_EXPR, boolean_type_node, op2,
+                                         build_int_cst (TREE_TYPE (op2),
+                                                        TYPE_PRECISION 
(t_opx)))))
+           return 0;
+
+          /* See if cast can be moved out of the shift-right operation.  */
+         if (TYPE_PRECISION (TREE_TYPE (op1)) <= TYPE_PRECISION (t_opx)
+             || (!TYPE_UNSIGNED (t_opx) && TYPE_UNSIGNED (TREE_TYPE (op1))))
+           return 0;
+        }
+    }
+  else
+    return 0;
+
+  /* Do transformation ((T1) X) shift-code N -> (T1) (X shift-code N).  */
+  if (dump_file)
+    {
+      fprintf (dump_file, "  Replaced ");
+      print_gimple_stmt (dump_file, stmt, 0, 0);
+      fprintf (dump_file, "    ");
+      print_gimple_stmt (dump_file, def, 0, 0);
+    }
+
+  tem = make_ssa_name (t_opx, NULL);
+  newop = gimple_build_assign_with_ops (code, tem, opx, op2);
+  gimple_set_location (newop, gimple_location (stmt));
+  gsi_insert_before (gsi_p, newop, GSI_SAME_STMT);
+  gimple_assign_set_rhs_with_ops_1 (gsi_p, NOP_EXPR, tem, NULL_TREE, 
NULL_TREE);
+
+  if (gimple_location (def) != UNKNOWN_LOCATION)
+    gimple_set_location (stmt, gimple_location (def));
+
+  stmt = gsi_stmt (*gsi_p);
+  update_stmt (stmt);
+
+  if (TREE_CODE (op1) == SSA_NAME)
+    ret = remove_prop_source_from_use (op1) ? 2 : 1;
+  else
+    ret = 1;
+  if (dump_file)
+    {
+      fprintf (dump_file, "   by ");
+      print_gimple_stmt (dump_file, stmt, 0, 0);
+      fprintf (dump_file, "    ");
+      print_gimple_stmt (dump_file, newop, 0, 0);
+    }
+
+  return ret;
+}
 
 /* Propagate from the ssa name definition statements of COND_EXPR
    in the rhs of statement STMT into the conditional if that simplifies it.
@@ -3432,6 +3533,15 @@ ssa_forward_propagate_and_combine (void)
                     || code == NEGATE_EXPR)
                    && TREE_CODE (rhs1) == SSA_NAME)
                  changed = simplify_not_neg_expr (&gsi);
+               else if (code == LSHIFT_EXPR
+                        || code == RSHIFT_EXPR)
+                 {
+                   int did_something;
+                   did_something = simplify_shift (code, &gsi);
+                   if (did_something == 2)
+                     cfg_changed = true;
+                   changed = did_something != 0;
+                 }
                else if (code == COND_EXPR
                         || code == VEC_COND_EXPR)
                  {

Re: [patch tree-ssa-forwprop]: Add type raising in shift-operations

Reply via email to