----- Original Message ----- > On 11/21/13 08:15, Kai Tietz wrote: > > Hi, > > > > this is the required forward-propagation part for the type-demotion pass as > > separate patch. > > This patch is sharing some adjustments in testsuite due new optimization of > > type-casts on shift-operations. > > > > This patch tries to generate the "normal" form (TYPE) (X shift-op Y) out of > > the "denormal" form "((TYPE) X) shift-op Y". > > > > ChangeLog > > > > 2013-11-21 Kai Tietz <kti...@redhat.com> > > > > * tree-ssa-forwprop.c (simplify_shift): New function. > > (ssa_forward_propagate_and_combine): Use it. > > > > ChangeLog testsuite > > > > * gcc.dg/tree-ssa-ts-shift-2.c: New test. > > > > > > Shared testsuite part between type-demotion and forward-propagation > > patches. > > > > Changelog gcc/testsuite: > > > > * gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise. > > * gcc.dg/vect/vect-over-widen-1.c: Likewise. > > * gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise. > > * gcc.dg/vect/vect-over-widen-3.c: Likewise. > > * gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise. > > * gcc.dg/vect/vect-over-widen-4.c: Likewise. > > > > Bootstrapped for x86_64-unknown-linux-gnu. Ok for apply? > > > > Index: gcc-org/gcc/testsuite/gcc.dg/tree-ssa/ts-shift-2.c > > =================================================================== > > --- /dev/null > > +++ gcc-org/gcc/testsuite/gcc.dg/tree-ssa/ts-shift-2.c > > @@ -0,0 +1,33 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > + > > +unsigned char f1 (char x) > > +{ > > + unsigned char x1 = (unsigned char) x; > > + return (x1 << 5); > > +} > > + > > +unsigned short f2 (unsigned int x) > > +{ > > + unsigned short x1 = (unsigned short) x; > > + return (x1 << 15); > > +} > > + > > +unsigned int f3 (unsigned short x) > > +{ > > + unsigned long long x1 = (unsigned long long) x; > > + return (unsigned int) (x1 >> 15); > > +} > > + > > +unsigned long long f4 (unsigned short x) > > +{ > > + unsigned int x1 = (unsigned int) x; > > + return (unsigned long long) (x1 >> 7); > > +} > > + > > +/* { dg-final { scan-tree-dump-times "= \\\(long long unsigned int\\\)" 1 > > "optimized" } } */ > > +/* { dg-final { scan-tree-dump-times "= \\\(short unsigned int\\\)" 1 > > "optimized" } } */ > > +/* { dg-final { scan-tree-dump-times "= \\\(unsigned int\\\)" 1 > > "optimized" } } */ > > +/* { dg-final { scan-tree-dump-times "= \\\(unsigned char\\\)" 1 > > "optimized" } } */ > So at least for f1 and f2, these regexps match regardless of whether or > not the code was transformed. > > Without your patch (f1) > x1_2 = (unsigned char) x_1(D); > _3 = x1_2 << 5; > return _3; > > With your patch (f1): > _4 = x_1(D) << 5; > _2 = (unsigned char) _4; > return _2; > > So all that's happened is we changed which object got casted -- we have > the same number casts of an object to (unsigned char).
That is actual wished. We shouldn't come to patterns, which have more type-casts by this patch. What we see here is the normalization of shift-left/right operations. This patch takes care that we prefer in general (Type) (X shift-left/right y) over ((Type) X) shift-left/right y notation. > ISTM you might be better off scanning the details of the forwprop dump. > For example, in f1 we get: > > Replaced _3 = x1_2 << 5; > x1_2 = (unsigned char) x_1(D); > by by _3 = (unsigned char) _5; > _5 = x_1(D) << 5; > > > We basically see the same thing happen in f2. > > For f1 & f2 I don't see that we've done anything other than change the > casts. They don't show any benefit from this patch. > > f3 & f4 on the other hand do show improvements. I adjusted test so, that f1, and f2 are showing an effect, too. I had kept here simple transformation to see normalized pattern-form. In new variant the checks won't match without that patch anymore. > > WHen I looked over a bunch of .i files, the results were mixed. > Sometimes better, sometimes worse without any clear bias. Hmm, this sample here is questionable. > So here's an example that gets worse: > > > *************** rshift_double (long unsigned int l1, lon > *** 651,661 **** > _29 = (long int) _28; > *hv_13(D) = _29; > _31 = l1_9(D) >> _27; > ! _32 = (unsigned int) count_4(D); > ! _33 = 63 - _32; > ! _34 = (int) _33; > ! _35 = h1.18_26 << _34; > ! _36 = _35 << 1; > _37 = _36 | _31; > *lv_12(D) = _37; > ;; succ: 11 > --- 652,663 ---- > _29 = (long int) _28; > *hv_13(D) = _29; > _31 = l1_9(D) >> _27; > ! _33 = (unsigned int) count_4(D); > ! _34 = 63 - _33; > ! _35 = (int) _34; > ! _71 = h1_10(D) << _35; > ! _32 = _71 << 1; > ! _36 = (long unsigned int) _32; > _37 = _36 | _31; > *lv_12(D) = _37; > > > > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > > + > > Index: gcc-org/gcc/tree-ssa-forwprop.c > > =================================================================== > > --- gcc-org.orig/gcc/tree-ssa-forwprop.c > > +++ gcc-org/gcc/tree-ssa-forwprop.c > > @@ -551,6 +551,107 @@ forward_propagate_into_gimple_cond (gimp > > return 0; > > } > > > > +/* Try to simplify shift-statement ((TYPE1) X) CODE Y for > > + integral-kind types. > > + Returns none-zero if the stmt was changed. */ > Simplify it to what? I know from reading the patch explanation, but > it'd be better to include that in a comment. Even better would be to > show a little sequence of psuedo-gimple and how it gets changed. Agreed. Simplify might be the wrong term here. Due we are doing pattern-transformation - if possible - into its normalized form. I modified comment here, and adjusted the print to display by-string only once. > > > + > > +static int > > +simplify_shift (enum tree_code code, gimple_stmt_iterator *gsi_p) > > +{ > > + gimple stmt = gsi_stmt (*gsi_p); > > + gimple def, newop; > > + tree op1, opx, op2, t_opx, tem; > > + int ret; > > + > > + if (!INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt)))) > > + return 0; > > + > > + op1 = gimple_assign_rhs1 (stmt); > > + if (TREE_CODE (op1) != SSA_NAME > > + || !(def = SSA_NAME_DEF_STMT (op1)) > > + || !is_gimple_assign (def) > > + || !gimple_assign_cast_p (def) > > + || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op1) > > + || !has_single_use (op1)) > > + return 0; > > + > > + opx = gimple_assign_rhs1 (def); > > + if (TREE_CODE (opx) != SSA_NAME) > > + return 0; > > + > > + t_opx = TREE_TYPE (opx); > > + > > + if (!INTEGRAL_TYPE_P (t_opx)) > > + return 0; > > + > > + op2 = gimple_assign_rhs2 (stmt); > > + > > + if (code == LSHIFT_EXPR) > > + { > > + if (TYPE_PRECISION (TREE_TYPE (op1)) > > + > TYPE_PRECISION (t_opx)) > > + return 0; > > + } > > + else if (code == RSHIFT_EXPR) > > + { > > + /* If type of OPX and OP1 are compatible, we don't need to check OP2 > > + for validity as we don't change range of operation. > > + Otherwise we need to check that right-hand operand of shift-right > > + doesn't exceed type-precision of inner operand. */ > There's some kind of whitespace problem here. > > > + > > + /* Do transformation ((T1) X) shift-code N -> (T1) (X shift-code N). */ > > + if (dump_file) > > + { > > + fprintf (dump_file, " Replaced "); > > + print_gimple_stmt (dump_file, stmt, 0, 0); > > + fprintf (dump_file, " "); > > + print_gimple_stmt (dump_file, def, 0, 0); > > + fprintf (dump_file, " by "); > > + } > > + > > + tem = make_ssa_name (t_opx, NULL); > > + newop = gimple_build_assign_with_ops (code, tem, opx, op2); > > + gimple_set_location (newop, gimple_location (stmt)); > > + gsi_insert_before (gsi_p, newop, GSI_SAME_STMT); > > + gimple_assign_set_rhs_with_ops_1 (gsi_p, NOP_EXPR, tem, NULL_TREE, > > NULL_TREE); > > + > > + if (gimple_location (def) != UNKNOWN_LOCATION) > > + gimple_set_location (stmt, gimple_location (def)); > > + > > + stmt = gsi_stmt (*gsi_p); > > + update_stmt (stmt); > > + > > + if (TREE_CODE (op1) == SSA_NAME) > > + ret = remove_prop_source_from_use (op1) ? 2 : 1; > > + else > > + ret = 1; > > + if (dump_file) > > + { > > + fprintf (dump_file, " by "); > > + print_gimple_stmt (dump_file, stmt, 0, 0); > > + fprintf (dump_file, " "); > > + print_gimple_stmt (dump_file, newop, 0, 0); > As someone has already noted, the dump files are sub-optimal with the > "by by" output: Yeah, adjusted it in attached patch. > Replaced _3 = x1_2 << 5; > x1_2 = (unsigned char) x_1(D); > by by _3 = (unsigned char) _5; > _5 = x_1(D) << 5; > > > > /* Propagate from the ssa name definition statements of COND_EXPR > > in the rhs of statement STMT into the conditional if that simplifies > > it. > > @@ -3432,6 +3533,15 @@ ssa_forward_propagate_and_combine (void) > > || code == NEGATE_EXPR) > > && TREE_CODE (rhs1) == SSA_NAME) > > changed = simplify_not_neg_expr (&gsi); > > + else if (code == LSHIFT_EXPR > > + || code == RSHIFT_EXPR) > > + { > > + int did_something; > > + did_something = simplify_shift (code, &gsi); > > + if (did_something == 2) > > + cfg_changed = true; > > + changed = did_something != 0; > > + } > > else if (code == COND_EXPR > > || code == VEC_COND_EXPR) > > { > Does this apply to rotates as well? For rotate this transformation doesn't apply in the same way. Just in case that type-cast and X have same precision we could do this transformation. So this case might be some future enhancement for this function. > > > > > > Index: gcc-trunk/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c > > =================================================================== > > --- gcc-trunk.orig/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c > > +++ gcc-trunk/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c > > @@ -58,9 +58,8 @@ int main (void) > > return 0; > > } > > > > -/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: > > detected" 2 "vect" { target vect_widen_shift } } } */ > > -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: > > detected" 2 "vect" { target vect_widen_shift } } } */ > > -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: > > detected" 4 "vect" { target { ! vect_widen_shift } } } } */ > > +/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: > > detected" 4 "vect" { target vect_widen_shift } } } */ > > +/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: > > detected" 0 "vect" } } */ > > /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ > > /* { dg-final { cleanup-tree-dump "vect" } } */ > I'm guessing this was one of the cases you mentioned to me privately via > email. Are we still vectorizing as well, just differently without the > need for as many conversions? Yes. The vectorization still happens in the same way as before. Just the shift-widening-patterns getting superfluous by this patch. > Simliarly for the other vect testcases. At the least you need to > indicate we're not actually regressing on those tests -- when you change > an epected output in a way that makes it look like the test might be > compromised, you need to explain yourself more thoroughly. My obvious > concern is we're not vectorizing as much and the testsuite changes are > merely a way of hiding that fact. The output-test isn't changed. All tests still result in vectorized loop and if you take a closer look to generated output, you will notice that generated vectorization is identical to that one without this patch. > I think there's some questions that need to be answered before this can > go forward. > > jeff > > > Regards, Kai ChangeLog 2013-11-22 Kai Tietz <kti...@redhat.com> * tree-ssa-forwprop.c (simplify_shift): New function. (ssa_forward_propagate_and_combine): Use it. ChangeLog testsuite * gcc.dg/tree-ssa-ts-shift-2.c: New test. Bootstrapped for x86_64-unknown-linux-gnu. Ok for apply? Regards, Kai Index: gcc-org/gcc/testsuite/gcc.dg/tree-ssa/ts-shift-2.c =================================================================== --- /dev/null +++ gcc-org/gcc/testsuite/gcc.dg/tree-ssa/ts-shift-2.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +char f1 (char x) +{ + unsigned char x1 = (unsigned char) x; + return (x1 << 5); +} + +short f2 (unsigned int x) +{ + unsigned short x1 = (unsigned short) x; + return (x1 << 15); +} + +unsigned int f3 (unsigned short x) +{ + unsigned long long x1 = (unsigned long long) x; + return (unsigned int) (x1 >> 15); +} + +unsigned long long f4 (unsigned short x) +{ + unsigned int x1 = (unsigned int) x; + return (unsigned long long) (x1 >> 7); +} + +/* { dg-final { scan-tree-dump-times "= \\\(long long unsigned int\\\)" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "= \\\(short int\\\)" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "= \\\(unsigned int\\\)" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "= \\\(unsigned char\\\)" 1 "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ + Index: gcc-org/gcc/tree-ssa-forwprop.c =================================================================== --- gcc-org.orig/gcc/tree-ssa-forwprop.c +++ gcc-org/gcc/tree-ssa-forwprop.c @@ -551,6 +551,107 @@ forward_propagate_into_gimple_cond (gimp return 0; } +/* Try to normalize shift-left/right statement from form + ((TYPE1) X) CODE Y to (TYPE) (X CODE Y). TYPE, and X have to be + integral-kind types. + Returns none-zero if the stmt was changed. */ + +static int +simplify_shift (enum tree_code code, gimple_stmt_iterator *gsi_p) +{ + gimple stmt = gsi_stmt (*gsi_p); + gimple def, newop; + tree op1, opx, op2, t_opx, tem; + int ret; + + if (!INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt)))) + return 0; + + op1 = gimple_assign_rhs1 (stmt); + if (TREE_CODE (op1) != SSA_NAME + || !(def = SSA_NAME_DEF_STMT (op1)) + || !is_gimple_assign (def) + || !gimple_assign_cast_p (def) + || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op1) + || !has_single_use (op1)) + return 0; + + opx = gimple_assign_rhs1 (def); + if (TREE_CODE (opx) != SSA_NAME) + return 0; + + t_opx = TREE_TYPE (opx); + + if (!INTEGRAL_TYPE_P (t_opx)) + return 0; + + op2 = gimple_assign_rhs2 (stmt); + + if (code == LSHIFT_EXPR) + { + if (TYPE_PRECISION (TREE_TYPE (op1)) + > TYPE_PRECISION (t_opx)) + return 0; + } + else if (code == RSHIFT_EXPR) + { + /* If type of OPX and OP1 are compatible, we don't need to check OP2 + for validity as we don't change range of operation. + Otherwise we need to check that right-hand operand of shift-right + doesn't exceed type-precision of inner operand. */ + if (!useless_type_conversion_p (TREE_TYPE (op1), t_opx)) + { + if (TREE_CODE (op2) != INTEGER_CST) + return 0; + if (integer_zerop (fold_binary (LT_EXPR, boolean_type_node, op2, + build_int_cst (TREE_TYPE (op2), + TYPE_PRECISION (t_opx))))) + return 0; + + /* See if cast can be moved out of the shift-right operation. */ + if (TYPE_PRECISION (TREE_TYPE (op1)) <= TYPE_PRECISION (t_opx) + || (!TYPE_UNSIGNED (t_opx) && TYPE_UNSIGNED (TREE_TYPE (op1)))) + return 0; + } + } + else + return 0; + + /* Do transformation ((T1) X) shift-code N -> (T1) (X shift-code N). */ + if (dump_file) + { + fprintf (dump_file, " Replaced "); + print_gimple_stmt (dump_file, stmt, 0, 0); + fprintf (dump_file, " "); + print_gimple_stmt (dump_file, def, 0, 0); + } + + tem = make_ssa_name (t_opx, NULL); + newop = gimple_build_assign_with_ops (code, tem, opx, op2); + gimple_set_location (newop, gimple_location (stmt)); + gsi_insert_before (gsi_p, newop, GSI_SAME_STMT); + gimple_assign_set_rhs_with_ops_1 (gsi_p, NOP_EXPR, tem, NULL_TREE, NULL_TREE); + + if (gimple_location (def) != UNKNOWN_LOCATION) + gimple_set_location (stmt, gimple_location (def)); + + stmt = gsi_stmt (*gsi_p); + update_stmt (stmt); + + if (TREE_CODE (op1) == SSA_NAME) + ret = remove_prop_source_from_use (op1) ? 2 : 1; + else + ret = 1; + if (dump_file) + { + fprintf (dump_file, " by "); + print_gimple_stmt (dump_file, stmt, 0, 0); + fprintf (dump_file, " "); + print_gimple_stmt (dump_file, newop, 0, 0); + } + + return ret; +} /* Propagate from the ssa name definition statements of COND_EXPR in the rhs of statement STMT into the conditional if that simplifies it. @@ -3432,6 +3533,15 @@ ssa_forward_propagate_and_combine (void) || code == NEGATE_EXPR) && TREE_CODE (rhs1) == SSA_NAME) changed = simplify_not_neg_expr (&gsi); + else if (code == LSHIFT_EXPR + || code == RSHIFT_EXPR) + { + int did_something; + did_something = simplify_shift (code, &gsi); + if (did_something == 2) + cfg_changed = true; + changed = did_something != 0; + } else if (code == COND_EXPR || code == VEC_COND_EXPR) {