On Tue, Nov 11, 2014 at 5:58 PM, Thomas Preud'homme <thomas.preudho...@arm.com> wrote: > Hi, > > The bswap pass iterate over the statements in a basic block and replace them > by a bswap, load or load+bswap when equivalent. However, to keep the code > equivalent the load need to be perform at the same place as where one of the > original load was made so that they have the same VUSE. Consider for example > the following: > > incorrect_read_le32 (char *x, char *y) > { > unsigned char c0, c1, c2, c3; > uint32_t result; > > c0 = x[0]; > c1 = x[1]; > c2 = x[2]; > c3 = x[3]; > *y = 1; > if (c0 %2) > result = c0 | c1 << 8 | c2 << 16 | c3 << 24; > else > result = c1; > return result; > } > > On ARM this would be replaced by a 32bit load but this needs to be done > before the *y = 1 line. This implies to move the result line and can make it > change basic block. This can then lead to problems as the iterator used to > iterate over the statements in a basic block still points to the original > basic block of this statement while the statement itself points to the new > basic block. A second bswap can then copy this wrong basic block information > in a newly created gimple statement while it links to statement in another > basic block, which is the error observed. > > The patch consist in making sure the iterator points to the statement just > before the statement being consider for bswap replacement. The patch also > adds a number of comments to give the rational why some (potentially > surprising things) happen. The testsuite was constructed by using the gimple > representation of the binutils file causing the ICE and then reduced with > creduce (creduce could not reduce the binutils file directly). > > ChangeLog entries are as follows: > > *** gcc/ChangeLog *** > > 2014-11-09 Thomas Preud'homme <thomas.preudho...@arm.com> > > PR tree-optimization/63761 > * tree-ssa-math-opts.c (bswap_replace): Construct gsi from cur_stmt > rather than taking it as a parameter. Add some comments to explain the > gsi_move_before in case of load and why canonicalization of bswap into > a rotation is only done for 16bit values. > (pass_optimize_bswap::execute): Adapt for loop via gsi to make gsi > refer to the statement just before cur_stmt. Ignore 16bit bswap that > are already in canonical form. Adapt bswap_replace to removal of its > gsi parameter. > > > *** gcc/testsuite/ChangeLog *** > > 2014-11-09 Thomas Preud'homme <thomas.preudho...@arm.com> > > PR tree-optimization/63761 > * /gcc.c-torture/compile/pr63761.c: New test. > > Testing done: > * GCC with the patch was boostrapped on x86_64-linux-gnu and the testsuite > ran without any regression. > * Equally GCC with the patch was built for aarch64-none-linux-gnu target and > the tests ran without regression. > > > diff --git a/gcc/testsuite/gcc.c-torture/compile/pr63761.c > b/gcc/testsuite/gcc.c-torture/compile/pr63761.c > new file mode 100644 > index 0000000..5cda3f1 > --- /dev/null > +++ b/gcc/testsuite/gcc.c-torture/compile/pr63761.c > @@ -0,0 +1,17 @@ > +int a, b; > +short c; > + > +void fn1 (); > + > +void > +fn2 (unsigned short p1) > +{ > + int d; > + > + c = p1 >> 8 | p1 << 8; > + d = b; > + if (d) > + fn1 (); > + a = d >> 8 & 0x00FF > + | d << 8 & 0xFF00; > +} > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c > index 12fa4e8..aab056c 100644 > --- a/gcc/tree-ssa-math-opts.c > +++ b/gcc/tree-ssa-math-opts.c > @@ -2172,23 +2172,28 @@ public: > > }; // class pass_optimize_bswap > > -/* Perform the bswap optimization: replace the statement CUR_STMT at > - GSI with a load of type, VUSE and set-alias as described by N if a > - memory source is involved (N->base_addr is non null), followed by > - the builtin bswap invocation in FNDECL if BSWAP is true. SRC_STMT > - gives where should the replacement be made. It also gives the > - source on which CUR_STMT is operating via its rhs's first tree nad > - N->range gives the size of the expression involved for maintaining > - some statistics. */ > +/* Perform the bswap optimization: replace the expression computed in the rhs > + of CUR_STMT by an equivalent bswap, load or load + bswap expression. > + Which of these alternatives replace the rhs is given by N->base_addr (non > + null if a load is needed) and BSWAP. The type, VUSE and set-alias of the > + load to perform are also given in N while the builtin bswap invoke is > given > + in FNDEL. Finally, if a load is involved, SRC_STMT refers to one of the > + load statements involved to construct the rhs in CUR_STMT and N->range > gives > + the size of the rhs expression for maintaining some statistics. > + > + Note that if the replacement involve a load, CUR_STMT is moved just after > + SRC_STMT to do the load with the same VUSE which can lead to CUR_STMT > + changing of basic block. */ > > static bool > -bswap_replace (gimple cur_stmt, gimple_stmt_iterator gsi, gimple src_stmt, > - tree fndecl, tree bswap_type, tree load_type, > - struct symbolic_number *n, bool bswap) > +bswap_replace (gimple cur_stmt, gimple src_stmt, tree fndecl, tree > bswap_type, > + tree load_type, struct symbolic_number *n, bool bswap) > { > + gimple_stmt_iterator gsi; > tree src, tmp, tgt; > gimple bswap_stmt; > > + gsi = gsi_for_stmt (cur_stmt); > src = gimple_assign_rhs1 (src_stmt); > tgt = gimple_assign_lhs (cur_stmt); > > @@ -2207,6 +2212,9 @@ bswap_replace (gimple cur_stmt, gimple_stmt_iterator > gsi, gimple src_stmt, > && SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align)) > return false; > > + /* Move cur_stmt just before one of the load of the original > + to ensure it has the same VUSE. See PR61517 for what could > + go wrong. */ > gsi_move_before (&gsi, &gsi_ins); > gsi = gsi_for_stmt (cur_stmt); > > @@ -2293,7 +2301,10 @@ bswap_replace (gimple cur_stmt, gimple_stmt_iterator > gsi, gimple src_stmt, > > tmp = src; > > - /* Canonical form for 16 bit bswap is a rotate expression. */ > + /* Canonical form for 16 bit bswap is a rotate expression. Only 16bit > values > + are considered as rotation of 2N bit values by N bits is generally not > + equivalent to a bswap. Consider for instance 0x01020304 >> 16 which > gives > + 0x03040102 while a bswap for that value is 0x04030201. */ > if (bswap && n->range == 16) > { > tree count = build_int_cst (NULL, BITS_PER_UNIT); > @@ -2393,10 +2404,10 @@ pass_optimize_bswap::execute (function *fun) > gimple_stmt_iterator gsi; > > /* We do a reverse scan for bswap patterns to make sure we get the > - widest match. As bswap pattern matching doesn't handle > - previously inserted smaller bswap replacements as sub- > - patterns, the wider variant wouldn't be detected. */ > - for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi)) > + widest match. As bswap pattern matching doesn't handle previously > + inserted smaller bswap replacements as sub-patterns, the wider > + variant wouldn't be detected. */ > + for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi);) > { > gimple src_stmt, cur_stmt = gsi_stmt (gsi); > tree fndecl = NULL_TREE, bswap_type = NULL_TREE, load_type; > @@ -2404,6 +2415,14 @@ pass_optimize_bswap::execute (function *fun) > struct symbolic_number n; > bool bswap; > > + /* This gsi_prev (&gsi) is not part of the for loop because cur_stmt > + might be moved to a different basic block by bswap_replace and > gsi > + must not points to it if that's the case. Moving the gsi_prev > + there make sure that gsi points to the statement previous to > + cur_stmt while still making sure that all statements are > + considered in this basic block. */ > + gsi_prev (&gsi); > + > if (!is_gimple_assign (cur_stmt)) > continue; > > @@ -2431,6 +2450,9 @@ pass_optimize_bswap::execute (function *fun) > switch (n.range) > { > case 16: > + /* Already in canonical form, nothing to do. */ > + if (code == LROTATE_EXPR || code == RROTATE_EXPR) > + continue; > load_type = uint16_type_node; > if (bswap16_p) > { > @@ -2461,8 +2483,8 @@ pass_optimize_bswap::execute (function *fun) > if (bswap && !fndecl) > continue; > > - if (bswap_replace (cur_stmt, gsi, src_stmt, fndecl, bswap_type, > - load_type, &n, bswap)) > + if (bswap_replace (cur_stmt, src_stmt, fndecl, bswap_type, > load_type, > + &n, bswap)) > changed = true; > } > } > > Is this ok for trunk?
Ok. Thanks, Richard. > Best regards, > > Thomas > > >