On Tue, Nov 11, 2014 at 5:58 PM, Thomas Preud'homme
<thomas.preudho...@arm.com> wrote:
> Hi,
>
> The bswap pass iterate over the statements in a basic block and replace them 
> by a bswap, load or load+bswap when equivalent. However, to keep the code 
> equivalent the load need to be perform at the same place as where one of the 
> original load was made so that they have the same VUSE. Consider for example 
> the following:
>
> incorrect_read_le32 (char *x, char *y)
> {
>   unsigned char c0, c1, c2, c3;
>   uint32_t result;
>
>   c0 = x[0];
>   c1 = x[1];
>   c2 = x[2];
>   c3 = x[3];
>   *y = 1;
>   if (c0 %2)
>     result = c0 | c1 << 8 | c2 << 16 | c3 << 24;
>   else
>     result = c1;
>   return result;
> }
>
> On ARM this would be replaced by a 32bit load but this needs to be done 
> before the *y = 1 line. This implies to move the result line and can make it 
> change basic block. This can then lead to problems as the iterator used to 
> iterate over the statements in a basic block still points to the original 
> basic block of this statement while the statement itself points to the new 
> basic block. A second bswap can then copy this wrong basic block information 
> in a newly created gimple statement while it links to statement in another 
> basic block, which is the error observed.
>
> The patch consist in making sure the iterator points to the statement just 
> before the statement being consider for bswap replacement. The patch also 
> adds a number of comments to give the rational why some (potentially 
> surprising things) happen. The testsuite was constructed by using the gimple 
> representation of the binutils file causing the ICE and then reduced with 
> creduce (creduce could not reduce the binutils file directly).
>
> ChangeLog entries are as follows:
>
> *** gcc/ChangeLog ***
>
> 2014-11-09  Thomas Preud'homme  <thomas.preudho...@arm.com>
>
>         PR tree-optimization/63761
>         * tree-ssa-math-opts.c (bswap_replace): Construct gsi from cur_stmt
>         rather than taking it as a parameter. Add some comments to explain the
>         gsi_move_before in case of load and why canonicalization of bswap into
>         a rotation is only done for 16bit values.
>         (pass_optimize_bswap::execute): Adapt for loop via gsi to make gsi
>         refer to the statement just before cur_stmt. Ignore 16bit bswap that
>         are already in canonical form. Adapt bswap_replace to removal of its
>         gsi parameter.
>
>
> *** gcc/testsuite/ChangeLog ***
>
> 2014-11-09  Thomas Preud'homme  <thomas.preudho...@arm.com>
>
>         PR tree-optimization/63761
>         * /gcc.c-torture/compile/pr63761.c: New test.
>
> Testing done:
> * GCC with the patch was boostrapped on x86_64-linux-gnu and the testsuite 
> ran without any regression.
> * Equally GCC with the patch was built for aarch64-none-linux-gnu target and 
> the tests ran without regression.
>
>
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr63761.c 
> b/gcc/testsuite/gcc.c-torture/compile/pr63761.c
> new file mode 100644
> index 0000000..5cda3f1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr63761.c
> @@ -0,0 +1,17 @@
> +int a, b;
> +short c;
> +
> +void fn1 ();
> +
> +void
> +fn2 (unsigned short p1)
> +{
> +  int d;
> +
> +  c = p1 >> 8 | p1 << 8;
> +  d = b;
> +  if (d)
> +    fn1 ();
> +  a = d >> 8 & 0x00FF
> +    | d << 8 & 0xFF00;
> +}
> diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> index 12fa4e8..aab056c 100644
> --- a/gcc/tree-ssa-math-opts.c
> +++ b/gcc/tree-ssa-math-opts.c
> @@ -2172,23 +2172,28 @@ public:
>
>  }; // class pass_optimize_bswap
>
> -/* Perform the bswap optimization: replace the statement CUR_STMT at
> -   GSI with a load of type, VUSE and set-alias as described by N if a
> -   memory source is involved (N->base_addr is non null), followed by
> -   the builtin bswap invocation in FNDECL if BSWAP is true.  SRC_STMT
> -   gives where should the replacement be made.  It also gives the
> -   source on which CUR_STMT is operating via its rhs's first tree nad
> -   N->range gives the size of the expression involved for maintaining
> -   some statistics.  */
> +/* Perform the bswap optimization: replace the expression computed in the rhs
> +   of CUR_STMT by an equivalent bswap, load or load + bswap expression.
> +   Which of these alternatives replace the rhs is given by N->base_addr (non
> +   null if a load is needed) and BSWAP.  The type, VUSE and set-alias of the
> +   load to perform are also given in N while the builtin bswap invoke is 
> given
> +   in FNDEL.  Finally, if a load is involved, SRC_STMT refers to one of the
> +   load statements involved to construct the rhs in CUR_STMT and N->range 
> gives
> +   the size of the rhs expression for maintaining some statistics.
> +
> +   Note that if the replacement involve a load, CUR_STMT is moved just after
> +   SRC_STMT to do the load with the same VUSE which can lead to CUR_STMT
> +   changing of basic block.  */
>
>  static bool
> -bswap_replace (gimple cur_stmt, gimple_stmt_iterator gsi, gimple src_stmt,
> -              tree fndecl, tree bswap_type, tree load_type,
> -              struct symbolic_number *n, bool bswap)
> +bswap_replace (gimple cur_stmt, gimple src_stmt, tree fndecl, tree 
> bswap_type,
> +              tree load_type, struct symbolic_number *n, bool bswap)
>  {
> +  gimple_stmt_iterator gsi;
>    tree src, tmp, tgt;
>    gimple bswap_stmt;
>
> +  gsi = gsi_for_stmt (cur_stmt);
>    src = gimple_assign_rhs1 (src_stmt);
>    tgt = gimple_assign_lhs (cur_stmt);
>
> @@ -2207,6 +2212,9 @@ bswap_replace (gimple cur_stmt, gimple_stmt_iterator 
> gsi, gimple src_stmt,
>           && SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align))
>         return false;
>
> +      /* Move cur_stmt just before  one of the load of the original
> +        to ensure it has the same VUSE.  See PR61517 for what could
> +        go wrong.  */
>        gsi_move_before (&gsi, &gsi_ins);
>        gsi = gsi_for_stmt (cur_stmt);
>
> @@ -2293,7 +2301,10 @@ bswap_replace (gimple cur_stmt, gimple_stmt_iterator 
> gsi, gimple src_stmt,
>
>    tmp = src;
>
> -  /* Canonical form for 16 bit bswap is a rotate expression.  */
> +  /* Canonical form for 16 bit bswap is a rotate expression.  Only 16bit 
> values
> +     are considered as rotation of 2N bit values by N bits is generally not
> +     equivalent to a bswap.  Consider for instance 0x01020304 >> 16 which 
> gives
> +     0x03040102 while a bswap for that value is 0x04030201.  */
>    if (bswap && n->range == 16)
>      {
>        tree count = build_int_cst (NULL, BITS_PER_UNIT);
> @@ -2393,10 +2404,10 @@ pass_optimize_bswap::execute (function *fun)
>        gimple_stmt_iterator gsi;
>
>        /* We do a reverse scan for bswap patterns to make sure we get the
> -        widest match. As bswap pattern matching doesn't handle
> -        previously inserted smaller bswap replacements as sub-
> -        patterns, the wider variant wouldn't be detected.  */
> -      for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
> +        widest match. As bswap pattern matching doesn't handle previously
> +        inserted smaller bswap replacements as sub-patterns, the wider
> +        variant wouldn't be detected.  */
> +      for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi);)
>          {
>           gimple src_stmt, cur_stmt = gsi_stmt (gsi);
>           tree fndecl = NULL_TREE, bswap_type = NULL_TREE, load_type;
> @@ -2404,6 +2415,14 @@ pass_optimize_bswap::execute (function *fun)
>           struct symbolic_number n;
>           bool bswap;
>
> +         /* This gsi_prev (&gsi) is not part of the for loop because cur_stmt
> +            might be moved to a different basic block by bswap_replace and 
> gsi
> +            must not points to it if that's the case.  Moving the gsi_prev
> +            there make sure that gsi points to the statement previous to
> +            cur_stmt while still making sure that all statements are
> +            considered in this basic block.  */
> +         gsi_prev (&gsi);
> +
>           if (!is_gimple_assign (cur_stmt))
>             continue;
>
> @@ -2431,6 +2450,9 @@ pass_optimize_bswap::execute (function *fun)
>           switch (n.range)
>             {
>             case 16:
> +             /* Already in canonical form, nothing to do.  */
> +             if (code == LROTATE_EXPR || code == RROTATE_EXPR)
> +               continue;
>               load_type = uint16_type_node;
>               if (bswap16_p)
>                 {
> @@ -2461,8 +2483,8 @@ pass_optimize_bswap::execute (function *fun)
>           if (bswap && !fndecl)
>             continue;
>
> -         if (bswap_replace (cur_stmt, gsi, src_stmt, fndecl, bswap_type,
> -                            load_type, &n, bswap))
> +         if (bswap_replace (cur_stmt, src_stmt, fndecl, bswap_type, 
> load_type,
> +                            &n, bswap))
>             changed = true;
>         }
>      }
>
> Is this ok for trunk?

Ok.

Thanks,
Richard.

> Best regards,
>
> Thomas
>
>
>

Reply via email to