On Tue, 5 Jan 2016, Thomas Preud'homme wrote:

> Hi,
> 
> bswap optimization pass generate wrong code on big endian targets when the 
> result of a bit operation it analyzed is a partial load of the range of 
> memory 
> accessed by the original expression (when one or more bytes at lowest address 
> were lost in the computation). This is due to the way cmpxchg and cmpnop are 
> adjusted in find_bswap_or_nop before being compared to the result of the 
> symbolic expression. Part of the adjustment is endian independent: it's to 
> ignore the bytes that were not accessed by the original gimple expression. 
> However, when the result has less byte than that original expression, some 
> more byte need to be ignored and this is endian dependent.
> 
> The current code only support loss of bytes at the highest addresses because 
> there is no code to adjust the address of the load. However, for little and 
> big endian targets the bytes at highest address translate into different byte 
> significance in the result. This patch first separate cmpxchg and cmpnop 
> adjustement into 2 steps and then deal with endianness correctly for the 
> second step.
> 
> ChangeLog entries are as follow:
> 
> 
> *** gcc/ChangeLog ***
> 
> 2015-12-16  Thomas Preud'homme  <thomas.preudho...@arm.com>
> 
>         PR tree-optimization/67781
>         * tree-ssa-math-opts.c (find_bswap_or_nop): Zero out bytes in cmpxchg
>         and cmpnop in two steps: first the ones not accessed in original
>         gimple expression in a endian independent way and then the ones not
>         accessed in the final result in an endian-specific way.
> 
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2015-12-16  Thomas Preud'homme  <thomas.preudho...@arm.com>
> 
>         PR tree-optimization/67781
>         * gcc.c-torture/execute/pr67781.c: New file.
> 
> 
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr67781.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr67781.c
> new file mode 100644
> index 0000000..bf50aa2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr67781.c
> @@ -0,0 +1,34 @@
> +#ifdef __UINT32_TYPE__
> +typedef __UINT32_TYPE__ uint32_t;
> +#else
> +typedef unsigned uint32_t;
> +#endif
> +
> +#ifdef __UINT8_TYPE__
> +typedef __UINT8_TYPE__ uint8_t;
> +#else
> +typedef unsigned char uint8_t;
> +#endif
> +
> +struct
> +{
> +  uint32_t a;
> +  uint8_t b;
> +} s = { 0x123456, 0x78 };
> +
> +int pr67781()
> +{
> +  uint32_t c = (s.a << 8) | s.b;
> +  return c;
> +}
> +
> +int
> +main ()
> +{
> +  if (sizeof (uint32_t) * __CHAR_BIT__ != 32)
> +    return 0;
> +
> +  if (pr67781 () != 0x12345678)
> +    __builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> index b00f046..e5a185f 100644
> --- a/gcc/tree-ssa-math-opts.c
> +++ b/gcc/tree-ssa-math-opts.c
> @@ -2441,6 +2441,8 @@ find_bswap_or_nop_1 (gimple *stmt, struct 
> symbolic_number 
> *n, int limit)
>  static gimple *
>  find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap)
>  {
> +  unsigned rsize;
> +  uint64_t tmpn, mask;
>  /* The number which the find_bswap_or_nop_1 result should match in order
>     to have a full byte swap.  The number is shifted to the right
>     according to the size of the symbolic number before using it.  */
> @@ -2464,24 +2466,38 @@ find_bswap_or_nop (gimple *stmt, struct 
> symbolic_number 
> *n, bool *bswap)
>  
>    /* Find real size of result (highest non-zero byte).  */
>    if (n->base_addr)
> -    {
> -      int rsize;
> -      uint64_t tmpn;
> -
> -      for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++);
> -      n->range = rsize;
> -    }
> +    for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++);
> +  else
> +    rsize = n->range;
>  
> -  /* Zero out the extra bits of N and CMP*.  */
> +  /* Zero out the bits corresponding to untouched bytes in original gimple
> +     expression.  */
>    if (n->range < (int) sizeof (int64_t))
>      {
> -      uint64_t mask;
> -
>        mask = ((uint64_t) 1 << (n->range * BITS_PER_MARKER)) - 1;
>        cmpxchg >>= (64 / BITS_PER_MARKER - n->range) * BITS_PER_MARKER;
>        cmpnop &= mask;
>      }
>  
> +  /* Zero out the bits corresponding to unused bytes in the result of the
> +     gimple expression.  */
> +  if (rsize < n->range)
> +    {
> +      if (BYTES_BIG_ENDIAN)
> +     {
> +       mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1;
> +       cmpxchg &= mask;
> +       cmpnop >>= (n->range - rsize) * BITS_PER_MARKER;
> +     }
> +      else
> +     {
> +       mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1;
> +       cmpxchg >>= (n->range - rsize) * BITS_PER_MARKER;
> +       cmpnop &= mask;
> +     }
> +      n->range = rsize;
> +    }
> +
>    /* A complete byte swap should make the symbolic number to start with
>       the largest digit in the highest order byte. Unchanged symbolic
>       number indicates a read with same endianness as target architecture.  */
> 
> 
> 
> Regression testsuite was run on a bootstrapped native x86_64-linux-gnu GCC 
> and 
> on an arm-none-eabi GCC cross-compiler without any regression. I'm waiting 
> for 
> a slot on gcc110 to do a big endian bootstrap but at least the testcase works 
> on mips-linux. I'll send an update once bootstrap is complete.
> 
> Is this ok for trunk and 5 branch in a week time if no regression is reported?

Yes.

Thanks,
Richar.

> Best regards,
> 
> Thomas
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Reply via email to