[AArch64_be] Don't fold reduction intrinsics.

James Greenhalgh Wed, 30 Jul 2014 03:20:14 -0700

Hi,

Reduction operations are defined in tree.def to


   return a vector of the same type, with the first element in the vector
   holding the result of the reduction of all elements of the operand.  The
   content of the other elements in the returned vector is undefined.

The reduction intrinsics map to AArch64's reduction instructions (addv and
friends). These return their result in architectural lane 0. In GCC's view,
this is at the opposite end of the vector from element 0.

It is therefore not correct to make this fold for BYTES_BIG_ENDIAN.

Tested big/little-endian with no issues on aarch64-none-elf.

OK?

Thanks,
James

---
gcc/

2014-07-28  James Greenhalgh  <[email protected]>

        * config/aarch64/aarch64-builtins.c
        (aarch64_gimple_fold_builtin): Don't fold reduction operations for
        BYTES_BIG_ENDIAN.

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index fee17ec..58db77e 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -1383,6 +1383,20 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   tree call = gimple_call_fn (stmt);
   tree fndecl;
   gimple new_stmt = NULL;
+
+  /* The operations folded below are reduction operations.  These are
+     defined to leave their result in the 0'th element (from the perspective
+     of GCC).  The architectural instruction we are folding will leave the
+     result in the 0'th element (from the perspective of the architecture).
+     For big-endian systems, these perspectives are not aligned.
+
+     It is therefore wrong to perform this fold on big-endian.  There
+     are some tricks we could play with shuffling, but the mid-end is
+     inconsistent in the way it treats reduction operations, so we will
+     end up in difficulty.  Until we fix the ambiguity - just bail out.  */
+  if (BYTES_BIG_ENDIAN)
+    return false;
+
   if (call)
     {
       fndecl = gimple_call_fndecl (stmt);

[AArch64_be] Don't fold reduction intrinsics.

Reply via email to