The end goal here is to remove this code from tree-vect-loop.c (vect_create_epilog_for_reduction):

      if (BYTES_BIG_ENDIAN)
        bitpos = size_binop (MULT_EXPR,
                             bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) - 1),
                             TYPE_SIZE (scalar_type));
      else

as this is the root cause of PR/61114 (see testcase there, failing on all bigendian targets supporting reduc_[us]plus_optab). Quoting Richard Biener, "all code conditional on BYTES/WORDS_BIG_ENDIAN in tree-vect* is suspicious". The code snippet above is used on two paths:

(Path 1) (patches 1-6) Reductions using REDUC_(PLUS|MIN|MAX)_EXPR = reduc_[us](plus|min|max)_optab. The optab is documented as "the scalar result is stored in the least significant bits of operand 0", but the tree code as "the first element in the vector holding the result of the reduction of all elements of the operand". This mismatch means that when the tree code is folded, the code snippet above reads the result from the wrong end of the vector.

The strategy (as per https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html) is to define new tree codes and optabs that produce scalar results directly; this seems better than tying (the element of the vector into which the result is placed) to (the endianness of the target), and avoids generating extra moves on current bigendian targets. However, the previous optabs are retained for now as a migration strategy so as not to break existing backends; moving individual platforms over will follow.

A complication here is on AArch64, where we directly generate REDUC_PLUS_EXPRs from intrinsics in gimple_fold_builtin; I temporarily remove this folding in order to decouple the midend and AArch64 backend.

(Path 2) (patches 7-13) Reductions using whole-vector-shifts, i.e. VEC_RSHIFT_EXPR and vec_shr_optab. Here the tree code as well as the optab is defined in an endianness-dependent way, leading to significant complication in fold-const.c. (Moreover, the "equivalent" vec_shl_optab is never used!). Few platforms appear to handle vec_shr_optab (and fewer bigendian - I see only PowerPC and MIPS), so it seems pertinent to change the existing optab to be endianness-neutral.

Patch 10 defines vec_shr for AArch64, for the old specification; patch 13 updates that implementation to fit the new endianness-neutral specification, serving as a guide for other existing backends. Patches/RFCs 15 and 16 are equivalents for MIPS and PowerPC; I haven't tested these but hope they act as useful pointers for the port maintainers.

Finally patch 14 cleans up the affected part of tree-vect-loop.c (vect_create_epilog_for_reduction).

--Alan

Reply via email to