On Wed, Nov 12, 2014 at 6:53 PM, Alan Lawrence <alan.lawre...@arm.com> wrote:
> This makes the vectorizer use VEC_PERM_EXPRs when doing reductions via
> shifts, rather than VEC_RSHIFT_EXPR.
>
> VEC_RSHIFT_EXPR presently has an endianness-dependent meaning (paralleling
> vec_shr_optab). While the overall destination of this patch series is to
> make these endianness-neutral, this patch already feels quite big enough,
> hence, here we just switch to using VEC_PERM_EXPRs that have meaning
> equivalent to the old VEC_RSHIFT_EXPRs. Since VEC_PERM_EXPR is
> endianness-neutral, this means the mask we need to represent the meaning of
> the old VEC_RSHIFT_EXPR changes according to endianness. (Patch 4 completes
> this journey by removing the BYTES_BIG_ENDIAN-conditional parts; so an
> alternative route to the same endpoint, would be to first change
> VEC_RSHIFT_EXPR to be endianness-independent, then replace it by
> VEC_PERM_EXPRs. I posted such a patch to make VEC_RSHIFT_EXPR independent
> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01475.html and this was what
> lead Richi to make his suggestion!)
>
> The "trick" here is then to look for the platform handling vec_shr_optab
> when expanding vec_perm_const *if* the second vector is all constant zeroes
> and the vec_perm mask is appropriate. I felt it was best to keep this case
> separate from can_vec_perm_p, so the latter only indicates when the target
> platform can apply a given permutation to _arbitrary_input_vectors_, as
> can_vec_perm_p's interface is already complicated enough without making it
> also able to handle cases where some of the vectors-to-be-shuffled are
> known.
>
> A nice side effect of this patch is that aarch64 targets suddenly perform
> reductions via shifts even *without* a vec_shr_optab, because
> aarch64_vectorize_vec_perm_const_ok looks for shuffle-masks for the EXT
> instruction, which can indeed be used to perform a shift :).
>
> With patch 1, bootstrapped on x86-none-linux-gnu (more testing with patch
> 3).

Ok.

Thanks,
Richard.

> gcc/ChangeLog:
>
>         * optabs.c (can_vec_perm_p): Update comment, does not consider
> vec_shr.
>         (shift_amt_for_vec_perm_mask): New.
>         (expand_vec_perm_1): Use vec_shr_optab if second vector is
> const0_rtx
>         and mask appropriate.
>
>         * tree-vect-loop.c (calc_vec_perm_mask_for_shift): New.
>         (have_whole_vector_shift): New.
>         (vect_model_reduction_cost): Call have_whole_vector_shift instead of
>         looking for vec_shr_optab.
>         (vect_create_epilog_for_reduction): Likewise; also rename local
> variable
>         have_whole_vector_shift to reduce_with_shift; output VEC_PERM_EXPRs
>         instead of VEC_RSHIFT_EXPRs.
>
>         * tree-vect-stmts.c (vect_gen_perm_mask_checked): Extend comment.

Reply via email to