On Wed, Nov 12, 2014 at 6:53 PM, Alan Lawrence <alan.lawre...@arm.com> wrote: > This makes the vectorizer use VEC_PERM_EXPRs when doing reductions via > shifts, rather than VEC_RSHIFT_EXPR. > > VEC_RSHIFT_EXPR presently has an endianness-dependent meaning (paralleling > vec_shr_optab). While the overall destination of this patch series is to > make these endianness-neutral, this patch already feels quite big enough, > hence, here we just switch to using VEC_PERM_EXPRs that have meaning > equivalent to the old VEC_RSHIFT_EXPRs. Since VEC_PERM_EXPR is > endianness-neutral, this means the mask we need to represent the meaning of > the old VEC_RSHIFT_EXPR changes according to endianness. (Patch 4 completes > this journey by removing the BYTES_BIG_ENDIAN-conditional parts; so an > alternative route to the same endpoint, would be to first change > VEC_RSHIFT_EXPR to be endianness-independent, then replace it by > VEC_PERM_EXPRs. I posted such a patch to make VEC_RSHIFT_EXPR independent > https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01475.html and this was what > lead Richi to make his suggestion!) > > The "trick" here is then to look for the platform handling vec_shr_optab > when expanding vec_perm_const *if* the second vector is all constant zeroes > and the vec_perm mask is appropriate. I felt it was best to keep this case > separate from can_vec_perm_p, so the latter only indicates when the target > platform can apply a given permutation to _arbitrary_input_vectors_, as > can_vec_perm_p's interface is already complicated enough without making it > also able to handle cases where some of the vectors-to-be-shuffled are > known. > > A nice side effect of this patch is that aarch64 targets suddenly perform > reductions via shifts even *without* a vec_shr_optab, because > aarch64_vectorize_vec_perm_const_ok looks for shuffle-masks for the EXT > instruction, which can indeed be used to perform a shift :). > > With patch 1, bootstrapped on x86-none-linux-gnu (more testing with patch > 3).
Ok. Thanks, Richard. > gcc/ChangeLog: > > * optabs.c (can_vec_perm_p): Update comment, does not consider > vec_shr. > (shift_amt_for_vec_perm_mask): New. > (expand_vec_perm_1): Use vec_shr_optab if second vector is > const0_rtx > and mask appropriate. > > * tree-vect-loop.c (calc_vec_perm_mask_for_shift): New. > (have_whole_vector_shift): New. > (vect_model_reduction_cost): Call have_whole_vector_shift instead of > looking for vec_shr_optab. > (vect_create_epilog_for_reduction): Likewise; also rename local > variable > have_whole_vector_shift to reduce_with_shift; output VEC_PERM_EXPRs > instead of VEC_RSHIFT_EXPRs. > > * tree-vect-stmts.c (vect_gen_perm_mask_checked): Extend comment.