Richard Biener <rguent...@suse.de> writes:
> On June 19, 2019 10:55:16 AM GMT+02:00, Jakub Jelinek <ja...@redhat.com> 
> wrote:
>>Hi!
>>
>>When VEC_[LR]SHIFT_EXPR has been replaced with VEC_PERM_EXPR,
>>vec_shl_optab
>>has been removed as unused, because we only used vec_shr_optab for the
>>reductions.
>>Without this patch the vect-simd-*.c tests can be vectorized just fine
>>for SSE4 and above, but can't be with SSE2.  As the comment in
>>tree-vect-stmts.c tries to explain, for the inclusive scan operation we
>>want (when using V8SImode vectors):
>>       _30 = MEM <vector(8) int> [(int *)&D.2043];
>>       _31 = MEM <vector(8) int> [(int *)&D.2042];
>>       _32 = VEC_PERM_EXPR <_31, _40, { 8, 0, 1, 2, 3, 4, 5, 6 }>;
>>       _33 = _31 + _32;
>> // _33 = { _31[0], _31[0]+_31[1], _31[1]+_31[2], ..., _31[6]+_31[7] };
>>       _34 = VEC_PERM_EXPR <_33, _40, { 8, 9, 0, 1, 2, 3, 4, 5 }>;
>>       _35 = _33 + _34;
>>    // _35 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], _31[0]+.._31[3],
>>       //         _31[1]+.._31[4], ... _31[4]+.._31[7] };
>>       _36 = VEC_PERM_EXPR <_35, _40, { 8, 9, 10, 11, 0, 1, 2, 3 }>;
>>       _37 = _35 + _36;
>>    // _37 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], _31[0]+.._31[3],
>>       //         _31[0]+.._31[4], ... _31[0]+.._31[7] };
>>       _38 = _30 + _37;
>>       _39 = VEC_PERM_EXPR <_38, _38, { 7, 7, 7, 7, 7, 7, 7, 7 }>;
>>       MEM <vector(8) int> [(int *)&D.2043] = _39;
>>       MEM <vector(8) int> [(int *)&D.2042] = _38;  */
>>For V4SImode vectors that would be VEC_PERM_EXPR <x, init, { 4, 0, 1, 2
>>}>,
>>VEC_PERM_EXPR <x2, init, { 4, 5, 0, 1 }> and
>>VEC_PERM_EXPR <x3, init, { 3, 3, 3, 3 }> etc.
>>Unfortunately, SSE2 can't do the VEC_PERM_EXPR <x, init, { 4, 0, 1, 2
>>}>
>>permutation (the other two it can do).  Well, to be precise, it can do
>>it
>>using the vector left shift which has been removed as unused, provided
>>that init is initializer_zerop (shifting all zeros from the left).
>>init usually is all zeros, that is the neutral element of additive
>>reductions and couple of others too, in the unlikely case that some
>>other
>>reduction is used with scan (multiplication, minimum, maximum, bitwise
>>and),
>>we can use a VEC_COND_EXPR with constant first argument, i.e. a blend
>>or
>>and/or.
>>
>>So, this patch reintroduces vec_shl_optab (most backends actually have
>>those
>>patterns already) and handles its expansion and vector generic lowering
>>similarly to vec_shr_optab - i.e. it is a VEC_PERM_EXPR where the first
>>operand is initializer_zerop and third operand starts with a few
>>numbers
>>smaller than number of elements (doesn't matter which one, as all
>>elements
>>are same - zero) followed by nelts, nelts+1, nelts+2, ...
>>Unlike vec_shr_optab which has zero as the second operand, this one has
>>it
>>as first operand, because VEC_PERM_EXPR canonicalization wants to have
>>first element selector smaller than number of elements.  And unlike
>>vec_shr_optab, where we also have a fallback in have_whole_vector_shift
>>using normal permutations, this one doesn't need it, that "fallback" is
>>tried
>>first before vec_shl_optab.
>>
>>For the vec_shl_optab checks, it tests only for constant number of
>>elements
>>vectors, not really sure if our VECTOR_CST encoding can express the
>>left
>>shifts in any way nor whether SVE supports those (I see aarch64 has
>>vec_shl_insert but that is just a fixed shift by element bits and
>>shifts in
>>a scalar rather than zeros).
>>
>>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Ok. 

I think it would be worth instead telling evpc that the second permute
vector is zero.  Permutes with a second vector of zero are somewhat
special for SVE too, and could be in other cases for other targets.

I thought the direction of travel was not to have optabs for specific
kinds of permute any more.  E.g. zip, unzip and blend are common
permutes too, but we decided not to commonise those.

Thanks,
Richard

Reply via email to