[Bug target/68655] SSE2 cannot vec_perm of low and high part

rguenther at suse dot de Thu, 03 Dec 2015 01:48:07 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68655

--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 3 Dec 2015, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68655
> 
> Jakub Jelinek <jakub at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|UNCONFIRMED                 |NEW
>    Last reconfirmed|                            |2015-12-03
>      Ever confirmed|0                           |1
> 
> --- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> Well, doing something like that at the optabs.c level wouldn't be really
> helpful, as i?86 has tons of different permutation instructions and for many
> permutations different sequence lengths.
> 
> So, the question is, does any supported CPU have some extra reinterpretation
> costs if we use a different integral vector mode (I believe there is some cost
> for some CPU when reinterpreting an integral vector as float vector and back,
> vice versa, or perhaps even float vector as double vector and vice versa)?
> If not, then the easiest fix is IMHO to change either
> ix86_expand_vec_perm_const_1
> or both
> ix86_expand_vec_perm_const and ix86_vectorize_vec_perm_const_ok
> to detect the case when V*{QI,HI,SI} permutation is doable in a wider unit 
> mode
> same whole vector size mode and just transform it to that case 
> unconditionally.
> If there is some cost, then we'd perhaps should do that at the end of
> expand_vec_perm_1 (if everything else failed for single instruction), but then
> the question is what to do with the 2-5 long sequences, we'd need to repeat
> that at all the other spots.

Older AMD CPUs had "reformatting" costs but only when you apply operations
to vectors that may destroy properties such as whether the value is
a NaN - and the formatting penalty applied only when you then perform
an operation in FP representation on that vector that would care about
this.

So generally I think changing from vector integer modes to
vector integer or float modes of different size and then back
for the purpose of permutation is fine.

Doing this for vector float modes might have an issue depending
on the HW thus using vshufpd on a fload vector.  Practially
the FP state doesn't change unless you shuffle sub-parts of the
float but of course the HW might not be so clever to detect this.

So I think using larger modes or even smaller modes (we already
try chars in optans.c unconditionally (even for float modes?))
for integer vector mode shuffles is ok.  For float vector modes
I would avoid this unless we do more research.

[Bug target/68655] SSE2 cannot vec_perm of low and high part

Reply via email to