http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607
Marc Glisse <marc.glisse at normalesup dot org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #26912|0 |1
is obsolete| |
--- Comment #17 from Marc Glisse <marc.glisse at normalesup dot org> 2012-03-20
21:50:40 UTC ---
Created attachment 26938
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26938
intra-lane shuffle in 3 insn
This (mostly untested) patch is a reformulation of the generic v8sf single
vector shuffle in 4 insn as a generic intra-lane 2 vector shuffle in at most 3
insn. Reformulating __builtin_shuffle(x,m) as
__builtin_shuffle(x,vperm2f128(x,1),mm) would then guarantee a maximum size of
4.
Note that the strategy of doing a 2-vector shuffle by shuffling (not restricted
to one vpermilp*) each vector and blending the results gives a maximum of 9
insn, whereas the current code often generates twice that number.
By the way, I have trouble understanding this comment:
/* For d->op0 == d->op1 the only useful vperm2f128 permutation
is 0x10. */
Is it really 0x10, or is there a stray 0 at the end and it is really just 1?