http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607
Marc Glisse <marc.glisse at normalesup dot org> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #26938|0 |1 is obsolete| | --- Comment #18 from Marc Glisse <marc.glisse at normalesup dot org> 2012-03-25 13:52:09 UTC --- Created attachment 26979 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26979 default case An updated version of this simple, generic-case shuffle (do note that I didn't run the generated code, just checked that it compiled and the instructions generated looked roughly ok). With the patch, we have (concerning v4df and v8sf): - no single-vector shuffle takes more than 4 insn, - no 2-vector shuffle takes more than 9 insn (or 3 (+ 2 movs for constants...) with AVX2). I think the current code already guarantees than anything that can be done in a single instruction is. Some possible goals (making everything optimal may be a bit hard) would be: - everything that can be done in 2 insn is, - no single-vector v4df takes more than 3 insn, - one or two extra optimizations, if they are generic enough. I do wonder occasionally about allowing wild indexes (jokers, places where you can put anything) in shuffles, whether it is exposed to users or just an internal tool.