http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607
--- Comment #27 from Marc Glisse <marc.glisse at normalesup dot org> 2012-04-09 16:50:47 UTC --- Notes to self (or other): - Intel's SDE makes it possible to test without appropriate hardware; - for V4DF shuffles, there seems to be a very simple generic solution that performs two vperm2f128 and then one vshufpd. permutation (a,b,c,d), input (x,y): t1 = vperm2f128(x,y,(a/2)+16*(c/2)); t2 = vperm2f128(x,y,(b/2)+16*(d/2)); return vshufpd(t1,t2,(a%2)+2*(b%2)+4*(c%2)+8*(d%2)); (when t1 or t2 is equal to x or y, it generates only 2 insn in cases that the current code doesn't detect, like {3,1,2,2})