http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607

--- Comment #27 from Marc Glisse <marc.glisse at normalesup dot org> 2012-04-09 
16:50:47 UTC ---
Notes to self (or other):
- Intel's SDE makes it possible to test without appropriate hardware;
- for V4DF shuffles, there seems to be a very simple generic solution that
performs two vperm2f128 and then one vshufpd.

permutation (a,b,c,d), input (x,y):
t1 = vperm2f128(x,y,(a/2)+16*(c/2));
t2 = vperm2f128(x,y,(b/2)+16*(d/2));
return vshufpd(t1,t2,(a%2)+2*(b%2)+4*(c%2)+8*(d%2));

(when t1 or t2 is equal to x or y, it generates only 2 insn in cases that the
current code doesn't detect, like {3,1,2,2})

Reply via email to