http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52572
--- Comment #3 from Marc Glisse <marc.glisse at normalesup dot org> 2012-03-13
17:57:58 UTC ---
Or for this variant:
__m256d f(__m256d *y){
__m256d x=*y;
x[0]=0; // or x[3]
return x;
}
it looks like vmaskmovpd could replace:
vmovapd (%rdi), %ymm0
vmovapd %xmm0, %xmm1
vmovlpd .LC0(%rip), %xmm1, %xmm1
vinsertf128 $0x0, %xmm1, %ymm0, %ymm0
(I tried a version with __builtin_shuffle but it wouldn't generate vmaskmovpd
either)
(sorry for the naive suggestions, there are too many possibilities to optimize
them all...)