https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101846

--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
But when upper bits is not used, vpmovdw version seems better. 

v4hi
bar_dw_128 (v8hi x)
{
  return __builtin_shufflevector (x, x, 0, 2, 4, 6);// 4, 5, 6, 7);
}

-       vpshufb .LC2(%rip), %xmm0, %xmm0
+       vpmovdw %xmm0, %xmm0

Reply via email to