https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563
--- Comment #10 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to cqwrteur from comment #9) > (In reply to cqwrteur from comment #8) > > for sse2 to do the __builtin_convertvector job yeah > > https://godbolt.org/z/dsf3WK58E > > using temp_vec_type [[__gnu__::__vector_size__ (16)]] = char; > void foo4(temp_vec_type& v) noexcept > { > v=__builtin_shufflevector(v,v,1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14); > } > > This is even more interesting. > > foo4(char __vector(16)&): # @foo4(char __vector(16)&) > movdqa (%rdi), %xmm0 > movdqa %xmm0, %xmm1 > psrlw $8, %xmm1 > psllw $8, %xmm0 > por %xmm1, %xmm0 > movdqa %xmm0, (%rdi) > retq > > clang generates this. by using ror and or This is interesting case, similar for psrld/psrlq + pslld/psllq + or.