https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563
--- Comment #9 from cqwrteur <unlvsur at live dot com> --- (In reply to cqwrteur from comment #8) > for sse2 to do the __builtin_convertvector job yeah https://godbolt.org/z/dsf3WK58E using temp_vec_type [[__gnu__::__vector_size__ (16)]] = char; void foo4(temp_vec_type& v) noexcept { v=__builtin_shufflevector(v,v,1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14); } This is even more interesting. foo4(char __vector(16)&): # @foo4(char __vector(16)&) movdqa (%rdi), %xmm0 movdqa %xmm0, %xmm1 psrlw $8, %xmm1 psllw $8, %xmm0 por %xmm1, %xmm0 movdqa %xmm0, (%rdi) retq clang generates this. by using ror and or