https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563

--- Comment #10 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to cqwrteur from comment #9)
> (In reply to cqwrteur from comment #8)
> > for sse2 to do the __builtin_convertvector job yeah
> 
> https://godbolt.org/z/dsf3WK58E
> 
> using temp_vec_type [[__gnu__::__vector_size__ (16)]] = char;
> void foo4(temp_vec_type& v) noexcept
> {
>       v=__builtin_shufflevector(v,v,1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14);
> }
> 
> This is even more interesting.
> 
> foo4(char __vector(16)&): # @foo4(char __vector(16)&)
>   movdqa (%rdi), %xmm0
>   movdqa %xmm0, %xmm1
>   psrlw $8, %xmm1
>   psllw $8, %xmm0
>   por %xmm1, %xmm0
>   movdqa %xmm0, (%rdi)
>   retq
> 
> clang generates this. by using ror and or

This is interesting case, similar for psrld/psrlq + pslld/psllq + or.

Reply via email to