https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96918
--- Comment #5 from Marc Glisse <glisse at gcc dot gnu.org> --- typedef unsigned short v8i16 __attribute__((vector_size(16))); v8i16 bswap_epi16(v8i16 x) { return (x << 8) | (x >> 8); } We do recognize a rotate already in GENERIC return x r<< 8; But this is expanded to movdqa %xmm0, %xmm1 psrlw $8, %xmm0 psllw $8, %xmm1 por %xmm1, %xmm0 probably the target could advertise a rotate insn for that mode, restricted to an argument of 8? IIRC, I didn't use vector extensions for the corresponding shift intrinsics because for large shift amounts they set the result to 0. But for a constant scalar, we could lower the builtin to a shift (or fold to 0).