https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54803
--- Comment #6 from vekumar at gcc dot gnu.org --- (In reply to vekumar from comment #5) > On bdver4 when we enable -march=bdver4 and -mno-prefer-avx128 vectorizes > using YMM > Otherwise uses vprotq instruction. > > .L13: > vmovdqa (%r8,%r9), %ymm0 > incq %rax > vpsrlq $32, %ymm0, %ymm1 > vpsllq $32, %ymm0, %ymm0 > vpor %ymm0, %ymm1, %ymm0 > vmovdqa %ymm0, (%rdx,%r9) > addq $32, %r9 > cmpq %rax, %r10 > ja .L13 This is with trunk gcc version 6.0.0 20150810 (experimental) (GCC)