https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118662
--- Comment #15 from Uroš Bizjak <ubizjak at gmail dot com> --- The testcase now generates (-O2 -ftree-slp-vectorize -fno-vect-cost-model -msse4): addup: pmovsxbd (%rdi), %xmm0 movd (%rdi), %xmm1 movdqa %xmm0, %xmm2 pextrb $3, %xmm1, %edx ... One possible improvement would be to move QImode value to %xmm1 and sign-extend to %xmm0 from %xmm1. Something like: addup: movd (%rdi), %xmm1 pmovsxbd %xmm1, %xmm0 movdqa %xmm0, %xmm2 pextrb $3, %xmm1, %edx ... This would save memory read.