https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91201
--- Comment #3 from Joel Yliluoma <bisqwit at iki dot fi> --- For the record, for this particular case (8-bit checksum of an array, 16 bytes in this case) there exists even more optimal SIMD code, which ICC (version 18 or greater) generates automatically. vmovups xmm0, XMMWORD PTR bytes[rip] #5.9 vpxor xmm2, xmm2, xmm2 #4.41 vpaddb xmm0, xmm2, xmm0 #4.41 vpsrldq xmm1, xmm0, 8 #4.41 vpaddb xmm3, xmm0, xmm1 #4.41 vpsadbw xmm4, xmm2, xmm3 #4.41 vmovd eax, xmm4 #4.41 movsx rax, al #4.41 ret #7.16