https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91201

--- Comment #3 from Joel Yliluoma <bisqwit at iki dot fi> ---
For the record, for this particular case (8-bit checksum of an array, 16 bytes
in this case) there exists even more optimal SIMD code, which ICC (version 18
or greater) generates automatically.

        vmovups   xmm0, XMMWORD PTR bytes[rip]                  #5.9
        vpxor     xmm2, xmm2, xmm2                              #4.41
        vpaddb    xmm0, xmm2, xmm0                              #4.41
        vpsrldq   xmm1, xmm0, 8                                 #4.41
        vpaddb    xmm3, xmm0, xmm1                              #4.41
        vpsadbw   xmm4, xmm2, xmm3                              #4.41
        vmovd     eax, xmm4                                     #4.41
        movsx     rax, al                                       #4.41
        ret                                                     #7.16

Reply via email to