[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

jakub at gcc dot gnu.org Fri, 07 Dec 2018 01:39:02 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398


--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
No, because that has different behavior.  In particular, it reads all bytes
from 0 to max - 1 from both arrays, rather than stopping at the first one.
If both pointers are aligned the same modulo simd size, then it is in theory
vectorizable just by loading both simd words, comparing all the individual
bytes,
if all of them compare equal, continue looping, otherwise e.g. in a scalar loop
determine the exact one.  Though, if they are different, it is harder, one simd
word would need to be read in the previous iteration of the loop and not read
in the current one until you prove that there isn't any difference.

Essentially, you are looking for reimplementation of memcmp e.g. from glibc
(like the C version in
https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=string/memcmp.c;hb=HEAD
) just with the minor tweak that rather than returning the difference of the
bytes you return the length of the common prefix of both strings and you don't
have the guarantee memcmp has (that all bytes between 0 and max-1 are
accessible).

E.g. glibc x86_64 memcmp unrolled loop looks like:
        movdqu    (%rdi,%rsi), %xmm0
        pcmpeqb   (%rdi), %xmm0
        pmovmskb  %xmm0, %edx
        subl      $0xffff, %edx
        jnz       L(neq)
        addq       $16, %rdi
repeated a couple of times, but that can be done this way only if both pointers
% simd_size are the same (you can even use movdqa after scalar loop for
alignment).  If they aren't, you need to compare separately one set of bytes
first and only if they are all equal you can read the next (aligned) word and
in both cases you need to do the whole vector shifts.

All in all, I'd say you want to have a function for this and implement it
efficiently, rather than hoping the compiler will pattern match inefficient
code and turn it into the smart thing for you, it is way too specialized thing.

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

Reply via email to