https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398
--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> --- No, because that has different behavior. In particular, it reads all bytes from 0 to max - 1 from both arrays, rather than stopping at the first one. If both pointers are aligned the same modulo simd size, then it is in theory vectorizable just by loading both simd words, comparing all the individual bytes, if all of them compare equal, continue looping, otherwise e.g. in a scalar loop determine the exact one. Though, if they are different, it is harder, one simd word would need to be read in the previous iteration of the loop and not read in the current one until you prove that there isn't any difference. Essentially, you are looking for reimplementation of memcmp e.g. from glibc (like the C version in https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=string/memcmp.c;hb=HEAD ) just with the minor tweak that rather than returning the difference of the bytes you return the length of the common prefix of both strings and you don't have the guarantee memcmp has (that all bytes between 0 and max-1 are accessible). E.g. glibc x86_64 memcmp unrolled loop looks like: movdqu (%rdi,%rsi), %xmm0 pcmpeqb (%rdi), %xmm0 pmovmskb %xmm0, %edx subl $0xffff, %edx jnz L(neq) addq $16, %rdi repeated a couple of times, but that can be done this way only if both pointers % simd_size are the same (you can even use movdqa after scalar loop for alignment). If they aren't, you need to compare separately one set of bytes first and only if they are all equal you can read the next (aligned) word and in both cases you need to do the whole vector shifts. All in all, I'd say you want to have a function for this and implement it efficiently, rather than hoping the compiler will pattern match inefficient code and turn it into the smart thing for you, it is way too specialized thing.