https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64622
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
without loop header copyign we generate
__strcspn_c1:
.LFB0:
.cfi_startproc
xorl %eax, %eax
jmp .L2
.p2align 4,,10
.p2align 3
.L8:
cmpl %esi, %edx
je .L6
addq $1, %rax
.L2:
movsbl (%rdi,%rax), %edx
testb %dl, %dl
jne .L8
.L6:
rep ret
so it would be interesting to investigate how they do this (if it's a special
hack or some systematic fix). The loop header contains just the IV increment
here.