https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104610
--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #6)
> (In reply to Hongtao.liu from comment #5)
> > (In reply to Hongtao.liu from comment #4)
> > > (In reply to Hongtao.liu from comment #3)
> > > > (In reply to Hongtao.liu from comment #2)
> > > > > in Gimple, there're
> > > > >
> > > > > _1 = __builtin_memcmp_eq (a_5(D), &t[0], 32);
> > > > > _2 = _1 == 0;
> > > > > _6 = (int) _2;
> > > > >
> > > > >
> > > > > So it's related to codegen optimization with vectorized codes for
> > > > > __builtin_memcmp_eq, guess we can start with size multiple of 16
> > > > > bytes?
> > > > >
> > > > There's no optab or target_hook for backend to participate in
> > > > optimization
> > But there's cbranch_optab check in can_compare_p, and i386 supports
> > V8SI/V4DI/V4SI/V2DI, but not for OI/TI, adding support for them?
> >
> > 25899(define_expand "cbranch<mode>4"
> > 25900 [(set (reg:CC FLAGS_REG)
> > 25901 (compare:CC (match_operand:VI48_AVX 1 "register_operand")
> > 25902 (match_operand:VI48_AVX 2 "nonimmediate_operand")))
> > 25903 (set (pc) (if_then_else
> > 25904 (match_operator 0 "bt_comparison_operator"
> > 25905 [(reg:CC FLAGS_REG) (const_int 0)])
> > 25906 (label_ref (match_operand 3))
>
> After supporting cbranchoi4, gcc generates
>
> _Z1fPc:
> .LFB0:
> .cfi_startproc
> vmovdqa .LC1(%rip), %ymm0
> vpxor (%rdi), %ymm0, %ymm0
> vptest %ymm0, %ymm0
> sete %al
> vzeroupper
>
> which is optimal as clang/llvm does.
Also extend cbranchti to ptest when target_sse4_1 and CODE == NE || CODE == EQ
so gcc generates
movdqu (%rdi), %xmm0
movdqa .LC1(%rip), %xmm1
pxor %xmm1, %xmm0
ptest %xmm0, %xmm0
sete %al
for
bool f128(char *a)
{
char t[] = "012345678901234";
return __builtin_memcmp(a, &t[0], sizeof(t)) == 0;
}
the original codegen is
movabsq $14692989455579448, %rax
xorq 8(%rdi), %rax
movabsq $3978425819141910832, %rdx
xorq (%rdi), %rdx
orq %rdx, %rax
sete %al
ret