https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #25 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Qing Zhao from comment #24)
> From the above, we can see:
>     even with n is as big as 20, inlined version is much faster than the
> non-inlined version, both on aarch64 (no hardware string compare insn
> provided) and X86 (hardware string compare insn provided)
> 
> So, it's reasonable to do the inline as much as possible.

Those numbers look too good to be true. I get around 2x speed up at n=3 (using
old GLIBC header which does the inlining), you get 7x on AArch64 and 21x on
x86...

In general it's not a good idea to inline too much because of code size bloat
and branch mispredictions (a good strcmp implementation processes 8 or 16 chars
at a time rather than requiring 1 branch per character). n=4 seems reasonable
since you need 3 instructions per character on most targets.

Reply via email to