https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809
--- Comment #25 from Wilco <wilco at gcc dot gnu.org> --- (In reply to Qing Zhao from comment #24) > From the above, we can see: > even with n is as big as 20, inlined version is much faster than the > non-inlined version, both on aarch64 (no hardware string compare insn > provided) and X86 (hardware string compare insn provided) > > So, it's reasonable to do the inline as much as possible. Those numbers look too good to be true. I get around 2x speed up at n=3 (using old GLIBC header which does the inlining), you get 7x on AArch64 and 21x on x86... In general it's not a good idea to inline too much because of code size bloat and branch mispredictions (a good strcmp implementation processes 8 or 16 chars at a time rather than requiring 1 branch per character). n=4 seems reasonable since you need 3 instructions per character on most targets.