https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809
--- Comment #23 from Qing Zhao <qing.zhao at oracle dot com> --- I have an implementation for the part C of this task in my private space: part C: for strcmp (s1, s2), strncmp (s1, s2, n): if the result is NOT used to do simple equality test against zero, one of "s1" or "s2" is a small constant string, n is a constant, and the Min value of the length of the constant string and "n" is smaller than a predefined threshold T, inline the call by a byte-to-byte comparision sequence to avoid calling overhead. with this implementation, I was able to measure the performance impact of the inlining transformation on different value of "n", n is the length of the string need to be compared. In order to decide the following two concerns: A. what's the default value of n. B. on a platform that support string compare hardware insns (for exmaple, X86), which should be done first for a call to strcmp/strncmp, the inline or the hardware insns? On both aarch64 and X86, I tried the following small testing case for the performance experiments: qinzhao@gcc116:~/Bugs/78809/const_cmp$ cat t_p.c #include <string.h> char array[]= "fishiiiiiiiiiiiiiiiiiiiiiiiiiiiii"; #define NUM 1000000000 int __attribute__ ((noinline)) cmp2 (const char *p) { int result = 0; int i; for (i=0; i< NUM; i++) { result |= strcmp (p, "fishiiiii"); } return result; } int result = 0; int main() { for (int i = 0; i < 25; i++) result += cmp2 (array); return 0; } and the option I used was: -O -fno-tree-loop-im and the corresponding option to enable or disable the added inlining, the following is the performance result: aarch64 strcmp n= 3 4 5 6 10 20 inline 31 41 62 72 114 242 no-inline 229 229 229 229 272 333 aarch64 strncmp n= 3 4 5 6 10 20 inline 41 62 62 76 125 250 no-inline 291 291 291 291 364 427 X86 strcmp n= 4 5 6 10 20 inline 21 25 31 42 163 no-inline 445 461 488 529 672 X86 strncmp n= 4 5 6 10 20 inline 21 25 28 43 77 no-inline 412 435 442 495 638 From the above, we can see: