https://gcc.gnu.org/bugzilla/show_bug.cgi?id=27077
Helmut Schellong <var at schellong dot biz> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |var at schellong dot biz --- Comment #6 from Helmut Schellong <var at schellong dot biz> --- gcc 6.2 ....... builtin strlen: 137.787 ns libc strlen: 12.562 ns (-fno-builtin-strlen) About 12 times faster! See table below. The slow functions match in the use of string instructions. E.g. libc: memcmp.S (assembler) and so on. Especially compare functions return early at most on the first .. third char! memcmp/memcmp_F 38.198 2.707 [ns] ta/tb = 14.11 Bem.: abcd, AbcD, 2 Core2Duo3333/FreeBSD_10.1 memcmp/memcmp_F 42.712 4.211 [ns] ta/tb = 10.14 Bem.: abcd, abcD, 5 Core2Duo3333/FreeBSD_10.1 memcmp/memcmp_F 52.509 8.725 [ns] ta/tb = 6.02 Bem.: a-h, a-H, 9 Core2Duo3333/FreeBSD_10.1 memcmp/memcmp_F 27.335 18.058 [ns] ta/tb = 1.51 Bem.: a-z, a-Z, 27 Core2Duo3333/FreeBSD_10.1 memcmp/memcmp_F 47.842 33.706 [ns] ta/tb = 1.42 Bem.: a-z, a-Z, 53 Core2Duo3333/FreeBSD_10.1 memcmp/memcmp_F 67.691 71.879 [ns] ta/tb = 0.94 Bem.: a-z, a-Z, 105 Core2Duo3333/FreeBSD_10.1 memset/memset8 27.088 7.521 [ns] ta/tb = 3.60 | 10 memset/memset8 58.958 8.421 [ns] ta/tb = 7.00 | 20 memset/memset8 35.186 6.863 [ns] ta/tb = 5.13 | 30 memset/memset8 58.961 10.830 [ns] ta/tb = 5.44 | 50 memset/memset8 58.980 17.449 [ns] ta/tb = 3.38 | 100 memset/memset8 39.655 24.378 [ns] ta/tb = 1.63 | 256 memset/memset8 76.700 53.598 [ns] ta/tb = 1.43 | 1000 memset/memset8 228.024 204.360 [ns] ta/tb = 1.12 | 5000 memset/memset8 415.862 391.599 [ns] ta/tb = 1.06 | 10000 memset/memset8 5675.480 5859.560 [ns] ta/tb = 0.97 | 100000 memset/memset8 5667.370 5993.000 [ns] ta/tb = 0.95 Bem.: arr+3, 100000 Core2Duo3333/FreeBSD_10.1 memcpy/memcpy8 50568.300 47217.600 [ns] ta/tb = 1.07 Bem.: arr+3, arr+1, 100000 Core2Duo3333/FreeBSD_10.1 memcpy/memcpy8 50603.000 8896.300 [ns] ta/tb = 5.69 Bem.: arr+3, arr+3, 100000 Core2Duo3333/FreeBSD_10.1 memcpy/memcpy8 9231.980 9028.020 [ns] ta/tb = 1.02 Bem.: arr+0, arr+0, 100000 Core2Duo3333/FreeBSD_10.1 memcpy/memcpy8 15.343 17.753 [ns] ta/tb = 0.86 Bem.: arr+3, arr+1, 10 Core2Duo3333/FreeBSD_10.1 memcpy/memcpy8 42.720 27.385 [ns] ta/tb = 1.56 Bem.: arr+3, arr+1, 20 Core2Duo3333/FreeBSD_10.1 memcpy/memcpy8 25.781 37.901 [ns] ta/tb = 0.68 Bem.: arr+3, arr+1, 50 Core2Duo3333/FreeBSD_10.1 memcpy/memcpy8 78.229 64.093 [ns] ta/tb = 1.22 Bem.: arr+3, arr+1, 100 Core2Duo3333/FreeBSD_10.1 memcpy/memcpy8 75.217 18.964 [ns] ta/tb = 3.97 Bem.: arr+3, arr+3, 100 Core2Duo3333/FreeBSD_10.1 memcmp/memcmp8 37.584 3.309 [ns] ta/tb = 11.36 Bem.: arr+0, arr+0, 2 Core2Duo3333/FreeBSD_10.1 memcmp/memcmp8 42.121 5.506 [ns] ta/tb = 7.65 Bem.: arr+0, arr+0, 5 Core2Duo3333/FreeBSD_10.1 memcmp/memcmp8 43.927 7.518 [ns] ta/tb = 5.84 Bem.: arr+0, arr+0, 27 Core2Duo3333/FreeBSD_10.1 memcmp/memcmp8 49.618 11.421 [ns] ta/tb = 4.34 Bem.: arr+0, arr+0, 53 Core2Duo3333/FreeBSD_10.1 memcmp/memcmp8 67.068 20.441 [ns] ta/tb = 3.28 Bem.: arr+0, arr+0, 105 Core2Duo3333/FreeBSD_10.1 memcmp/memcmp8 67.700 25.281 [ns] ta/tb = 2.68 Bem.: arr+3, arr+0, 105 Core2Duo3333/FreeBSD_10.1 memcmp/memcmp8 71.608 25.272 [ns] ta/tb = 2.83 Bem.: arr+3, arr+1, 105 Core2Duo3333/FreeBSD_10.1 memcmp/memcmp8 15088.468 10363.376 [ns] ta/tb = 1.46 Bem.: arr+0, arr+0, 100002 Core2Duo3333/FreeBSD_10.1 memcmp/memcmp8 151610.600 104768.150 [ns] ta/tb = 1.45 Bem.: arr+0, arr+0, 1000002 Core2Duo3333/FreeBSD_10.1 memcmp/memcmp8 13964.650 10280.200 [ns] ta/tb = 1.36 Bem.: arr+0, arr+0, 100002 Core2Duo3333/FreeBSD_10.1 strlen/strlen_F 11648.990 38192.520 [ns] ta/tb = 0.31 Bem.: string 100000 Core2Duo3333/FreeBSD_10.1 strlen/strlen_F 137.787 46.930 [ns] ta/tb = 2.94 Bem.: string 100; builtin rep scasb Core2Duo3333/FreeBSD_10.1 strlen/strlen_F 12.562 43.338 [ns] ta/tb = 0.29 Bem.: string 100 Core2Duo3333/FreeBSD_10.1 strlen/strlen_F 5.323 12.635 [ns] ta/tb = 0.42 Bem.: string 20 Core2Duo3333/FreeBSD_10.1 Mathematical method: t = 2t+L - (t+L) ; L=looptime