I start with simplest suggestion which is precomputing constant arguments like saving multiplication cost in strchr with:
char *strchr_c(char *x, unsigned long u); #define strchr(x,c) \ (__builtin_constant_p(c) ? strchr_c (x, c * (~0ULL / 255)) : strchr (x,c)) Then I am working on using constant n for memset and memcpy. These cannot be done in gcc alone as you need to choose implementation based on cpu and for different sizes different are best for different cpu. Some users try to always do inlining like in rte_memcpy. That works better than gcc one as its optimized for newer processors. For sizes beyond 64 bytes trying to fully expand memcpy and memset doesn't make lot of sense as libcall is faster. To get benefits of inlining I now work on following approach. For sizes < 64 use builtin. For n 64-1024 make indirect jump according to cpu-specific table that you get from libc. That would allow do unrolling upto size 1024 into sequence of movsqa's without increasing cache footprint much. Same with memset(x,0,n) except you need to pass 0.0 argument to have zero xmm register. Entry point for aligned input doesn't make lot of sense. As input is short you want to go into copy of header and you save difference between aligned/unaligned load and crosspage check. As you need to duplicate header icache cost could be bigger. What makes sense is inline headers instead expanding whole function. I am looking at following expansion of strcmp/memcmp: int inline_strcmp (const char *x, const char *y) { int r = *((unsigned char *) x) - *((unsigned char *) y); return r ? r : strcmp(x, y); } int inline_memcmp (const void *x, const void *y, size_t n) { if (n == 0) return 0; int r = *((unsigned char *) x) - *((unsigned char *) y); return r ? r : memcmp(x + 1, y + 1, n - 1); } Note that end is not tested as its unlikely. Same transformation could be done for strncmp, strcasecmp and strncasecmp but we at libc would need to improve tls access of tolower which now requires call which defeats purpose of inline. That gives considerable savings as in my profile 32.4% calls of strcmp and calls of 49.5% differ in first byte. From profiling data these branches are almost completely predictable as I see long sequences of calls that differ at 0 followed by sequence that differ in other. From programs measured it could harm only make. See attached data. On x64 adding match for first 16 bytes using sse would also make sense. except make all other programs have 90% of calls differ in first 16 bytes. Same could be done for strchr/memchr headers where first 16 bytes also form majority. In case of make we should check if in strchr(x,'/') we have x[0] == '/' which happens 85.1% times. In generic case same header would be bigger so question if its profitable versus code size becomes more significant. For similar questions I have on todo list add counters for userspace profiling. Decision if some optimization is profitable depends on details like average size of input that cannot be directly determined from profile. For example in strstr we would need digraph that occurs least often. I don't know if that could be integrated into -fprofile-generate -fprofile-use or done before that as it would change control flow or do it just by macros. If we could convince people to do compilation with profiling it would also allow to directly precompute tables like below without large header hacks, and make things like calculating perfect hashing possible without external tools. For precomputed tables I so far know two use-cases One case would be memchr("abc",x,3) or strchr("abc",x) pattern. I found that in libc to test membership which is obviously ineffective. Second use case is strpbrk family. These have in common that they could benefit from precomputed table with 1 for present bytes and 0 otherwise. While I could create such table I couldn't do that without 256 warnings. Following constructs table just fine but complains warning: initializer element is not a constant expression int main() { static char x[256] = {strchr("aaa", 'a') == NULL, strchr("aaa", 'b') == NULL}; printf("%i %i %s", x[0],x[1], x); } Same trick could be used for making bitwise array. Also its weird what you could and cannot do in static initializers. I was surprised that I could use strchr but couldn't evalutate "abc"[2] as 'c'. When bug above gets fixed that allows these functions to be lot faster, as most of time you get match in first 8 bytes.
Statistic of comparison routines collected with dryrun, for source see kam.mff.cuni.cz/~ondra/dryrun.tar.bz2 summary strcmp: replaying ls average size 0.2 calls 246 succeed 93.1% latencies 1.1 2.8 s1 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% s2 aligned to 4 bytes 21.5% aligned to 8 bytes 10.2% aligned to 16 bytes 3.7% s1-s2 aligned to 4 bytes 21.5% aligned to 8 bytes 10.2% aligned to 16 bytes 3.7% n <= 0: 88.2% n <= 1: 93.5% n <= 2: 100.0% n <= 3: 100.0% n <= 4: 100.0% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying bash average size 4.0 calls 711 succeed 57.7% latencies -4.6 -4.6 s1 aligned to 4 bytes 65.4% aligned to 8 bytes 56.0% aligned to 16 bytes 2.0% s2 aligned to 4 bytes 58.6% aligned to 8 bytes 50.8% aligned to 16 bytes 3.4% s1-s2 aligned to 4 bytes 49.6% aligned to 8 bytes 39.9% aligned to 16 bytes 37.1% n <= 0: 0.1% n <= 1: 49.9% n <= 2: 60.6% n <= 3: 64.3% n <= 4: 71.6% n <= 8: 81.3% n <= 16: 99.4% n <= 32: 100.0% n <= 64: 100.0% replaying dircolors average size 1.0 calls 54 succeed 96.3% latencies -5.1 -6.1 s1 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 98.1% s2 aligned to 4 bytes 1.9% aligned to 8 bytes 1.9% aligned to 16 bytes 1.9% s1-s2 aligned to 4 bytes 1.9% aligned to 8 bytes 1.9% aligned to 16 bytes 1.9% n <= 0: 87.0% n <= 1: 87.0% n <= 2: 87.0% n <= 3: 87.0% n <= 4: 88.9% n <= 8: 94.4% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying ps average size 1.7 calls 239 succeed 87.0% latencies 3.6 16.0 s1 aligned to 4 bytes 94.1% aligned to 8 bytes 88.3% aligned to 16 bytes 88.3% s2 aligned to 4 bytes 28.0% aligned to 8 bytes 13.4% aligned to 16 bytes 11.3% s1-s2 aligned to 4 bytes 28.5% aligned to 8 bytes 13.0% aligned to 16 bytes 12.6% n <= 0: 58.6% n <= 1: 77.0% n <= 2: 77.4% n <= 3: 81.6% n <= 4: 84.1% n <= 8: 94.1% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying ssh-add average size 11.5 calls 221 succeed 0.9% latencies 1.4 0.8 s1 aligned to 4 bytes 97.3% aligned to 8 bytes 97.3% aligned to 16 bytes 97.3% s2 aligned to 4 bytes 92.8% aligned to 8 bytes 91.4% aligned to 16 bytes 91.0% s1-s2 aligned to 4 bytes 94.6% aligned to 8 bytes 93.2% aligned to 16 bytes 92.8% n <= 0: 1.4% n <= 1: 1.4% n <= 2: 1.8% n <= 3: 12.7% n <= 4: 13.6% n <= 8: 29.0% n <= 16: 83.3% n <= 32: 100.0% n <= 64: 100.0% replaying ssh-keygen average size 11.5 calls 222 succeed 0.9% latencies 1.7 2.0 s1 aligned to 4 bytes 97.3% aligned to 8 bytes 97.3% aligned to 16 bytes 97.3% s2 aligned to 4 bytes 92.3% aligned to 8 bytes 91.0% aligned to 16 bytes 90.5% s1-s2 aligned to 4 bytes 94.1% aligned to 8 bytes 92.8% aligned to 16 bytes 92.3% n <= 0: 1.4% n <= 1: 1.4% n <= 2: 1.8% n <= 3: 12.6% n <= 4: 13.5% n <= 8: 29.3% n <= 16: 83.3% n <= 32: 100.0% n <= 64: 100.0% replaying mc average size 7.3 calls 16244 succeed 62.2% latencies -182.0 -181.9 s1 aligned to 4 bytes 95.6% aligned to 8 bytes 95.3% aligned to 16 bytes 95.3% s2 aligned to 4 bytes 80.4% aligned to 8 bytes 78.6% aligned to 16 bytes 77.3% s1-s2 aligned to 4 bytes 79.6% aligned to 8 bytes 78.2% aligned to 16 bytes 76.9% n <= 0: 28.6% n <= 1: 32.1% n <= 2: 35.6% n <= 3: 43.6% n <= 4: 48.4% n <= 8: 61.3% n <= 16: 87.1% n <= 32: 99.7% n <= 64: 99.9% replaying killall average size 0.1 calls 281 succeed 99.3% latencies 10.9 0.5 s1 aligned to 4 bytes 0.4% aligned to 8 bytes 0.4% aligned to 16 bytes 0.4% s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% s1-s2 aligned to 4 bytes 0.4% aligned to 8 bytes 0.4% aligned to 16 bytes 0.4% n <= 0: 97.5% n <= 1: 99.6% n <= 2: 99.6% n <= 3: 99.6% n <= 4: 99.6% n <= 8: 99.6% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying iceweasel average size 5.8 calls 13136 succeed 86.7% latencies -39.2 -33.5 s1 aligned to 4 bytes 32.5% aligned to 8 bytes 14.5% aligned to 16 bytes 7.6% s2 aligned to 4 bytes 31.8% aligned to 8 bytes 16.7% aligned to 16 bytes 10.9% s1-s2 aligned to 4 bytes 28.6% aligned to 8 bytes 14.1% aligned to 16 bytes 6.8% n <= 0: 33.0% n <= 1: 41.5% n <= 2: 45.8% n <= 3: 54.4% n <= 4: 58.6% n <= 8: 68.4% n <= 16: 92.3% n <= 32: 99.9% n <= 64: 100.0% replaying mutt average size 28.3 calls 27644 succeed 39.4% latencies -157.4 -134.1 s1 aligned to 4 bytes 99.8% aligned to 8 bytes 73.0% aligned to 16 bytes 73.0% s2 aligned to 4 bytes 85.0% aligned to 8 bytes 61.2% aligned to 16 bytes 59.0% s1-s2 aligned to 4 bytes 84.9% aligned to 8 bytes 76.4% aligned to 16 bytes 74.3% n <= 0: 19.0% n <= 1: 33.3% n <= 2: 35.0% n <= 3: 35.8% n <= 4: 37.2% n <= 8: 39.2% n <= 16: 40.1% n <= 32: 56.7% n <= 64: 89.3% replaying irb average size 3.1 calls 10058 succeed 39.2% latencies -102.7 -98.0 s1 aligned to 4 bytes 0.3% aligned to 8 bytes 0.3% aligned to 16 bytes 0.1% s2 aligned to 4 bytes 21.4% aligned to 8 bytes 8.2% aligned to 16 bytes 4.4% s1-s2 aligned to 4 bytes 41.6% aligned to 8 bytes 28.5% aligned to 16 bytes 13.0% n <= 0: 2.0% n <= 1: 9.2% n <= 2: 33.5% n <= 3: 74.8% n <= 4: 84.7% n <= 8: 99.9% n <= 16: 99.9% n <= 32: 100.0% n <= 64: 100.0% replaying vim average size 1.5 calls 161275 succeed 84.9% latencies 105.3 124.3 s1 aligned to 4 bytes 75.5% aligned to 8 bytes 71.3% aligned to 16 bytes 70.2% s2 aligned to 4 bytes 47.0% aligned to 8 bytes 41.2% aligned to 16 bytes 39.8% s1-s2 aligned to 4 bytes 45.2% aligned to 8 bytes 39.4% aligned to 16 bytes 37.2% n <= 0: 54.1% n <= 1: 73.1% n <= 2: 81.8% n <= 3: 86.7% n <= 4: 90.6% n <= 8: 96.7% n <= 16: 99.3% n <= 32: 100.0% n <= 64: 100.0% replaying ar average size 0.2 calls 1000000 succeed 99.9% latencies 5.0 4.8 s1 aligned to 4 bytes 25.0% aligned to 8 bytes 13.0% aligned to 16 bytes 6.1% s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% s1-s2 aligned to 4 bytes 25.0% aligned to 8 bytes 13.0% aligned to 16 bytes 6.1% n <= 0: 90.9% n <= 1: 97.6% n <= 2: 98.3% n <= 3: 99.6% n <= 4: 99.7% n <= 8: 99.9% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying make average size 30.1 calls 1000000 succeed 98.7% latencies 1.2 1.6 s1 aligned to 4 bytes 28.4% aligned to 8 bytes 20.7% aligned to 16 bytes 9.8% s2 aligned to 4 bytes 26.6% aligned to 8 bytes 18.8% aligned to 16 bytes 7.7% s1-s2 aligned to 4 bytes 22.2% aligned to 8 bytes 12.3% aligned to 16 bytes 4.5% n <= 0: 4.2% n <= 1: 4.2% n <= 2: 5.3% n <= 3: 5.3% n <= 4: 5.3% n <= 8: 5.3% n <= 16: 8.9% n <= 32: 77.8% n <= 64: 100.0% replaying /usr/lib/gcc/x86_64-linux-gnu/4.9/cc1 average size 5.8 calls 15151 succeed 37.9% latencies 2.9 -2.6 s1 aligned to 4 bytes 40.7% aligned to 8 bytes 37.0% aligned to 16 bytes 36.0% s2 aligned to 4 bytes 97.1% aligned to 8 bytes 96.8% aligned to 16 bytes 45.7% s1-s2 aligned to 4 bytes 40.3% aligned to 8 bytes 36.6% aligned to 16 bytes 35.1% n <= 0: 12.5% n <= 1: 14.5% n <= 2: 15.0% n <= 3: 58.5% n <= 4: 68.1% n <= 8: 80.2% n <= 16: 94.6% n <= 32: 98.0% n <= 64: 100.0% replaying gcc average size 0.5 calls 235 succeed 93.6% latencies 2.9 4.0 s1 aligned to 4 bytes 30.2% aligned to 8 bytes 17.0% aligned to 16 bytes 9.4% s2 aligned to 4 bytes 5.5% aligned to 8 bytes 4.7% aligned to 16 bytes 4.7% s1-s2 aligned to 4 bytes 25.1% aligned to 8 bytes 19.1% aligned to 16 bytes 11.1% n <= 0: 74.9% n <= 1: 92.3% n <= 2: 93.2% n <= 3: 94.5% n <= 4: 98.7% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying /bin/bash average size 5.8 calls 2108 succeed 37.3% latencies -39.0 -39.5 s1 aligned to 4 bytes 71.4% aligned to 8 bytes 54.8% aligned to 16 bytes 2.6% s2 aligned to 4 bytes 59.3% aligned to 8 bytes 43.8% aligned to 16 bytes 1.8% s1-s2 aligned to 4 bytes 50.9% aligned to 8 bytes 39.0% aligned to 16 bytes 35.7% n <= 0: 0.1% n <= 1: 34.4% n <= 2: 45.2% n <= 3: 47.9% n <= 4: 59.3% n <= 8: 68.5% n <= 16: 99.1% n <= 32: 100.0% n <= 64: 100.0% replaying /usr/bin/lsof average size 9.4 calls 56 succeed 33.9% latencies 29.8 29.5 s1 aligned to 4 bytes 98.2% aligned to 8 bytes 98.2% aligned to 16 bytes 98.2% s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% s1-s2 aligned to 4 bytes 98.2% aligned to 8 bytes 98.2% aligned to 16 bytes 98.2% n <= 0: 1.8% n <= 1: 30.4% n <= 2: 30.4% n <= 3: 30.4% n <= 4: 37.5% n <= 8: 55.4% n <= 16: 78.6% n <= 32: 100.0% n <= 64: 100.0% replaying find average size 0.2 calls 297 succeed 96.3% latencies -0.9 -9.5 s1 aligned to 4 bytes 26.6% aligned to 8 bytes 15.8% aligned to 16 bytes 9.8% s2 aligned to 4 bytes 20.2% aligned to 8 bytes 0.3% aligned to 16 bytes 0.3% s1-s2 aligned to 4 bytes 31.3% aligned to 8 bytes 16.2% aligned to 16 bytes 8.8% n <= 0: 93.6% n <= 1: 97.0% n <= 2: 97.3% n <= 3: 97.6% n <= 4: 98.3% n <= 8: 99.7% n <= 16: 99.7% n <= 32: 100.0% n <= 64: 100.0% replaying pager average size 0.8 calls 116 succeed 94.8% latencies -18.6 -18.6 s1 aligned to 4 bytes 93.1% aligned to 8 bytes 92.2% aligned to 16 bytes 91.4% s2 aligned to 4 bytes 7.8% aligned to 8 bytes 7.8% aligned to 16 bytes 6.9% s1-s2 aligned to 4 bytes 6.0% aligned to 8 bytes 5.2% aligned to 16 bytes 5.2% n <= 0: 75.0% n <= 1: 86.2% n <= 2: 87.9% n <= 3: 89.7% n <= 4: 94.0% n <= 8: 98.3% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying man average size 1.0 calls 1723 succeed 97.6% latencies 1.6 -13.4 s1 aligned to 4 bytes 37.8% aligned to 8 bytes 26.7% aligned to 16 bytes 19.4% s2 aligned to 4 bytes 56.3% aligned to 8 bytes 47.8% aligned to 16 bytes 38.2% s1-s2 aligned to 4 bytes 34.5% aligned to 8 bytes 23.6% aligned to 16 bytes 18.7% n <= 0: 71.7% n <= 1: 92.9% n <= 2: 93.4% n <= 3: 93.6% n <= 4: 93.9% n <= 8: 97.3% n <= 16: 98.8% n <= 32: 99.5% n <= 64: 100.0% replaying troff average size 1.3 calls 178664 succeed 94.4% latencies -63.4 -59.8 s1 aligned to 4 bytes 86.8% aligned to 8 bytes 84.8% aligned to 16 bytes 83.9% s2 aligned to 4 bytes 27.7% aligned to 8 bytes 17.3% aligned to 16 bytes 9.8% s1-s2 aligned to 4 bytes 27.1% aligned to 8 bytes 16.5% aligned to 16 bytes 9.2% n <= 0: 57.9% n <= 1: 63.9% n <= 2: 78.9% n <= 3: 90.7% n <= 4: 95.6% n <= 8: 97.3% n <= 16: 99.9% n <= 32: 100.0% n <= 64: 100.0% replaying grotty average size 6.1 calls 5553 succeed 62.8% latencies -18.7 -31.6 s1 aligned to 4 bytes 99.0% aligned to 8 bytes 98.9% aligned to 16 bytes 98.9% s2 aligned to 4 bytes 90.2% aligned to 8 bytes 89.6% aligned to 16 bytes 89.4% s1-s2 aligned to 4 bytes 89.2% aligned to 8 bytes 88.6% aligned to 16 bytes 88.3% n <= 0: 11.1% n <= 1: 16.4% n <= 2: 31.1% n <= 3: 49.3% n <= 4: 55.3% n <= 8: 56.4% n <= 16: 98.4% n <= 32: 100.0% n <= 64: 100.0% replaying groff average size 0.2 calls 696 succeed 98.4% latencies 12.6 10.3 s1 aligned to 4 bytes 91.7% aligned to 8 bytes 90.9% aligned to 16 bytes 90.9% s2 aligned to 4 bytes 33.5% aligned to 8 bytes 18.4% aligned to 16 bytes 9.1% s1-s2 aligned to 4 bytes 25.7% aligned to 8 bytes 9.9% aligned to 16 bytes 0.6% n <= 0: 88.8% n <= 1: 98.3% n <= 2: 99.1% n <= 3: 99.6% n <= 4: 99.7% n <= 8: 99.9% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying as average size 6.7 calls 5198 succeed 36.7% latencies 24.5 20.4 s1 aligned to 4 bytes 28.9% aligned to 8 bytes 14.7% aligned to 16 bytes 7.4% s2 aligned to 4 bytes 28.9% aligned to 8 bytes 14.7% aligned to 16 bytes 7.3% s1-s2 aligned to 4 bytes 74.1% aligned to 8 bytes 67.9% aligned to 16 bytes 64.6% n <= 0: 4.0% n <= 1: 10.4% n <= 2: 13.8% n <= 3: 18.6% n <= 4: 25.0% n <= 8: 67.4% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% summary memcmp: replaying ls average size 0.4 calls 9641 succeed 100.0% latencies -6.2 -7.0 s1 aligned to 4 bytes 27.2% aligned to 8 bytes 12.3% aligned to 16 bytes 2.5% s2 aligned to 4 bytes 26.0% aligned to 8 bytes 15.6% aligned to 16 bytes 8.4% s1-s2 aligned to 4 bytes 25.0% aligned to 8 bytes 12.6% aligned to 16 bytes 6.4% n <= 0: 63.7% n <= 1: 97.1% n <= 2: 100.0% n <= 3: 100.0% n <= 4: 100.0% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying awk average size 0.5 calls 158 succeed 93.0% latencies 0.9 0.9 s1 aligned to 4 bytes 51.3% aligned to 8 bytes 46.8% aligned to 16 bytes 46.8% s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% s1-s2 aligned to 4 bytes 51.3% aligned to 8 bytes 46.8% aligned to 16 bytes 46.8% n <= 0: 78.5% n <= 1: 89.9% n <= 2: 93.7% n <= 3: 96.2% n <= 4: 97.5% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying mc average size 0.4 calls 1942 succeed 98.5% latencies -199.0 -199.0 s1 aligned to 4 bytes 31.3% aligned to 8 bytes 21.9% aligned to 16 bytes 16.3% s2 aligned to 4 bytes 28.7% aligned to 8 bytes 22.8% aligned to 16 bytes 14.8% s1-s2 aligned to 4 bytes 28.8% aligned to 8 bytes 19.4% aligned to 16 bytes 14.4% n <= 0: 79.2% n <= 1: 96.7% n <= 2: 96.7% n <= 3: 96.7% n <= 4: 98.7% n <= 8: 99.4% n <= 16: 99.9% n <= 32: 100.0% n <= 64: 100.0% replaying mutt average size 4.7 calls 29693 succeed 100.0% latencies -251.6 -253.5 s1 aligned to 4 bytes 99.8% aligned to 8 bytes 1.4% aligned to 16 bytes 1.4% s2 aligned to 4 bytes 100.0% aligned to 8 bytes 99.8% aligned to 16 bytes 99.8% s1-s2 aligned to 4 bytes 99.8% aligned to 8 bytes 1.3% aligned to 16 bytes 1.3% n <= 0: 8.7% n <= 1: 8.9% n <= 2: 8.9% n <= 3: 8.9% n <= 4: 8.9% n <= 8: 98.9% n <= 16: 98.9% n <= 32: 100.0% n <= 64: 100.0% replaying irb average size 2.9 calls 306 succeed 88.2% latencies -109.3 -112.0 s1 aligned to 4 bytes 34.0% aligned to 8 bytes 19.0% aligned to 16 bytes 13.7% s2 aligned to 4 bytes 82.4% aligned to 8 bytes 73.5% aligned to 16 bytes 35.9% s1-s2 aligned to 4 bytes 34.6% aligned to 8 bytes 19.9% aligned to 16 bytes 12.4% n <= 0: 67.3% n <= 1: 69.9% n <= 2: 80.4% n <= 3: 81.0% n <= 4: 84.6% n <= 8: 87.9% n <= 16: 89.5% n <= 32: 99.3% n <= 64: 100.0% replaying vim average size 1.5 calls 467979 succeed 99.1% latencies 101.4 95.6 s1 aligned to 4 bytes 25.6% aligned to 8 bytes 15.6% aligned to 16 bytes 10.0% s2 aligned to 4 bytes 59.5% aligned to 8 bytes 47.0% aligned to 16 bytes 46.3% s1-s2 aligned to 4 bytes 20.4% aligned to 8 bytes 8.6% aligned to 16 bytes 3.6% n <= 0: 6.7% n <= 1: 52.2% n <= 2: 94.6% n <= 3: 98.4% n <= 4: 99.0% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying make average size 7.2 calls 1000000 succeed 99.5% latencies 1.3 1.4 s1 aligned to 4 bytes 19.2% aligned to 8 bytes 12.3% aligned to 16 bytes 8.4% s2 aligned to 4 bytes 27.5% aligned to 8 bytes 15.8% aligned to 16 bytes 6.6% s1-s2 aligned to 4 bytes 24.8% aligned to 8 bytes 12.2% aligned to 16 bytes 6.0% n <= 0: 72.1% n <= 1: 75.0% n <= 2: 75.3% n <= 3: 75.3% n <= 4: 75.3% n <= 8: 76.1% n <= 16: 76.6% n <= 32: 100.0% n <= 64: 100.0% replaying /usr/lib/gcc/x86_64-linux-gnu/4.9/cc1 average size 4.4 calls 6108 succeed 34.0% latencies 0.0 10.5 s1 aligned to 4 bytes 27.7% aligned to 8 bytes 2.2% aligned to 16 bytes 1.5% s2 aligned to 4 bytes 80.8% aligned to 8 bytes 79.2% aligned to 16 bytes 42.5% s1-s2 aligned to 4 bytes 27.9% aligned to 8 bytes 3.3% aligned to 16 bytes 2.4% n <= 0: 23.8% n <= 1: 26.5% n <= 2: 27.2% n <= 3: 27.4% n <= 4: 52.5% n <= 8: 96.1% n <= 16: 99.9% n <= 32: 100.0% n <= 64: 100.0% replaying gcc average size 0.0 calls 63189 succeed 99.9% latencies 1.6 1.7 s1 aligned to 4 bytes 3.4% aligned to 8 bytes 3.2% aligned to 16 bytes 3.1% s2 aligned to 4 bytes 26.5% aligned to 8 bytes 11.9% aligned to 16 bytes 6.6% s1-s2 aligned to 4 bytes 24.7% aligned to 8 bytes 13.2% aligned to 16 bytes 7.7% n <= 0: 96.3% n <= 1: 99.7% n <= 2: 99.9% n <= 3: 99.9% n <= 4: 99.9% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying pager average size 0.9 calls 118 succeed 56.8% latencies -18.2 -18.2 s1 aligned to 4 bytes 23.7% aligned to 8 bytes 15.3% aligned to 16 bytes 8.5% s2 aligned to 4 bytes 21.2% aligned to 8 bytes 16.9% aligned to 16 bytes 13.6% s1-s2 aligned to 4 bytes 30.5% aligned to 8 bytes 17.8% aligned to 16 bytes 11.9% n <= 0: 54.2% n <= 1: 56.8% n <= 2: 98.3% n <= 3: 98.3% n <= 4: 100.0% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying man average size 12.3 calls 119 succeed 49.6% latencies -16.9 -5.0 s1 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% s1-s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% n <= 0: 0.8% n <= 1: 21.8% n <= 2: 21.8% n <= 3: 21.8% n <= 4: 21.8% n <= 8: 50.4% n <= 16: 89.1% n <= 32: 89.1% n <= 64: 100.0% replaying as average size 5.3 calls 8968 succeed 2.1% latencies 16.0 4.8 s1 aligned to 4 bytes 42.8% aligned to 8 bytes 39.1% aligned to 16 bytes 38.4% s2 aligned to 4 bytes 35.4% aligned to 8 bytes 23.9% aligned to 16 bytes 18.8% s1-s2 aligned to 4 bytes 26.3% aligned to 8 bytes 13.1% aligned to 16 bytes 7.4% n <= 0: 0.2% n <= 1: 0.3% n <= 2: 1.5% n <= 3: 12.7% n <= 4: 47.8% n <= 8: 98.9% n <= 16: 99.6% n <= 32: 100.0% n <= 64: 100.0% summary strcasecmp: replaying mutt average size 1.2 calls 53965 succeed 100.0% latencies -252.2 -251.1 s1 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% s2 aligned to 4 bytes 31.7% aligned to 8 bytes 20.8% aligned to 16 bytes 11.9% s1-s2 aligned to 4 bytes 31.7% aligned to 8 bytes 20.8% aligned to 16 bytes 11.9% n <= 0: 63.4% n <= 1: 65.3% n <= 2: 65.3% n <= 3: 88.7% n <= 4: 100.0% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% average case mismatches 0.581 replaying irb average size 1.0 calls 693 succeed 94.5% latencies -97.4 -97.9 s1 aligned to 4 bytes 30.4% aligned to 8 bytes 11.4% aligned to 16 bytes 4.2% s2 aligned to 4 bytes 29.1% aligned to 8 bytes 14.3% aligned to 16 bytes 10.2% s1-s2 aligned to 4 bytes 27.4% aligned to 8 bytes 13.6% aligned to 16 bytes 5.9% n <= 0: 84.6% n <= 1: 88.3% n <= 2: 89.0% n <= 3: 89.8% n <= 4: 90.3% n <= 8: 93.8% n <= 16: 99.6% n <= 32: 100.0% n <= 64: 100.0% average case mismatches 0.000 replaying vim average size 0.5 calls 2194 succeed 95.2% latencies -19.8 -9.9 s1 aligned to 4 bytes 92.7% aligned to 8 bytes 92.6% aligned to 16 bytes 91.7% s2 aligned to 4 bytes 27.7% aligned to 8 bytes 10.9% aligned to 16 bytes 6.5% s1-s2 aligned to 4 bytes 26.5% aligned to 8 bytes 10.2% aligned to 16 bytes 5.3% n <= 0: 87.2% n <= 1: 90.6% n <= 2: 91.3% n <= 3: 94.5% n <= 4: 97.4% n <= 8: 99.1% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% average case mismatches 0.024 replaying /usr/lib/gcc/x86_64-linux-gnu/4.9/cc1 average size 5.3 calls 108 succeed 4.6% latencies 31.5 -6.5 s1 aligned to 4 bytes 6.5% aligned to 8 bytes 5.6% aligned to 16 bytes 5.6% s2 aligned to 4 bytes 1.9% aligned to 8 bytes 0.9% aligned to 16 bytes 0.9% s1-s2 aligned to 4 bytes 93.5% aligned to 8 bytes 93.5% aligned to 16 bytes 93.5% n <= 0: 0.9% n <= 1: 0.9% n <= 2: 0.9% n <= 3: 0.9% n <= 4: 3.7% n <= 8: 95.4% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% average case mismatches 0.028 replaying /usr/bin/lsof average size 0.1 calls 181 succeed 98.9% latencies 32.2 36.8 s1 aligned to 4 bytes 20.4% aligned to 8 bytes 17.1% aligned to 16 bytes 17.1% s2 aligned to 4 bytes 17.7% aligned to 8 bytes 0.6% aligned to 16 bytes 0.6% s1-s2 aligned to 4 bytes 26.0% aligned to 8 bytes 12.7% aligned to 16 bytes 6.1% n <= 0: 97.2% n <= 1: 99.4% n <= 2: 99.4% n <= 3: 99.4% n <= 4: 99.4% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% average case mismatches 0.000 replaying man average size 2.1 calls 70892 succeed 100.0% latencies -353.3 -355.8 s1 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% s1-s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% n <= 0: 38.8% n <= 1: 63.4% n <= 2: 74.7% n <= 3: 81.3% n <= 4: 86.7% n <= 8: 95.5% n <= 16: 98.4% n <= 32: 100.0% n <= 64: 100.0% average case mismatches 0.063 replaying preconv average size 0.6 calls 75 succeed 97.3% latencies -35.2 -6.9 s1 aligned to 4 bytes 97.3% aligned to 8 bytes 96.0% aligned to 16 bytes 96.0% s2 aligned to 4 bytes 38.7% aligned to 8 bytes 21.3% aligned to 16 bytes 9.3% s1-s2 aligned to 4 bytes 37.3% aligned to 8 bytes 21.3% aligned to 16 bytes 9.3% n <= 0: 84.0% n <= 1: 85.3% n <= 2: 85.3% n <= 3: 86.7% n <= 4: 98.7% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% average case mismatches 0.453 summary strncasecmp: replaying mutt average size 0.5 calls 233025 succeed 95.9% latencies -260.3 -259.2 s1 aligned to 4 bytes 24.4% aligned to 8 bytes 23.6% aligned to 16 bytes 0.4% s2 aligned to 4 bytes 100.0% aligned to 8 bytes 49.2% aligned to 16 bytes 25.8% s1-s2 aligned to 4 bytes 24.4% aligned to 8 bytes 13.2% aligned to 16 bytes 7.5% n <= 0: 81.1% n <= 1: 85.7% n <= 2: 87.6% n <= 3: 100.0% n <= 4: 100.0% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% average case mismatches 0.000 replaying vim average size 2.8 calls 10719 succeed 98.3% latencies -20.9 -20.2 s1 aligned to 4 bytes 30.3% aligned to 8 bytes 11.4% aligned to 16 bytes 8.1% s2 aligned to 4 bytes 20.7% aligned to 8 bytes 5.0% aligned to 16 bytes 3.5% s1-s2 aligned to 4 bytes 27.9% aligned to 8 bytes 8.1% aligned to 16 bytes 3.7% n <= 0: 55.5% n <= 1: 57.6% n <= 2: 58.4% n <= 3: 71.2% n <= 4: 72.6% n <= 8: 86.6% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% average case mismatches 0.002 replaying man average size 1.3 calls 167 succeed 91.0% latencies -17.1 22.7 s1 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% s1-s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% n <= 0: 50.3% n <= 1: 64.1% n <= 2: 66.5% n <= 3: 89.8% n <= 4: 98.8% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% average case mismatches 0.000 replaying as average size 0.0 calls 3267 succeed 100.0% latencies 1.5 7.7 s1 aligned to 4 bytes 24.4% aligned to 8 bytes 12.3% aligned to 16 bytes 6.0% s2 aligned to 4 bytes 0.1% aligned to 8 bytes 0.0% aligned to 16 bytes 0.0% s1-s2 aligned to 4 bytes 25.3% aligned to 8 bytes 11.6% aligned to 16 bytes 6.0% n <= 0: 99.9% n <= 1: 100.0% n <= 2: 100.0% n <= 3: 100.0% n <= 4: 100.0% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% average case mismatches 0.000