Am 11/09/13 22:21, schrieb Philip Guenther:
>
> Sorry, but I don't really find your tests convincing.
>
> * Only test the worst case of a matching buffer.
> * Unreasonably large example used (are there *any* 256MB memcmp or
> bcmp in the kernel?)
> * Use of fprintf in the inner loop adds large fixed costs including
> syscalls to what should be a microbenchmark.
> * Measurements aren't of just the inner loop.
> * No test showing the suggested compiler options actually have the
> suggested effect.
>
> If you want to show that changing A will have a positive effect, you
> need to have your test be as close a simulation of A as you can. This
> doesn't seem to be that.
>
All valid points. I compared things using 'objdump -d' already. As soon
as GCC is told to optimize, it will inline bcmp/memcmp using 'repz
cmpsb' (see '/usr/src/gnu/gcc/gcc/config/i386/i386.md' lines 18622ff and
function 'expand_builtin_memcmp' in file
'/usr/src/gnu/gcc/gcc/builtins.c'). This is slower even when comparing
just a few bytes. The larger the number of bytes to compare gets, the
more significant the difference becomes. See last result for just 128
bytes (0m29.58s vs. 0m5.32s).
$ cc -DBSIZ=4 -DITERATIONS=1000000000 -O2 bcmp.c
0m23.54s real 0m23.24s user 0m0.00s system
$ cc -DBSIZ=4 -DITERATIONS=1000000000 -O2 -fno-builtin-bcmp
-fno-builtin-memcmp bcmp.c
0m18.79s real 0m18.76s user 0m0.00s system
$ cc -DBSIZ=8 -DITERATIONS=1000000000 -O2 bcmp.c
0m32.46s real 0m32.15s user 0m0.00s system
$ cc -DBSIZ=8 -DITERATIONS=1000000000 -O2 -fno-builtin-bcmp
-fno-builtin-memcmp bcmp.c
0m20.03s real 0m20.00s user 0m0.00s system
$ cc -DBSIZ=16 -DITERATIONS=1000000000 -O2 bcmp.c
0m49.81s real 0m49.78s user 0m0.00s system
$ cc -DBSIZ=16 -DITERATIONS=1000000000 -O2 -fno-builtin-bcmp
-fno-builtin-memcmp bcmp.c
0m22.62s real 0m22.57s user 0m0.00s system
$ cc -DBSIZ=128 -DITERATIONS=100000000 -O2 bcmp.c
0m29.66s real 0m29.58s user 0m0.00s system
$ cc -DBSIZ=128 -DITERATIONS=100000000 -O2 -fno-builtin-bcmp
-fno-builtin-memcmp bcmp.c
0m5.33s real 0m5.32s user 0m0.00s system
$ cat bcmp.c
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/mman.h>
#define VALUE (0xff)
int
main(int argc, char *argv[])
{
void *b1, *b2;
int i;
b1 = malloc(BSIZ);
if (b1 == NULL) {
fprintf(stderr, "unable to allocate memory: %s\n",
strerror(errno));
return errno;
}
if (mlock(b1, BSIZ)) {
fprintf(stderr, "unable to lock memory: %s\n",
strerror(errno));
return errno;
}
memset(b1, VALUE, BSIZ);
b2 = malloc(BSIZ);
if (b2 == NULL) {
fprintf(stderr, "unable to allocate memory: %s\n",
strerror(errno));
return errno;
}
if (mlock(b2, BSIZ)) {
fprintf(stderr, "unable to lock memory: %s\n",
strerror(errno));
return errno;
}
memset(b2, VALUE, BSIZ);
for (i = 0; i < ITERATIONS; i++) {
if (bcmp(b1, b2, BSIZ)) {
fprintf(stderr, "buffers do not match\n");
}
}
if (munlock(b1, BSIZ)) {
fprintf(stderr, "unable to unlock memory: %s\n",
strerror(errno));
}
if (munlock(b2, BSIZ)) {
fprintf(stderr, "unable to unlock memory: %s\n",
strerror(errno));
}
free(b1);
free(b2);
return 0;
}