On Wednesday 27 February 2008 03:06, J.C. Pizarro wrote: > Compiling and executing the code of Nick Piggin at > http://gcc.gnu.org/ml/gcc/2008-02/msg00601.html > > in my old Athlon64 Venice 3200+ 2.0 GHz, > 3 GiB DDR400, 32-bit kernel, gcc 3.4.6, i got > > $ gcc -O3 -falign-functions=64 -falign-loops=64 -falign-jumps=64 > -falign-labels=64 -march=i686 foo.c -o foo > $ ./foo > no deps, predictable -- C code took 10.08ns per iteration > no deps, predictable -- cmov code took 11.07ns per iteration > no deps, predictable -- jmp code took 11.25ns per iteration > has deps, predictable -- C code took 26.66ns per iteration > has deps, predictable -- cmov code took 35.44ns per iteration > has deps, predictable -- jmp code took 18.89ns per iteration > no deps, unpredictable -- C code took 10.17ns per iteration > no deps, unpredictable -- cmov code took 11.07ns per iteration > no deps, unpredictable -- jmp code took 22.51ns per iteration > has deps, unpredictable -- C code took 104.02ns per iteration > has deps, unpredictable -- cmov code took 107.19ns per iteration > has deps, unpredictable -- jmp code took 176.18ns per iteration
Thanks for the numbers... just be careful, sometimes the numbers are a bit funny: eg. "C" should be very similar code as "cmov", but sometimes the numbers vary more than I'd like (eg. 26 vs 35ns case) which I guess is due to gcc having better control over the code generation in the C case. This should apply to jmp as well, and indeed if I compile for i586 (without cmov), then the C code uses jmp, and can be even slightly faster than my jmp asm. So this is really a demonstration / guideline only, and I think it would have to be implemented natively in gcc before you can look really closely at the numbers. Thanks, Nick