Re: optimizing predictable branches on x86

Nick Piggin Sun, 02 Mar 2008 20:39:37 -0800

On Wednesday 27 February 2008 03:06, J.C. Pizarro wrote:
> Compiling and executing the code of Nick Piggin at
> http://gcc.gnu.org/ml/gcc/2008-02/msg00601.html
>
> in my old Athlon64 Venice 3200+ 2.0 GHz,
> 3 GiB DDR400, 32-bit kernel, gcc 3.4.6, i got
>
> $ gcc -O3 -falign-functions=64 -falign-loops=64 -falign-jumps=64
> -falign-labels=64 -march=i686 foo.c -o foo
> $ ./foo
>  no deps,   predictable -- C    code took  10.08ns per iteration
>  no deps,   predictable -- cmov code took  11.07ns per iteration
>  no deps,   predictable -- jmp  code took  11.25ns per iteration
> has deps,   predictable -- C    code took  26.66ns per iteration
> has deps,   predictable -- cmov code took  35.44ns per iteration
> has deps,   predictable -- jmp  code took  18.89ns per iteration
>  no deps, unpredictable -- C    code took  10.17ns per iteration
>  no deps, unpredictable -- cmov code took  11.07ns per iteration
>  no deps, unpredictable -- jmp  code took  22.51ns per iteration
> has deps, unpredictable -- C    code took  104.02ns per iteration
> has deps, unpredictable -- cmov code took  107.19ns per iteration
> has deps, unpredictable -- jmp  code took  176.18ns per iteration


Thanks for the numbers... just be careful, sometimes the numbers
are a bit funny: eg. "C" should be very similar code as "cmov", but
sometimes the numbers vary more than I'd like (eg. 26 vs 35ns case)
which I guess is due to gcc having better control over the code
generation in the C case. This should apply to jmp as well, and
indeed if I compile for i586 (without cmov), then the C code uses
jmp, and can be even slightly faster than my jmp asm.

So this is really a demonstration / guideline only, and I think it
would have to be implemented natively in gcc before you can look
really closely at the numbers.

Thanks,
Nick

Re: optimizing predictable branches on x86

Reply via email to