Hi,

I did some tests on the following function

--- CUT HERE ---
int fibo(int n)
{
  if (n < 2) return 1;
  return (fibo(n-2) + fibo(n-1));
}
--- CUT HERE ---

and I discovered that it is faster -O2 than -O3. This is with gcc 4.9.2.

Looking at the disassembly I see it is using FP registers to hold
integer values. The following is a small extract.

.L3:
        fmov    w0, s8
        sub     w25, w25, #1
        cmn     w25, #1
        add     w0, w0, w27
        fmov    s8, w0
        bne     .L19
        add     w0, w0, 1
        b       .L2

Recompiling with -mgeneral-regs-only generates a huge improvement.

The following are the times I get on various partner HW. I have
normalised the -O2 times to 1 second so that I do not disclose actual
partner performance data:

Partner 1: -O2 = 1sec, -O3 = 1.13sec, -O3 -mgeneral-regs-only = 0.72sec
Partner 2: -O2 = 1sec, -O3 = 0.68sec, -O3 -mgeneral-regs-only = 0.60sec
Partner 3: -O2 = 1sec, -O3 = 0.73sec, -O3 -mgeneral-regs-only = 0.68sec
Partner 4: -O2 = 1sec, -O3 = 0.83sec, -O3 -mgeneral-regs-only = 0.84sec

So, in general, -O3 does actually do better than -O2, but in all cases
performance is better if I stop it using FP registers for int values.

I have put a tarball of the test program along with 3 binaries and 3
disassemblies here:-

http://people.linaro.org/~edward.nevill/fibo.tar

All the best,
Ed.


_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to