[This is a follow up on gcc3 vs. gcc4 discussion. Background: R
benchmark tests ( http://www.sciviews.org/benchmark/index.html ) show
a dramatic difference in "Escoufier's method on a 37x37 matrix
(mixed)" test when comparing binaries for PowerPC compiled with gcc3
vs gcc4.]
On Oct 16, 2006, at 11:29 AM, René J.V. Bertin wrote:
Anyway, it has nothing to do with the G4 optimisations, as the
generic 2.4.0 on CRAN also shows the same performance drop.
Thanks for the example. I think I have a clarification on this. On a
higher level it's happening in "do_cov", but the underlying issue is
the use of "long double" computations. First the results:
The timings I get (on 2xG5 2.7GHz) are:
gcc3: 0.8s
gcc4: 4.5s (dynamic libgcc)
gcc4: 4.2s (static libgcc)
Basically any calls that use long double will be affected:
qadd: 4.5s (gcc3 opt), 6.7s (Agcc4 opt), 7.4s (gcc3), 7.9s (gcc4 opt
+dyngcc), 10.5s (Agcc4), 10.6s (gcc4 dyngcc)
(this test basically runs 500x 1M long double additions on an array -
it's even more extreme if you run it on short arrays : 250kx1k will
give 2s on gcc3 and 7.7s on gcc4)
Now, the actual reason is that gcc3 simply ignores "long double" and
performs all computation using regular double precision (sizeof(long
double)=8 in gcc3 and 16 in gcc4). What this means is that you lose
precision in gcc3. To illustrate the impact, changing "long double"
to "double" in gcc4 will bring the 250kx1k test down from 7.7s to
2.1s which is almost the same as gcc3.
Thus, restricting R to double computations I get for the 37x37 test
with gcc 4.0.3:
gcc4nld: 0.7s
which is actually even faster than the gcc3 result.
Attached you will find the R benchmarks 2.3 results (ran with R
2.4.0) - there is pretty much no difference between the binaries
except for the 37x37 test and the explanation is above.
Cheers,
Simon
I. Matrix calculation gcc3 gcc4d CRAN
---------------------
Creation, transp., deformation of a 1500x1500 matrix (sec): 1.42 1.37 1.48
800x800 normal distributed random matrix ^1000______ (sec): 0.10 0.11 0.11
Sorting of 2,000,000 random values__________________ (sec): 0.36 0.34 0.34
700x700 cross-product matrix (b = a' * a)___________ (sec): 0.057 0.057 0.058
Linear regression over a 600x600 matrix (c = a \ b') (sec): 0.090 0.090 0.090
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.149 0.148 0.149
II. Matrix functions
--------------------
FFT over 800,000 random values______________________ (sec): 0.76 0.76 0.77
Eigenvalues of a 320x320 random matrix______________ (sec): 0.33 0.33 0.32
Determinant of a 650x650 random matrix______________ (sec): 0.068 0.068 0.066
Cholesky decomposition of a 900x900 matrix__________ (sec): 0.102 0.103 0.104
Inverse of a 400x400 random matrix__________________ (sec): 0.051 0.055 0.052
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.131 0.133 0.131
III. Programmation
------------------
750,000 Fibonacci numbers calculation (vector calc)_ (sec): 0.44 0.44 0.47
Creation of a 2250x2250 Hilbert matrix (matrix calc) (sec): 1.87 1.80 1.97
Grand common divisors of 70,000 pairs (recursion)___ (sec): 0.31 0.32 0.34
Creation of a 220x220 Toeplitz matrix (loops)_______ (sec): 0.36 0.35 0.35
Escoufier's method on a 37x37 matrix (mixed)________ (sec): 2.37 2.27 5.76
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.667 0.654 0.683
Total time for all 15 tests_________________________ (sec): 8.68 8.47 12.27
Overall mean (sum of I, II and III trimmed means/3)_ (sec): 0.236 0.234 0.237
--- End of test ---
---
gcc3 = Apple gcc-3.3 + g77 3.4.6 + -O9 + -mtune=G5 (R24-branch 39648)
gcc4d = CRAN gcc 4.0.3 + #undef HAVE_LONG_DOUBLE + -O9 + -mtune=G5 (R24-branch
39648)
CRAN = CRAN binary 2.4.0
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel