------- Comment #24 from whaley at cs dot utsa dot edu 2006-06-27 16:44 -------
Guys,
OK, here is a table summarizing the performance you can see using the
mmbench4s.tar.gz. I believe this covers a strong majority of the x86
architectures in use today (there are some specialty processors such as the
Pentium-M, Turion, Efficeon, etc. missing, but I don't think they are a big %
of the market).
In this table, I report the following for each machine and data precision:
% Clock: % of clock rate achieved by best compiled version of gemm_atlas.c
(rated in mflop). Note, theoretical peak for intel machines is
1 flop/clock, and is 2 flops/clock for AMD, which would correspond
to 100% and 200% respectively.
gcc4/3 : (gcc 4 x87 performance) / (gcc 3 x87 performance)
so < 1 indicates slowdown, > 1 indicates speedup
NOTES:
(1) Pentium 4 is a model=2, while Pentium 4E is model=3.
(2) PPRO, PIII & P4e get bad % clock for double: this is because the
static blocking factor in the benchmark (nb=60) exceeds the cache,
which makes the gcc 4 #s look better than they are.
(3) In general, the % peak achieved by this kernel is large enough that
I think it is truly indicative of the computational efficiency of the
generated code.
double single
-------------- ---------------
MACHINES %CLOCK gcc4/3 %CLOCK gcc4/3
=========== ====== ====== ====== ======
PentiumPRO 67.5 0.77 78.5 0.71
PentiumIII 47.6 0.95 81.4 0.69
Pentium 4 93.8 0.92 95.7 1.00
Pentium4e 72.8 0.75 80.4 0.80
Pentium-D 86.7 0.83 94.1 0.91
CoreDuo 85.8 1.01 94.9 1.11
Athlon-K7 137.8 0.62 139.1 0.63
Athlon-64 X2 160.0 0.58 165.5 0.60
Opteron 164.6 0.57 164.6 0.61
The CoreDue numbers above are generated by me on a OS X machine, where I
hand-translated Linux assembly to run, since I could not compile stock gccs. I
have a request out for results from a guy who has Linux/CoreDue, and when I get
those I will update the results if necessary. At that time, I will also post
an attachment with all the raw timing runs that I generated the table from.
Thanks,
Clint
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827