------- Comment #24 from whaley at cs dot utsa dot edu 2006-06-27 16:44 ------- Guys,
OK, here is a table summarizing the performance you can see using the mmbench4s.tar.gz. I believe this covers a strong majority of the x86 architectures in use today (there are some specialty processors such as the Pentium-M, Turion, Efficeon, etc. missing, but I don't think they are a big % of the market). In this table, I report the following for each machine and data precision: % Clock: % of clock rate achieved by best compiled version of gemm_atlas.c (rated in mflop). Note, theoretical peak for intel machines is 1 flop/clock, and is 2 flops/clock for AMD, which would correspond to 100% and 200% respectively. gcc4/3 : (gcc 4 x87 performance) / (gcc 3 x87 performance) so < 1 indicates slowdown, > 1 indicates speedup NOTES: (1) Pentium 4 is a model=2, while Pentium 4E is model=3. (2) PPRO, PIII & P4e get bad % clock for double: this is because the static blocking factor in the benchmark (nb=60) exceeds the cache, which makes the gcc 4 #s look better than they are. (3) In general, the % peak achieved by this kernel is large enough that I think it is truly indicative of the computational efficiency of the generated code. double single -------------- --------------- MACHINES %CLOCK gcc4/3 %CLOCK gcc4/3 =========== ====== ====== ====== ====== PentiumPRO 67.5 0.77 78.5 0.71 PentiumIII 47.6 0.95 81.4 0.69 Pentium 4 93.8 0.92 95.7 1.00 Pentium4e 72.8 0.75 80.4 0.80 Pentium-D 86.7 0.83 94.1 0.91 CoreDuo 85.8 1.01 94.9 1.11 Athlon-K7 137.8 0.62 139.1 0.63 Athlon-64 X2 160.0 0.58 165.5 0.60 Opteron 164.6 0.57 164.6 0.61 The CoreDue numbers above are generated by me on a OS X machine, where I hand-translated Linux assembly to run, since I could not compile stock gccs. I have a request out for results from a guy who has Linux/CoreDue, and when I get those I will update the results if necessary. At that time, I will also post an attachment with all the raw timing runs that I generated the table from. Thanks, Clint -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827