------- Comment #24 from whaley at cs dot utsa dot edu  2006-06-27 16:44 -------
Guys,

OK, here is a table summarizing the performance you can see using the
mmbench4s.tar.gz.  I believe this covers a strong majority of the x86
architectures in use today (there are some specialty processors such as the
Pentium-M, Turion, Efficeon, etc. missing, but I don't think they are a big %
of the market).

In this table, I report the following for each machine and data precision:
  % Clock: % of clock rate achieved by best compiled version of gemm_atlas.c
           (rated in mflop).  Note, theoretical peak for intel machines is
           1 flop/clock, and is 2 flops/clock for AMD, which would correspond
           to 100% and 200% respectively.
  gcc4/3 : (gcc 4 x87 performance) / (gcc 3 x87 performance)
           so < 1 indicates slowdown, > 1 indicates speedup

NOTES:
(1) Pentium 4 is a model=2, while Pentium 4E is model=3.
(2) PPRO, PIII & P4e get bad % clock for double: this is because the
    static blocking factor in the benchmark (nb=60) exceeds the cache,
    which makes the gcc 4 #s look better than they are.
(3) In general, the % peak achieved by this kernel is large enough that
    I think it is truly indicative of the computational efficiency of the
    generated code.

                        double                 single
                    --------------         ---------------
MACHINES            %CLOCK  gcc4/3         %CLOCK  gcc4/3
===========         ======  ======         ======  ======
PentiumPRO            67.5    0.77           78.5    0.71
PentiumIII            47.6    0.95           81.4    0.69
Pentium 4             93.8    0.92           95.7    1.00
Pentium4e             72.8    0.75           80.4    0.80
Pentium-D             86.7    0.83           94.1    0.91
CoreDuo               85.8    1.01           94.9    1.11
Athlon-K7            137.8    0.62          139.1    0.63
Athlon-64 X2         160.0    0.58          165.5    0.60
Opteron              164.6    0.57          164.6    0.61

The CoreDue numbers above are generated by me on a OS X machine, where I
hand-translated Linux assembly to run, since I could not compile stock gccs.  I
have a request out for results from a guy who has Linux/CoreDue, and when I get
those I will update the results if necessary.  At that time, I will also post
an attachment with all the raw timing runs that I generated the table from.

Thanks,
Clint


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827

Reply via email to