Joshua,
Great thanks. That was clear and the takeaway is that I should pay attention
to the number of memory channels per core (which may be less than 1.0)
besides the number of cores and the RAM/core.

What is the "ncpu" column in Table 1 (for example)? Does the 4 refer to 4
cores, and the 1 and 2 cases don't use all the cores on the motherboard? Or
is "ncpu" an application parameter? I read it as "number of CPUs"? I noted
that the heart simulation didn't have an ncpu column, but that was why I
thought you had multiple nodes going.

Thanks very much,
Peter

P.S. and then where does the billiard cue go?


On 3/8/07, Joshua Baker-LePain <[EMAIL PROTECTED]> wrote:

On Thu, 8 Mar 2007 at 11:33am, Peter St. John wrote

> Those benchmarks are quite interesting and I wonder if I interpret them
at
> all correctly.
> It would seem that the Intel outperforms it's advantage in clockspeed
(1/6th
> faster, but ballpark 1/3 better performance?) so the question would be
> performance gain per dollar cost (which is fine); however, for that
heart
> simulation towards the end, it looks like the AMD scales up with
increasing
> nodecount enormously better, and with several nodes actually outperforms
the
> faster Intel.
> Should I guess at relatively poor performance of the networking on the
> motherboard used with the intel chip or does that have anything to do
with
> the CPU itself?

Each benchmark was run on a single sytem with 4 CPUs (or, rather, 4 cores
in 2 sockets) -- there was no network involved.  The difference (IMO) lies
in the memory subsystems of the 2 architectures.

Opterons have 1 memory controller per socket (on the CPU, shared by the 2
cores) attached to a dedicated bank of memory via a Hypertransport link
(referred to from here on as HT).  That socket is connected to the other
CPU socket (and its HT connected memory bank) by HT.

Xeons (still) have a single memory controller hub to which the CPUs
communicate via the front side bus (FSB).  That single hub has 2 channels
to memory.

So, yes, clock-for-clock (and for my usage) Xeon 51xxs are faster than
Opterons.  But, if your code hits memory *really hard* (which that heart
model does), then the multiple paths to memory available to the Opterons
allow them to scale better.

This situation has existed for a long time on the Intel side.  For P4
based Xeons it was crippling.  The new Core based Xeons, however, don't
suffer nearly as badly (due to their big cache, maybe?).  E.g. the thermal
simulations in that same file are pretty memory intensive themselves, and
P4 based Xeons scaled *horribly* on them.  But the 51xx Xeons still scale
very well on them (which surprised me).

--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to