Re: [Beowulf] Intel Quad-Core or AMD Opteron

Robert G. Brown Thu, 23 Aug 2007 09:05:43 -0700

On Thu, 23 Aug 2007, Li, Bo wrote:

Doug have got many valuable points I think.
The current application using MPI or OPENMP on a multi-core machine runs in 
simple SMP way, which means nearly nothing done for multi-core optimization. 
IMHO, at least multi-core processors equips with different internal 
inter-connection from general SMP system. We can put much data exchanging 
operations done on one socket and the external bandwidth can be used for any 
other threads.
For SMP systems or multi-core systems, main memory bandwidth can be the 
critical bottle-neck. In some of my experiments, 8 cores can eat all of them 
and in extreme cases, 4 cores eat up and you can hardly find any improvement 
from 4 cores to 8 cores. In these conditions, data transferring should be 
planned well and done in an efficient.
By the way, I prefer to use OPENMP on a SMP system and MPI between boxes in a 
cluster. Multi-core processors saved me much money for the same peak 
performance, but tuning or optimization can help us do better.
Regards,
Li, Bo


More to the point (and perhaps more useful for cluster/systems
engineers) -- as one adds cores to a CPU without increasing the
bandwidth of the memory it shares or otherwise multiplexing the pathways
to that memory, it is almost inevitable that a memory bottleneck will
appear.  It seems reasonable to probe that memory bottleneck as directly
as possible with e.g. multiple copies of stream or stream-like
benchmarks that also permit shuffled/nonstreaming/random access to
memory blocks for varying sizes and strides of read and written data to
get an idea of the BASELINE rates and nonlinearities as one runs over
the cache boundaries and so on.

This gives people without access to a quad core at least the opportunity
to meditate on whether or not there is any hope of it being useful, to
them, compared to (say) a dual dual core or two single processor dual
cores with a network connection.  A quad core CPU "is" a little
mini-beowulf in some sense and faces very similar issues when one
contemplates its task scaling, and just as network-based IPCs are the
crux of the issue for COTS cluster design it seems that core-to-core
and/or core-to-memory latencies and bandwidths in different load
configurations are at the crux for multicores, where putting LOTS of
cores on a single die can just make things MUCH worse.  (Remembering
that the task scaling curve can easily turn DOWN if you put too many
CPUs on a task with inadequate IPC capacity.)

Has anyone published a simple set of these numbers?  One would expect to
be able to understand most other benchmarks in terms of them...

   rgb

----- Original Message -----
From: "Douglas Eadline" <[EMAIL PROTECTED]>
To: "Ruhollah Moussavi Baygi" <[EMAIL PROTECTED]>
Cc: <beowulf@beowulf.org>
Sent: Thursday, August 23, 2007 9:09 PM
Subject: Re: [Beowulf] Intel Quad-Core or AMD Opteron

Multi-core, I lie awake at night thinking about this stuff.
There seem to be no quick answers.

The thing that amazes me about multi-core is how many people
consider the performance of a single process to be a good measure
of total processor performance. If you are going to by a quad-core
CPU to run one process at a time, then this is good test
otherwise it is like predicting performance of your code
on cluster by running it on the head node as single
serial job.

Over the past 8-10 months I have had the chance to test
Intel quad-core, AMD dual core (soon I'll have some Barcelona's)
and here are my conclusions. The details of what I found are
in my columns in Linux Magazine, which is slowly making its way
to the LM web site (and eventually ClusterMonkey):

- how well multiple processes run (use memory) on quad-core
  is very application specific. I have a simple test script
  that calculates what I call "effective cores". I have seen
  these results range from about 2-7 on a dual socket quad-core
  Intel system (8 cores total) and a quad socket dual core AMD
  system (8 cores total).

- running a single copy of the NAS FT benchmark on a clovertown
  was much faster than a comparable Opteron. But, running a parallel
  MPI version of FT on 8 cores showed the AMD system to be faster.

- on Intel quad-cores where the process is placed can have
  large effect on performance. This is largely due to the
  fact that you have four dual core woodcrests each with it's
  own cache. Naturally, if you have four processes running
  it is best if each one gets its own woodcrest. To the OS
  the all look the same. Other than Intel MPI, I don't
  know of any other MPI that attempts to optimize this.
  Open MPI has some processor affinity but it is
  not all that sophisticated (yet).

- again depending on the application, GigE may not
  sufficient to support the amount of traffic that
  multi-core can generate. So if your code ran
  well on GigE, it may not on a multi-core cluster.
  Things like IB or Myrinet 10GigE may be needed.

Please note, I am not trying to pick a winner, were that
even possible. I want to state that more than ever testing
your code(s) in parallel on these systems is critical if
you want to get optimal performance.

One other thing I found as well. I recently ran the NAS
parallel benchmarks on a dual socket quad core Intel system
(8 cores total) using both the OpenMP (GNU 4.2) and MPI (LAM)
libraries. Anyone want to guess what produced the best results?

--
Doug

Hi everybody,

As you may be aware of, Intel has reduced the price of its Quad CPUs,
dramatically.

Does anyone have any experience using Intel Quad-Core CPUs in a Beowulf
Cluster?

Do you prefer these ones over AMD Opteron?

Essentially, are Intel Quad CPUs having really FOUR cores? Are they really
64-bit processors, as Opterons are?

Thanks for any comment on each of my questions.

Wishes,
rmb


--
Best,
Ruhollah Moussavi Baygi


!DSPAM:46cc246327668298414181!
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


!DSPAM:46cc246327668298414181!



--
Doug
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Robert G. Brown                        http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:[EMAIL PROTECTED]


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Intel Quad-Core or AMD Opteron

Reply via email to