In message from Craig Tierney <craig.tier...@noaa.gov> (Tue, 11 Aug
2009 11:40:03 -0600):
Rahul Nabar wrote:
On Mon, Aug 10, 2009 at 12:48 PM, Bruno
Coutinho<couti...@dcc.ufmg.br> wrote:
This is often caused by cache competition or memory bandwidth
saturation.
If it was cache competition, rising from 4 to 6 threads would make
it worse.
As the code became faster with DDR3-1600 and much slower with Xeon
5400,
this code is memory bandwidth bound.
Tweaking CPU affinity to avoid thread jumping among cores of the
will not
help much, as the big bottleneck is memory bandwidth.
To this code, CPU affinity will only help in NUMA machines to
maintain
memory access in local memory.
If the machine has enough bandwidth to feed the cores, it will
scale.
Exactly! But I thought this was the big advance with the Nehalem
that
it has removed the CPU<->Cache<->RAM bottleneck. So if the code
scaled
with the AMD Barcelona then it would continue to scale with the
Nehalem right?
I'm posting a copy of my scaling plot here if it helps.
http://dl.getdropbox.com/u/118481/nehalem_scaling.jpg
To remove most possible confounding factors this particular Nehlem
plot is produced with the following settings:
Hyperthreading OFF
24GB memory i.e. 6 banks of 4GB. i.e. optimum memory configuration
X5550
Even if we explained away the bizzare performance of the 4 node case
to the Turbo effect what is most confusing is how the 8 core data
point could be so much slower than the corresponding 8 core point on
a
old AMD Barcelona.
Something's wrong here that I just do not understand. BTW, any other
VASP users here? Anybody have any Nehalem experience?
Rahul,
What are you doing to ensure that you have both memory and processor
affinity enabled?
Craig
As I mentioned here in "numactl&SuSE11.1' thread, on some kernels
there is wrong behaviour for Nehalem (bad /sys/devices/system/node
directory content). This bug is presented, in particular, in default
OpenSuSE 11 kernels (2.6.27.7-9 and 2.6.29-6), and (as it was writted
in the corresponding thread discussion) in FC11 2.6.29 kernel.
I found that in such situation disabling of NUMA in BIOS gives only
increase of STREAM throughput. Therefore I think this (Rahul) problem
is not due to BIOS settings. Unfortunately I've no data about VASP
itself.
It's interesting, do somebody have "normally working" w/Nehalem - in
the sense of NUMA - kernels ? AFAIK more old 2.6 kernels (from SuSE
10.3) works OK, but I didn't check. May be error in NUMA support is
the reason of Rahul problem ?
Mikhail
--
Rahul
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin
Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
Craig Tierney (craig.tier...@noaa.gov)
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin
Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
üÔÏ ÓÏÏÂÝÅÎÉÅ ÂÙÌÏ ÐÒÏ×ÅÒÅÎÏ ÎÁ ÎÁÌÉÞÉÅ × ÎÅÍ ×ÉÒÕÓÏ×
É ÉÎÏÇÏ ÏÐÁÓÎÏÇÏ ÓÏÄÅÒÖÉÍÏÇÏ ÐÏÓÒÅÄÓÔ×ÏÍ
MailScanner, É ÍÙ ÎÁÄÅÅÍÓÑ
ÞÔÏ ÏÎÏ ÎÅ ÓÏÄÅÒÖÉÔ ×ÒÅÄÏÎÏÓÎÏÇÏ ËÏÄÁ.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf