Mikhail Kuzminsky wrote: > In message from Craig Tierney <craig.tier...@noaa.gov> (Tue, 11 Aug 2009 > 11:40:03 -0600): >> Rahul Nabar wrote: >>> On Mon, Aug 10, 2009 at 12:48 PM, Bruno >>> Coutinho<couti...@dcc.ufmg.br> wrote: >>>> This is often caused by cache competition or memory bandwidth >>>> saturation. >>>> If it was cache competition, rising from 4 to 6 threads would make >>>> it worse. >>>> As the code became faster with DDR3-1600 and much slower with Xeon >>>> 5400, >>>> this code is memory bandwidth bound. >>>> Tweaking CPU affinity to avoid thread jumping among cores of the >>>> will not >>>> help much, as the big bottleneck is memory bandwidth. >>>> To this code, CPU affinity will only help in NUMA machines to maintain >>>> memory access in local memory. >>>> >>>> >>>> If the machine has enough bandwidth to feed the cores, it will scale. >>> >>> Exactly! But I thought this was the big advance with the Nehalem that >>> it has removed the CPU<->Cache<->RAM bottleneck. So if the code scaled >>> with the AMD Barcelona then it would continue to scale with the >>> Nehalem right? >>> >>> I'm posting a copy of my scaling plot here if it helps. >>> >>> http://dl.getdropbox.com/u/118481/nehalem_scaling.jpg >>> >>> To remove most possible confounding factors this particular Nehlem >>> plot is produced with the following settings: >>> >>> Hyperthreading OFF >>> 24GB memory i.e. 6 banks of 4GB. i.e. optimum memory configuration >>> X5550 >>> >>> Even if we explained away the bizzare performance of the 4 node case >>> to the Turbo effect what is most confusing is how the 8 core data >>> point could be so much slower than the corresponding 8 core point on a >>> old AMD Barcelona. >>> >>> Something's wrong here that I just do not understand. BTW, any other >>> VASP users here? Anybody have any Nehalem experience? >>> >> >> Rahul, >> What are you doing to ensure that you have both memory and processor >> affinity enabled? >> Craig > > As I mentioned here in "numactl&SuSE11.1' thread, on some kernels there > is wrong behaviour for Nehalem (bad /sys/devices/system/node directory > content). This bug is presented, in particular, in default OpenSuSE 11 > kernels (2.6.27.7-9 and 2.6.29-6), and (as it was writted in the > corresponding thread discussion) in FC11 2.6.29 kernel. > > I found that in such situation disabling of NUMA in BIOS gives only > increase of STREAM throughput. Therefore I think this (Rahul) problem is > not due to BIOS settings. Unfortunately I've no data about VASP itself. > > It's interesting, do somebody have "normally working" w/Nehalem - in the > sense of NUMA - kernels ? AFAIK more old 2.6 kernels (from SuSE 10.3) > works OK, but I didn't check. May be error in NUMA support is the reason > of Rahul problem ? >
What do you mean normally? I am running Centos 5.3 with 2.6.18-128.2.1 right now on a 448 node Nehalem cluster. I am so far happy with how things work. The original Centos 5.3 kernel, 2.6.18-128.1.10 had bugs in Nelahem support where nodes would just start randomly run slow. Upgrading the kernel fixed that. But that performance problem was either all or none, I don't recall it exhibiting itself in the way that Rahul described. Craig > Mikhail >> >> >>> -- >>> Rahul >>> _______________________________________________ >>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >>> >> >> >> -- >> Craig Tierney (craig.tier...@noaa.gov) >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> -- >> üÔÏ ÓÏÏÂÝÅÎÉÅ ÂÙÌÏ ÐÒÏ×ÅÒÅÎÏ ÎÁ ÎÁÌÉÞÉÅ × ÎÅÍ ×ÉÒÕÓÏ× >> É ÉÎÏÇÏ ÏÐÁÓÎÏÇÏ ÓÏÄÅÒÖÉÍÏÇÏ ÐÏÓÒÅÄÓÔ×ÏÍ >> MailScanner, É ÍÙ ÎÁÄÅÅÍÓÑ >> ÞÔÏ ÏÎÏ ÎÅ ÓÏÄÅÒÖÉÔ ×ÒÅÄÏÎÏÓÎÏÇÏ ËÏÄÁ. >> > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Craig Tierney (craig.tier...@noaa.gov) _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf