On Tue, Mar 25, 2008 at 12:40 PM, <[EMAIL PROTECTED]> wrote: > > > On Tue, Mar 25, 2008 at 12:17 AM, Eric Thibodeau <[EMAIL PROTECTED]> > > wrote: > >> > >> Mark Hahn wrote: > >> >> NUMA is an acronym meaning Non Uniform Memory Access. This is a > >> >> hardware constraint and is not a "performance" switch you turn on. > >> >> Under the Linux > >> > > >> > I don't agree. NUMA is indeed a description of hardware. I'm not > >> > sure what you meant by "constraint" - NUMA is not some kind of > >> > shortcoming. > >> Mark is right, my choice of words is misleading. By constraint I meant > >> that you have to be conscious of what ends up where (that was the point > >> of the link I added in my e-mail ;P ) > >> > >> >> kernel there is an option that is meant to tell the kernel to be > >> >> conscious about that hardware fact and attempt to help it optimize > >> >> the way it maps the memory allocation to a task Vs the processor the > >> >> given task will be using (processor affinity, check out taskset (in > >> >> recent util-linux implementations, ie: 2.13+). > >> > the kernel has had various forms of NUMA and socket affinity for a > >> > long time, > >> > and I suspect most any distro will install kernel which has the > >> > appropriate support (surely any x86_64 kernel would have NUMA > >> support). > >> My point of view on distro kernels is that they are to be scrutinized > >> unless they are specifically meant to be used as computation nodes (ie: > >> don't expect CONFIG_HZ=100 to be set on "typical" distros). > >> Also, NUMA is only applicable to Opteron architecture (internal MMU > >> with > >> HyperTransport), not the Intel flavor of multi-core CPUs (external MMU, > >> which can be a single bus or any memory access scheme as dictated by > >> the > >> motherboard manufacturer). > >> > >> > > >> > I usually use numactl rather than taskset. I'm not sure of the > >> > history of those tools. as far as I can tell, taskset only addresses > >> > numactl --cpubind, > >> > though they obviously approach things differently. if you're going > >> to > >> > use taskset, you'll want to set cpu affinity to multiple cpus (those > >> > local to a socket, or 'node' in numactl terms.) > >> > > >> >> In your specific case, you would have 4Gigs per CPU and would want > >> >> to make sure each task (assuming one per CPU) stays on the same CPU > >> >> all the time and would want to make sure each task fits within the > >> >> "local" 4Gig. > >> > > >> > "numactl --localalloc". > >> > > >> > but you should first verify that your machines actually do have the > >> 8GB > >> > split across both nodes. it's not that uncommon to see an > >> > inexperienced assembler fill up one node before going onto the next, > >> > and there have even > >> > been some boards which provided no memory to the second node. > >> Mark (Hahn) is right (again !), I ASSumed the tech would load the > >> memory > >> banks appropriately, don't make that mistake ;) And numactl is indeed > >> more appropriate in this case (thanks Mr. Hahn ;) ). Note that the > >> kernel (configured with NUMA) _will_ attempt to allocate the memory to > >> "'local nodes" before offloading to memory "abroad". > >> > >> Eric > >> > > The memory will be installed by myself correctly - that is, > > distributing the memory according to cpu. However, it appears that > > one of my nodes (my first Opteron machine) may well be one that has > > only one bank of four DIMM slots assigned to cpu 0 and shared by cpu > > 1. It uses a Tyan K8W Tiger s2875 motherboard. My other two nodes > > use Arima HDAMA motherboards with SATA support - each cpu has a bank > > of 4 DIMMs associated with it. The Tyan node is getting 4 @ 2 Gb > > DIMMs, one of the HDAMA nodes is getting 8 @ 1 Gb (both instances > > fully populating the available DIMM slots) and the last machine is > > going to get 4 @ 1 Gb DIMMs for one cpu and 2 @ 2 Gb for the other. > > That last scheme might give you some unbalanced performance but that is > something to look up with the MB's instruction manual (ie: you might be > better off installing the RAM as 1G+1G+2G for both CPUs instead of 4x1G + > 2x2G).
On my Opteron systems, wouldn't 3 DIMMs per CPU drop me into 64-bit memory bandwidth rather than the allowed 128-bit memory bandwidth when each CPU has an even number of DIMMs? > > > > It looks like I may want to upgrade my motherboard before exploring > > NUMA / affinity then. > > If you're getting into "upgrading" (ie: trowing money at) anything, then > you're getting into the slippery slope of the hardware selection debate ;) Slippery indeed. At this point, I think I may just install the RAM to bring my current calculation out of swap and be done with the cluster for now. Given that I think one of my nodes uses hypertransport for all of cpu 1 memory access, would it hurt anything to use affinity when only 2 out of 3 nodes can benefit from affinity? > > > > This discussion as well as reading about NUMA and affinity elsewhere > > leads to another question - what is the difference between using > > numactl or using the affinity options of my parallelization software > > (in my case openmpi)? > > numactl is an application to help nudge processes in the correct > direction. Implementing cpuaffinity within your code makes your code > explicitally aware that it will run on an SMP machine (ie: it's hardcoded > and you don't need to call a script to change your processe's affinity). > > In that regards Chris Samuel replied with the mention of Torque and PBS > which would support affinity assignment. IMHO, that would be the most > logical place to control affinity (as long as one can provide some memory > access hints, ie: same options as seen in numactl's manpage) > > > Thanks, > > > > Mark (Kosmowski) > > > > Eric Thibodeau > > Again, thank you for this discussion - I'm learning quite a bit! _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf