On Mon, 8 Oct 2007, Chris Samuel wrote: > If I then run 2 x 4 CPU jobs of the *same* problem, they all run at > 50% CPU.
With big thanks to Mark Hahn, this problem is solved. Infiniband is exonerated, it was the MPI stack that was the problem! Mark suggested that this sounded like a CPU affinity problem, and he was right. Turns out that when you build MVAPICH2 (in our case mvapich2-0.9.8p3) on an AMD64 or EM64T system is defaults to compiling in and enabling CPU affinity support. So if we take an example of 4 x 2 CPU jobs, it has the unfortunate effect of binding all those MPI processes to the first 2 cores in the system - hence why we see only 25% CPU utilisation per process (watched via top, and evident by the comparative run time). Fortunately though it does check the users environment for the variable MV2_ENABLE_AFFINITY and if that is set to 0 then the affinity setting is bypassed. So simply modifying my PBS script to include: export MV2_ENABLE_AFFINITY=0 before using mpiexec [1] to launch the jobs results in a properly performing system again! I'm currently running 4 x 2 CPU NAMD jobs and they're back to properly consuming 100% CPU per process. Phew! Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf