Re: [Beowulf] Scaling issues on Xeon E5-2680

2016-02-28 Thread John Hearns
Backing up what everyone else says. DO you know how the 24 core jobs are being distributed between sockets and cores. Also as a very naive diagnostic, just run 'top' or even better 'htop' while the jobs are running. I KNOW this isnt a great diagnostic for parallel programs, but sometimes you c

Re: [Beowulf] Scaling issues on Xeon E5-2680

2016-02-28 Thread Peter St. John
I come from the math side, not the electronics side, so this may be an ill-posed question, but could you try running the job with 12 cores on just one node, then with 6 cores on each of two nodes? I'm thinking, the 24 core version may get assigned to more nodes than your 12 core, and it's communica

Re: [Beowulf] Scaling issues on Xeon E5-2680

2016-02-28 Thread Jon Tegner
Thanks! we are using openmpi-1.10.2, and I believe bind-to core and socket are on by default here. Openmpi is built using ./configure \ --build=x86_64-redhat-linux-gnu \ --host=x86_64-redhat-linux-gnu \ --disable-dependency-tracking \ --prefix=/remote/soft/OpenM

Re: [Beowulf] Scaling issues on Xeon E5-2680

2016-02-28 Thread Joe Landman
On 02/28/2016 10:27 AM, Jon Tegner wrote: Hi, have issues with performance on E5-2680. Each of the nodes have 2 of these 12 core CPUs on SuperMicro SuperServer 1028R-WMR (i.e., 24 cores on each node). For one of our applications (CFD/OpenFOAM) we have noticed that the calculation runs fas

Re: [Beowulf] Scaling issues on Xeon E5-2680

2016-02-28 Thread cbergstrom
Do you have CPU affinity enabled? Is this omp + MPI? Which compilers and flags. Give more details about the software stack   Original Message   From: Jon Tegner Sent: Sunday, February 28, 2016 22:28 To: beowulf@beowulf.org Subject: [Beowulf] Scaling issues on Xeon E5-2680 Hi, have issues with p

[Beowulf] Scaling issues on Xeon E5-2680

2016-02-28 Thread Jon Tegner
Hi, have issues with performance on E5-2680. Each of the nodes have 2 of these 12 core CPUs on SuperMicro SuperServer 1028R-WMR (i.e., 24 cores on each node). For one of our applications (CFD/OpenFOAM) we have noticed that the calculation runs faster using 12 cores on 4 nodes compared to whe