Thanks!

we are using openmpi-1.10.2, and I believe bind-to core and socket are on by default here. Openmpi is built using

./configure \
        --build=x86_64-redhat-linux-gnu \
        --host=x86_64-redhat-linux-gnu \
        --disable-dependency-tracking \
        --prefix=/remote/soft/OpenMPI/1.10.2 \
        --disable-mpi-profile \
        --enable-shared \
        --with-tm \
        --with-sge \
        --with-verbs

gcc is used throughout.

Code should be using RDMA. I have verified that infiniband performance (using openmpi-1.10.2) is what is to be expected (through mpitests-osu_bw and mpitest_osu_latency).

Hugepagesize haven't been touched ("grep Hugepagesize /proc/meminfo" gives 2048 kB). Is this something worth changing?

numastat gives

                           node0           node1
numa_hit                 6288694         1193663
numa_miss                      0               0
numa_foreign                   0               0
interleave_hit             66539           66287
local_node               6288301         1126078
other_node                   393           67585

on one of the nodes - but these numbers seems to be fairly representative for the nodes. However, haven't used numastat before, and maybe it should be used simultaneously as the job is running (which it wasn't now)?

Thanks!

/jon

On 02/28/2016 04:30 PM, cbergst...@pathscale.com wrote:
Do you have CPU affinity enabled? Is this omp + MPI? Which compilers and flags. 
Give more details about the software stack

   Original Message
From: Jon Tegner
Sent: Sunday, February 28, 2016 22:28
To: beowulf@beowulf.org
Subject: [Beowulf] Scaling issues on Xeon E5-2680

Hi,

have issues with performance on E5-2680. Each of the nodes have 2 of
these 12 core CPUs on SuperMicro SuperServer 1028R-WMR (i.e., 24 cores
on each node).

For one of our applications (CFD/OpenFOAM) we have noticed that the
calculation runs faster using 12 cores on 4 nodes compared to when using
24 cores on 4 nodes.

In our environment we also have older AMD hardware (nodes with 4 CPUs
with 12 cores each), and here we don't see these strange scaling issues.

System is CentOS-7, and communication is over FDR Infiniband. BIOS is
recently updated, and hyperthreading is disabled.

Feel a bit lost here, and any hints on how to proceed with this are
greatly appreciated!

Thanks,

/jon
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to