Re: [Beowulf] NUMA zone weirdness

2016-12-16 Thread Peter St. John
I notice an odd thing maybe someone more hardware cluefull could explain? Node 1 has 32GB (that is, 32 * 1024 MB) but Node 0 is an odd number (very odd, to me), 32673 is 95 MB short. It doesn't make sense to me that a bank of bad memory would be such a funny number short. Peter On Fri, Dec 16, 201

Re: [Beowulf] NUMA zone weirdness

2016-12-16 Thread Elken, Tom
Hi John and Greg, You showed Nodes 0 & 2 (no node 1) and a strange CPU assignment to nodes! Even though you had Cluster On Die (CoD) Endabled in your BIOS, I have never seen that arrangement of Numa nodes and CPUs. You may have a bug in your BIOS or OS ? With CoD enabled, I would have expe

Re: [Beowulf] NUMA zone weirdness

2016-12-16 Thread Greg Lindahl
Wow, that's pretty obscure! I'd recommend reporting it to Intel so that they can add it to the descendants of ipath_checkout / ipath_debug. It's exactly the kind of hidden gotcha that leads to unhappy systems! -- greg On Fri, Dec 16, 2016 at 03:52:34PM +, John Hearns wrote: > Problem solved.

Re: [Beowulf] NUMA zone weirdness

2016-12-16 Thread John Hearns
Problem solved. I have changed the QPI Snoop Mode on these servers from ClusterOnDIe Enabled to Disabled and they display what I take to be correct behaviour - ie [root@comp006 ~]# numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 node 0 size: 32673 MB node 0 free:

Re: [Beowulf] NUMA zone weirdness

2016-12-16 Thread John Hearns
hwloc is finding weirdness also. I am going to find I have done something stupid, right? [johnh@comp006 ~]$ lstopo * hwloc 1.11.3 has encountered what looks like an error from the operating system. * * Package (P#1 cpus

[Beowulf] NUMA zone weirdness

2016-12-16 Thread John Hearns
This is in the context of Ominpath cards and the hfi1 driver. In the file pio.c there is a check on the NUMA zones being online * num_numa = num_online_nodes ();* *1711*