Donc, Brice,

Can hwloc detect this and give a suitably large complaint?

--
-- Jim
--
James Cownie <[email protected]> (and intel, as I’m sure most of you know…)

> On 18 Dec 2016, at 21:28, Brice Goglin <[email protected]> wrote:
> 
> Hello
> Do you know if all your CPU memory channels are populated? CoD requires
> that each half of the CPU has some memory DIMMs (so that each NUMA node
> actually contains some memory). If both channels of one half are empty,
> the NUMA node might somehow disappear.
> Brice
> 
> 
> 
> 
> Le 16/12/2016 23:26, Elken, Tom a écrit :
>> Hi John and Greg,
>> 
>> You showed  Nodes 0 & 2 (no node 1) and a strange CPU assignment to nodes!  
>> Even though you had Cluster On Die (CoD) Endabled in your BIOS, I have never 
>> seen that arrangement of Numa nodes and CPUs.  You may have a bug in your 
>> BIOS or OS ?  
>> With CoD enabled, I would have expected 4 NUMA nodes, 0-3, and 6 cores 
>> assigned to each one.
>> 
>> The Omni-Path Performance Tuning User Guide 
>> http://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/fabric-products/Intel_OP_Performance_Tuning_UG_H93143_v6_0.pdf
>>  
>> does recommend Disabling CoD in Xeon BIOSes  (Table 2 on P. 12), but it's 
>> not considered a hard prohibition.
>> Disabling improves some fabric performance benchmarks, but Enabling helps 
>> some single-node applications performance, which could outweigh the fabric 
>> performance aspects.
>> 
>> -Tom
>> 
>>> -----Original Message-----
>>> From: Beowulf [mailto:[email protected]] On Behalf Of Greg
>>> Lindahl
>>> Sent: Friday, December 16, 2016 2:00 PM
>>> To: John Hearns
>>> Cc: Beowulf Mailing List
>>> Subject: Re: [Beowulf] NUMA zone weirdness
>>> 
>>> Wow, that's pretty obscure!
>>> 
>>> I'd recommend reporting it to Intel so that they can add it to the
>>> descendants of ipath_checkout / ipath_debug. It's exactly the kind of
>>> hidden gotcha that leads to unhappy systems!
>>> 
>>> -- greg
>>> 
>>> On Fri, Dec 16, 2016 at 03:52:34PM +0000, John Hearns wrote:
>>>> Problem solved.
>>>> I have changed the QPI Snoop Mode on these servers from
>>>> ClusterOnDIe Enabled to Disabled and they display what I take to be correct
>>>> behaviour - ie
>>>> 
>>>> [root@comp006 ~]# numactl --hardware
>>>> available: 2 nodes (0-1)
>>>> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11
>>>> node 0 size: 32673 MB
>>>> node 0 free: 31541 MB
>>>> node 1 cpus: 12 13 14 15 16 17 18 19 20 21 22 23
>>>> node 1 size: 32768 MB
>>>> node 1 free: 31860 MB
>>>> node distances:
>>>> node   0   1
>>>>  0:  10  21
>>>>  1:  21  10
>>> _______________________________________________
>>> Beowulf mailing list, [email protected] sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>> _______________________________________________
>> Beowulf mailing list, [email protected] sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit 
>> http://www.beowulf.org/mailman/listinfo/beowulf
> 
> _______________________________________________
> Beowulf mailing list, [email protected] sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf



_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to