Hi Keith, On Tuesday, 11 September 2018 7:46:14 AM AEST Keith Ball wrote:
> 1.) Slurm seems to be incapable of recognizing sockets/cores/threads on > these systems. [...] > Anyone know if there is a way to get Slurm to recognize the true topology > for POWER nodes? IIIRC Slurm uses hwloc for discovering topology, so "lstopo-no-graphics" might give you some insights into whether it's showing you the right config. I'd be curious to see what "lscpu" and "slurmd -C" say as well. > 2.) Another concern is the gres.conf. Slurm seems to have trouble taking > processor ID's that are > "#Sockets". The true processor ID as given by > nvidia-smi topo -m output will range up to 159, and slurm doesn't like > this. Are we to use "Cores=" entries in gres.conf, and use the number of > the physical cores, instead of what nvidia-smi outputs? Again I *think* Slurm is using hwloc's logical CPU numbering for this, so lstopo should help - using a quick snippet on my local PC (HT enabled) here: Package L#0 + L3 L#0 (8192KB) L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 PU L#0 (P#0) PU L#1 (P#4) L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 PU L#2 (P#1) PU L#3 (P#5) L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 PU L#4 (P#2) PU L#5 (P#6) L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 PU L#6 (P#3) PU L#7 (P#7) you can see that the logical numbering (L#0 and L#1) is done to be contiguous compared to how the firmware has enumerated the CPUs. > 3.) A related gres.conf question: there seems to be no documentation of > using "CPUs=" instead of "Cores=", yet I have seen several online examples > using "CPUs=" (and I myself have used it on an x86 system without issue). > Should one use "Cores" instead of "CPUs", when specifying binding to > specific GPUs? I think CPUs= was the older syntax which has been replaced with Cores=. The gres.conf we use on our HPC cluster uses Cores= quite happily. Name=gpu Type=p100 File=/dev/nvidia0 Cores=0-17 Name=gpu Type=p100 File=/dev/nvidia1 Cores=18-35 All the best! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC