On 05/06/20 15:29, Riebs, Andy wrote:

Tks for the answer.

> I'm *guessing* that you are tripping over the use of "--tasks 32" on a 
> heterogeneous cluster,
If you mean that using "--tasks 32" trips the use of a second node, then
no. The node does have two AMD Opteron 6274 .

> though your comment about the node without InfiniBand troubles me. If you 
> drain that node, or exclude it in your command line, that might correct the 
> problem. I wonder if OMPI and PMIx have decided that IB is the way to go, and 
> are failing when they try to set up on the node without IB.
The job uses a single node. On another node (identical HW: they're two
servers in 1U) the same job works with 32 tasks. Nodes are configured
via a script, so the config should be exactly the same, but maybe
something fell out of sync (continuous updates w/o reinstall since
Debian 8!). But I could't find anything obviously different.

> If that's not it, I'd try
> 0. Check sacct for the node lists for the successful and unsuccessful runs -- 
> a problem node might jump out.
> 1. Running your job with explicit node lists. Again, you may find a problem 
> node this way.
I already run it with explicit nodelist to address the problematic node
to try to identify and resolve the problem, not to avoid it leaving a
node unused...

> p.s. If this doesn't fix it, please include the Slurm and OMPI versions, and 
> a copy of your slurm.conf file (with identifying information like node names 
> removed) in your next note to this list.
I'm using Debian-packaged versions:
slurm-client/stable,stable,now 18.08.5.2-1+deb10u1 amd64
openmpi-bin/stable,now 3.1.3-11 amd64

slurm.conf (nodes and partitions omitted):
-8<--
SlurmCtldHost=str957-cluster(#.#.#.#)
AuthType=auth/munge
CacheGroups=0
CryptoType=crypto/munge
EnforcePartLimits=YES
MpiDefault=none
MpiParams=ports=12000-12999
ProctrackType=proctrack/cgroup
PrologSlurmctld=/etc/slurm-llnl/SlurmCtldProlog.sh
ReturnToService=2
SlurmctldPidFile=/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm/slurmd
SlurmUser=slurm
StateSaveLocation=/var/lib/slurm/slurmctld
SwitchType=switch/none
TaskPlugin=task/cgroup
TmpFS=/mnt/local_data/
UsePAM=1
GetEnvTimeout=20
InactiveLimit=0
KillWait=120
MinJobAge=300
SlurmctldTimeout=20
SlurmdTimeout=30
Waittime=10
FastSchedule=0
SchedulerType=sched/backfill
SchedulerPort=7321
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
PriorityType=priority/multifactor
PreemptMode=CANCEL
PreemptType=preempt/partition_prio
AccountingStorageEnforce=safe
AccountingStorageHost=str957-cluster
AccountingStorageType=accounting_storage/slurmdbd
AccountingStoreJobComment=YES
AcctGatherNodeFreq=300
ClusterName=oph
JobCompLoc=/var/spool/slurm/jobscompleted.txt
JobCompType=jobcomp/filetxt
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurm/slurmd.log
-8<--

I've had a similar problem while adding new nodes in a new partition. I
"solved" (probably) by adding a line
mtl = psm2
to /etc/openmpi/openmpi-mca-params.conf .
But those were nodes with IB.

Since I'm quite ignorant about the whole MPI and IB ecosystem, it's
mostly guesswork...

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it

Reply via email to