Re: [slurm-users] Intermittent problem at 32 CPUs

2020-06-09 Thread Diego Zuccato
Il 08/06/20 12:16, Diego Zuccato ha scritto: > I have another partition on these new nodes. 4 identical machines, new > installation, ConnectX-5 card, dual Intel Xeon 5120 (14 core dual > thread). No problem running a job requiring 112 threads (on 4 nodes), > but can't run a single-node job with 5

Re: [slurm-users] Intermittent problem at 32 CPUs

2020-06-08 Thread Diego Zuccato
Il 07/06/20 09:44, Diego Zuccato ha scritto: >> I'm *guessing* that you are tripping over the use of "--tasks 32" on a >> heterogeneous cluster, > If you mean that using "--tasks 32" trips the use of a second node, then > no. The node does have two AMD Opteron 6274 . [...] > I've had a similar pr

Re: [slurm-users] Intermittent problem at 32 CPUs

2020-06-07 Thread Diego Zuccato
On 05/06/20 15:29, Riebs, Andy wrote: Tks for the answer. > I'm *guessing* that you are tripping over the use of "--tasks 32" on a > heterogeneous cluster, If you mean that using "--tasks 32" trips the use of a second node, then no. The node does have two AMD Opteron 6274 . > though your commen

Re: [slurm-users] Intermittent problem at 32 CPUs

2020-06-05 Thread Riebs, Andy
lurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of Diego Zuccato Sent: Friday, June 5, 2020 9:08 AM To: Slurm User Community List Subject: [slurm-users] Intermittent problem at 32 CPUs Hello all. I already tried for some weeks to debug this problem, but it seems I'm stil

[slurm-users] Intermittent problem at 32 CPUs

2020-06-05 Thread Diego Zuccato
Hello all. I already tried for some weeks to debug this problem, but it seems I'm still missing something. I have a small, (very) heterogeneous cluster. After upgrading to Debian 10 and packaged versions of Slurm and IB drivers/tools, I noticed that *sometimes* jobs requesting 32 or more threads f