Il 08/06/20 12:16, Diego Zuccato ha scritto:
> I have another partition on these new nodes. 4 identical machines, new
> installation, ConnectX-5 card, dual Intel Xeon 5120 (14 core dual
> thread). No problem running a job requiring 112 threads (on 4 nodes),
> but can't run a single-node job with 5
Il 07/06/20 09:44, Diego Zuccato ha scritto:
>> I'm *guessing* that you are tripping over the use of "--tasks 32" on a
>> heterogeneous cluster,
> If you mean that using "--tasks 32" trips the use of a second node, then
> no. The node does have two AMD Opteron 6274 .
[...]
> I've had a similar pr
On 05/06/20 15:29, Riebs, Andy wrote:
Tks for the answer.
> I'm *guessing* that you are tripping over the use of "--tasks 32" on a
> heterogeneous cluster,
If you mean that using "--tasks 32" trips the use of a second node, then
no. The node does have two AMD Opteron 6274 .
> though your commen
lurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of
Diego Zuccato
Sent: Friday, June 5, 2020 9:08 AM
To: Slurm User Community List
Subject: [slurm-users] Intermittent problem at 32 CPUs
Hello all.
I already tried for some weeks to debug this problem, but it seems I'm
stil
Hello all.
I already tried for some weeks to debug this problem, but it seems I'm
still missing something.
I have a small, (very) heterogeneous cluster. After upgrading to Debian
10 and packaged versions of Slurm and IB drivers/tools, I noticed that
*sometimes* jobs requesting 32 or more threads f