Re: [slurm-users] [Long] Why are tasks started on a 30 second clock?

2019-07-25 Thread Benjamin Redling
On 25.07.19 20:11, Kirill Katsnelson wrote: On Thu, Jul 25, 2019 at 8:16 AM Mark Hahn > wrote: how about a timeout from elsewhere?  for instance, when I see a 30s delay, I normally at least check DNS, which can introduce such quantized delays. Thanks

Re: [slurm-users] [Long] Why are tasks started on a 30 second clock?

2019-07-25 Thread Kirill Katsnelson
On Thu, Jul 25, 2019 at 8:16 AM Mark Hahn wrote: > how about a timeout from elsewhere? for instance, when I see a 30s delay, > I normally at least check DNS, which can introduce such quantized delays. > Thanks, it's a good guess, but is very unlikely the case. The Google Cloud is quite differe

Re: [slurm-users] Slurm node weights

2019-07-25 Thread Ryan Novosielski
My understanding is that the topology plug-in will overrule this, and that may or may not be a problem depending on your environment. I had a ticket in to SchedMD about this, because it looked like our nodes were getting allocated in the exact reverse order. I suspected this was because our high

Re: [slurm-users] [Long] Why are tasks started on a 30 second clock?

2019-07-25 Thread Mark Hahn
I'll be very grateful if anyone can explain where does the 30-second clock hide! how about a timeout from elsewhere? for instance, when I see a 30s delay, I normally at least check DNS, which can introduce such quantized delays. regards, mark hahn.

Re: [slurm-users] Slurm node weights

2019-07-25 Thread Sarlo, Jeffrey S
I think it would be the slurm-slurmctld rpm. I'm not sure on the timing of updating and restarting. We noticed the issue when we were testing 18.08.01 and so didn't have any users/jobs at the time and just modified and rebuilt. Jeff From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.

[slurm-users] [Long] Why are tasks started on a 30 second clock?

2019-07-25 Thread Kirill Katsnelson
I am setting up and debugging a little (up to 100 nodes) elastic cluster in the Google Compute Engine (GCE). Our compute load is embarrassingly parallelizable, and I am just packing nodes with a either a task per core for CPU, or task per node for GPU loads, and the node VMs are started and deleted

Re: [slurm-users] Slurm node weights

2019-07-25 Thread David Baker
Hi Jeff, Thank you for these details. so far we have never implemented any Slurm fixes. I suspect the node weights feature is quite important and useful, and it's probably worth me investigating this fix. In this respect could you please advise me? If I use the fix to regenerate the "slurm-s

Re: [slurm-users] Slurm node weights

2019-07-25 Thread Sarlo, Jeffrey S
This is the fix if you want to modify the code and rebuild https://github.com/SchedMD/slurm/commit/f66a2a3e2064 I think 18.08.04 and later have it fixed. Jeff From: slurm-users on behalf of David Baker Sent: Thursday, July 25, 2019 6:53 AM To: Slurm User Co

Re: [slurm-users] Slurm node weights

2019-07-25 Thread David Baker
Hello, Thank you for the replies. We're running an early version of Slurm 18.08 and it does appear that the node weights are being ignored re the bug. We're experimenting with Slurm 19*, however we don't expect to deploy that new version for quite a while. In the meantime does anyone know if

Re: [slurm-users] Slurm node weights

2019-07-25 Thread Sarlo, Jeffrey S
Which version of Slurm are you running? I know some of the earlier versions of 18.08 had a bug and node weights were not working. Jeff From: slurm-users on behalf of David Baker Sent: Thursday, July 25, 2019 6:09 AM To: slurm-users@lists.schedmd.com Subjec

Re: [slurm-users] Slurm node weights

2019-07-25 Thread Sean Crosby
Hi David, What does: scontrol show node orange01 scontrol show node orange02 show? Just to see if there's a default node weight hanging around, and if your weight changes have been picked up. Sean -- Sean Crosby Senior DevOpsHPC Engineer and HPC Team Lead | Research Platform Services Research

Re: [slurm-users] Slurm node weights

2019-07-25 Thread Viviano, Brad
Did you try assigning a weight to orange[02-03]? I've found with Slurm it's better to be exact in your slurm.conf and don't relay on the defaults. This is what I use on my cluster: NodeName=r1n[01-32] weight=10 CoresPerSocket=16 Sockets=2 ThreadsPerCore=1 RealMemory=257000 NodeName=r1n[33-64

Re: [slurm-users] Slurm node weights

2019-07-25 Thread David Baker
Hello, As an update I note that I have tried restarting the slurmctld, however that doesn't help. Best regards, David From: slurm-users on behalf of David Baker Sent: 25 July 2019 11:47:35 To: slurm-users@lists.schedmd.com Subject: [slurm-users] Slurm nod

[slurm-users] Slurm node weights

2019-07-25 Thread David Baker
Hello, I'm experimenting with node weights and I'm very puzzled by what I see. Looking at the documentation I gathered that jobs will be allocated to the nodes with the lowest weight which satisfies their requirements. I have 3 nodes in a partition and I have defined the nodes like so.. Node