Re: [slurm-users] restart slurmd on nodes w/ running jobs?

2018-07-27 Thread Paul Edmon
Restarting slurmd should be fine assuming they come back before the communications time out.  I restart slurmd's all the time and haven't had any real problems. -Paul Edmon- On 7/27/2018 6:51 PM, Chris Harwell wrote: Ot is possible, but double check your config for timeouts first. On Fri, J

Re: [slurm-users] restart slurmd on nodes w/ running jobs?

2018-07-27 Thread Chris Harwell
Ot is possible, but double check your config for timeouts first. On Fri, Jul 27, 2018, 15:31 Prentice Bisbal wrote: > Slurm-users, > > I'm still learning Slurm, so I have what I think is a basic question. > Can you restart slurmd on nodes where jobs are running, or will that > kill the jobs? I r

[slurm-users] restart slurmd on nodes w/ running jobs?

2018-07-27 Thread Prentice Bisbal
Slurm-users, I'm still learning Slurm, so I have what I think is a basic question. Can you restart slurmd on nodes where jobs are running, or will that kill the jobs? I ran into the same problem as described here: https://bugs.schedmd.com/show_bug.cgi?id=3535 I believe the best way to fix th

Re: [slurm-users] Fwd: srun: error: Unable to allocate resources: Invalid partition name specified

2018-07-27 Thread Brian Andrus
You show you still have more that one partition with Default=YES. There should one and only one that is set to YES. That is the one partition that is used if it is not specified. Brian Andrus On 7/27/2018 6:34 AM, valeri...@cbpf.br wrote: Hi Merlin Do you accidentally have more than one par

[slurm-users] Fwd: srun: error: Unable to allocate resources: Invalid partition name specified

2018-07-27 Thread valeriana
Hi Merlin Do you accidentally have more than one partition with Default=YES? It was. I changed to NO and I continue with the same error. [root@master ~]# scontrol show partition PartitionName=course AllowGroups=courseit AllowAccounts=ALL AllowQos=ALL AllocNodes=ALL Default=NO QoS=N/A

[slurm-users] Fwd: srun: error: Unable to allocate resources: Invalid partition name specified

2018-07-27 Thread valeriana
Hi Merlin [root@masters3 ~]# scontrol show partition PartitionName=course AllowGroups=courseit AllowAccounts=ALL AllowQos=ALL AllocNodes=ALL Default=NO QoS=N/A DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LL