Restarting slurmd should be fine assuming they come back before the
communications time out. I restart slurmd's all the time and haven't
had any real problems.
-Paul Edmon-
On 7/27/2018 6:51 PM, Chris Harwell wrote:
Ot is possible, but double check your config for timeouts first.
On Fri, J
Ot is possible, but double check your config for timeouts first.
On Fri, Jul 27, 2018, 15:31 Prentice Bisbal wrote:
> Slurm-users,
>
> I'm still learning Slurm, so I have what I think is a basic question.
> Can you restart slurmd on nodes where jobs are running, or will that
> kill the jobs? I r
Slurm-users,
I'm still learning Slurm, so I have what I think is a basic question.
Can you restart slurmd on nodes where jobs are running, or will that
kill the jobs? I ran into the same problem as described here:
https://bugs.schedmd.com/show_bug.cgi?id=3535
I believe the best way to fix th
You show you still have more that one partition with Default=YES.
There should one and only one that is set to YES.
That is the one partition that is used if it is not specified.
Brian Andrus
On 7/27/2018 6:34 AM, valeri...@cbpf.br wrote:
Hi Merlin
Do you accidentally have more than one par
Hi Merlin
Do you accidentally have more than one partition with Default=YES?
It was. I changed to NO and I continue with the same error.
[root@master ~]# scontrol show partition
PartitionName=course
AllowGroups=courseit AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
Hi Merlin
[root@masters3 ~]# scontrol show partition
PartitionName=course
AllowGroups=courseit AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LL