[slurm-users] Re: Temporarily bypassing pam_slurm_adopt.so

2024-07-09 Thread Timony, Mick via slurm-users
At HMS we do the same as Paul's cluster and specify the groups we want to have access to all our compute nodes, we allow two groups that represent our DevOps team and our Research Computing consultants to have access and then corresponding sudo rules for each group to allow different command se

[slurm-users] Re: Job submitted to multiple partitions not running when any partition is full

2024-07-09 Thread Timony, Mick via slurm-users
Hi Paul, There could be multiple reasons why the job isn't running, from the user's QOS to your cluster hitting MaxJobCount. This page might help: https://slurm.schedmd.com/high_throughput.html The output of the following command might help: scontrol show job 465072​ Regards -- Mick Timony Se

[slurm-users] Re: Increasing SlurmdTimeout beyond 300 Seconds

2024-02-12 Thread Timony, Mick via slurm-users
We set SlurmdTimeout=600​. The docs say not to go any higher than 65533 seconds: https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdTimeout The FAQ has info about SlurmdTimeout also. The worst thing that could happen is will take longer to set nodes as being down: >A node is set DOWN when the s