Re: [slurm-users] Cgroup not restricting GPUs acces with ssh

2023-02-10 Thread Brian Johanson
Guillaume, Check out the slurm-users thread from 2018 "pam_slurm_adopt does not constrain memory?" which explains the issues with systemd-logind. Also: https://bugs.schedmd.com/show_bug.cgi?id=5920 -b On 2/9/23 7:09 AM, Guillaume Lechantre wrote: Hi everyone, I'm in charge of the new clust

Re: [slurm-users] node health check

2023-01-31 Thread Brian Johanson
On 1/30/23 10:35 PM, Ratnasamy, Fritz wrote: Hi,  Currently, some of our nodes are overloaded. The nhc installed used to check the load and drain the node when it is overloaded. However, for the past few  days, it is not showing the state of the node. When I run /usr/sbin/nhc manually, it sa

Re: [slurm-users] Node can't run simple job when STATUS is up and STATE is idle

2020-01-21 Thread Brian Johanson
On 1/21/2020 12:32 AM, Chris Samuel wrote: On 20/1/20 3:00 pm, Dean Schulze wrote: There's either a problem with the source code I cloned from github, or there is a problem when the controller runs on Ubuntu 19 and the node runs on CentOS 7.7. I'm downgrading to a stable 19.05 build to see

Re: [slurm-users] salloc with bash scripts problem

2019-01-02 Thread Brian Johanson
sallocdefaultcommand specified in slurm.conf will change the default behavior when salloc is executed without appending a command and also explain conflicting behavior between installations.    SallocDefaultCommand   Normally, salloc(1) will run the user's default shell when a

Re: [slurm-users] Restore Last JOBID After Reinstall of Slurm Master Node?

2018-12-24 Thread Brian Johanson
If that is lost, you can manually set it in slurm.conf with FirstJobId -b On 12/24/2018 1:09 AM, Sean Caron wrote: On Mon, Dec 24, 2018 at 12:13 AM Hanby, Mike > wrote: Howdy, We installed a new server to take over the duties of the Slurm master. I imported