Re: [slurm-users] cgroup limits not created for jobs

2020-07-24 Thread Chris Samuel
On Friday, 24 July 2020 9:48:35 AM PDT Paul Raines wrote: > But when I run a job on the node it runs I can find no > evidence in cgroups of any limits being set > > Example job: > > mlscgpu1[0]:~$ salloc -n1 -c3 -p batch --gres=gpu:quadro_rtx_6000:1 --mem=1G > salloc: Granted job allocation 17 >

Re: [slurm-users] Restart Job after sudden reboot of the node

2020-07-24 Thread Christopher Samuel
On 7/24/20 12:28 pm, Saikat Roy wrote: If SLURM restarts automatically, is there any way to stop it? If you would rather Slurm not start scheduling jobs when it is restarted then you can set your partitions to have `State=DOWN` in slurm.conf. That way should the node running slurmctld reboo

Re: [slurm-users] Restart Job after sudden reboot of the node

2020-07-24 Thread Steven Dick
Both See man sbatch, --requeue The default is to not requeue (unless it was changed in slurm.conf) and your job anc check $SLURM_RESTART_COUNT to see if it has been restarted. This is handy if your job can checkpoint / restart. On Fri, Jul 24, 2020 at 3:33 PM Saikat Roy wrote: > Hello, > > I

[slurm-users] Restart Job after sudden reboot of the node

2020-07-24 Thread Saikat Roy
Hello, I have recently installed SLURM in our ubuntu cluster. I have one doubt that if the system somehow automatically restarts due to power failure what will happen to the running jobs. Are they going to resume automatically or we have to restart manually? If SLURM restarts automatically, is th

[slurm-users] cgroup limits not created for jobs

2020-07-24 Thread Paul Raines
I am not seeing any cgroup limits being put in place on the nodes when jobs run. I have slurm 20.02 running on CentOS 7.8 In slurm.conf I have ProctrackType=proctrack/cgroup TaskPlugin=task/affinity,task/cgroup SelectTypeParameters=CR_Core_Memory JobAcctGatherType=jobacct_gather/cgroup and c

Re: [slurm-users] Fwd: Slurm MySQL database configuration

2020-07-24 Thread Tina Friedrich
Hi Peter, is this an actual NFS server, or something exporting NFS (like a NetApp). This might be a silly question but - if it's an actual server, could you run the slurmdb server on the NFS server? There would then be no need for any clustered DB service or anything; it would simply make the

[slurm-users] Jenkins integration

2020-07-24 Thread Brian Andrus
All, I am wondering what approaches folks have used to integrate Jenkins into slurm. In particular, the ability to submit jobs as different users by Jenkins, if that is how it is being done. It occurs to me that the ability to use "--uid" for the sbatch command by other than root could be