On Friday, 24 July 2020 9:48:35 AM PDT Paul Raines wrote:
> But when I run a job on the node it runs I can find no
> evidence in cgroups of any limits being set
>
> Example job:
>
> mlscgpu1[0]:~$ salloc -n1 -c3 -p batch --gres=gpu:quadro_rtx_6000:1 --mem=1G
> salloc: Granted job allocation 17
>
On 7/24/20 12:28 pm, Saikat Roy wrote:
If SLURM restarts automatically, is there any way to stop it?
If you would rather Slurm not start scheduling jobs when it is restarted
then you can set your partitions to have `State=DOWN` in slurm.conf.
That way should the node running slurmctld reboo
Both
See man sbatch, --requeue
The default is to not requeue (unless it was changed in slurm.conf) and
your job anc check $SLURM_RESTART_COUNT to see if it has been restarted.
This is handy if your job can checkpoint / restart.
On Fri, Jul 24, 2020 at 3:33 PM Saikat Roy wrote:
> Hello,
>
> I
Hello,
I have recently installed SLURM in our ubuntu cluster. I have one doubt
that if the system somehow automatically restarts due to power failure what
will happen to the running jobs. Are they going to resume automatically or
we have to restart manually? If SLURM restarts automatically, is th
I am not seeing any cgroup limits being put in place on the nodes
when jobs run. I have slurm 20.02 running on CentOS 7.8
In slurm.conf I have
ProctrackType=proctrack/cgroup
TaskPlugin=task/affinity,task/cgroup
SelectTypeParameters=CR_Core_Memory
JobAcctGatherType=jobacct_gather/cgroup
and c
Hi Peter,
is this an actual NFS server, or something exporting NFS (like a NetApp).
This might be a silly question but - if it's an actual server, could you
run the slurmdb server on the NFS server? There would then be no need
for any clustered DB service or anything; it would simply make the
All,
I am wondering what approaches folks have used to integrate Jenkins into
slurm.
In particular, the ability to submit jobs as different users by Jenkins,
if that is how it is being done.
It occurs to me that the ability to use "--uid" for the sbatch command
by other than root could be