[slurm-users] BB plugin changes

2020-01-23 Thread subodhp
Dear all, I want to make changes in slurm's BB generic plugin skeleton, please tell me where i need to make the changes. Do i need to make the changes in burst_buffer_generic.c file or the example files which slurm has provided. - Subodh Pandey --

Re: [slurm-users] Can't get node out of drain state

2020-01-23 Thread Chris Samuel
On 23/1/20 7:09 pm, Dean Schulze wrote: Pretty strange that having a Gres= property on a node that doesn't have a gpu would get it stuck in the drain state. Slurm verifies that nodes have the capabilities you say they have so that should a node boot with less RAM than it should have, or a soc

Re: [slurm-users] Can't get node out of drain state

2020-01-23 Thread Dean Schulze
The problem turned out to be that I had Gres=gpu:gp100:1 on the NodeName line for that node and it didn't have a gpu or a gres.conf. Once I moved that to the correct NodeName line in slurm.conf that node came out of the drain state and became usable again. Pretty strange that having a Gres= prope

[slurm-users] setting default resource limits with sacctmgr dump/load ?

2020-01-23 Thread Grigory Shamov
Hi All, I have tried to use a script that would manage SURM accounts and users with sacctmgr dump flat files. I am using SLURM 19.05.4, on CentOS 7.6. Our accounting scheme is rather flat: there is one level of accounting groups and users that belong to the groups. It looks like with sacctmgr dum

[slurm-users] Useful script: estimating how long until the next blocked job starts

2020-01-23 Thread Renfro, Michael
Hey, folks. Some of my users submit job after job with no recognition of our 1000 CPU-day TRES limit, and thus their later jobs get blocked with the reason AssocGrpCPURunMinutesLimit. I’ve written up a script [1] using Ole Holm Nielsen’s showuserlimits script [2] that will identify a user’s sm

Re: [slurm-users] Can't get node out of drain state

2020-01-23 Thread Alex Chekholko
Hey Dean, Does 'scontrol show node https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#configure-firewall-for-slurm-daemons Also check that slurmd daemons on the compute nodes can talk to each other (not just to the master). e.g. bottom of https://slurm.schedmd.com/big_sys.html Regards, Alex

Re: [slurm-users] Issues with HA config and AllocNodes

2020-01-23 Thread Dave Sizer
Bumping on this thread.. this issue persists even after upgrade to 19.05.4. Does anyone have an HA setup that could provide some insight? From: Dave Sizer Date: Thursday, December 19, 2019 at 9:44 AM To: Slurm User Community List , Brian Andrus Subject: Re: [slurm-users] Issues with HA config

[slurm-users] Can't get node out of drain state

2020-01-23 Thread Dean Schulze
I've tried the normal things with scontrol ( https://blog.redbranch.net/2015/12/26/resetting-drained-slurm-node/), but I have a node that will not come out of the drain state. I've also done a hard reboot and tried again. Are there any other remedies? Thanks.

Re: [slurm-users] Implementation of generic plugin

2020-01-23 Thread subodhp
Dear all, I wish to know where i need to make the changes, running > srun -bb="capacity=1G access=striped type=scratch" a.out command executes but then running below command > scontrol show burst doesn't shows anything. Regards, Subodh On January 22, 2020 at 3:53 PM subodhp wrote: > Dear