Re: [slurm-users] Moving Slurmctld and slurmdbd to a new host

2021-01-15 Thread Ryan Novosielski
My understanding is job state directory. Theoretically if you back it up, screw up and lose it, you can restore it and try again. There’s some mention of this in the upgrade docs if I’m not mistaken (as they suggest backing it up in case you mess up during). -- #BlackLivesMatter || \\UTGER

[slurm-users] Moving Slurmctld and slurmdbd to a new host

2021-01-15 Thread Prentice Bisbal
Slurm users, I'm planning on moving slurmctld and slurmdbd to a new host. I know how to dump the MySQL DB from the old server and import it to the new slurmdbd host, and I know how to copy the job state directories to the new host. I plan on doing this during our next maintenance window when

Re: [slurm-users] Parent account in AllowAccounts

2021-01-15 Thread Fulcomer, Samuel
Durai, There is no inheritance in "AllowAccounts". You need to specify each account explicitly. There _is_ inheritance in fairshare calculation. On Fri, Jan 15, 2021 at 2:17 PM Brian Andrus wrote: > As I understand it, the parents are really meant for reporting, so you > can run reports that a

Re: [slurm-users] Parent account in AllowAccounts

2021-01-15 Thread Brian Andrus
As I understand it, the parents are really meant for reporting, so you can run reports that aggregate the usage among children. Useful for a chargeback model. As far as permissions, that is on a per account basis, regardless of hierarchy. Just because a parent can go to the bar, doesn't mean

Re: [slurm-users] Parent account in AllowAccounts

2021-01-15 Thread Michael Gutteridge
I've only ever seen the parent-child account relationship discussed in the context of usage and fairshare. I think for the allow/deny controls you have to specify each account individually. I did find this enhancement request: https://bugs.schedmd.com/show_bug.cgi?id=1398 which would support that

Re: [slurm-users] [EXT] GPU Jobs with Slurm

2021-01-15 Thread Ryan Novosielski
Do you have any more information about that? I think that’s the bug I alluded to earlier in the conversation, and I believe I’m affected by it, but don’t know how to tell, how to fix it, or how to refer to it if I wanted to ask SchedMD (we have a contract). -- #BlackLivesMatter || \\UTGERS

[slurm-users] GPU process accounting information

2021-01-15 Thread Ole Holm Nielsen
Hi, We have installed some new GPU nodes, and now users are asking for some sort of monitoring of GPU utilisation and GPU memory utilisation at the end of a job, like what Slurm already provides for CPU and memory usage. I haven't found any pages describing how to perform GPU accounting withi

[slurm-users] error: DBD_SEND_MULT_MSG message from invalid uid 9920

2021-01-15 Thread Michael Smith
I’m new to SLURM and attempting to setup a new installation. I’ve built the 20.11.2 tools on CentOS 7, and now I’ve got the MariaDB running but the slurmdbd log file is full of: [2021-01-15T09:34:25.002] error: Processing last message from connection 10(192.168.1.16) uid(9920) [2021-01-15T09:3

Re: [slurm-users] GPU process accounting information

2021-01-15 Thread Michael Di Domenico
i would imagine that slurm should be able to pull that data through nvml. but i'd bet the hooks aren't inplace. On Fri, Jan 15, 2021 at 7:44 AM Ole Holm Nielsen wrote: > > Hi, > > We have installed some new GPU nodes, and now users are asking for some > sort of monitoring of GPU utilisation and

[slurm-users] Parent account in AllowAccounts

2021-01-15 Thread Durai Arasan
Hi, As you know for each partition you can specify AllowAccounts=account1,account2... I have a parent account say "parent1" with two child accounts "child1" and "child2" I expected that setting AllowAccounts=parent1 will allow parent1,child1, and child2 to submit jobs to that partition. But unfor

Re: [slurm-users] pam_slurm_adopt always claims now active jobs even when they do

2021-01-15 Thread William Brown
I encountered the same problem, and as with munge I created a .te file that can be built to create a policy to add to the compute nodes to fix this: my-pam_slurm_adopt.te: --- module my-pam_slurm_adopt 1.0; require {

Re: [slurm-users] pam_slurm_adopt always claims now active jobs even when they do

2021-01-15 Thread Ole Holm Nielsen
On 10/29/20 12:56 PM, Paul Raines wrote: The debugging was useful.  The problem turned out to be that I am running with SELINUX enabled due to corporate policy.  The issue was SELINUX is blocking sshd access to /var/slurm/spool/d socket files: The documentation https://slurm.schedmd.com/pam_slu