My understanding is job state directory. Theoretically if you back it up, screw
up and lose it, you can restore it and try again. There’s some mention of this
in the upgrade docs if I’m not mistaken (as they suggest backing it up in case
you mess up during).
--
#BlackLivesMatter
|| \\UTGER
Slurm users,
I'm planning on moving slurmctld and slurmdbd to a new host. I know how
to dump the MySQL DB from the old server and import it to the new
slurmdbd host, and I know how to copy the job state directories to the
new host. I plan on doing this during our next maintenance window when
Durai,
There is no inheritance in "AllowAccounts". You need to specify each
account explicitly.
There _is_ inheritance in fairshare calculation.
On Fri, Jan 15, 2021 at 2:17 PM Brian Andrus wrote:
> As I understand it, the parents are really meant for reporting, so you
> can run reports that a
As I understand it, the parents are really meant for reporting, so you
can run reports that aggregate the usage among children. Useful for a
chargeback model.
As far as permissions, that is on a per account basis, regardless of
hierarchy.
Just because a parent can go to the bar, doesn't mean
I've only ever seen the parent-child account relationship discussed in the
context of usage and fairshare. I think for the allow/deny controls you
have to specify each account individually.
I did find this enhancement request:
https://bugs.schedmd.com/show_bug.cgi?id=1398 which would support that
Do you have any more information about that? I think that’s the bug I alluded
to earlier in the conversation, and I believe I’m affected by it, but don’t
know how to tell, how to fix it, or how to refer to it if I wanted to ask
SchedMD (we have a contract).
--
#BlackLivesMatter
|| \\UTGERS
Hi,
We have installed some new GPU nodes, and now users are asking for some
sort of monitoring of GPU utilisation and GPU memory utilisation at the
end of a job, like what Slurm already provides for CPU and memory usage.
I haven't found any pages describing how to perform GPU accounting withi
I’m new to SLURM and attempting to setup a new installation. I’ve built the
20.11.2 tools on CentOS 7, and now I’ve got the MariaDB running but the
slurmdbd log file is full of:
[2021-01-15T09:34:25.002] error: Processing last message from connection
10(192.168.1.16) uid(9920)
[2021-01-15T09:3
i would imagine that slurm should be able to pull that data through
nvml. but i'd bet the hooks aren't inplace.
On Fri, Jan 15, 2021 at 7:44 AM Ole Holm Nielsen
wrote:
>
> Hi,
>
> We have installed some new GPU nodes, and now users are asking for some
> sort of monitoring of GPU utilisation and
Hi,
As you know for each partition you can specify
AllowAccounts=account1,account2...
I have a parent account say "parent1" with two child accounts "child1"
and "child2"
I expected that setting AllowAccounts=parent1 will allow parent1,child1,
and child2 to submit jobs to that partition. But unfor
I encountered the same problem, and as with munge I created a .te file that
can be built to create a policy to add to the compute nodes to fix this:
my-pam_slurm_adopt.te:
---
module my-pam_slurm_adopt 1.0;
require {
On 10/29/20 12:56 PM, Paul Raines wrote:
The debugging was useful. The problem turned out to be that I am running
with SELINUX enabled due to corporate policy. The issue was SELINUX is
blocking sshd access to /var/slurm/spool/d socket files:
The documentation https://slurm.schedmd.com/pam_slu
12 matches
Mail list logo