OK, feeling a bit silly about having sent this after re-re-reading the
man page for slurm.conf... and discovering the
AccountingStorageBackupHost setting.
Sorry for wasting the time of anyone who read that :)
Xand*
*
On 15/02/2022 15:46, Xand Meaden wrote:
Hello,
I'm wondering what others a
There hasn't been as much effort to make slurmdbd as resilient as you
are hinting at because there has been no need.
The database itself can be made resilient for keeping the data safe.
Data that is unable to go in to the database is cached until it becomes
available, even if that is to failov
It doesn’t affect the use case of connecting via srun afterwards as no new job
is submitted so the job_submit.lua logic is never called.
$ srun --pty /bin/bash
srun: error: submit_job: ERROR: interactive jobs are not allowed in the CPU or
GPU partitions. Use the interactive partition
srun: error
...that would interfere with users 'logging in' to a job to check on it
though, wouldn't it? I mean we do have pam_slurm_adopt configured but I
still tell people it's preferable to use 'srun --jobid= --pty
/bin/bash' to check what a specific job is doing as pam_slurm_adopt
doesn't seem to i
Hi Peter,
as Rémi said, the way to do this in Slurm is via a job submit plugin. For
example in our job_submit.lua we have
if (job_desc.partition == "cpu" or job_desc.partition == "gpu") and
job_desc.qos ~= "admin" then
if job_desc.script == nil or job_desc.script == '' then
slurm.l