Hello guys,
I'm wondering if I can expect that SLUG 21 will be available as archive after
broadcast, won't it?
I'm very looking forward to watching the SLUG virtual but the US Mountain Time
zone would be hard a little bit for Asian Pacific.
Thanks,
Kota
One last reminder: the Slurm User Group Meeting will be starting at 9am
(Mountain) on Tuesday. Hope to (virtually) see you there!
- Tim
On 9/15/21 2:50 PM, Tim Wickberg wrote:
One more reminder that the Slurm User Group Meeting (SLUG'21) will be
held on Tuesday, streaming through YouTube Live.
Uhm... Writing it down triggered an alarm bell.
What if, at boot, slurmctld is started before home gets mounted? It
wouldn't find the file.
That would explain the killing of jobs at reboot, but not the one when
restarting slurmctld (w/ slurmdbd running). But probably worth more
testing...
Il
Tks. Checked it: it's on the home filesystem, NFS-shared between the
nodes. Well, actually a bit more involved than that: JobCompLoc points
to /var/spool/jobscompleted.txt but /var/spool/slurm is actually a
symlink to /home/conf/slurm_spool .
root@str957-cluster:/# grep spool /etc/slurm.conf
J
Hi;
Please check the StateSaveLocation directory which should readable and
writable by both slurmctld nodes and it should be a shared directory,
not two local directory.
The explanation at below is taken from slurm web site:
"The backup controller recovers state information from the
StateSa
Hello all.
After summer break, I noticed that rebooting one of the two slurmctld
nodes kills & requeues all running jobs. Before the break it did not
impact running jobs and nobody changed config during the break... Duh?
Today I just restarted slurmctld and slurmd: same kill&requeue.
I'm cur