Re: [slurm-users] Slurm User Group Meeting (SLUG'21) will be held on YouTube on September 21st

2021-09-20 Thread Kota Tsuyuzaki
Hello guys, I'm wondering if I can expect that SLUG 21 will be available as archive after broadcast, won't it? I'm very looking forward to watching the SLUG virtual but the US Mountain Time zone would be hard a little bit for Asian Pacific. Thanks, Kota

Re: [slurm-users] Slurm User Group Meeting (SLUG'21) will be held on YouTube on September 21st

2021-09-20 Thread Tim Wickberg
One last reminder: the Slurm User Group Meeting will be starting at 9am (Mountain) on Tuesday. Hope to (virtually) see you there! - Tim On 9/15/21 2:50 PM, Tim Wickberg wrote: One more reminder that the Slurm User Group Meeting (SLUG'21) will be held on Tuesday, streaming through YouTube Live.

Re: [slurm-users] restarting slurmctld restarts jobs???

2021-09-20 Thread Diego Zuccato
Uhm... Writing it down triggered an alarm bell. What if, at boot, slurmctld is started before home gets mounted? It wouldn't find the file. That would explain the killing of jobs at reboot, but not the one when restarting slurmctld (w/ slurmdbd running). But probably worth more testing... Il

Re: [slurm-users] restarting slurmctld restarts jobs???

2021-09-20 Thread Diego Zuccato
Tks. Checked it: it's on the home filesystem, NFS-shared between the nodes. Well, actually a bit more involved than that: JobCompLoc points to /var/spool/jobscompleted.txt but /var/spool/slurm is actually a symlink to /home/conf/slurm_spool . root@str957-cluster:/# grep spool /etc/slurm.conf J

Re: [slurm-users] restarting slurmctld restarts jobs???

2021-09-20 Thread mercan
Hi; Please check the StateSaveLocation directory which should readable and writable by both slurmctld nodes and it should be a shared directory, not two local directory. The explanation at below is taken from slurm web site: "The backup controller recovers state information from the StateSa

[slurm-users] restarting slurmctld restarts jobs???

2021-09-20 Thread Diego Zuccato
Hello all. After summer break, I noticed that rebooting one of the two slurmctld nodes kills & requeues all running jobs. Before the break it did not impact running jobs and nobody changed config during the break... Duh? Today I just restarted slurmctld and slurmd: same kill&requeue. I'm cur