Hi, I've just finished setup of a single node "cluster" with slurm on ubuntu 20.04. Infrastructural limitations prevent me from running it 24/7, and it's only powered on during business hours.
Currently, I have a cron job running that hibernates that sole node before closing time. The hibernation is done with standard systemd, and hibernates to the swap partition. I have not run any lengthy slurm jobs on it yet. Before I do, can I get some thoughts on a couple of things? If it hibernated when slurm still had jobs running/queued, would they resume properly when the machine powers back on? Note that my swap space is bigger than my RAM. Is it necessary to perhaps setup a pre-hibernate script for systemd to iterate scontrol to suspend all the jobs before hibernating and resume them post-resume? What about the wall times? I'm uessing that slurm will count the downtime as elapsed for each job. Is there a way to config this, or is the only alternative a post-hibernate script that iteratively updates the wall times of the running jobs using scontrol again? Thanks for your attention. Regards AR