Hi,

I've just finished  setup of a single node "cluster" with slurm on ubuntu
20.04. Infrastructural limitations  prevent me from running it 24/7, and
it's only powered on during business hours.


Currently, I have a cron job running that hibernates that sole node before
closing time.

The hibernation is done with standard systemd, and hibernates to the swap
partition.

 I have not run any lengthy slurm jobs on it yet. Before I do, can I get
some thoughts on a couple of things?

If it hibernated when slurm still had jobs running/queued, would they
resume properly when the machine powers back on?

Note that my swap space is bigger than my  RAM.

Is it necessary to perhaps setup a pre-hibernate script for systemd to
iterate scontrol to suspend all the jobs before hibernating and resume them
post-resume?

What about the wall times? I'm uessing that slurm will count the downtime
as elapsed for each job. Is there a way to config this, or is the only
alternative a post-hibernate script that iteratively updates the wall times
of the running jobs using scontrol again?

Thanks for your attention.
Regards
AR

Reply via email to