That's probably not optimal, but could work. I'd go with brutal
preemption: swapping 90+G can be quite time-consuming.
Diego
Il 07/02/2023 14:18, Analabha Roy ha scritto:
On Tue, 7 Feb 2023, 18:12 Diego Zuccato, <diego.zucc...@unibo.it
<mailto:diego.zucc...@unibo.it>> wrote:
RAM used by a suspended job is not released. At most it can be swapped
out (if enough swap is available).
There should be enough swap available. I have 93 gigs of Ram and as big
a swap partition. I can top it off with swap files if needed.
Il 07/02/2023 13:14, Analabha Roy ha scritto:
> Hi Sean,
>
> Thanks for your awesome suggestion! I'm going through the
reservation
> docs now. At first glance, it seems like a daily reservation
would turn
> down jobs that are too big for the reservation. It'd be nice if
> slurm could suspend (in the manner of 'scontrol suspend') jobs
during
> reserved downtime and resume them after. That way, folks can submit
> large jobs without having to worry about the downtimes. Perhaps
the FLEX
> option in reservations can accomplish this somehow?
>
>
> I suppose that I can do it using a shell script iterator and a
cron job,
> but that seems like an ugly hack. I was hoping if there is a way to
> config this in slurm itself?
>
> AR
>
> On Tue, 7 Feb 2023 at 16:06, Sean Mc Grath <smcg...@tcd.ie
<mailto:smcg...@tcd.ie>
> <mailto:smcg...@tcd.ie <mailto:smcg...@tcd.ie>>> wrote:
>
> Hi Analabha,
>
> Could you do something like create a daily reservation for 8
hours
> that starts at 9am, or whatever times work for you like the
> following untested command:
>
> scontrol create reservation starttime=09:00:00 duration=8:00:00
> nodecnt=1 flags=daily ReservationName=daily
>
> Daily option at
https://slurm.schedmd.com/scontrol.html#OPT_DAILY
<https://slurm.schedmd.com/scontrol.html#OPT_DAILY>
> <https://slurm.schedmd.com/scontrol.html#OPT_DAILY
<https://slurm.schedmd.com/scontrol.html#OPT_DAILY>>
>
> Some more possible helpful documentation at
> https://slurm.schedmd.com/reservations.html
<https://slurm.schedmd.com/reservations.html>
> <https://slurm.schedmd.com/reservations.html
<https://slurm.schedmd.com/reservations.html>>, search for "daily".
>
> My idea being that jobs can only run in that reservation, (that
> would have to be configured separately, not sure how from the
top of
> my head), which is only active during the times you want the
node to
> be working. So the cronjob that hibernates/shuts it down will
do so
> when there are no jobs running. At least in theory.
>
> Hope that helps.
>
> Sean
>
> ---
> Sean McGrath
> Senior Systems Administrator, IT Services
>
>
------------------------------------------------------------------------
> *From:* slurm-users <slurm-users-boun...@lists.schedmd.com
<mailto:slurm-users-boun...@lists.schedmd.com>
> <mailto:slurm-users-boun...@lists.schedmd.com
<mailto:slurm-users-boun...@lists.schedmd.com>>> on behalf of
> Analabha Roy <hariseldo...@gmail.com
<mailto:hariseldo...@gmail.com> <mailto:hariseldo...@gmail.com
<mailto:hariseldo...@gmail.com>>>
> *Sent:* Tuesday 7 February 2023 10:05
> *To:* Slurm User Community List
<slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com>
> <mailto:slurm-users@lists.schedmd.com
<mailto:slurm-users@lists.schedmd.com>>>
> *Subject:* Re: [slurm-users] [External] Hibernating a whole
cluster
> Hi,
>
> Thanks. I had read the Slurm Power Saving Guide before. I believe
> the configs enable slurmctld to check other nodes for
idleness and
> suspend/resume them. Slurmctld must run on a separate, always-on
> server for this to work, right?
>
> My issue might be a little different. I literally have only
one node
> that runs everything: slurmctld, slurmd, slurmdbd, everything.
>
> This node must be set to "sudo systemctl hibernate"after business
> hours, regardless of whether jobs are queued or running. The next
> business day, it can be switched on manually.
>
> systemctl hibernate is supposed to save the entire run state
of the
> sole node to swap and poweroff. When powered on again, it should
> restore everything to its previous running state.
>
> When the job queue is empty, this works well. I'm not sure
how well
> this hibernate/resume will work with running jobs and would
> appreciate any suggestions or insights.
>
> AR
>
>
> On Tue, 7 Feb 2023 at 01:39, Florian Zillner
<fzill...@lenovo.com <mailto:fzill...@lenovo.com>
> <mailto:fzill...@lenovo.com <mailto:fzill...@lenovo.com>>> wrote:
>
> Hi,
>
> follow this guide:
https://slurm.schedmd.com/power_save.html
<https://slurm.schedmd.com/power_save.html>
> <https://slurm.schedmd.com/power_save.html
<https://slurm.schedmd.com/power_save.html>>
>
> Create poweroff / poweron scripts and configure slurm to
do the
> poweroff after X minutes. Works well for us. Make sure to
set an
> appropriate time (ResumeTimeout) to allow the node to
come back
> to service.
> Note that we did not achieve good power saving with
suspending
> the nodes, powering them off and on saves way more power. The
> downside is it takes ~ 5 mins to resume (= power on) the
nodes
> when needed.
>
> Cheers,
> Florian
>
------------------------------------------------------------------------
> *From:* slurm-users
<slurm-users-boun...@lists.schedmd.com
<mailto:slurm-users-boun...@lists.schedmd.com>
> <mailto:slurm-users-boun...@lists.schedmd.com
<mailto:slurm-users-boun...@lists.schedmd.com>>> on behalf of
> Analabha Roy <hariseldo...@gmail.com
<mailto:hariseldo...@gmail.com>
> <mailto:hariseldo...@gmail.com
<mailto:hariseldo...@gmail.com>>>
> *Sent:* Monday, 6 February 2023 18:21
> *To:* slurm-users@lists.schedmd.com
<mailto:slurm-users@lists.schedmd.com>
> <mailto:slurm-users@lists.schedmd.com
<mailto:slurm-users@lists.schedmd.com>>
> <slurm-users@lists.schedmd.com
<mailto:slurm-users@lists.schedmd.com>
> <mailto:slurm-users@lists.schedmd.com
<mailto:slurm-users@lists.schedmd.com>>>
> *Subject:* [External] [slurm-users] Hibernating a whole
cluster
> Hi,
>
> I've just finished setup of a single node "cluster" with
slurm
> on ubuntu 20.04. Infrastructural limitations prevent me from
> running it 24/7, and it's only powered on during
business hours.
>
>
> Currently, I have a cron job running that hibernates that
sole
> node before closing time.
>
> The hibernation is done with standard systemd, and
hibernates to
> the swap partition.
>
> I have not run any lengthy slurm jobs on it yet. Before
I do,
> can I get some thoughts on a couple of things?
>
> If it hibernated when slurm still had jobs
running/queued, would
> they resume properly when the machine powers back on?
>
> Note that my swap space is bigger than my RAM.
>
> Is it necessary to perhaps setup a pre-hibernate script for
> systemd to iterate scontrol to suspend all the jobs before
> hibernating and resume them post-resume?
>
> What about the wall times? I'm uessing that slurm will
count the
> downtime as elapsed for each job. Is there a way to
config this,
> or is the only alternative a post-hibernate script that
> iteratively updates the wall times of the running jobs using
> scontrol again?
>
> Thanks for your attention.
> Regards
> AR
>
>
>
> --
> Analabha Roy
> Assistant Professor
> Department of Physics
> <http://www.buruniv.ac.in/academics/department/physics
<http://www.buruniv.ac.in/academics/department/physics>>
> The University of Burdwan <http://www.buruniv.ac.in/
<http://www.buruniv.ac.in/>>
> Golapbag Campus, Barddhaman 713104
> West Bengal, India
> Emails: dan...@utexas.edu <mailto:dan...@utexas.edu>
<mailto:dan...@utexas.edu <mailto:dan...@utexas.edu>>,
> a...@phys.buruniv.ac.in <mailto:a...@phys.buruniv.ac.in>
<mailto:a...@phys.buruniv.ac.in <mailto:a...@phys.buruniv.ac.in>>,
> hariseldo...@gmail.com <mailto:hariseldo...@gmail.com>
<mailto:hariseldo...@gmail.com <mailto:hariseldo...@gmail.com>>
> Webpage: http://www.ph.utexas.edu/~daneel/
<http://www.ph.utexas.edu/~daneel/>
> <http://www.ph.utexas.edu/~daneel/
<http://www.ph.utexas.edu/~daneel/>>
>
>
>
> --
> Analabha Roy
> Assistant Professor
> Department of Physics
> <http://www.buruniv.ac.in/academics/department/physics
<http://www.buruniv.ac.in/academics/department/physics>>
> The University of Burdwan <http://www.buruniv.ac.in/
<http://www.buruniv.ac.in/>>
> Golapbag Campus, Barddhaman 713104
> West Bengal, India
> Emails: dan...@utexas.edu <mailto:dan...@utexas.edu>
<mailto:dan...@utexas.edu <mailto:dan...@utexas.edu>>,
> a...@phys.buruniv.ac.in <mailto:a...@phys.buruniv.ac.in>
<mailto:a...@phys.buruniv.ac.in <mailto:a...@phys.buruniv.ac.in>>,
> hariseldo...@gmail.com <mailto:hariseldo...@gmail.com>
<mailto:hariseldo...@gmail.com <mailto:hariseldo...@gmail.com>>
> Webpage: http://www.ph.utexas.edu/~daneel/
<http://www.ph.utexas.edu/~daneel/>
> <http://www.ph.utexas.edu/~daneel/
<http://www.ph.utexas.edu/~daneel/>>
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786