IIUC it's not possible to increase resource usage once the job is
started: it would mess the scheduler and MPI comms (probably).
But I also think you're trying to find a problem for a "solution". Just
state the problem you're facing instead of proposing a solution :)
What software are you running? How does it detect that a resize is
needed? How would it handle the expansion?
Diego
Il 28/06/2023 13:02, Rahmanpour Koushki, Maysam ha scritto:
Dear Slurm Mailing List,
I hope this email finds you well. I am currently working on a project
that requires the ability to dynamically shrink or expand nodes for
running jobs in Slurm. However, I am facing some challenges and would
greatly appreciate your assistance and expertise in finding a solution.
In my research, I came across the following resources:
1.
Slurm Advanced Usage Tutorial: I found a tutorial
(https://slurm.schedmd.com/slurm_ug_2011/Advanced_Usage_Tutorial.pdf
<https://slurm.schedmd.com/slurm_ug_2011/Advanced_Usage_Tutorial.pdf>) that
discusses advanced features of Slurm. It mentions the possibility of assigning and
deassigning nodes to a job, which is exactly what I need. However, the tutorial
refers to the FAQ for more detailed information.
2.
Stack Overflow Question: I also came across a related question on
Stack Overflow
(https://stackoverflow.com/questions/49398201/how-to-update-job-node-number-in-slurm
<https://stackoverflow.com/questions/49398201/how-to-update-job-node-number-in-slurm>)
that discusses updating the node number for a job in Slurm. The answer suggests that
it is indeed possible, but again, it refers to the FAQ for further details.
Upon reviewing the current FAQ, I found that it states node shrinking is
only possible for pending jobs. Unfortunately, it does not provide
additional information or examples to clarify if this functionality can
be extended to running jobs.
I would be grateful if anyone could provide insight into the following:
1.
Is it possible to dynamically shrink or expand nodes for running
jobs in Slurm? If so, how can it be achieved?
2.
Are there any alternative methods or workarounds to accomplish
dynamic node scaling for running jobs in Slurm?
I kindly request your guidance, personal experiences, or any relevant
resources that could shed light on this topic. Your expertise and
assistance would greatly help me in successfully completing my project.
Thank you in advance for your time and support.
Best regards,
Maysam
Johannes Gutenberg University of Mainz
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786