Thank you for the responses.

In response to some of the suggestions, I would like to provide further details 
on my specific use case. I am currently focused on exploring the concept of 
malleable jobs, which possess the ability to adapt their computing resources 
during runtime.

To tackle the MPI incompatibility issue associated with malleable jobs, There 
are solutions like Flex-MPI which extends the functionality of MPI to support 
resource adaptivity for malleable jobs during runtime. Furthermore, There are 
scheduling algorithms tailored for malleable jobs. These algorithms aim to 
efficiently allocate resources and optimize job scheduling based on the dynamic 
nature of malleable jobs.

My primary objective is to understand how Slurm can effectively support 
malleable jobs. So I am investigating to find out how can SLURM support expand 
and shrink nodes during runtime.


Best Regards


Maysam


________________________________
From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Diego 
Zuccato <diego.zucc...@unibo.it>
Sent: Wednesday, June 28, 2023 4:15:44 PM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] Dynamic Node Shrinking/Expanding for Running Jobs in 
Slurm

IIUC it's not possible to increase resource usage once the job is
started: it would mess the scheduler and MPI comms (probably).

But I also think you're trying to find a problem for a "solution". Just
state the problem you're facing instead of proposing a solution :)
What software are you running? How does it detect that a resize is
needed? How would it handle the expansion?

Diego

Il 28/06/2023 13:02, Rahmanpour Koushki, Maysam ha scritto:
> Dear Slurm Mailing List,
>
>
> I hope this email finds you well. I am currently working on a project
> that requires the ability to dynamically shrink or expand nodes for
> running jobs in Slurm. However, I am facing some challenges and would
> greatly appreciate your assistance and expertise in finding a solution.
>
> In my research, I came across the following resources:
>
>  1.
>
>     Slurm Advanced Usage Tutorial: I found a tutorial
>     (https://slurm.schedmd.com/slurm_ug_2011/Advanced_Usage_Tutorial.pdf
>     <https://slurm.schedmd.com/slurm_ug_2011/Advanced_Usage_Tutorial.pdf>) 
> that discusses advanced features of Slurm. It mentions the possibility of 
> assigning and deassigning nodes to a job, which is exactly what I need. 
> However, the tutorial refers to the FAQ for more detailed information.
>
>  2.
>
>     Stack Overflow Question: I also came across a related question on
>     Stack Overflow
>     
> (https://stackoverflow.com/questions/49398201/how-to-update-job-node-number-in-slurm
>  
> <https://stackoverflow.com/questions/49398201/how-to-update-job-node-number-in-slurm>)
>  that discusses updating the node number for a job in Slurm. The answer 
> suggests that it is indeed possible, but again, it refers to the FAQ for 
> further details.
>
> Upon reviewing the current FAQ, I found that it states node shrinking is
> only possible for pending jobs. Unfortunately, it does not provide
> additional information or examples to clarify if this functionality can
> be extended to running jobs.
>
> I would be grateful if anyone could provide insight into the following:
>
>  1.
>
>     Is it possible to dynamically shrink or expand nodes for running
>     jobs in Slurm? If so, how can it be achieved?
>
>  2.
>
>     Are there any alternative methods or workarounds to accomplish
>     dynamic node scaling for running jobs in Slurm?
>
> I kindly request your guidance, personal experiences, or any relevant
> resources that could shed light on this topic. Your expertise and
> assistance would greatly help me in successfully completing my project.
>
> Thank you in advance for your time and support.
>
> Best regards,
>
>
> Maysam
>
>
> Johannes Gutenberg University of Mainz
>
>

--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Reply via email to