IIUC it's not possible to increase resource usage once the job is started: it would mess the scheduler and MPI comms (probably).

But I also think you're trying to find a problem for a "solution". Just state the problem you're facing instead of proposing a solution :) What software are you running? How does it detect that a resize is needed? How would it handle the expansion?

Diego

Il 28/06/2023 13:02, Rahmanpour Koushki, Maysam ha scritto:
Dear Slurm Mailing List,


I hope this email finds you well. I am currently working on a project that requires the ability to dynamically shrink or expand nodes for running jobs in Slurm. However, I am facing some challenges and would greatly appreciate your assistance and expertise in finding a solution.

In my research, I came across the following resources:

 1.

    Slurm Advanced Usage Tutorial: I found a tutorial
    (https://slurm.schedmd.com/slurm_ug_2011/Advanced_Usage_Tutorial.pdf
    <https://slurm.schedmd.com/slurm_ug_2011/Advanced_Usage_Tutorial.pdf>) that 
discusses advanced features of Slurm. It mentions the possibility of assigning and 
deassigning nodes to a job, which is exactly what I need. However, the tutorial 
refers to the FAQ for more detailed information.

 2.

    Stack Overflow Question: I also came across a related question on
    Stack Overflow
    
(https://stackoverflow.com/questions/49398201/how-to-update-job-node-number-in-slurm 
<https://stackoverflow.com/questions/49398201/how-to-update-job-node-number-in-slurm>)
 that discusses updating the node number for a job in Slurm. The answer suggests that 
it is indeed possible, but again, it refers to the FAQ for further details.

Upon reviewing the current FAQ, I found that it states node shrinking is only possible for pending jobs. Unfortunately, it does not provide additional information or examples to clarify if this functionality can be extended to running jobs.

I would be grateful if anyone could provide insight into the following:

 1.

    Is it possible to dynamically shrink or expand nodes for running
    jobs in Slurm? If so, how can it be achieved?

 2.

    Are there any alternative methods or workarounds to accomplish
    dynamic node scaling for running jobs in Slurm?

I kindly request your guidance, personal experiences, or any relevant resources that could shed light on this topic. Your expertise and assistance would greatly help me in successfully completing my project.

Thank you in advance for your time and support.

Best regards,


Maysam


Johannes Gutenberg University of Mainz



--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Reply via email to