Hello everyone,

I setup a SLURM cluster based on this post and plugin. 
https://aws.amazon.com/blogs/compute/deploying-a-burstable-and-event-driven-hpc-cluster-on-aws-using-slurm-part-1/
 
<https://aws.amazon.com/blogs/compute/deploying-a-burstable-and-event-driven-hpc-cluster-on-aws-using-slurm-part-1/>

When I submit jobs to the queue, the AWS instances start configuring. Because I 
have so many potential instances, for each job, they spool up one instance. For 
example, if I submit 10 job, AWS will configure 10 instances. What would be 
ideal is if there is a slurm.conf option I’m missing that will tell the 
power-save plugin to only configure N amount of nodes, even though there 
hundreds of “available” nodes to configure in the cloud. Some potential 
solutions I have thought of.

1. Have the scheduler fill up nodes even if they are in the configuring state. 
SLURM knows how many CPUs are available for the nodes that are being 
configured. Is there a way to have jobs all fill up a node, even if it’s in the 
configuring state? That way, a queued job will not trigger the “power save 
resume” of a new node. 

2. Some parameter in slurm.conf that has maximum nodes that can be available.

3. Modify my slurm_resum script to check for how many nodes are configured. If 
that number is greater than my N amount of nodes I want spun up, then do 
nothing. Hopefully that will just send the job back to the queue to await one 
of those configured nodes.

I hope I’m making sense. I know the elastic computing is a new feature

Jordan



Reply via email to