Re: [slurm-users] AWS SLURM Burst Cluster, fill configuring nodes

J.R. W Fri, 26 Oct 2018 00:45:46 -0700

Hi John, 

Thanks for the tips. I got it to work. The trick was to use


SchedulerParameters= bf_busy_nodes

This is creating the nodes only if the previous node is filled up.

Jordan

> On Oct 26, 2018, at 1:13 AM, John Hearns <hear...@googlemail.com> wrote:
> 
> Hi Jordan.
> Regarding filling up the nodes look at
> https://slurm.schedmd.com/elastic_computing.html 
> <https://slurm.schedmd.com/elastic_computing.html>
> 
> SelectType
> Generally must be "select/linear". If Slurm is configured to allocate 
> individual CPUs to jobs rather than whole nodes (e.g. 
> SelectType=select/cons_res rather than SelectType=select/linear), then Slurm 
> maintains bitmaps to track the state of every CPU in the system. If the 
> number of CPUs to be allocated on each node is not known when the slurmctld 
> daemon is started, one must allocate whole nodes to jobs rather than 
> individual processors. The use of "select/cons_res" requires each node to 
> have a CPU count set and the node eventually selected must have at least that 
> number of CPUs.
> 
> 
> If I am not wrong you can configure the number of CPUs per node as a fixed 
> amount - if you select a fixed instance type
> 
> 
> NOTE: This demo uses c4.2xlarge instance types for the compute nodes, which 
> have statically set the number of CPUs=8 in slurm_nodes.conf. If you want to 
> expierment with different instance types (in slurm-aws-startup.sh) ensure you 
> change the CPUs in slurm_nodes.conf.
> 
> 
> 
> 
> 
> 
> On Fri, 26 Oct 2018 at 07:13, J.R. W <jwillis0...@gmail.com 
> <mailto:jwillis0...@gmail.com>> wrote:
> 
> 
> 
> Hello everyone,
> 
> I setup a SLURM cluster based on this post and plugin. 
> https://aws.amazon.com/blogs/compute/deploying-a-burstable-and-event-driven-hpc-cluster-on-aws-using-slurm-part-1/
>  
> <https://aws.amazon.com/blogs/compute/deploying-a-burstable-and-event-driven-hpc-cluster-on-aws-using-slurm-part-1/>
> 
> When I submit jobs to the queue, the AWS instances start configuring. Because 
> I have so many potential instances, for each job, they spool up one instance. 
> For example, if I submit 10 job, AWS will configure 10 instances. What would 
> be ideal is if there is a slurm.conf option I’m missing that will tell the 
> power-save plugin to only configure N amount of nodes, even though there 
> hundreds of “available” nodes to configure in the cloud. Some potential 
> solutions I have thought of.
> 
> 1. Have the scheduler fill up nodes even if they are in the configuring 
> state. SLURM knows how many CPUs are available for the nodes that are being 
> configured. Is there a way to have jobs all fill up a node, even if it’s in 
> the configuring state? That way, a queued job will not trigger the “power 
> save resume” of a new node. 
> 
> 2. Some parameter in slurm.conf that has maximum nodes that can be available.
> 
> 3. Modify my slurm_resum script to check for how many nodes are configured. 
> If that number is greater than my N amount of nodes I want spun up, then do 
> nothing. Hopefully that will just send the job back to the queue to await one 
> of those configured nodes.
> 
> I hope I’m making sense. I know the elastic computing is a new feature
> 
> Jordan
> 
> 
>

Re: [slurm-users] AWS SLURM Burst Cluster, fill configuring nodes

Reply via email to