I have an application group that would improve throughput if we could configure 
jobs to run two on a node (but starting/finishing at individual job times) 
packed by the scheduler rather than spread out and overlapped only when the 
partition is fully loaded with one job per node. The users' workflow is such 
that expecting individuals to do things like multiple srun inside the same 
batch script isn't going to work.


Currently the implementation of select/linear + OverSubscribe=force:2 first 
assigns out to all empty nodes round-robin, then starts doubling up.

Is there a script/plugin way to change this to first double up, then round 
robin the job assignment in the scheduler?


The use case in more detail:


PartitionName=batch   Nodes=cluster[17-100] State=UP RootOnly=NO Default=YES 
MaxTime=2880 MaxNodes=60  DefaultTime=5 QoS=batch

PartitionName=long  Nodes=cluster[37-100] State=UP RootOnly=NO Default=NO 
MaxTime=100000 MaxNodes=10  DefaultTime=5


Users who want to run without manual restarts for a really long time can use 
partition 'long', but we don't want to round-robin fill the machine (note 
overlapping node set) with 'long' jobs before doubling the long jobs. The 
threading and memory behavior of the application (large serial sections) makes 
this a reasonable policy.


Making the partition node lists  non-overlapping leads to idleness in both 
batch and long.


What's the right path to achieve such a policy?

Ben

Reply via email to