On Thu, Oct 18, 2018 at 1:03 PM Daniel Letai <d...@letai.org.il> wrote:
>
>
> Hello all,
>
>
> To solve a requirement where a large number of job arrays (~10k arrays, each 
> with at most 8M elements) with same priority should be executed with minimal 
> starvation of any array - we don't want to wait for each array to complete 
> before starting the next one - we wish to implement "interleaving" between 
> arrays, we came up with the following scheme:
>
>
> Start all arrays in this partition in a "Hold" state.
>
> Release a predefined number of elements (E.g., 200)
>
> from this point a slurmctld prolog takes over:
>
> On the 200th job run squeue, note the next job array (array id following the 
> currently executing array id)
>
> Release a predefined number of elements (E.g., 200)
>
> and repeat
>
>
> This might produce a very large number of release requests to the scheduler 
> in a short time frame, and one concern is the scheduler loop getting too many 
> requests.
>
> Can you think of other issues that might come up with this approach?
>
>
> Do you have any recommendations, or might suggest a better approach to solve 
> this problem?

I can't comment on the scalability issues but if possible using %200
on the array submission seems like the simplest solution. From the
sbatch man page:
For  example "--array=0-15%4" will limit the number of simultaneously
running tasks from this job array to 4.

>
> We have considered fairshare, but all arrays are from same account and user. 
> We have considered creating accounts on the fly (1 for each array) but get an 
> error ("This should never happen") after creating a few thousand accounts.
>
> To my understanding fairshare is only viable between accounts.

Reply via email to