On Thu, Oct 18, 2018 at 1:03 PM Daniel Letai <d...@letai.org.il> wrote: > > > Hello all, > > > To solve a requirement where a large number of job arrays (~10k arrays, each > with at most 8M elements) with same priority should be executed with minimal > starvation of any array - we don't want to wait for each array to complete > before starting the next one - we wish to implement "interleaving" between > arrays, we came up with the following scheme: > > > Start all arrays in this partition in a "Hold" state. > > Release a predefined number of elements (E.g., 200) > > from this point a slurmctld prolog takes over: > > On the 200th job run squeue, note the next job array (array id following the > currently executing array id) > > Release a predefined number of elements (E.g., 200) > > and repeat > > > This might produce a very large number of release requests to the scheduler > in a short time frame, and one concern is the scheduler loop getting too many > requests. > > Can you think of other issues that might come up with this approach? > > > Do you have any recommendations, or might suggest a better approach to solve > this problem?
I can't comment on the scalability issues but if possible using %200 on the array submission seems like the simplest solution. From the sbatch man page: For example "--array=0-15%4" will limit the number of simultaneously running tasks from this job array to 4. > > We have considered fairshare, but all arrays are from same account and user. > We have considered creating accounts on the fly (1 for each array) but get an > error ("This should never happen") after creating a few thousand accounts. > > To my understanding fairshare is only viable between accounts.