Loris Bennett <loris.benn...@fu-berlin.de> writes: > Hi David, > > (Thanks for changing the subject to something more appropriate). > > David Laehnemann <david.laehnem...@hhu.de> writes: > >> Yes, but only to an extent. The linked conversation ends with this: >> >>>> Do you have any best practice about setting MaxJobCount to a proper >> number? >> >>> That depends upon your workload. You could probably set MaxJobCount >> to at least 50000 with most systems (assuming you have at least a few >> gigabytes of memory). Some sites run with a value of 1000000 or more. >> >> So, it is configurable. But this has a limit. And if you have lots of >> users on a system submitting lots of jobs, even a value of 1000000 can >> get exhausted. > > Yes, but start a lot more jobs and stay within the limit if you use jobs
.. but you can start ... > arrays. When you submit individual jobs, a job ID for each one needs to > be written to the Slurm job database. This can cause the database to > become unresponsive if the number submitted at one time, whether by > snakemake or just a bash script looping over 'sbatch', is too high. If, > on the other hand, you submit a job array, only one entry needs to be > made in the database immediately, with entries for the elements of the > array only being made when a job can actually start. > > This is why a large number of individual jobs with the same resource > requirements prevents backfill from working properly. The mechanism > only considers a certain (configurable) number of pending jobs to see > whether they qualify for backfilling. In this context, a job array is > counted as a single job, regardless of how large the array actually is. > This will degrade the throughput of the system and thus negatively > impact all users. Therefore, on our system we would not allow users to > employ an mechanism which generates a large number jobs but does not > employ job arrays. > >> And in either case, this is not something that speaks against a >> workflow management system giving you additional control over things. >> So I'm not sure what exactly we are arguing about, right here... > > I just wanted to point out that, whereas for some user approaches > such as snakemake obviously scratch a very important itch, for people > running HPC systems, and indeed for users who don't use such mechanisms, > they may cause issues. > > Cheers, > > Loris > >> cheers, >> david >> >> >> >> On Thu, 2023-02-23 at 17:41 +0100, Ole Holm Nielsen wrote: >>> On 2/23/23 17:07, David Laehnemann wrote: >>> > In addition, there are very clear limits to how many jobs slurm can >>> > handle in its queue, see for example this discussion: >>> > https://bugs.schedmd.com/show_bug.cgi?id=2366 >>> >>> My 2 cents: Slurm's job limits are configurable, see this Wiki page: >>> https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#maxjobcount-limit >>> >>> /Ole >>> -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin