On 7/19/22 08:15, Ole Holm Nielsen wrote:
On 7/19/22 00:45, gphipps wrote:
Everyone so often one of our users accidentally writes a “fork-bomb”
that submits thousands of sbatch and srun requests per second. It is a
giant DDOS attack on our scheduler. Is there a way of rate limiting
these requests before they reach the daemon? I could imagine writing a
shim in front of sbatch/srun, but I was hoping there was an official way
to do this
Perhaps setting MaxSubmitJobs and MaxJobs on associations and QOSes would
do the trick?
You may also want to increase the default MaxJobCount in slurm.conf.
See my Wiki page for the details:
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#maxjobcount-limit
Another possibility would be to write a Job submit Lua plugin to reject
jobs before they get submitted. Of course, you would have to be able to
define some logic which somehow detects the "fork-bomb" situation, which
may not be so easy to do? See
https://slurm.schedmd.com/job_submit_plugins.html
I have some additional pointers to job submit plugins at
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#job-submit-plugins
/Ole