Re: [slurm-users] Rate Limiting of RPC calls

Paul Edmon Tue, 09 Feb 2021 17:10:50 -0800

We've hit this before several times. The tricks we've used to deal withthis are:

1. Being on the latest release: A lot of work has gone into improvingRPC throughput, if you aren't running the latest 20.11 release I highlyrecommend upgrading. 20.02 also was pretty good at this.

2. max_rpc_cnt/defer: I would recommend using either of these settingsfor SchedulerParameters as it will allow the scheduler more time to breathe.

3. I would make sure that your mysql settings are set such that your DBis fully cached in memory and not hitting disk. I also recommendrunning your DB on the same server as you run your ctld. We've foundthat this can improve throughput.

4. We put a caching version of squeue in place which gives almost livedata to the users rather than live data. This additional buffer layerhelps cut down traffic. This is something we rolled in house with adatabase that updates every 30 seconds.

5. Recommend to users to submit jobs that last for more than 10 minutesand to use Job arrays instead of looping sbatch. This will reducethrashing.


Those are my recommendations for how to deal with this.

-Paul Edmon-

On 2/9/2021 7:59 PM, Kota Tsuyuzaki wrote:

Hello guys,

In our cluster, sometimes new incoming member accidentally creates too many 
slurm RPC calls (sbatch, sacct, etc), then slurmctld,
slurmdbd, and mysql may be overloaded.
To prevent such a situation, I'm looking for something like RPC Rate Limit for 
users. Does Slurm supports such a RateLimit feature?
If not, is there way to save Slurm server-side resources?

Best,
Kota

--------------------------------------------
露崎　浩太 (Kota Tsuyuzaki)
kota.tsuyuzaki...@hco.ntt.co.jp
NTTソフトウェアイノベーションセンタ
分散処理基盤技術プロジェクト
0422-59-2837
---------------------------------------------

Re: [slurm-users] Rate Limiting of RPC calls

Reply via email to