Why do you have? SchedulerParameters = (null)
Is that even allowed ? https://slurm.schedmd.com/sched_config.html On Thu, Jan 11, 2018 at 1:39 PM, Colas Rivière <rivi...@umdgrb.umd.edu> wrote: > Hello, > > I'm managing a small cluster (one head node, 24 workers, 1160 total worker > threads). The head node has two E5-2680 v3 CPUs (hyper-threaded), ~100 GB > of memory and spinning disks. > The head node becomes occasionally less responsive when there are more > than 10k jobs in queue, and becomes really unmanageable when reaching 100k > jobs in queue, with error messages such as: > >> sbatch: error: Slurm temporarily unable to accept job, sleeping and >> retrying. >> > or > >> Running: slurm_load_jobs error: Socket timed out on send/recv operation >> > Is that normal to experience slowdowns when the queue reaches this few 10k > jobs? What limit should I expect? Would adding a SSD drive for > SlurmdSpoolDir help? What can be done to push this limit? > > The cluster runs Slurm 17.02.4 on CentOS 6 and the config is attached > (from `scontrol show config`). > > Thanks, > Colas > -- Nick Santucci santu...@uci.edu