Hello all
We have a single node simple slurm installation with the following hardware configuration: NodeName=node01 Arch=x86_64 CoresPerSocket=1 CPUAlloc=102 CPUErr=0 CPUTot=160 CPULoad=67.09 AvailableFeatures=(null) ActiveFeatures=(null) Gres=(null) NodeAddr=biotec01 NodeHostName=biotec01 Version=16.05 OS=Linux RealMemory=1200000 AllocMem=1093632 FreeMem=36066 Sockets=160 Boards=1 State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A BootTime=2020-04-19T17:22:31 SlurmdStartTime=2020-04-20T13:54:34 CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s Slurm version is 16.05 (we are about to upgrade do Debian 10 and slurm 18.08 from the repo): Everything is working as expected except but we have the following "problem": Users use sbatch to submit their jobs but usually reserve way too much RAM for the job causing other jobs queued waiting for RAM even when the actual RAM usage is very low. Is there a recommended solution for this problem? Is there an way to say slurm to start a job "overbooking" some RAM by, say, 20%? Thanks for any recommendation. slurm.conf: ControlMachine=node01 MpiDefault=none ProctrackType=proctrack/cgroup ReturnToService=1 SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd SlurmUser=slurm StateSaveLocation=/var/lib/slurm-llnl/slurmctld SwitchType=switch/none TaskPlugin=task/cgroup FastSchedule=1 SchedulerType=sched/backfill SelectType=select/cons_res SelectTypeParameters=CR_Core_Memory AccountingStorageType=accounting_storage/slurmdbd ClusterName=cluster JobAcctGatherType=jobacct_gather/linux SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log SlurmdLogFile=/var/log/slurm-llnl/slurmd.log DebugFlags=NO_CONF_HASH NodeName=biotec01 CPUs=160 RealMemory=1200000 State=UNKNOWN PartitionName=short Nodes=node01 Default=YES MaxTime=24:00:00 State=UP Priority=30 PartitionName=long Nodes=node01 MaxTime=30-00:00:00 State=UP Priority=20 PartitionName=test Nodes=node01 MaxTime=1 State=UP MaxCPUsPerNode=3 Priority=30