Hi, Your job does not request any specific amount of memory, so it gets the default request. I believe the default request is all the RAM in the node.
Try something like: $ scontrol show config | grep -i defmem DefMemPerNode = 64000 Regards, Alex On Mon, Nov 23, 2020 at 12:33 PM Jan van der Laan <sl...@eoos.dds.nl> wrote: > Hi, > > I am having issues getting slurm to run multiple jobs in parallel on the > same machine. > > Most of our jobs are either (relatively) low on CPU and high on memory > (data processing) or low on memory and high on CPU (simulations). The > server we have is generally big enough (256GB Mem; 16 cores) to > accommodate multiple jobs running at the same time and we would like use > slurm to schedule these jobs. However, testing on a small (4 CPU) amazon > server, I am unable to get this working. I would have to use > `SelectType=select/cons_res` and `SelectTypeParameters=CR_CPU_Memory` as > far as I know. However, when starting multiple jobs using a single CPU > these are started sequentially and not in parallel. > > My `slurm.conf` > > === > ControlMachine=ip-172-31-37-52 > > MpiDefault=none > ProctrackType=proctrack/pgid > ReturnToService=1 > SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid > SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid > SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd > SlurmUser=slurm > StateSaveLocation=/var/lib/slurm-llnl/slurmctld > SwitchType=switch/none > TaskPlugin=task/none > > # SCHEDULING > FastSchedule=1 > SchedulerType=sched/backfill > SelectType=select/cons_res > SelectTypeParameters=CR_CPU_Memory > > # LOGGING AND ACCOUNTING > AccountingStorageType=accounting_storage/none > ClusterName=cluster > JobAcctGatherType=jobacct_gather/none > SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log > SlurmdLogFile=/var/log/slurm-llnl/slurmd.log > > # COMPUTE NODES > NodeName=ip-172-31-37-52 CPUs=4 RealMemory=7860 CoresPerSocket=2 > ThreadsPerCore=2 State=UNKNOWN > PartitionName=test Nodes=ip-172-31-37-52 Default=YES MaxTime=INFINITE > State=UP > ==== > > `job.sh` > === > #!/bin/bash > sleep 30 > env > === > > Output when running jobs: > === > ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh > Submitted batch job 2 > ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh > Submitted batch job 3 > ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh > Submitted batch job 4 > ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh > Submitted batch job 5 > ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh > Submitted batch job 6 > ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh > Submitted batch job 7 > ubuntu@ip-172-31-37-52:~$ squeue > JOBID PARTITION NAME USER ST TIME NODES > NODELIST(REASON) > 3 test job.sh ubuntu PD 0:00 1 > (Resources) > 4 test job.sh ubuntu PD 0:00 1 > (Priority) > 5 test job.sh ubuntu PD 0:00 1 > (Priority) > 6 test job.sh ubuntu PD 0:00 1 > (Priority) > 7 test job.sh ubuntu PD 0:00 1 > (Priority) > 2 test job.sh ubuntu R 0:03 1 > ip-172-31-37-52 > === > > The jobs are run sequentially, while in principle it should be possible > to run 4 jobs in parallel. I am probably missing something simple. How > do I get this to work? > > Best, > Jan > >