Hi Alex,
Thanks a lot. I suspected it was something trivial.
ubuntu@ip-172-31-12-211:~$ scontrol show config | grep -i defmem
DefMemPerNode = UNLIMITED
Specifying `sbatch --mem=1M job.sh` works. I will probably specify a
default value in the slurm.conf (just tried; that also helps).
Best,
Jan
On 23-11-2020 22:15, Alex Chekholko wrote:
Hi,
Your job does not request any specific amount of memory, so it gets the
default request. I believe the default request is all the RAM in the node.
Try something like:
$ scontrol show config | grep -i defmem
DefMemPerNode = 64000
Regards,
Alex
On Mon, Nov 23, 2020 at 12:33 PM Jan van der Laan <sl...@eoos.dds.nl
<mailto:sl...@eoos.dds.nl>> wrote:
Hi,
I am having issues getting slurm to run multiple jobs in parallel on
the
same machine.
Most of our jobs are either (relatively) low on CPU and high on memory
(data processing) or low on memory and high on CPU (simulations). The
server we have is generally big enough (256GB Mem; 16 cores) to
accommodate multiple jobs running at the same time and we would like
use
slurm to schedule these jobs. However, testing on a small (4 CPU)
amazon
server, I am unable to get this working. I would have to use
`SelectType=select/cons_res` and
`SelectTypeParameters=CR_CPU_Memory` as
far as I know. However, when starting multiple jobs using a single CPU
these are started sequentially and not in parallel.
My `slurm.conf`
===
ControlMachine=ip-172-31-37-52
MpiDefault=none
ProctrackType=proctrack/pgid
ReturnToService=1
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=slurm
StateSaveLocation=/var/lib/slurm-llnl/slurmctld
SwitchType=switch/none
TaskPlugin=task/none
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_Memory
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
ClusterName=cluster
JobAcctGatherType=jobacct_gather/none
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
# COMPUTE NODES
NodeName=ip-172-31-37-52 CPUs=4 RealMemory=7860 CoresPerSocket=2
ThreadsPerCore=2 State=UNKNOWN
PartitionName=test Nodes=ip-172-31-37-52 Default=YES MaxTime=INFINITE
State=UP
====
`job.sh`
===
#!/bin/bash
sleep 30
env
===
Output when running jobs:
===
ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh
Submitted batch job 2
ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh
Submitted batch job 3
ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh
Submitted batch job 4
ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh
Submitted batch job 5
ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh
Submitted batch job 6
ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh
Submitted batch job 7
ubuntu@ip-172-31-37-52:~$ squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
3 test job.sh ubuntu PD 0:00 1
(Resources)
4 test job.sh ubuntu PD 0:00 1
(Priority)
5 test job.sh ubuntu PD 0:00 1
(Priority)
6 test job.sh ubuntu PD 0:00 1
(Priority)
7 test job.sh ubuntu PD 0:00 1
(Priority)
2 test job.sh ubuntu R 0:03 1
ip-172-31-37-52
===
The jobs are run sequentially, while in principle it should be possible
to run 4 jobs in parallel. I am probably missing something simple. How
do I get this to work?
Best,
Jan