Re: [slurm-users] NoDecay on accounts (or on GrpTRESMins in general)
On Fri, Nov 20, 2020 at 12:11 AM Sebastian T Smith wrote: > Hi, > > We're setting GrpTRESMins on the account association and have NoDecay > QOS's for different user classes. All user associations with a > GrpTRESMins-limited account are assigned a NoDecay QOS. I'm not sure if > it's a better approach... but it's an option. > If I follow correctly, your GrpTRESMins usage on the accounts will still get decayed. From tests I ran here when running with a NoDecay QOS, the GrpTRESMins of the account still gets decayed, while the GrpTRESMins of the QOS doesn't. So do you also have a GrpTRESMins on the QOS itself? And if so, why do you need both on the QOS and on the account? or am I missing something? Thanks, Yair.
[slurm-users] MinJobAge
All, I always thought that MinJobAge affected how long a job will show up when doing 'squeue' That does not seem to be the case for me. I have MinJobAge=900, but if I do 'squeue --me' as soon as I finish an interactive job, there is nothing in the queue. I swear I used to see jobs in a completed state for a period of time, but they are not showing up at all on our cluster. How does one have jobs show up that are completed? Brian Andrus
[slurm-users] Simultaneously running multiple jobs on same node
Hi, I am having issues getting slurm to run multiple jobs in parallel on the same machine. Most of our jobs are either (relatively) low on CPU and high on memory (data processing) or low on memory and high on CPU (simulations). The server we have is generally big enough (256GB Mem; 16 cores) to accommodate multiple jobs running at the same time and we would like use slurm to schedule these jobs. However, testing on a small (4 CPU) amazon server, I am unable to get this working. I would have to use `SelectType=select/cons_res` and `SelectTypeParameters=CR_CPU_Memory` as far as I know. However, when starting multiple jobs using a single CPU these are started sequentially and not in parallel. My `slurm.conf` === ControlMachine=ip-172-31-37-52 MpiDefault=none ProctrackType=proctrack/pgid ReturnToService=1 SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd SlurmUser=slurm StateSaveLocation=/var/lib/slurm-llnl/slurmctld SwitchType=switch/none TaskPlugin=task/none # SCHEDULING FastSchedule=1 SchedulerType=sched/backfill SelectType=select/cons_res SelectTypeParameters=CR_CPU_Memory # LOGGING AND ACCOUNTING AccountingStorageType=accounting_storage/none ClusterName=cluster JobAcctGatherType=jobacct_gather/none SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log SlurmdLogFile=/var/log/slurm-llnl/slurmd.log # COMPUTE NODES NodeName=ip-172-31-37-52 CPUs=4 RealMemory=7860 CoresPerSocket=2 ThreadsPerCore=2 State=UNKNOWN PartitionName=test Nodes=ip-172-31-37-52 Default=YES MaxTime=INFINITE State=UP `job.sh` === #!/bin/bash sleep 30 env === Output when running jobs: === ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh Submitted batch job 2 ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh Submitted batch job 3 ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh Submitted batch job 4 ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh Submitted batch job 5 ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh Submitted batch job 6 ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh Submitted batch job 7 ubuntu@ip-172-31-37-52:~$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 3 test job.sh ubuntu PD 0:00 1 (Resources) 4 test job.sh ubuntu PD 0:00 1 (Priority) 5 test job.sh ubuntu PD 0:00 1 (Priority) 6 test job.sh ubuntu PD 0:00 1 (Priority) 7 test job.sh ubuntu PD 0:00 1 (Priority) 2 test job.sh ubuntu R 0:03 1 ip-172-31-37-52 === The jobs are run sequentially, while in principle it should be possible to run 4 jobs in parallel. I am probably missing something simple. How do I get this to work? Best, Jan
Re: [slurm-users] Simultaneously running multiple jobs on same node
Hi, Your job does not request any specific amount of memory, so it gets the default request. I believe the default request is all the RAM in the node. Try something like: $ scontrol show config | grep -i defmem DefMemPerNode = 64000 Regards, Alex On Mon, Nov 23, 2020 at 12:33 PM Jan van der Laan wrote: > Hi, > > I am having issues getting slurm to run multiple jobs in parallel on the > same machine. > > Most of our jobs are either (relatively) low on CPU and high on memory > (data processing) or low on memory and high on CPU (simulations). The > server we have is generally big enough (256GB Mem; 16 cores) to > accommodate multiple jobs running at the same time and we would like use > slurm to schedule these jobs. However, testing on a small (4 CPU) amazon > server, I am unable to get this working. I would have to use > `SelectType=select/cons_res` and `SelectTypeParameters=CR_CPU_Memory` as > far as I know. However, when starting multiple jobs using a single CPU > these are started sequentially and not in parallel. > > My `slurm.conf` > > === > ControlMachine=ip-172-31-37-52 > > MpiDefault=none > ProctrackType=proctrack/pgid > ReturnToService=1 > SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid > SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid > SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd > SlurmUser=slurm > StateSaveLocation=/var/lib/slurm-llnl/slurmctld > SwitchType=switch/none > TaskPlugin=task/none > > # SCHEDULING > FastSchedule=1 > SchedulerType=sched/backfill > SelectType=select/cons_res > SelectTypeParameters=CR_CPU_Memory > > # LOGGING AND ACCOUNTING > AccountingStorageType=accounting_storage/none > ClusterName=cluster > JobAcctGatherType=jobacct_gather/none > SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log > SlurmdLogFile=/var/log/slurm-llnl/slurmd.log > > # COMPUTE NODES > NodeName=ip-172-31-37-52 CPUs=4 RealMemory=7860 CoresPerSocket=2 > ThreadsPerCore=2 State=UNKNOWN > PartitionName=test Nodes=ip-172-31-37-52 Default=YES MaxTime=INFINITE > State=UP > > > `job.sh` > === > #!/bin/bash > sleep 30 > env > === > > Output when running jobs: > === > ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh > Submitted batch job 2 > ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh > Submitted batch job 3 > ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh > Submitted batch job 4 > ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh > Submitted batch job 5 > ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh > Submitted batch job 6 > ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh > Submitted batch job 7 > ubuntu@ip-172-31-37-52:~$ squeue > JOBID PARTITION NAME USER ST TIME NODES > NODELIST(REASON) > 3 test job.sh ubuntu PD 0:00 1 > (Resources) > 4 test job.sh ubuntu PD 0:00 1 > (Priority) > 5 test job.sh ubuntu PD 0:00 1 > (Priority) > 6 test job.sh ubuntu PD 0:00 1 > (Priority) > 7 test job.sh ubuntu PD 0:00 1 > (Priority) > 2 test job.sh ubuntu R 0:03 1 > ip-172-31-37-52 > === > > The jobs are run sequentially, while in principle it should be possible > to run 4 jobs in parallel. I am probably missing something simple. How > do I get this to work? > > Best, > Jan > >