Re: [slurm-users] How to create a partition where only one job can run concurrently?

2019-10-18 Thread Brian Andrus
If you have accounting implemented, just set MaxJobs and it will do the trick: MaxJobs= The total number of jobs able to run at any given time for the given association. If this limit is reached new jobs will be queued but only allowed to run after previous jobs complete from the association.

[slurm-users] How to create a partition where only one job can run concurrently?

2019-10-18 Thread bbenedetto
Greetings! I am trying to set up a partition that will only allow one job at a time to run, regardless of who submits it. So multiple jobs from multiple users can be in the queue. But I only want the partition to run one at a time. I also have the need to set up an additional partition with the

Re: [slurm-users] [EXT] Re: How to find core count per job per node

2019-10-18 Thread Tom Wurgler
Thanks for the replies! This is exactly what I need. -tom From: slurm-users on behalf of Ole Holm Nielsen Sent: Friday, October 18, 2019 2:15 PM To: slurm-users@lists.schedmd.com Subject: [EXT] Re: [slurm-users] How to find core count per job per node WARNING

Re: [slurm-users] How to find core count per job per node

2019-10-18 Thread Mark Hahn
$ scontrol --details show job 1653838 JobId=1653838 JobName=v1.20 ... Nodes=r00g01 CPU_IDs=31-35 Mem=5120 GRES_IDX= Nodes=r00n16 CPU_IDs=34-35 Mem=2048 GRES_IDX= Nodes=r00n20 CPU_IDs=12-17,30-35 Mem=12288 GRES_IDX= Nodes=r01n16 CPU_IDs=15 Mem=1024 GRES_IDX= thanks for sharing t

Re: [slurm-users] How to find core count per job per node

2019-10-18 Thread Jeffrey Frey
Adding the "--details" flag to scontrol lookup of the job: $ scontrol --details show job 1636832 JobId=1636832 JobName=R3_L2d : NodeList=r00g01,r00n09 BatchHost=r00g01 NumNodes=2 NumCPUs=60 NumTasks=60 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=60,mem=60G,node=2,billing=55350 Sock

Re: [slurm-users] How to find core count per job per node

2019-10-18 Thread Ole Holm Nielsen
On 18-10-2019 19:56, Tom Wurgler wrote: I need to know how many cores a given job is using per node. Say my nodes have 24 cores each and I run a 36 way job. It take a node and a half. scontrol show job id shows me 36 cores, and the 2 nodes it is running on. But I want to know how it split the job

[slurm-users] How to find core count per job per node

2019-10-18 Thread Tom Wurgler
I need to know how many cores a given job is using per node. Say my nodes have 24 cores each and I run a 36 way job. It take a node and a half. scontrol show job id shows me 36 cores, and the 2 nodes it is running on. But I want to know how it split the job up between the nodes. Thanks for any inf

Re: [slurm-users] [External] Re: Upgrading SLURM from 17.02.7 to 18.08.8 - Job ID gets reset

2019-10-18 Thread Florian Zillner
Hi Lech, Thanks for the hint. I didn't know about that option. Another way would be to just retain the StateSaveLocation files and move those over to the sandbox in which I've tested the upgrade. Once I copied the files and re-did the upgrade from scratch, the IDs were consecutive as expected.

Re: [slurm-users] Upgrading SLURM from 17.02.7 to 18.08.8 - Job ID gets reset

2019-10-18 Thread Lech Nieroda
Hi Florian, You can use the FirstJobId option from slurm.conf to continue the JobIds seamlessly. Kind Regards, Lech > Am 18.10.2019 um 11:47 schrieb Florian Zillner : > > Hi all, > > we’re using OpenHPC packages to run SLURM. Current OpenHPC Version is 1.3.8 > (SLURM 18.08.8), though we’re

[slurm-users] Upgrading SLURM from 17.02.7 to 18.08.8 - Job ID gets reset

2019-10-18 Thread Florian Zillner
Hi all, we're using OpenHPC packages to run SLURM. Current OpenHPC Version is 1.3.8 (SLURM 18.08.8), though we're still at 1.3.3 (SLURM 17.02.7), for now. I've successfully attempted an upgrade in a separate testing environment, which works fine once you adhere to the upgrading notes... So the