Re: [slurm-users] backfill scheduler does not work for heterogeneous jobs (version 17.11)

2018-12-03 Thread Kenneth Roberts
Hi – The time stamps show that your 1st sbatch job components start at the same time and then run for 1 minute. 30 seconds after the simultaneous end of all three components of the 1st sbatch, the two components of the 3rd sbatch and the three components of the 2nd all start. The two com

Re: [slurm-users] GRES GPU issues

2018-12-03 Thread Lou Nicotra
Made the change in the gres.conf on local server file and restarted slurmd and slurmctld on master Unfortunately same error... Distributed corrected gres.conf to all k20 servers, restarted slurmd and slurmdctl... Still has same error... On Mon, Dec 3, 2018 at 4:04 PM Brian W. Johanson wrot

Re: [slurm-users] GRES GPU issues

2018-12-03 Thread Brian W. Johanson
Is that a lowercase k in k20 specified in the batch script and nodename and a uppercase K specified in gres.conf? On 12/03/2018 09:13 AM, Lou Nicotra wrote: Hi All, I have recently set up a slurm cluster with my servers and I'm running into an issue while submitting GPU jobs. It has something t

Re: [slurm-users] GRES GPU issues

2018-12-03 Thread Lou Nicotra
Here you go... Thanks for looking into this... lnicotra@tiger11 run# scontrol show config Configuration data as of 2018-12-03T15:39:51 AccountingStorageBackupHost = (null) AccountingStorageEnforce = none AccountingStorageHost = panther02 AccountingStorageLoc= N/A AccountingStoragePort = 681

Re: [slurm-users] GRES GPU issues

2018-12-03 Thread Michael Di Domenico
are you willing to paste an `scontrol show config` from the machine having trouble On Mon, Dec 3, 2018 at 12:10 PM Lou Nicotra wrote: > > I'm running slurmd version 18.08.0... > > It seems that the system recognizes the GPUs after a slurmd restart. I tuned > debug to 5, restarted and then submit

Re: [slurm-users] GRES GPU issues

2018-12-03 Thread Lou Nicotra
I'm running slurmd version 18.08.0... It seems that the system recognizes the GPUs after a slurmd restart. I tuned debug to 5, restarted and then submitted job. Nothing get logged to log file in local server... [2018-12-03T11:55:18.442] Slurmd shutdown completing [2018-12-03T11:55:18.484] debug:

Re: [slurm-users] GRES GPU issues

2018-12-03 Thread Michael Di Domenico
do you get anything additional in the slurm logs? have you tried adding gres to the debugflags? what version of slurm are you running? On Mon, Dec 3, 2018 at 9:18 AM Lou Nicotra wrote: > > Hi All, I have recently set up a slurm cluster with my servers and I'm > running into an issue while submi

Re: [slurm-users] Account not permitted to use this partition

2018-12-03 Thread Renfro, Michael
What does scontrol show partition EMERALD give you? I’m assuming its AllowAccounts output won’t match your /etc/slurm/parts settings. > On Dec 2, 2018, at 12:34 AM, Mahmood Naderan wrote: > > Hi > Although I have created an account and associated that to a partition, but > the submitted job re

[slurm-users] GRES GPU issues

2018-12-03 Thread Lou Nicotra
Hi All, I have recently set up a slurm cluster with my servers and I'm running into an issue while submitting GPU jobs. It has something to to with gres configurations, but I just can't seem to figure out what is wrong. Non GPU jobs run fine. The error is as follows: sbatch: error: Batch job submi

Re: [slurm-users] backfill scheduler does not work for heterogeneous jobs (version 17.11)

2018-12-03 Thread Ana Jokanović
Hi Ken, I have read this page and I understood that in case of my example the third job should be backfilled. The second job can start after 15 minutes, but the third job requires only two nodes and 2 minutes, thus it can start immediately, but this does not happen. In the page that you referred