Re: [slurm-users] GPUs as resources which SLURM can control

2019-03-20 Thread Nicholas Yue
Thanks Michael. I noticed a couple of questions on the mailing list mentioning GRES lately. I will share that information to our SLURM administrators. Cheers On Thu, 21 Mar 2019 at 12:56, Renfro, Michael wrote: > I think all you’re looking for is Generic Resource (GRES) scheduling, > starting a

Re: [slurm-users] GPUs as resources which SLURM can control

2019-03-20 Thread Renfro, Michael
I think all you’re looking for is Generic Resource (GRES) scheduling, starting at https://slurm.schedmd.com/gres.html — if you’ve already seen that, then more details would be helpful. If it all works correctly, then ‘sbatch --gres=gpu scriptname’ should run up to 4 of those jobs and leave the

[slurm-users] GPUs as resources which SLURM can control

2019-03-20 Thread Nicholas Yue
Hi, I am new to SLURM. I have access to a cluster where one of the node has 4 GPUs We are running version SLURM 17.11.12 Is there some SBATCH token=value pair value I can use to submit jobs (each of which has an application that is only able to utilize 1 GPU) so that if I submit 6 copie

Re: [slurm-users] SLURM heterogeneous jobs, a little help needed plz

2019-03-20 Thread Christopher Samuel
On 3/20/19 4:20 AM, Frava wrote: Hi Chris, thank you for the reply. The team that manages that cluster is not very fond of upgrading SLURM, which I understand. Do be aware that Slurm 17.11 will stop being maintained once 19.05 is released in May. So basically my heterogeneous job that only

Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-20 Thread Christopher Samuel
On 3/20/19 9:09 AM, Peter Steinbach wrote: Interesting enough, if I add Cores=0-1 and Cores=2-3 to the gres.conf file, everything stops working again. :/ Should I send around scontrol outputs? And yes, I watched out to set the --mem flag for the job submission this time. Well there you've sa

Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-20 Thread Peter Steinbach
Interesting enough, if I add Cores=0-1 and Cores=2-3 to the gres.conf file, everything stops working again. :/ Should I send around scontrol outputs? And yes, I watched out to set the --mem flag for the job submission this time. Best, Peter smime.p7s Description: S/MIME Cryptographic Signat

[slurm-users] Preemption vs. backfill

2019-03-20 Thread Jeffrey Frey
Config details: - Slurm v17.11.8 - QOS-based preemption - Backfill scheduler (default parameters) - QOS: - "normal" = PreemptMode=CANCEL, GraceTime=5 minutes - Per-stakeholder = Preempt=normal GrpTRES= - Partitions: - "standard" (default) = QOS=normal - Per-stakeholder = QOS= When users need prio

Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-20 Thread Peter Steinbach
Hi Philippe, thanks for spotting this. This indeed appears to solve this first issue. Now I can try to make the GPUs available and play with pinning etc. Superb - if you happen to be at ISC, let me know. I'd buy you a coffee/beer! ;) Peter smime.p7s Description: S/MIME Cryptographic Sign

Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-20 Thread Philippe Dos Santos
Hello Peter, In order to run non-gres and gres jobs on the g1 node, you should try to limit the memory requested for each job, for instance if suitable: # sbatch --wrap="sleep 100 && env" -o singlecpu.log -c 3 -J cpu --mem=100 # sbatch --wrap="sleep 6 && env" -o gres.log -c 1 -J gpu --gres=g

[slurm-users] Error when the stdout or stderror path does not exist

2019-03-20 Thread Andrés Marín Díaz
Hello, When I redefine the stdout or stderror to files into a folder that does not exist the job fails instead of creating the folder I do not know if it is the expected behavior and if so, if it can be changed (with some config parameter or environment variable). I have another installatio

Re: [slurm-users] SLURM heterogeneous jobs, a little help needed plz

2019-03-20 Thread Frava
Hi Chris, thank you for the reply. The team that manages that cluster is not very fond of upgrading SLURM, which I understand. So basically my heterogeneous job that only have one step is considered to have multiple steps and that's a bug in SLURM 17.11.12 ? Le mer. 20 mars 2019 à 07:02, Chris Sa

Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-20 Thread Peter Steinbach
Hi Chris, I changed the initial state a bit (the number of cores per node was misconfigured): https://raw.githubusercontent.com/psteinb/docker-centos7-slurm/18.08.5-with-gres/slurm.conf But that doesn't change things. Initially, I see this: # sinfo -N -l Wed Mar 20 09:03:26 2019 NODELIST NO

Re: [slurm-users] Can one specify attributes on a GRES resource?

2019-03-20 Thread Quirin Lohr
Hi Will, I solved this by creating a new GRES: Some nodes have VRAM:no_consume:12G Some nodes have VRAM:no_consume:24G "no_consume" because it would be for the whole node otherwise. It only works because the nodes only have one type of GPUs each. It is then requested with --gres=gpu:1,VRAM:16G