Dear Valantis, thanks for the explanation. But, I have to correct you about the second alternate approach: srun -ppartition -N1 -n4 --gres=gpu:0 --time=00:30:00 --mem=1G -Jjobname --pty /bin/bash -il srun --gres=gpu:1 -l hostname
Naturally, this is not working and in consequence the "inner" srun job step throws an error about the generic resource being not available/allocatable: user@frontend02#-bash_4.2:~:[2]$ srun -pgpu -N1 -n4 --time=00:30:00 --mem=5G --gres=gpu:0 -Jjobname --pty /bin/bash -il user@gpu006#bash_4.2:~:[1]$ srun --gres=gpu:1 hostname srun: error: Unable to create step for job 18044554: Invalid generic resource (gres) specification Test it yourself. ;-) Best Sebastian Sebastian Kraus Team IT am Institut für Chemie Gebäude C, Straße des 17. Juni 115, Raum C7 Technische Universität Berlin Fakultät II Institut für Chemie Sekretariat C3 Straße des 17. Juni 135 10623 Berlin Tel.: +49 30 314 22263 Fax: +49 30 314 29309 Email: sebastian.kr...@tu-berlin.de ________________________________________ From: Chrysovalantis Paschoulas <c.paschou...@fz-juelich.de> Sent: Friday, December 13, 2019 13:05 To: Kraus, Sebastian Subject: Re: [slurm-users] srun: job steps and generic resources Hi Sebastian, the first srun uses the gres you requested and the second waits for it to be available again. You have to do either ``` srun -ppartition -N1 -n4 --gres=gpu:1 --time=00:30:00 --mem=1G -Jjobname --pty /bin/bash -il srun --gres=gpu:0 -l hostname ``` or ``` srun -ppartition -N1 -n4 --gres=gpu:0 --time=00:30:00 --mem=1G -Jjobname --pty /bin/bash -il srun --gres=gpu:1 -l hostname ``` Best Regards, Valantis On 13.12.19 12:44, Kraus, Sebastian wrote: > Dear all, > I am facing the following nasty problem. > I use to start interactive batch jobs via: > srun -ppartition -N1 -n4 --time=00:30:00 --mem=1G -Jjobname --pty /bin/bash > -il > Then, explicitly starting a job step within such a session via: > srun -l hostname > works fine. > But, as soon as I add a generic resource to the job allocation as with: > srun -ppartition -N1 -n4 --gres=gpu:1 --time=00:30:00 --mem=1G -Jjobname > --pty /bin/bash -il > an explict job step lauched as above via: > srun -l hostname > infinitely stalls/blocks. > Hope, anyone out there able to explain me this behavior. > > Thanks and best > Sebastian > > > Sebastian Kraus > Team IT am Institut für Chemie > > Technische Universität Berlin > Fakultät II > Institut für Chemie > Sekretariat C3 > Straße des 17. Juni 135 > 10623 Berlin > > Email: sebastian.kr...@tu-berlin.de