Hello Björn-Helge.
Sigh ... First of all, of course, many thanks! This indeed helped a lot! Two comments: a) Why are Interfaces at Slurm tools changed? I once learned that the Interfaces must be designed to be as stable as possible. Otherwise, users get frustrated and go away. b) This only works if I have to specify --mem for a task. Although manageable, I wonder why one needs to be that restrictive. In principle, in the use case outlined, one task could use a bit less memory, and the other may require a bit more the half of the node's available memory. (So clearly this isn't always predictable.) I only hope that in such cases the second task does not die from OOM ... (I will know soon, I guess.) Really, thank you! Was a very helpful hint! Cheers, Martin ________________________________ Von: slurm-users <slurm-users-boun...@lists.schedmd.com> im Auftrag von Bjørn-Helge Mevik <b.h.me...@usit.uio.no> Gesendet: Mittwoch, 18. Januar 2023 13:49 An: slurm-us...@schedmd.com Betreff: Re: [slurm-users] srun jobfarming hassle question "Ohlerich, Martin" <martin.ohler...@lrz.de> writes: > Dear Colleagues, > > > already for quite some years now are we again and again facing issues on our > clusters with so-called job-farming (or task-farming) concepts in Slurm jobs > using srun. And it bothers me that we can hardly help users with requests in > this regard. > > > From the documentation > (https://slurm.schedmd.com/srun.html#SECTION_EXAMPLES), it reads like this. > > -------------------------------------------> > > ... > > #SBATCH --nodes=?? > > ... > > srun -N 1 -n 2 ... prog1 &> log.1 & > > srun -N 1 -n 1 ... prog2 &> log.2 & Unfortunately, that part of the documentation is not quite up-to-date. The semantics of srun has changed a little the last couple of years/Slurm versions, so today, you have to use "srun --exact ...". From "man srun" (version 21.08): --exact Allow a step access to only the resources requested for the step. By default, all non-GRES resources on each node in the step allocation will be used. This option only applies to step allocations. NOTE: Parallel steps will either be blocked or rejected until requested step resources are available unless --over‐ lap is specified. Job resources can be held after the com‐ pletion of an srun command while Slurm does job cleanup. Step epilogs and/or SPANK plugins can further delay the release of step resources. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo