I am pretty sure with vanilla slurm is impossible.

What it might be possible (maybe) is submitting 5 core jobs and using some
pre-post scripts which immediately before the job start change the
requested number of cores to "however are currently available on the node
where it is scheduled to run". That feels like a nightmare script to write,
prone to race conditions (e.g. what is slurm has scheduled another job on
the same node to start almost at the same time?). It also may be
impractical (the modified job will probably need to be rescheduled,
possibly landing on another node with a different number of idle cores) or
impossible (maybe slurm does not offer the possibility of changing the
requested nodes after the job has been assigned a node, only at other
times, such as submission time).

What is theoretically possible would be to use slurm only as a "dummy bean
counter": submit the job as a 5 core job and let it land and start on a
node. The job itself does nothing other than counting the number of idle
nodes on that core and submitting *another* slurm job of the highest
priority targeting that specific node (option -w) and that number of cores.
If the second job starts, then by some other mechanism, probably external
to slurm, the actual computational job will start on the appropriate nodes.
If that happens outside of slurm, it would be very hard to get right (with
the appropriate cgroup for example). If that happens inside of slurm, it
needs some functionality which I am not aware exists, but it sounds more
likely than "changing the number of cores at the moment the job start". For
example the two jobs could merge into one. Or the two jobs could stay
separate, but share some MPI communicator or thread space (but again have
troubles with the separate cgroups they live in).

So in conclusion if this is just a few jobs where you are trying to be more
efficient, I think it's better to give up. If this is something of really
large scale and important, then my recommendation would be to purchase
official Slurm support and get assistance from them

On Fri, Aug 2, 2024 at 8:37 AM Laura Hild via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> My read is that Henrique wants to specify a job to require a variable
> number of CPUs on one node, so that when the job is at the front of the
> queue, it will run opportunistically on however many happen to be available
> on a single node as long as there are at least five.
>
> I don't personally know of a way to specify such a job, and wouldn't be
> surprised if there isn't one, since as other posters have suggested,
> usually there's a core-count sweet spot that should be used, achieving a
> performance goal while making efficient use of resources.  A cluster
> administrator may in fact not want you using extra cores, even if there's a
> bit more speed-up to be had, when those cores could be used more
> efficiently by another job.  I'm also not sure how one would set a
> judicious TimeLimit on a job that would have such a variable wall-time.
>
> So there is the question of whether it is possible, and whether it is
> advisable.
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to