Hello list,
We have just enabled "gres/shard" in order to enable sharing of GPUs on our
cluster. I am now looking for examples of user-facing documentation on this
feature. If anyone has something, and can send a URL or other example, I'd
appreciate it.
Thanks,
Will
--
slurm-users mailing li
Hi Reed,
Unfortunately, we had the same issue with 22.05.9; SchedMD advice was to
upgrade to 23.11.x, and this appears to have resolved this issue for us.
SchedMD support said to us, "We did a lot of work regarding shards in the 23.11
release."
HTH,
Will
--
slurm-users mailing list -- slurm-
Hi list,
In our institution, our instructions to users who want to spawn an interactive
job (for us, a bash shell) have always been to do "srun ..." from the login
node, which has always been working well for us. But when we had a recent Slurm
training, the SchedMD folks advised us to use "sall
Thanks for the logical explanation, Paul. So when I rewrite my user
documentation, I'll mention using `salloc` instead of `srun`.
Yes, we do have `LaunchParameters=use_interactive_step` set on our cluster, so
salloc gives a shell on the allocated host.
Best,
Will
--
slurm-users mailing list -
Hi all,
I have a single-partition Slurm cluster (the single partition name being
"default_queue") that I now want to implement multiple different queues on to
subdivide the resources. Say the new default queue is "queue1"; Should I set
the "default_queue" to `State=INACTIVE` and then use `Alter
Can I ask if this replaces the work on "SUNK" that was previously announced?
(but never released as open-source on GitHub as was planned; looks like it is
only available on CoreWeave Cloud?)
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users
We are getting a few calls to support container workloads on our Slurm cluster;
I want to support these user's usecases, so am looking into it now.
The problem for me is, I'm not super-familiar with container runtimes excepting
(regular rootful) Docker... I see that any Slurm-compatible runtime