[slurm-users] User-facing documentation on shard use

2024-02-28 Thread wdennis--- via slurm-users
Hello list, We have just enabled "gres/shard" in order to enable sharing of GPUs on our cluster. I am now looking for examples of user-facing documentation on this feature. If anyone has something, and can send a URL or other example, I'd appreciate it. Thanks, Will -- slurm-users mailing li

[slurm-users] Re: GPU shards not exclusive

2024-02-28 Thread wdennis--- via slurm-users
Hi Reed, Unfortunately, we had the same issue with 22.05.9; SchedMD advice was to upgrade to 23.11.x, and this appears to have resolved this issue for us. SchedMD support said to us, "We did a lot of work regarding shards in the 23.11 release." HTH, Will -- slurm-users mailing list -- slurm-

[slurm-users] salloc+srun vs just srun

2024-02-28 Thread wdennis--- via slurm-users
Hi list, In our institution, our instructions to users who want to spawn an interactive job (for us, a bash shell) have always been to do "srun ..." from the login node, which has always been working well for us. But when we had a recent Slurm training, the SchedMD folks advised us to use "sall

[slurm-users] Re: salloc+srun vs just srun

2024-02-28 Thread wdennis--- via slurm-users
Thanks for the logical explanation, Paul. So when I rewrite my user documentation, I'll mention using `salloc` instead of `srun`. Yes, we do have `LaunchParameters=use_interactive_step` set on our cluster, so salloc gives a shell on the allocated host. Best, Will -- slurm-users mailing list -

[slurm-users] Redirect jobs submitted to old partition to new

2024-04-16 Thread wdennis--- via slurm-users
Hi all, I have a single-partition Slurm cluster (the single partition name being "default_queue") that I now want to implement multiple different queues on to subdivide the resources. Say the new default queue is "queue1"; Should I set the "default_queue" to `State=INACTIVE` and then use `Alter

[slurm-users] Re: Convergence of Kube and Slurm?

2024-07-29 Thread wdennis--- via slurm-users
Can I ask if this replaces the work on "SUNK" that was previously announced? (but never released as open-source on GitHub as was planned; looks like it is only available on CoreWeave Cloud?) -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users

[slurm-users] How to select a container runtime system?

2024-08-23 Thread wdennis--- via slurm-users
We are getting a few calls to support container workloads on our Slurm cluster; I want to support these user's usecases, so am looking into it now. The problem for me is, I'm not super-familiar with container runtimes excepting (regular rootful) Docker... I see that any Slurm-compatible runtime