Hi,
I built a small Slurm 21.08 cluster with NVIDIA GPU hardware and NVIDIA
deepops framework a couple of years ago. It is based on Ubuntu 20.04 and
makes use of the NVIDIA pyxis/enroot container solution. For operational
validation I used the nccl-tests application in a container. nccl-tests
On 4/4/25 5:23 am, Michael Milton via slurm-users wrote:
Plain srun re-uses the existing Slurm allocation, and specifying
resources like --mem will just request then from the current job rather
than submitting a new one
srun does that as it sees all the various SLURM_* environment variables
Thanks Davide,
It's true that srun will create an allocation if you aren't inside a job,
but if you are inside a job and you request more resources than it has,
then srun will just fail. This is the key issue that I want to avoid.
On Sat, Apr 5, 2025 at 11:48 AM Davide DelVento
wrote:
> The pla
You can set a partition QoS which specifies a minimum. We have such a qos on
our large-gpu partition; we don’t want people scheduling small stuff to it, so
we have this qos:
$ sacctmgr show qos large-gpu --json | jq '.QOS[] | { name: .name, min_limits:
.limits.min }'
{
"name": "large-gpu
Hello David,
thank you, this might be a simple and a viable solution to this problem. I'll
test both
(yours and Megan) solutions and then decide.
Kind regards
--
On Sun, Mar 30, 2025 at 08:19:12AM -0600, Davide DelVento via slurm-users wrote:
Hi Kamil,
I don't use QoS, so I don't have a dire
Ciao Massimo,
How about creating another queue cpus_in_the_gpu_nodes (or something less
silly) which targets the GPU nodes but does not allow the allocation of the
GPUs with gres and allocates 96-8 (or whatever other number you deem
appropriate) of the CPUs (and similarly with memory)? Actually it
I'm helping with a workflow manager that needs to submit Slurm jobs. For
logging and management reasons, the job (e.g. srun python) needs to be run
as though it were a regular subprocess (python):
- stdin, stdout and stderr for the command should be connected to
process inside the job
- s