Hello,
I've recently adopted setting AutoDetect=nvml in our GPU nodes' gres.conf
files to automatically populate Cores and Links for GPUs, which has been
working well.
I'm now wondering if I can prioritize having single GPU jobs scheduled on
NVLink pairs (these are PCIe A6000s) where one of the G
Hello,
I have a test cluster consist of two nodes, one as controller and the other as
compute node. I followed all the steps from slurm documentation and I want to
run jobs as containers but I get the following error when running podman run
hello-world on controller node:
time="2024-08-06T12:02