Hi,
I built a small Slurm 21.08 cluster with NVIDIA GPU hardware and NVIDIA
deepops framework a couple of years ago. It is based on Ubuntu 20.04 and
makes use of the NVIDIA pyxis/enroot container solution. For operational
validation I used the nccl-tests application in a container. nccl-tests
jobs with "mpirun --mca smsc xpmem -n $tasks
whatever-else-you-need" (which obviously may or may not be
relevant for
your case).
Cheers,
Davide
On Wed, Mar 26, 2025 at 12:51 PM Matthias Leopold via slurm-users
mailto:slurm-users@lists.schedmd.com>
<mailto:slurm-
us...@li
un --mca smsc xpmem -n $tasks
> whatever-else-you-need" (which obviously may or may not be
relevant for
> your case).
> Cheers,
> Davide
>
> On Wed, Mar 26, 2025 at 12:51 PM Matthias Leopold via slurm-users
> mailto:slurm-
us...@
ver-else-you-need" (which obviously may or may not be relevant for
your case).
Cheers,
Davide
On Wed, Mar 26, 2025 at 12:51 PM Matthias Leopold via slurm-users
mailto:slurm-users@lists.schedmd.com>>
wrote:
Hi,
I built a small Slurm 21.08 cluster with NVIDIA GPU hardware and N
Thanks for all replies. I'll take the hints with running
slurmctld/slurmdbd on separate nodes and disabling systemd units when
upgrading (I thought of that) with me.
Matthias
Am 06.03.25 um 17:04 schrieb Matthias Leopold via slurm-users:
Hi,
I'm building Slurm Debian packages fr
Hi,
I'm building Slurm Debian packages from SchedMD sources using this
tutorial https://www.schedmd.com/slurm/installation-tutorial/.
Now I tried upgrading (minor release upgrade within 24.05) using these
packages. https://slurm.schedmd.com/upgrades.html tells me to upgrade
(a) slurmdbd (b) sl
Hi,
I compiled and installed Slurm 24.05 on Ubuntu 22.04 following this
tutorial: https://www.schedmd.com/slurm/installation-tutorial/
Systemd service files are from deb packages that result from this.
Do I have to worry that slurmctld and slurmd don't write PID files
although SlurmctldPidFil
@altoslabs.com>
On Wed, Nov 13, 2024 at 10:21 AM Matthias Leopold via slurm-users
mailto:slurm-users@lists.schedmd.com>>
wrote:
Hi,
I'm trying to compile Slurm with NVIDIA NVML support, but the result is
unexpected. I get /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so,
Hi,
I'm trying to compile Slurm with NVIDIA NVML support, but the result is
unexpected. I get /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so, but when
I do "ldd /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so" there is no
reference to /lib/x86_64-linux-gnu/libnvidia-ml.so.1 (which I would
expect).
Hi,
I need to take care of a 17.02 Slurm cluster (I'm preparing it for
upgrades). I see that slurmdbd logs various "cluster not registered"
messages at startup (DBD_CLUSTER_TRES,DBD_JOB_START,DBD_STEP_START), but
I don't see a real problem. Accounting works. Do I have to worry? Can
this be re
10 matches
Mail list logo