[slurm-users] Slurm 24.05 and OpenMPI

2025-04-04 Thread Matthias Leopold via slurm-users
Hi, I built a small Slurm 21.08 cluster with NVIDIA GPU hardware and NVIDIA deepops framework a couple of years ago. It is based on Ubuntu 20.04 and makes use of the NVIDIA pyxis/enroot container solution. For operational validation I used the nccl-tests application in a container. nccl-tests

[slurm-users] Re: [EXTERNAL] Re: [EXTERN] Re: Slurm 24.05 and OpenMPI

2025-03-28 Thread Matthias Leopold via slurm-users
jobs with "mpirun --mca smsc xpmem -n $tasks whatever-else-you-need" (which obviously may or may not be relevant for your case). Cheers, Davide On Wed, Mar 26, 2025 at 12:51 PM Matthias Leopold via slurm-users mailto:slurm-users@lists.schedmd.com> <mailto:slurm- us...@li

[slurm-users] Re: [EXTERNAL] Re: [EXTERN] Re: Slurm 24.05 and OpenMPI

2025-03-27 Thread Matthias Leopold via slurm-users
un --mca smsc xpmem -n $tasks > whatever-else-you-need" (which obviously may or may not be relevant for > your case). > Cheers, > Davide > > On Wed, Mar 26, 2025 at 12:51 PM Matthias Leopold via slurm-users > mailto:slurm- us...@

[slurm-users] Re: [EXTERN] Re: Slurm 24.05 and OpenMPI

2025-03-27 Thread Matthias Leopold via slurm-users
ver-else-you-need" (which obviously may or may not be relevant for your case). Cheers, Davide On Wed, Mar 26, 2025 at 12:51 PM Matthias Leopold via slurm-users mailto:slurm-users@lists.schedmd.com>> wrote: Hi, I built a small Slurm 21.08 cluster with NVIDIA GPU hardware and N

[slurm-users] Re: [EXTERN] Slurm upgrade using Debian packages

2025-03-09 Thread Matthias Leopold via slurm-users
Thanks for all replies. I'll take the hints with running slurmctld/slurmdbd on separate nodes and disabling systemd units when upgrading (I thought of that) with me. Matthias Am 06.03.25 um 17:04 schrieb Matthias Leopold via slurm-users: Hi, I'm building Slurm Debian packages fr

[slurm-users] Slurm upgrade using Debian packages

2025-03-06 Thread Matthias Leopold via slurm-users
Hi, I'm building Slurm Debian packages from SchedMD sources using this tutorial https://www.schedmd.com/slurm/installation-tutorial/. Now I tried upgrading (minor release upgrade within 24.05) using these packages. https://slurm.schedmd.com/upgrades.html tells me to upgrade (a) slurmdbd (b) sl

[slurm-users] Slurm PID Files

2024-11-20 Thread Matthias Leopold via slurm-users
Hi, I compiled and installed Slurm 24.05 on Ubuntu 22.04 following this tutorial: https://www.schedmd.com/slurm/installation-tutorial/ Systemd service files are from deb packages that result from this. Do I have to worry that slurmctld and slurmd don't write PID files although SlurmctldPidFil

[slurm-users] Re: [EXTERN] Re: Slurm and NVIDIA NVML

2024-11-13 Thread Matthias Leopold via slurm-users
@altoslabs.com> On Wed, Nov 13, 2024 at 10:21 AM Matthias Leopold via slurm-users mailto:slurm-users@lists.schedmd.com>> wrote: Hi, I'm trying to compile Slurm with NVIDIA NVML support, but the result is unexpected. I get /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so,

[slurm-users] Slurm and NVIDIA NVML

2024-11-13 Thread Matthias Leopold via slurm-users
Hi, I'm trying to compile Slurm with NVIDIA NVML support, but the result is unexpected. I get /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so, but when I do "ldd /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so" there is no reference to /lib/x86_64-linux-gnu/libnvidia-ml.so.1 (which I would expect).

[slurm-users] slurmdbd 17.02: "cluster not registered" (but things work)

2024-02-19 Thread Matthias Leopold via slurm-users
Hi, I need to take care of a 17.02 Slurm cluster (I'm preparing it for upgrades). I see that slurmdbd logs various "cluster not registered" messages at startup (DBD_CLUSTER_TRES,DBD_JOB_START,DBD_STEP_START), but I don't see a real problem. Accounting works. Do I have to worry? Can this be re