[slurm-users] Re: slurmctld HA ; backup controller doesn't schedule and start any job

2025-04-09 Thread Hiromasa Watanabe via slurm-users
Hi all, Fortunately I solved this problem by changing Slurm version from 22.05.9 to 23.11.10. With Slurm 23.11.10, after stopping the primary slurmctld and slurmdbd, when I submit a job with sbatch while backup slurmctld and slurmdbd are running, the job becomes scheduled and runs. I don't know why

[slurm-users] errors while trying to setup slurmdbd.

2025-04-09 Thread Steven Jones via slurm-users
root@vuwunicohpcdbp1 ~]# systemctl status slurmdbd × slurmdbd.service - Slurm DBD accounting daemon Loaded: loaded (/usr/lib/systemd/system/slurmdbd.service; enabled; preset: disabled) Active: failed (Result: exit-code) since Thu 2025-04-10 10:28:52 NZST; 2min 33s ago Duration: 85ms

[slurm-users] Re: errors while trying to setup slurmdbd.

2025-04-09 Thread Steven Jones via slurm-users
Lik duh ty regards Steven Jones B.Eng (Hons) Technical Specialist - Linux RHCE Victoria University, Digital Solutions, Level 8 Rankin Brown Building, Wellington, NZ 6012 0064 4 463 6272 From: Christopher Samuel via slurm-users Sent: Thursday, 10 Apr

[slurm-users] Re: errors while trying to setup slurmdbd.

2025-04-09 Thread Christopher Samuel via slurm-users
Hi Steven, On 4/9/25 5:00 pm, Steven Jones via slurm-users wrote: Apr 10 10:28:52 vuwunicohpcdbp1.ods.vuw.ac.nz slurmdbd[2413]: slurmdbd: fatal: This host not configured to run SlurmDBD ((vuwunicohpcdbp1 or vuwunicohp> ^^^ that's the critical error message, and it's reporting that because s

[slurm-users] pam error, related to accounting?

2025-04-09 Thread David Bremner via slurm-users
Recently I enabled accounting on my tiny (1 compute node, one head node) slurm cluster. slurmdbd.conf looks like AuthType=auth/munge DbdHost=vertex DbdPort=6819 SlurmUser=slurm StorageHost=localhost StorageType=accounting_storage/mysql StorageUser=slurm StoragePa

[slurm-users] recommended freeIPMI version

2025-04-09 Thread Heckes, Frank via slurm-users
Hi all, I’d like to update to SLURM version 24.11.4. I was searching for a recommendation for freeIPMI (power measurement), but couldn’t find one on the schedMD web-pages (sorry, if I overlooked it). The latest available freeIPMI is 1.6.15 (https://ftp.gnu.org/gnu/freeipmi/freeipmi-1.6.15.tar.

[slurm-users] Re: recommended freeIPMI version

2025-04-09 Thread Ole Holm Nielsen via slurm-users
On 09-04-2025 18:23, Daniel Letai via slurm-users wrote: Although 1.6.15 is latest and greatest, there is already a patch https://lists.gnu.org/archive/html/freeipmi-devel/2025-02/msg0.html for an issue that was severe enough to fail to build on fedora42 https://bugzilla.redhat.com/show_bug

[slurm-users] Re: recommended freeIPMI version

2025-04-09 Thread Heckes, Frank via slurm-users
Hello Ole, Hell Daniel, Many thanks for the quick reply and the information. That was what I was looking for. Thanks a lot. Cheers, -Frank From: Daniel Letai via slurm-users Sent: Wednesday, 9 April 2025 18:24 To: slurm-users@lists.schedmd.com Subject: [slurm-users] Re: recommended freeI

[slurm-users] Re: recommended freeIPMI version

2025-04-09 Thread Daniel Letai via slurm-users
Although 1.6.15 is latest and greatest, there is already a patch https://lists.gnu.org/archive/html/freeipmi-devel/2025-02/msg0.html for an issue that was severe enough to fail to build on fedora42 https://bugzilla.redhat.com/show_bug.cgi?id=2340176. The patch is a

[slurm-users] Re: recommended freeIPMI version

2025-04-09 Thread Ole Holm Nielsen via slurm-users
Hi Frank, The latest and greatest FreeIPMI version is 1.6.15, but you must build the RPM including Systemd support, see https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#optional-prerequisite Some additional remarks are in https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configurati

[slurm-users] slurmctld HA ; backup controller doesn't schedule and start any job

2025-04-09 Thread hiromasa.watanabe--- via slurm-users
slurmctld HA ; backup controller doesn't schedule and start any job Hi all, I am trying a slurmctld HA configuration on two servers, using slurm version 22.05.9 of AlmaLinux 9.4. The problem is, after stopping the primary slurmctld and slurmdbd, when I submit a job with sbatch while backup slur