[slurm-users] Re: single node configuration

2024-04-10 Thread Steffen Grunewald via slurm-users
On Tue, 2024-04-09 at 11:07:32 -0700, Slurm users wrote: > Hi everyone, I'm conducting some tests. I've just set up SLURM on the head > node and haven't added any compute nodes yet. I'm trying to test it to > ensure it's working, but I'm encountering an error: 'Nodes required for the > job are DOWN

[slurm-users] Re: FreeBSD/aarch64: ld: error: unknown emulation: elf_aarch64

2024-05-06 Thread Steffen Grunewald via slurm-users
On Mon, 2024-05-06 at 11:38:30 +0100, Slurm users wrote: > Hello, > > I instructed port to use binutils from ports (version 2.40 native) instead > of base: > > `/usr/local/bin/ld: unrecognised emulation mode: elf_aarch64` > > ``` > /usr/local/bin/ld -V |grep aarch64 >aarch64cloudabi >aar

[slurm-users] Re: How to exclude master from computing? Set to DRAINED?

2024-06-24 Thread Steffen Grunewald via slurm-users
On Mon, 2024-06-24 at 13:54:43 +0200, Slurm users wrote: > Dear Slurm users, > > in our project we exclude the master from computing before starting > Slurmctld. We used to exclude the master from computing by simply not > mentioning it in the configuration i.e. just not having: > >     Partition

[slurm-users] Background tasks in Slurm scripts?

2024-07-26 Thread Steffen Grunewald via slurm-users
Good morning, yesterday I came across a Slurm (sbatch) script that, after doing some stuff in the foreground, runs another executable in the background - and doesn't "wait" for it to finish - literally the last line of the script is executable & (and that executable is supposed to take several 1

[slurm-users] Re: Background tasks in Slurm scripts?

2024-07-26 Thread Steffen Grunewald via slurm-users
On Fri, 2024-07-26 at 10:42:45 +0300, Slurm users wrote: > Good Morning; > > This is not a slurm issue. This is a default shell script feature. If you > want to wait to finish until all background processes, you should use wait > command after all. Thank you - I already knew this in principle, an

[slurm-users] Re: Slurm fails before nvidia-smi command

2024-07-29 Thread Steffen Grunewald via slurm-users
On Mon, 2024-07-29 at 11:23:12 +0300, Slurm users wrote: > Hi there all, > > We have Dell server with 2 x Nvidia H100 and running slurm on it. After > restart server if we do not write nvidia-smi command slurm fails. When we > run nvidia-smi && systemctl restart slurmd && systemctl restart slurmct

[slurm-users] Find out submit host of past job?

2024-08-07 Thread Steffen Grunewald via slurm-users
Hello everyone, I've grepped the manual pages and crawled the 'net, but couldn't find any answer to the following problem: I can see that the ctld keeps a record of it below /var/spool/slurm - as long as the job is running or waiting (and shown by "squeue") - and that this stores the environment

[slurm-users] Re: Find out submit host of past job?

2024-08-07 Thread Steffen Grunewald via slurm-users
On Wed, 2024-08-07 at 08:55:21 -0400, Slurm users wrote: > Warning on that one, it can eat up a ton of database space (depending on > size of environment, uniqueness of environment between jobs, and number of > jobs). We had it on and it nearly ran us out of space on our database host. > That said

[slurm-users] Re: error: Unable to contact slurm controller (connect failure)

2024-11-18 Thread Steffen Grunewald via slurm-users
Hi Daniel, >  error: Unable to contact slurm controller (connect failure) > > I appreciate any insight on what could be the cause. Can you check that the slurmctld is up and running, and that the said commands work on the controller machine itself? If the slurmctld cannot be started as a service

[slurm-users] Re: slurm nodes showing down*

2024-12-09 Thread Steffen Grunewald via slurm-users
iHi, On Sun, 2024-12-08 at 21:57:11 +, Slurm users wrote: > I have just rebuilt all my nodes and I see Did they ever work before with Slurm? (Which version?) > Only 1 & 2 seem available? > While 3~6 are not Either you didn't wait long enough (5 minutes should be sufficient), or the "down*"

[slurm-users] Re: Permission denied for slurmdbd.conf

2025-01-07 Thread Steffen Grunewald via slurm-users
On Sat, 2024-12-28 at 22:59:45 -, Slurm users wrote: > ls -ls /usr/local/slurm/etc/slurmdbd.conf > 4 -rw--- 1 slurm slurm 497 Dec 28 16:34 /usr/local/slurm/etc/slurmdbd.conf > > sudo -u slurm /usr/local/slurm/sbin/slurmdbd -Dvvv > > slurmdbd: error: s_p_parse_file: unable to read > "/

[slurm-users] Re: formatting node names

2025-01-07 Thread Steffen Grunewald via slurm-users
On Mon, 2025-01-06 at 12:55:12 -0700, Slurm users wrote: > Hi all, > I remember seeing on this list a slurm command to change a slurm-friendly > list such as > > gpu[01-02],node[03-04,12-22,27-32,36] > > into a bash friendly list such as > > gpu01 > gpu02 > node03 > node04 > node12 > etc I alwa

[slurm-users] Re: Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions

2025-01-07 Thread Steffen Grunewald via slurm-users
On Sat, 2025-01-04 at 08:11:21 -, Slurm users wrote: > JOBID PARTITION NAME USER ST TIME NODES > NODELIST(REASON) > 26 cpu myscriptuser1 PD 0:00 4 > (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher >

[slurm-users] Re: Unexpected node got allocation

2025-01-09 Thread Steffen Grunewald via slurm-users
On Thu, 2025-01-09 at 07:51:40 -0500, Slurm users wrote: > Hello there and good morning from Baltimore. > > I have a small cluster with 100 nodes. When the cluster is completely empty > of all jobs, the first job gets allocated to node 41. In other clusters, > the first job gets allocated to mode

[slurm-users] Re: setting up slurmdbd (fail)

2025-03-04 Thread Steffen Grunewald via slurm-users
On Tue, 2025-03-04 at 01:03:00 +, Slurm users wrote: > I am trying to add slurmdbd to my first attempt of slurmctld. > > I have mariadb 10.11 running and permissions set. > > MariaDB [(none)]> CREATE DATABASE slurm_acct_db; > Query OK, 1 row affected (0.000 sec) > > MariaDB [(none)]> show da