[slurm-users] Re: error: Unable to contact slurm controller (connect failure)

2024-11-18 Thread Sid Young via slurm-users
A few things to look at, make sure DNS/Host name resolution works, disable any firewalls for testing, you can lock it down after, make sure the slurm.conf file is the same on all nodes. I've just done a 20.11.9 to 24.05.2 upgrade along with a Centos7.9 to rhel 9.10 upgrade on all my nodes. Sid

[slurm-users] Re: 转发: What is the safe upgrade path when upgrade from slurm21.08 and mariadb5.5?

2024-10-29 Thread Sid Young via slurm-users
I recently upgraded from 20.11 to 24.05.2, before moving the cluster from CentOS 7.9 to Oracle Linux 8.10 The DB upgrade should be pretty simple, do a mysqldump first, then uninstall the old DB, change the repo's and install the new DB version. It should recognise the DB files on disk and access t

[slurm-users] ResumeAfterTime - Lacking Info

2024-08-28 Thread Sid Young via slurm-users
G'Day all, Can anyone shed light on the parameter "Resume AfterTime" returned from the command "scontrol show node XXX" Can it be used to automatically resume a "Down"ed node? Sid -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@li

[slurm-users] Manually initiating a DB purge

2024-08-21 Thread Sid Young via slurm-users
G'Day all, I have 3 years worth of job records in the slurm DB and we do not have any need to actually track anything at this stage, I would like to keep 12months worth of jobs so I need to purge 2 years worth at some point.. Is there a command to issue via scontrol to kick off a SlurmPurge using

[slurm-users] Upgrade compute node to 24.05.2

2024-08-14 Thread Sid Young via slurm-users
G'Day all, I've been upgrading cmy cluster from 20.11.0 in small steps to get to 24.05.2. Currently 1 have all nodes on 23.02.8, the controller on 24.05.2 and a single test node on 24.05.2. All are Centos 7.9 (upgrade to Oracle Linux 8.10 is Phase 2 of the upgrades). When I check the slurmd statu

[slurm-users] Re: Upgrade node while jobs running

2024-08-02 Thread Sid Young via slurm-users
if it goes wrong? 😊 > > > > Regards, > > > > Tim > > -- > > *Tim Cutts* > > Scientific Computing Platform Lead > > AstraZeneca > > > > Find out more about R&D IT Data, Analytics & AI and how we can support you > by visiting our Service

[slurm-users] Upgrade node while jobs running

2024-07-31 Thread Sid Young via slurm-users
G'day all, I've been waiting for node to become idle before upgrading them however some jobs take a long time. If I try to remove all the packages I assume that kills the slurmstep program and with it the job. Sid -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send

[slurm-users] Re: Slurm management of dual-node server trays?

2024-02-23 Thread Sid Young via slurm-users
Thats a Very interesting design and looking at the SD665 V3 documentation am I correct each node has dual 25GBs SFP28 interfaces? If so, the despite dual nodes in a 1u configuration, you actually have 2 separate servers? Sid On Fri, 23 Feb 2024, 22:40 Ole Holm Nielsen via slurm-users, < slurm-u