from:"Sid Young"

[slurm-users] Re: error: Unable to contact slurm controller (connect failure)

2024-11-18 Thread Sid Young via slurm-users

A few things to look at, make sure DNS/Host name resolution works, disable any firewalls for testing, you can lock it down after, make sure the slurm.conf file is the same on all nodes. I've just done a 20.11.9 to 24.05.2 upgrade along with a Centos7.9 to rhel 9.10 upgrade on all my nodes. Sid

[slurm-users] Re: 转发: What is the safe upgrade path when upgrade from slurm21.08 and mariadb5.5?

2024-10-29 Thread Sid Young via slurm-users

r you end up with a situation where the slurmd can't talk to the running slurmstepd and the job(s) gets lost. (Shows as a "Protocol Error"). Ole sent me a link to this guide which mostly worked. https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#upgrade-slurmd-on-nodes

[slurm-users] ResumeAfterTime - Lacking Info

2024-08-28 Thread Sid Young via slurm-users

G'Day all, Can anyone shed light on the parameter "Resume AfterTime" returned from the command "scontrol show node XXX" Can it be used to automatically resume a "Down"ed node? Sid -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@li

[slurm-users] Manually initiating a DB purge

2024-08-21 Thread Sid Young via slurm-users

G'Day all, I have 3 years worth of job records in the slurm DB and we do not have any need to actually track anything at this stage, I would like to keep 12months worth of jobs so I need to purge 2 years worth at some point.. Is there a command to issue via scontrol to kick off a SlurmPurge using

[slurm-users] Upgrade compute node to 24.05.2

2024-08-14 Thread Sid Young via slurm-users

G'Day all, I've been upgrading cmy cluster from 20.11.0 in small steps to get to 24.05.2. Currently 1 have all nodes on 23.02.8, the controller on 24.05.2 and a single test node on 24.05.2. All are Centos 7.9 (upgrade to Oracle Linux 8.10 is Phase 2 of the upgrades). When I check the slurmd statu

[slurm-users] Re: Upgrade node while jobs running

2024-08-02 Thread Sid Young via slurm-users

if it goes wrong? 😊 > > > > Regards, > > > > Tim > > -- > > *Tim Cutts* > > Scientific Computing Platform Lead > > AstraZeneca > > > > Find out more about R&D IT Data, Analytics & AI and how we can support you > by visiting our Service

[slurm-users] Upgrade node while jobs running

2024-07-31 Thread Sid Young via slurm-users

G'day all, I've been waiting for node to become idle before upgrading them however some jobs take a long time. If I try to remove all the packages I assume that kills the slurmstep program and with it the job. Sid -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send

[slurm-users] Re: Slurm management of dual-node server trays?

2024-02-23 Thread Sid Young via slurm-users

Thats a Very interesting design and looking at the SD665 V3 documentation am I correct each node has dual 25GBs SFP28 interfaces? If so, the despite dual nodes in a 1u configuration, you actually have 2 separate servers? Sid On Fri, 23 Feb 2024, 22:40 Ole Holm Nielsen via slurm-users, < slurm-u

[slurm-users] Upgrade from 20.11.0 to Slurm version 22.05.6 ?

2022-11-10 Thread Sid Young

Is there a direct upgrade path from 20.11.0 to 22.05.6 or is it in multiple steps? Sid Young On Fri, Nov 11, 2022 at 7:53 AM Marshall Garey wrote: > We are pleased to announce the availability of Slurm version 22.05.6. > > This includes a fix to core selection for steps which cou

Re: [slurm-users] Stopping new jobs but letting old ones end

2022-01-31 Thread Sid Young

Brian / Christopher, that looks like a good process, thanks guys, I will do some testing and let you know. if I mark a partition down and it has running jobs, what happens to those jobs, do they keep running? Sid Young W: https://off-grid-engineering.com W: (personal) https://sidyoung.com/ W

Re: [slurm-users] Stopping new jobs but letting old ones end

2022-01-31 Thread Sid Young

Sid Young W: https://off-grid-engineering.com W: (personal) https://sidyoung.com/ W: (personal) https://z900collector.wordpress.com/ On Tue, Feb 1, 2022 at 3:02 PM Christopher Samuel wrote: > On 1/31/22 4:41 pm, Sid Young wrote: > > > I need to replace a faulty DIMM chim in our log

[slurm-users] Stopping new jobs but letting old ones end

2022-01-31 Thread Sid Young

20-30 minutes, scheduler is a separate node and I could email back any users who try to SSH while the node is down. Sid Young W: https://off-grid-engineering.com W: (personal) https://sidyoung.com/ W: (personal) https://z900collector.wordpress.com/

Re: [slurm-users] Submitting jobs via SystemD

2021-09-15 Thread Sid Young

Whats wrong with just using the tools as is? Sid Young On Thu, Sep 16, 2021 at 5:54 AM Ondrej Valousek wrote: > Hi list, > I am wondering if there is a plugin allowing to submit jobs via SystemD > (I.e. using systemd-run) on exec nodes. > > I have actually modified SGE source

Re: [slurm-users] [External] Node utilization for 24 hours

2021-09-07 Thread Sid Young

00%|100.00% #trihpc|energy|0.00%|0.00%|0.00%|0.00%|0.00%|0.00% #trihpc|billing|14.62%|4.78%|0.00%|80.60%|0.00%|100.00% #trihpc|fs/disk|0.00%|0.00%|0.00%|0.00%|0.00%|0.00% #trihpc|vmem|0.00%|0.00%|0.00%|0.00%|0.00%|0.00% #trihpc|pages|0.00%|0.00%|0.00%|0.00%|0.00%|0.00% Sid Young W: https://off-grid-engin

Re: [slurm-users] Regarding multiple slurm server on one machine

2021-07-27 Thread Sid Young

Why not spin them up as Virtual machines... then you could build real (separate) clusters. Sid Young W: https://off-grid-engineering.com W: (personal) https://sidyoung.com/ W: (personal) https://z900collector.wordpress.com/ On Wed, Jul 28, 2021 at 12:07 AM Brian Andrus wrote: > You can

Re: [slurm-users] Exposing only requested CPUs to a job on a given node.

2021-07-01 Thread Sid Young

Hi Luis, I have exactly the same issue with a user who needs the reported cores to reflect the requested cores. If you find a solution that works please share. :) Thanks Sid Young Translational Research Institute Sid Young W: https://off-grid-engineering.com W: (personal) https

Re: [slurm-users] [External] incorrect number of cpu's being reported in srun job

2021-06-22 Thread Sid Young

Thanks for the reply... I will look into how to configure it. Sid Young Translational Research Institute On Wed, Jun 23, 2021 at 7:06 AM Prentice Bisbal wrote: > Yes, > > You need to use the cgroups plugin. > > > On Fri, Jun 18, 2021, 12:29 AM Sid Young wrote: > >>

[slurm-users] incorrect number of cpu's being reported in srun job

2021-06-17 Thread Sid Young

ant lines from the slurm.conf file: SelectType=select/cons_res SelectTypeParameters=CR_CPU_Memory ReturnToService=1 CpuFreqGovernors=OnDemand,Performance,UserSpace CpuFreqDef=Performance Sid Young Translational Research Institute

[slurm-users] Slurm stats in JSON format

2021-06-07 Thread Sid Young

G'Day all, Is there a tool that will extract the job counts in JSON format? Such as #running, #in pending #onhold etc I am trying to build some custom dashboards for the our new cluster and this would be a really useful set of metrics to gather and display. Sid Young W: https://off

[slurm-users] slurmrestd

2021-06-06 Thread Sid Young

Hi all, I'm interested in using the slurmrestd but it does not appear to be built when you do an rpmbuild reading though the docs does not indicate a switch needed to include it (unless I missed that)... any ideas on how the rpm is built? Sid Young W: https://off-grid-engineering.

Re: [slurm-users] Determining Cluster Usage Rate

2021-05-13 Thread Sid Young

Yes, on reflection I should have said utilization rather than usage! I've been researching what the most likely combination of metrics would give me an overall utilization of the HPC. Sadly its not as clear cut as I would have hoped. Does anyone have any ideas? Sid Young On Fri, May 14,

[slurm-users] Determining Cluster Usage Rate

2021-05-13 Thread Sid Young

Hi All, Is there a way to define an effective "usage rate" of a HPC Cluster using the data captured in the slurm database. Primarily I want to see if it can be helpful in presenting to the business a case for buying more hardware for the HPC :) Sid Young

Re: [slurm-users] Questions about adding new nodes to Slurm

2021-05-04 Thread Sid Young

You can push a new conf file and issue an "scontrol reconfigure" on the fly as needed... I do it on our cluster as needed, do the nodes first then login nodes then the slurm controller... you are making a huge issue of a very basic task... Sid On Tue, 4 May 2021, 22:28 Tina Friedrich, wrote: >

Re: [slurm-users] Questions about adding new nodes to Slurm

2021-04-27 Thread Sid Young

Hi David, I use SaltStack to push out the slurm.conf file to all nodes and do a "scontrol reconfigure" of the slurmd, this makes management much easier across the cluster. You can also do service restarts from one point etc. Avoid NFS mounts for the config, if the mount locks up your screwed. htt

[slurm-users] Re: error: Unable to contact slurm controller (connect failure)

[slurm-users] Re: 转发: What is the safe upgrade path when upgrade from slurm21.08 and mariadb5.5?

[slurm-users] ResumeAfterTime - Lacking Info

[slurm-users] Manually initiating a DB purge

[slurm-users] Upgrade compute node to 24.05.2

[slurm-users] Re: Upgrade node while jobs running

[slurm-users] Upgrade node while jobs running

[slurm-users] Re: Slurm management of dual-node server trays?

[slurm-users] Upgrade from 20.11.0 to Slurm version 22.05.6 ?

Re: [slurm-users] Stopping new jobs but letting old ones end

Re: [slurm-users] Stopping new jobs but letting old ones end

[slurm-users] Stopping new jobs but letting old ones end

Re: [slurm-users] Submitting jobs via SystemD

Re: [slurm-users] [External] Node utilization for 24 hours

Re: [slurm-users] Regarding multiple slurm server on one machine

Re: [slurm-users] Exposing only requested CPUs to a job on a given node.

Re: [slurm-users] [External] incorrect number of cpu's being reported in srun job

[slurm-users] incorrect number of cpu's being reported in srun job

[slurm-users] Slurm stats in JSON format

[slurm-users] slurmrestd

Re: [slurm-users] Determining Cluster Usage Rate

[slurm-users] Determining Cluster Usage Rate

Re: [slurm-users] Questions about adding new nodes to Slurm

Re: [slurm-users] Questions about adding new nodes to Slurm

24 matches

Site Navigation

Mail list logo

Footer information