[slurm-users] Re: slurm releases

2025-04-05 Thread Ryan Novosielski via slurm-users
There are multiple supported releases, and you can upgrade from any of the last 3 releases at present, which are released every 6 months. Major releases are more disruptive, and there is support for the previous versions to provide continuity of support. https://slurm.schedmd.com/upgrades.html

[slurm-users] Re: Cloud elastic help

2025-01-29 Thread Ryan Novosielski via slurm-users
> On Jan 29, 2025, at 16:49, mark.w.moorcroft--- via slurm-users > wrote: > > It helps to unblock port 6818 on the node image. #eyeroll Bear in mind there are also port requirements on the login node too if you plan to run interactive jobs (they will otherwise hang when executed). -- #BlackLi

[slurm-users] Re: sinfo not listing any partitions

2024-11-27 Thread Ryan Novosielski via slurm-users
At this point, I’d probably crank up the logging some and see what it’s saying in slurmctld.log. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technolo

[slurm-users] Re: sinfo not listing any partitions

2024-11-27 Thread Ryan Novosielski via slurm-users
If you’re sure you’ve restarted everything after the config change, are you also sure that you don’t have that stuff hidden from your current user? You can try -a to rule that out. Or run as root. -- #BlackLivesMatter || \\UTGERS, |---*O*-

[slurm-users] Re: A note on updating Slurm from 23.02 to 24.05 & multi-cluster

2024-09-26 Thread Ryan Novosielski via slurm-users
On Sep 26, 2024, at 15:03, Ward Poelmans via slurm-users wrote: Hi Bjørn-Helge, On 26/09/2024 09:50, Bjørn-Helge Mevik via slurm-users wrote: Ward Poelmans via slurm-users writes: We hit a snag when updating our clusters from Slurm 23.02 to 24.05. After updating the slurmdbd, our multi clust

[slurm-users] Re: SlurmDBD errors

2024-09-18 Thread Ryan Novosielski via slurm-users
I don’t think you should expect this from overlapping nodes in partitions, but instead whe you’re allowing hardware itself to be oversubscribed. Was your upgrade in this window? I would suggest looking for runaway jobs, which you’ve done, and am not sure what else. -- #BlackLivesMatter ||

[slurm-users] Re: Unsupported RPC version by slurmctld 19.05.3 from client slurmd 22.05.11

2024-06-17 Thread Ryan Novosielski via slurm-users
The benefits are pretty limited if you don’t have the server upgraded anyway, unless you’re just saying it’s easier to install a current client. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novo

[slurm-users] Re: diagnosing why interactive/non-interactive job waits are so long with State=MIXED

2024-06-04 Thread Ryan Novosielski via slurm-users
We do have bf_continue set. And also bf_max_job_user=50, because we discovered that one user can submit so many jobs that it will hit the limit of the number it’s going to consider and not run some jobs that it could otherwise run. On Jun 4, 2024, at 16:20, Robert Kudyba wrote: Thanks for the

[slurm-users] Re: diagnosing why interactive/non-interactive job waits are so long with State=MIXED

2024-06-04 Thread Ryan Novosielski via slurm-users
This is relatively true of my system as well, and I believe it’s that the backfill schedule is slower than the main scheduler. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu |

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread Ryan Novosielski via slurm-users
Are you looking at the log/what appears on the screen, and do you know for a fact that it is all the way up (should say "version started” at the end)? If that’s not it, you could have a permissions thing or something. I do not expect you’d need to extend the timeout for a normal run. I suspect

[slurm-users] Re: Jobs showing running but not running

2024-05-29 Thread Ryan Novosielski via slurm-users
One of the other states — down or fail, from memory — should cause it to completely drop the job. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technol

[slurm-users] Re: Removing safely a node

2024-05-16 Thread Ryan Novosielski via slurm-users
If I’m not mistaken, the manual for slurm.conf or one of the others lists either what action is needed to change every option, or has a combined list of what requires what (I can never remember and would have to look it up anyway). -- #BlackLivesMatter || \\UTGERS, |

[slurm-users] Re: Recover Batch Script Error

2024-02-16 Thread Ryan Novosielski via slurm-users
Are you absolutely certain you’ve done it before for completed jobs? I would not expect that to work for completed jobs, with the possible exception of very recently completed jobs (or am I thinking of Torque?). Other replies mention the relatively new feature (21.08?) to store the job script i