There are multiple supported releases, and you can upgrade from any of the last
3 releases at present, which are released every 6 months.
Major releases are more disruptive, and there is support for the previous
versions to provide continuity of support.
https://slurm.schedmd.com/upgrades.html
> On Jan 29, 2025, at 16:49, mark.w.moorcroft--- via slurm-users
> wrote:
>
> It helps to unblock port 6818 on the node image. #eyeroll
Bear in mind there are also port requirements on the login node too if you plan
to run interactive jobs (they will otherwise hang when executed).
--
#BlackLi
At this point, I’d probably crank up the logging some and see what it’s saying
in slurmctld.log.
--
#BlackLivesMatter
|| \\UTGERS, |---*O*---
||_// the State | Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technolo
If you’re sure you’ve restarted everything after the config change, are you
also sure that you don’t have that stuff hidden from your current user? You can
try -a to rule that out. Or run as root.
--
#BlackLivesMatter
|| \\UTGERS, |---*O*-
On Sep 26, 2024, at 15:03, Ward Poelmans via slurm-users
wrote:
Hi Bjørn-Helge,
On 26/09/2024 09:50, Bjørn-Helge Mevik via slurm-users wrote:
Ward Poelmans via slurm-users writes:
We hit a snag when updating our clusters from Slurm 23.02 to
24.05. After updating the slurmdbd, our multi clust
I don’t think you should expect this from overlapping nodes in partitions, but
instead whe you’re allowing hardware itself to be oversubscribed.
Was your upgrade in this window?
I would suggest looking for runaway jobs, which you’ve done, and am not sure
what else.
--
#BlackLivesMatter
||
The benefits are pretty limited if you don’t have the server upgraded anyway,
unless you’re just saying it’s easier to install a current client.
--
#BlackLivesMatter
|| \\UTGERS, |---*O*---
||_// the State | Ryan Novosielski - novo
We do have bf_continue set. And also bf_max_job_user=50, because we discovered
that one user can submit so many jobs that it will hit the limit of the number
it’s going to consider and not run some jobs that it could otherwise run.
On Jun 4, 2024, at 16:20, Robert Kudyba wrote:
Thanks for the
This is relatively true of my system as well, and I believe it’s that the
backfill schedule is slower than the main scheduler.
--
#BlackLivesMatter
|| \\UTGERS, |---*O*---
||_// the State | Ryan Novosielski - novos...@rutgers.edu
|
Are you looking at the log/what appears on the screen, and do you know for a
fact that it is all the way up (should say "version started” at the
end)?
If that’s not it, you could have a permissions thing or something.
I do not expect you’d need to extend the timeout for a normal run. I suspect
One of the other states — down or fail, from memory — should cause it to
completely drop the job.
--
#BlackLivesMatter
|| \\UTGERS, |---*O*---
||_// the State | Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technol
If I’m not mistaken, the manual for slurm.conf or one of the others lists
either what action is needed to change every option, or has a combined list of
what requires what (I can never remember and would have to look it up anyway).
--
#BlackLivesMatter
|| \\UTGERS, |
Are you absolutely certain you’ve done it before for completed jobs? I would
not expect that to work for completed jobs, with the possible exception of very
recently completed jobs (or am I thinking of Torque?).
Other replies mention the relatively new feature (21.08?) to store the job
script i
13 matches
Mail list logo