[slurm-users] Re: Recommended Stable Slurm Version for >100P Scale Clusters

John Hearns via slurm-users Sun, 16 Nov 2025 07:34:45 -0800

I would take a step back and ask how you intend to install and manage this
cluster.


CPU only or GPUs ?
OS ?
Interconnect fabric?
Storage ?

Power per rack? Cooling?
Monitoring?

On Sun, Nov 16, 2025, 2:39 PM KK via slurm-users <
[email protected]> wrote:

> We are currently planning to deploy a new HPC system with a total compute
> capacity exceeding 100 PF. As part of our preparation, we would like to
> understand which Slurm versions are considered stable and widely used at
> this scale.
>
> Could you please share your recommendations or experience regarding:
>
> 1. Which Slurm version is currently running reliably on very large-scale
> clusters (>100 PF or >10k nodes)?
>
> 2. Whether there are any versions we should avoid due to known issues at
> large scale.
>
> 3. Any best practices or configuration considerations for Slurm
> deployments of this size.
>
>
> --
> slurm-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>

-- 
slurm-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[slurm-users] Re: Recommended Stable Slurm Version for >100P Scale Clusters

Reply via email to