I would take a step back and ask how you intend to install and manage this cluster.
CPU only or GPUs ? OS ? Interconnect fabric? Storage ? Power per rack? Cooling? Monitoring? On Sun, Nov 16, 2025, 2:39 PM KK via slurm-users < [email protected]> wrote: > We are currently planning to deploy a new HPC system with a total compute > capacity exceeding 100 PF. As part of our preparation, we would like to > understand which Slurm versions are considered stable and widely used at > this scale. > > Could you please share your recommendations or experience regarding: > > 1. Which Slurm version is currently running reliably on very large-scale > clusters (>100 PF or >10k nodes)? > > 2. Whether there are any versions we should avoid due to known issues at > large scale. > > 3. Any best practices or configuration considerations for Slurm > deployments of this size. > > > -- > slurm-users mailing list -- [email protected] > To unsubscribe send an email to [email protected] >
-- slurm-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
