FWIW, I have used NFS/Gluster/Luster for a SaveStateLocation at various
times on various clusters.
I have never had an issue with any of them and run clusters in size up
to 1000+ nodes. I have even used the same share to symlink all the
nodes' slurm.conf with no issue.
Of course, YMMV, bu
HA for slurmctld is not multidatacenter HA but rather a traditional HA
setup where you have two server heads off of one storage brick
(connected by SAS cables or other fast interconnect). Multidatacenter
HA has issues with keeping things in sync due to latency and IOPs (as
noted below).
So t
On 10/24/22 09:57, Diego Zuccato wrote:
Il 24/10/2022 09:32, Ole Holm Nielsen ha scritto:
> It is definitely a BAD idea to store Slurm StateSaveLocation on a slow
> NFS directory! SchedMD recommends to use local NVME or SSD disks
> because there will be many IOPS to this file system!
IIUC i
On 24/10/2022 09:32, Ole Holm Nielsen wrote:
On 10/24/22 06:12, Richard Chang wrote:
I have a two node Slurmctld setup and both will mount an NFS exported directory
as the state save location.
It is definitely a BAD idea to store Slurm StateSaveLocation on a slow NFS
directory! SchedMD reco
Il 24/10/2022 09:32, Ole Holm Nielsen ha scritto:
> It is definitely a BAD idea to store Slurm StateSaveLocation on a slow
> NFS directory! SchedMD recommends to use local NVME or SSD disks
> because there will be many IOPS to this file system!
IIUC it does have to be shared between controllers
On 10/24/22 06:12, Richard Chang wrote:
Is there a thumb rule for the size of the directory that is NFS exported,
and to be used as StateSaveLocation.
I have a two node Slurmctld setup and both will mount an NFS exported
directory as the state save location.
It is definitely a BAD idea to st
Hi,
Is there a thumb rule for the size of the directory that is NFS
exported, and to be used as StateSaveLocation.
I have a two node Slurmctld setup and both will mount an NFS exported
directory as the state save location.
Let me know your thoughts.
Thanks & regards,
RC