Hi !
I manage a small CentOS8 cluster using slurm slurm-20.11.7-1 and
OpenMPI built from sources.
- I know this OS is not maintained any more and I need to negotiate
downtime to reinstall
- I know Slurm 20.11.7 has security issue (I've built it from source
some years ago with rpmbuild -ta --w
Hi Josef,
on a cluster using pxe boot and automatic (re) installation of nodes, I
do not think you can do this with IPoIB on an infiniband interface.
On my cluster nodes I have:
- 1Gb ethernet network for OOB
- 10 or 25Gb ethernet for session, automatic deployment and management
- IB HDR100 fo
Hi,
I'm using slurm on a small 8 nodes cluster. I've recently added one GPU
node with two Nvidia A100, one with 40Gb of RAM and one with 80Gb.
As using this GPU resource increase I would like to manage this resource
with Gres to avoid usage conflict. But at this time my setup do not
works as
Le 13/11/2024 à 15:45, Roberto Polverelli Monti via slurm-users a écrit :
Hello Patrick,
On 11/13/24 12:01 PM, Patrick Begou via slurm-users wrote:
As using this GPU resource increase I would like to manage this
resource with Gres to avoid usage conflict. But at this time my setup
do not
0-80:1
*
Ben
On 13/11/2024 16:00, Patrick Begou via slurm-users wrote:
This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that
the email is genuine and the content is safe.
Le 13/11/2024 à 15:45, Roberto Polverell
Hi Kent,
on your management node could you run:
systemctl status slurmctld
and check your 'Nodename=' and 'PartitionName=...' in
/etc/slurm.conf ? In my slurm.conf I have a more detailed description
and the Nodename Keyword start with an upper case (do'nt know if
slurm.conf is case sensit
Hi slurm team,
I would ask some clarifications with slurm releases. Why two versions of
slurm are available ?
I speak of 24.05.7 versus 24.11.3 on
https://www.schedmd.com/slurm-support/release-announcements and
announces made on this list ?
I'm managing small clusters in a french public r
he cluster's limit. To clear a previously set value use the modify
command with a new value of -1 for each TRES id.
- sacctmgr(1)
The "MaxCPUs" is a limit on the number of CPUs the association can use.
-- Michael
On Fri, Apr 18, 2025 at 8:01 AM Patrick Begou via slurm-us
Hi all,
I'm trying to setup a QoS on a small 5 nodes cluster running slurm
24.05.7. My goal is to limit the resources on a (time x number of cores)
strategy to avoid one large job requesting all the resources for too
long time. I've read from https://slurm.schedmd.com/qos.html and some
discus
Us" is a limit on the number of CPUs the association
can use.
-- Michael
On Fri, Apr 18, 2025 at 8:01 AM Patrick Begou via slurm-users
wrote:
Hi all,
I'm trying to setup a QoS on a small 5 nodes cluster running
slurm
24.05.7. My goal is to
10 matches
Mail list logo