Hi David,
The topology.conf file groups nodes into sets such that parallel jobs
will not be scheduled by Slurm across disjoint sets. Even though the
topology.conf man-page refers to network switches, it's really about
topology rather than network.
You may use fake (non-existing) switch name
Hi,
I'm seeing something similar.
slurmdbd version is 21.08.4
All the slurmd's & slurmctld's are version 20.11.8
This is what is in the slurmdbd.log
[2021-12-07T17:16:50.001] error: unpack_header: protocol_version 8704 not
supported
[2021-12-07T17:16:50.001] error: unpacking header
[2021-12-0
You can schedule jobs across the two racks, with any given job only using one
rack, by specifying
#SBATCH --partition rack1,rack2
It'll only use 1 partition, in order of priority (not liti
I never found a way for topology to do that - all I could get it to do is to
prefer to keep things within a
This should be fine assuming you don't mind the mismatch in CPU speeds.
Unless the codes are super sensitive to topology things should be okay
as modern IB is wicked fast.
In our environment here we have a variety of different hardware types
all networked together on the same IB fabric. Tha
Hello,
These days we have now enabled topology aware scheduling on our Slurm cluster.
One part of the cluster consists of two racks of AMD compute nodes. These racks
are, now, treated as separate entities by Slurm. Soon, we may add another set
of AMD nodes with slightly difference CPU specs to
Dear Thekla,
Thekla Loizou writes:
> Dear Loris,
>
> There is no specific node required for this array. I can verify that from
> "scontrol show job 124841" since the requested node list is empty:
> ReqNodeList=(null)
>
> Also, all 17 nodes of the cluster are identical so all nodes fulfill the jo
Dear Loris,
There is no specific node required for this array. I can verify that
from "scontrol show job 124841" since the requested node list is empty:
ReqNodeList=(null)
Also, all 17 nodes of the cluster are identical so all nodes fulfill the
job requirements, not only node cn06.
By "sav
Hi,
I'm unsuccessful in running an X11 application with a remote
SlurmctldHost. Let us call myfrontalnode the node from which the user is
running the slurm commands that is different from the host SlurmctldHost.
What fails is the following :
ssh -X myfrontalnode
srun --x11 xclock
which
Hi Thekla,
Thekla Loizou writes:
> Dear all,
>
> I have noticed that SLURM schedules several jobs from a job array on the same
> node with the same start time and end time.
>
> Each of these jobs requires the full node. You can see the squeue output
> below:
>
> JOBID PARTITION S
Dear all,
I have noticed that SLURM schedules several jobs from a job array on the
same node with the same start time and end time.
Each of these jobs requires the full node. You can see the squeue output
below:
JOBID PARTITION ST START_TIME NODES
SCHEDNODES NOD
10 matches
Mail list logo