Re: [slurm-users] A Slurm topological scheduling question

2021-12-07 Thread Ole Holm Nielsen
Hi David, The topology.conf file groups nodes into sets such that parallel jobs will not be scheduled by Slurm across disjoint sets. Even though the topology.conf man-page refers to network switches, it's really about topology rather than network. You may use fake (non-existing) switch name

Re: [slurm-users] Is this a known error?

2021-12-07 Thread Sean McGrath
Hi, I'm seeing something similar. slurmdbd version is 21.08.4 All the slurmd's & slurmctld's are version 20.11.8 This is what is in the slurmdbd.log [2021-12-07T17:16:50.001] error: unpack_header: protocol_version 8704 not supported [2021-12-07T17:16:50.001] error: unpacking header [2021-12-0

Re: [slurm-users] A Slurm topological scheduling question

2021-12-07 Thread Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
You can schedule jobs across the two racks, with any given job only using one rack, by specifying #SBATCH --partition rack1,rack2 It'll only use 1 partition, in order of priority (not liti I never found a way for topology to do that - all I could get it to do is to prefer to keep things within a

Re: [slurm-users] A Slurm topological scheduling question

2021-12-07 Thread Paul Edmon
This should be fine assuming you don't mind the mismatch in CPU speeds.  Unless the codes are super sensitive to topology things should be okay as modern IB is wicked fast. In our environment here we have a variety of different hardware types all networked together on the same IB fabric.  Tha

[slurm-users] A Slurm topological scheduling question

2021-12-07 Thread David Baker
Hello, These days we have now enabled topology aware scheduling on our Slurm cluster. One part of the cluster consists of two racks of AMD compute nodes. These racks are, now, treated as separate entities by Slurm. Soon, we may add another set of AMD nodes with slightly difference CPU specs to

Re: [slurm-users] Job array start time and SchedNodes

2021-12-07 Thread Loris Bennett
Dear Thekla, Thekla Loizou writes: > Dear Loris, > > There is no specific node required for this array. I can verify that from > "scontrol show job 124841" since the requested node list is empty: > ReqNodeList=(null) > > Also, all 17 nodes of the cluster are identical so all nodes fulfill the jo

Re: [slurm-users] Job array start time and SchedNodes

2021-12-07 Thread Thekla Loizou
Dear Loris, There is no specific node required for this array. I can verify that from "scontrol show job 124841" since the requested node list is empty: ReqNodeList=(null) Also, all 17 nodes of the cluster are identical so all nodes fulfill the job requirements, not only node cn06. By "sav

[slurm-users] Failed to forward X11 with a remote scheduler

2021-12-07 Thread Jeremy Fix
Hi, I'm unsuccessful in running an X11 application with a remote SlurmctldHost. Let us call myfrontalnode the node from which the user is running the slurm commands that is different from the host SlurmctldHost. What fails is the following : ssh -X myfrontalnode srun --x11 xclock which

Re: [slurm-users] Job array start time and SchedNodes

2021-12-07 Thread Loris Bennett
Hi Thekla, Thekla Loizou writes: > Dear all, > > I have noticed that SLURM schedules several jobs from a job array on the same > node with the same start time and end time. > > Each of these jobs requires the full node. You can see the squeue output > below: > >           JOBID     PARTITION  S

[slurm-users] Job array start time and SchedNodes

2021-12-07 Thread Thekla Loizou
Dear all, I have noticed that SLURM schedules several jobs from a job array on the same node with the same start time and end time. Each of these jobs requires the full node. You can see the squeue output below:           JOBID     PARTITION  ST   START_TIME  NODES SCHEDNODES   NOD