Re: [slurm-users] 4 sockets but "

2021-07-20 Thread Diego Zuccato
Il 20/07/2021 18:02, mercan ha scritto: Hi Ahmet. Did you check slurmctld log for a complain about the host line. if the slumctld can not recognize a parameter, may be it give up processing whole host line.Yup. Nothing there :( [2021-07-21T08:13:14.984] slurmctld version 18.08.5-2 started on

Re: [slurm-users] 4 sockets but "

2021-07-20 Thread Diego Zuccato
Il 20/07/2021 20:30, Ole Holm Nielsen ha scritto: Hi Ole. OK.  Performance may be a bit higher with SNC enabled. I'll try to tune performance once it starts working :) BTW tks for the hint. Uh, that's an old Slurm which will have many bugs that are fixed in later releases.  It seems that a n

Re: [slurm-users] problem with "configless" slurm.conf

2021-07-20 Thread Durai Arasan
I figured it out. slurmd doesn't run on login nodes. So you need a updated copy of slurm.conf on the login nodes. Best, Durai On Tue, Jul 20, 2021, 16:32 Ward Poelmans wrote: > Hi, > > On 20/07/2021 16:01, Durai Arasan wrote: > > > > This is limited to this one node only. Do you know how to fix

Re: [slurm-users] 4 sockets but "

2021-07-20 Thread Ole Holm Nielsen
Hi Diego, 2. Did you define a Sub NUMA Cluster (SNC) BIOS setting?  Then each physical socket would show up as two sockets (memory controllers), for a total of 8 "sockets" in your 4-socket system. I don't think so. Unless that's the default, I didn't change anything in the BIOS. Just checked t

Re: [slurm-users] 4 sockets but "

2021-07-20 Thread mercan
Hi; Did you check slurmctld log for a complain about the host line. if the slumctld can not recognize a parameter, may be it give up processing whole host line. Ahmet M. 20.07.2021 13:49 tarihinde Diego Zuccato yazdı: Hello all. It's been since yesterday that I'm facing this issue. I'm co

Re: [slurm-users] problem with "configless" slurm.conf

2021-07-20 Thread Ward Poelmans
Hi, On 20/07/2021 16:01, Durai Arasan wrote: > > This is limited to this one node only. Do you know how to fix this? I already > tried restarting the slurmd service on this node. Is the node properly definied in the slurm.conf and do the DNS hostname work? scontrol show node slurm-bm-70 Ward

Re: [slurm-users] 4 sockets but "

2021-07-20 Thread Diego Zuccato
Il 20/07/2021 13:23, Ole Holm Nielsen ha scritto: Hello Ole. The Xeon Platinum 8268 is a 24-core CPU: https://ark.intel.com/content/www/us/en/ark/products/192481/intel-xeon-platinum-8268-processor-35-75m-cache-2-90-ghz.html Yup. 1. So you have 4 physical sockets in each node? Correct. 2.

[slurm-users] problem with "configless" slurm.conf

2021-07-20 Thread Durai Arasan
Hello, We have set up "configless slurm" by passing a "conf-server" argument to slurmd on all nodes. More details here: https://slurm.schedmd.com/configless_slurm.html one of the nodes is not able to pick up the configuration: *>srun -w slurm-bm-70 --pty bash* *srun: error: fwd_tree_thread:

Re: [slurm-users] 4 sockets but "

2021-07-20 Thread Ole Holm Nielsen
Hi Diego, The Xeon Platinum 8268 is a 24-core CPU: https://ark.intel.com/content/www/us/en/ark/products/192481/intel-xeon-platinum-8268-processor-35-75m-cache-2-90-ghz.html Questions: 1. So you have 4 physical sockets in each node? 2. Did you define a Sub NUMA Cluster (SNC) BIOS setting? Then

[slurm-users] 4 sockets but "

2021-07-20 Thread Diego Zuccato
Hello all. It's been since yesterday that I'm facing this issue. I'm configuring 3 new quad-socket nodes defined as: NodeName=str957-mtx-[20-22] Sockets=4 CoresPerSocket=24 \ RealMemory=1160347 Weight=8 Feature=ib,matrix,intel,avx But scontrol show node str957-mtx-20 reports: NodeName=str957-

Re: [slurm-users] restart user login ONLY

2021-07-20 Thread Durai Arasan
Hello Brian, I apologize if this was more a general Linux question. But your recommendations on managing login nodes were useful. Thanks, Durai On Mon, Jul 19, 2021 at 7:27 PM Brian Andrus wrote: > Not really a slurm question, but here's my 2 cents: > > FWIW, if they are true zombies (PPID 1 a