Il 20/07/2021 18:02, mercan ha scritto:
Hi Ahmet.
Did you check slurmctld log for a complain about the host line. if the
slumctld can not recognize a parameter, may be it give up processing
whole host line.Yup. Nothing there :(
[2021-07-21T08:13:14.984] slurmctld version 18.08.5-2 started on
Il 20/07/2021 20:30, Ole Holm Nielsen ha scritto:
Hi Ole.
OK. Performance may be a bit higher with SNC enabled.
I'll try to tune performance once it starts working :) BTW tks for the hint.
Uh, that's an old Slurm which will have many bugs that are fixed in
later releases. It seems that a n
I figured it out. slurmd doesn't run on login nodes. So you need a updated
copy of slurm.conf on the login nodes.
Best,
Durai
On Tue, Jul 20, 2021, 16:32 Ward Poelmans wrote:
> Hi,
>
> On 20/07/2021 16:01, Durai Arasan wrote:
> >
> > This is limited to this one node only. Do you know how to fix
Hi Diego,
2. Did you define a Sub NUMA Cluster (SNC) BIOS setting? Then each
physical socket would show up as two sockets (memory controllers), for
a total of 8 "sockets" in your 4-socket system.
I don't think so. Unless that's the default, I didn't change anything in
the BIOS. Just checked t
Hi;
Did you check slurmctld log for a complain about the host line. if the
slumctld can not recognize a parameter, may be it give up processing
whole host line.
Ahmet M.
20.07.2021 13:49 tarihinde Diego Zuccato yazdı:
Hello all.
It's been since yesterday that I'm facing this issue.
I'm co
Hi,
On 20/07/2021 16:01, Durai Arasan wrote:
>
> This is limited to this one node only. Do you know how to fix this? I already
> tried restarting the slurmd service on this node.
Is the node properly definied in the slurm.conf and do the DNS hostname work?
scontrol show node slurm-bm-70
Ward
Il 20/07/2021 13:23, Ole Holm Nielsen ha scritto:
Hello Ole.
The Xeon Platinum 8268 is a 24-core CPU:
https://ark.intel.com/content/www/us/en/ark/products/192481/intel-xeon-platinum-8268-processor-35-75m-cache-2-90-ghz.html
Yup.
1. So you have 4 physical sockets in each node?
Correct.
2.
Hello,
We have set up "configless slurm" by passing a "conf-server" argument to
slurmd on all nodes. More details here:
https://slurm.schedmd.com/configless_slurm.html
one of the nodes is not able to pick up the configuration:
*>srun -w slurm-bm-70 --pty bash*
*srun: error: fwd_tree_thread:
Hi Diego,
The Xeon Platinum 8268 is a 24-core CPU:
https://ark.intel.com/content/www/us/en/ark/products/192481/intel-xeon-platinum-8268-processor-35-75m-cache-2-90-ghz.html
Questions:
1. So you have 4 physical sockets in each node?
2. Did you define a Sub NUMA Cluster (SNC) BIOS setting? Then
Hello all.
It's been since yesterday that I'm facing this issue.
I'm configuring 3 new quad-socket nodes defined as:
NodeName=str957-mtx-[20-22] Sockets=4 CoresPerSocket=24 \
RealMemory=1160347 Weight=8 Feature=ib,matrix,intel,avx
But scontrol show node str957-mtx-20 reports:
NodeName=str957-
Hello Brian,
I apologize if this was more a general Linux question. But your
recommendations on managing login nodes were useful.
Thanks,
Durai
On Mon, Jul 19, 2021 at 7:27 PM Brian Andrus wrote:
> Not really a slurm question, but here's my 2 cents:
>
> FWIW, if they are true zombies (PPID 1 a
11 matches
Mail list logo