I concur with this. Make sure your nodes are in the /etc/hosts file on the SMS. Also, if you name them by base + numerical sequence, you can configure them with a single line in Slurm (using the example below):

NodeName=radonc[01-04] CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8 ThreadsPerCore=2

On 05/04/2018 12:05 AM, Raymond Wan wrote:
Hi Eric,


On Fri, May 4, 2018 at 6:04 AM, Eric F. Alemany <ealem...@stanford.edu> wrote:
# COMPUTE NODES
NodeName=radonc[01-04] NodeAddr=10.112.0.5 10.112.0.6 10.112.0.14
10.112.0.16 CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8
ThreadsPerCore=2   State=UNKNOWN
PartitionName=debug Nodes=radonc[01-04] Default=YES MaxTime=INFINITE
State=UP


I don't know what is the problem, but my *guess* based on my own
configuration file is that we have one node per line under "NodeName".
We also don't have NodeAddr but maybe that's ok.  This means the IP
addresses of the nodes in our cluster are hard-coded in /etc/hosts.
Also, State is not given.

So, if I formatted your's to look line our's would look something like:

NodeName=radonc01 CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8
ThreadsPerCore=2
NodeName=radonc02 CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8
ThreadsPerCore=2
NodeName=radonc03 CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8
ThreadsPerCore=2
NodeName=radonc04 CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8
ThreadsPerCore=2
PartitionName=debug Nodes=radonc[01-04] Default=YES MaxTime=INFINITE State=UP

Maybe the problem is with the NodeAddr because you might have to
separate the values with a comma instead of a space?  With spaces, it
might have problems parsing?

That's my guess...

Ray


Reply via email to