Hi all, Out of curiosity, what causes that? It'd be good to know for the future -- I ran into the same issue and just edited the memory down and it works fine now, but I'd like to know why/what causes that error. I'm assuming low resources, ie memory or CPU or whatever. Mind clarifying?
On Wed, May 2, 2018, 7:11 PM John Kelly <john.ke...@broadcom.com> wrote: > Hi matt > > scontrol update nodename=odin state=resume > scontrol update nodename=odin state=idle > > -jfk > > > > On Wed, May 2, 2018 at 5:28 PM, Matt Hohmeister <hohmeis...@psy.fsu.edu> > wrote: > >> I have a two-node cluster: the server/compute node is a Dell PowerEdge >> R730; the compute node, a Dell PowerEdge R630. On both of these nodes, slurmd >> -C gives me the exact same line: >> >> >> >> [me@odin slurm]$ slurmd -C >> >> NodeName=odin CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 >> ThreadsPerCore=2 RealMemory=128655 >> >> >> >> [me@thor slurm]$ slurmd -C >> >> NodeName=thor CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 >> ThreadsPerCore=2 RealMemory=128655 >> >> >> >> So I edited my slurm.conf appropriately: >> >> >> >> NodeName=odin CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 >> ThreadsPerCore=2 RealMemory=128655 >> >> NodeName=thor CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 >> ThreadsPerCore=2 RealMemory=128655 >> >> >> >> …and it looks good, except for the drain on my server/compute node: >> >> >> >> [me@odin slurm]$ sinfo >> >> PARTITION AVAIL TIMELIMIT NODES STATE NODELIST >> >> debug* up infinite 1 drain odin >> >> debug* up infinite 1 idle thor >> >> >> >> …for the following reason: >> >> >> >> [me@odin slurm]$ sinfo -R >> >> REASON USER TIMESTAMP NODELIST >> >> Low socket*core*thre slurm 2018-05-02T11:55:38 odin >> >> >> >> Any ideas? >> >> >> >> Thanks! >> >> >> >> Matt Hohmeister >> >> Systems and Network Administrator >> >> Department of Psychology >> >> Florida State University >> >> PO Box 3064301 >> >> Tallahassee, FL 32306-4301 >> >> Phone: +1 850 645 1902 >> >> Fax: +1 850 644 7739 >> >> >> > >