Hi matt scontrol update nodename=odin state=resume scontrol update nodename=odin state=idle
-jfk On Wed, May 2, 2018 at 5:28 PM, Matt Hohmeister <hohmeis...@psy.fsu.edu> wrote: > I have a two-node cluster: the server/compute node is a Dell PowerEdge > R730; the compute node, a Dell PowerEdge R630. On both of these nodes, slurmd > -C gives me the exact same line: > > > > [me@odin slurm]$ slurmd -C > > NodeName=odin CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 > ThreadsPerCore=2 RealMemory=128655 > > > > [me@thor slurm]$ slurmd -C > > NodeName=thor CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 > ThreadsPerCore=2 RealMemory=128655 > > > > So I edited my slurm.conf appropriately: > > > > NodeName=odin CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 > ThreadsPerCore=2 RealMemory=128655 > > NodeName=thor CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 > ThreadsPerCore=2 RealMemory=128655 > > > > …and it looks good, except for the drain on my server/compute node: > > > > [me@odin slurm]$ sinfo > > PARTITION AVAIL TIMELIMIT NODES STATE NODELIST > > debug* up infinite 1 drain odin > > debug* up infinite 1 idle thor > > > > …for the following reason: > > > > [me@odin slurm]$ sinfo -R > > REASON USER TIMESTAMP NODELIST > > Low socket*core*thre slurm 2018-05-02T11:55:38 odin > > > > Any ideas? > > > > Thanks! > > > > Matt Hohmeister > > Systems and Network Administrator > > Department of Psychology > > Florida State University > > PO Box 3064301 > > Tallahassee, FL 32306-4301 > > Phone: +1 850 645 1902 > > Fax: +1 850 644 7739 > > >