Hi Rafal,
How do you restart the nodes? If you don’t use scontrol reboot Slurm
doesn’t expect nodes to reboot therefore you see that reason in those cases.
Best
Andreas
Am 27.09.2019 um 07:53 schrieb Rafał Kędziorski
mailto:rafal.kedzior...@gmail.com>>:
Hi,
I'm working with slurm-wlm 18.08.
Hi,
I'm working with slurm-wlm 18.08.5-2 on Raspberry Pi Cluster:
- 1 Pi 4 as manager
- 4 Pi 4 nodes
This work fine. But after every restart of the nodes I get this
cluster@pi-manager:~ $ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
devcluster*up infinite 4 down pi-4-n
Matt,
Depending on other parameters for the job, your '--ntasks=30' is likely having
the effect of requesting 30 (or more) cores for that individual job, which
likely is not "fitting" on an individual node (oversubscribe allows multiple
jobs to share a resource, but doesn't impact resource requ
I just did that...beautiful...thanks! The "default" let me run 48 jobs
concurrently across two nodes.
I've noticed that, still, when I have "#SBATCH --ntasks=30" in my .sbatch file,
the job still refuses to run, and I'm back at the below. Should I just ask my
users to not use -ntasks in their .
Hi Matt,
Check out the "OverSubscribe" partition parameter. Try setting your partition
to "OverSubscribe=YES" and then submitting the jobs with the "-oversubscibe"
option (or OverSubscribe=FORCE if you want this to happen for all jobs
submitted to the partition). Either oversubscribe option
I have a two-node cluster running Slurm, and I'm being asked about allowing
multiple jobs (hundreds of jobs) to run simultaneously. Following is my
scheduling part of slurm.conf, which I changed to allow multiple jobs to run on
each node:
# SCHEDULING
#DefMemPerCPU=0
FastSchedule=1
#MaxMemPerCP
Dear Jurgen,
Thank you for that. That does the expected job. It looks like the weirdness
that I saw in the serial partition has now gone away and so that is good.
Best regards,
David
From: slurm-users on behalf of Juergen
Salk
Sent: 26 September 2019 16:18
To:
I second that question - I'm using the same combination :)
I know there's some efforts - see
https://slurm.schedmd.com/SLUG16/monitoring_influxdb_slug.pdf - but I
don't know exactly what the state of that is at the moment.
(I resorted to telegraf's 'execute script' plugin to pump some
informat
* David Baker [190926 14:12]:
>
> Currently my normal QOS specifies MaxTRESPU=cpu=1280,nodes=32. I've
> tried a number of edits, however I haven't yet found a way of
> redefining the MaxTRESPU to be "cpu=1280". In the past I have
> resorted to deleting a QOS completely and redefining the whole
>
Hey everyone,
I am using Telegraf and InfluxDB to monitor our hardware and I'd like to
include some slurm metrics into this. Is there already a telegraf plugin
for monitoring slurm I don't know about, or do I have to start from
scratch?
Best,
Marcus
--
Marcus Vincent Boden, M.Sc.
Arbeitsgruppe e
Hello,
Currently my normal QOS specifies MaxTRESPU=cpu=1280,nodes=32. I've tried a
number of edits, however I haven't yet found a way of redefining the MaxTRESPU
to be "cpu=1280". In the past I have resorted to deleting a QOS completely and
redefining the whole thing, but in this case I'm not s
11 matches
Mail list logo