Hello everyone,
Sorry for might be a trivial question for most of you.
I am trying to understand cpu allocation in slurm.
The goal is to launch a batch job on one node. while the batch
itself will run several jobs in parallel each allocated a subset of
the cpus g
I see this in the debug logs:
"memory per node set to 1M in partition bdwall"
I seemingly can alleviate this if I set RealMemory=foo in the Node definitions,
but this just seems like something that shouldn't be necessary.
Did this become a required field after 16.05??
Thanks!
John
On 6/11/18,
Nothing I assume isn't correct:
DefMemPerNode = UNLIMITED
MaxMemPerNode = UNLIMITED
MemLimitEnforce = Yes
PropagateResourceLimitsExcept = MEMLOCK
CPU vars aren't set and never were.
Thanks!
John
On 6/11/18, 4:09 PM, "slurm-users on behalf of Renfro, Michael"
wrot
Anything in particular set for DefMemPerCPU in your slurm.conf?
> On Jun 11, 2018, at 3:50 PM, Roberts, John E. wrote:
>
> Hi,
>
>Seeing this after an upgrade today. I now can't get any jobs to run.
> Things were fin before the upgrade. Any Ideas?
>
>slurmstepd: error: Job 535721 exce
Hi,
Seeing this after an upgrade today. I now can't get any jobs to run. Things
were fin before the upgrade. Any Ideas?
slurmstepd: error: Job 535721 exceeded memory limit (1160 > 1024), being
killed
slurmstepd: error: Exceeded job memory limit
ulimit shows:
$ u
Yes. The x11 also worked for us outside of slurm. Well, good luck finding
your issue.
On Tue, Jun 12, 2018, 1:09 AM Christopher Benjamin Coffey <
chris.cof...@nau.edu> wrote:
> Hi Hadrian,
>
> Thank you, unfortunately that is not the issue. We can connect to the
> nodes outside of slurm and have
Hi Hadrian,
Thank you, unfortunately that is not the issue. We can connect to the nodes
outside of slurm and have the X11 stuff work properly.
Best,
Chris
—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
On 6/7/18, 6:49 PM, "slurm-users on behalf of H
We're seeing some pretty bad performance with around 3000 jobs in queue.
We're using sched/backfill, and I've been tweaking the bf_ parameters
to try and improve some things, with limited results.
But even before the backfill process starts, the main scheduling loop
is taking so long per job that i