Hey everyone,
Perhaps I am asking a basic question, but I really dont understand
how the preemption works.
The scenario(simplified for the example) is like this:
Nodes:
NodeName=A1 CPUS=2 RealMemory=128906 TmpDisk=117172
NodeName=A2 CPUS=30 RealMemory=128
We have
* high priority qos for short jobs. The qos is set at submission time by
the user or in the lua script.
* partitions for jobs of certain length or other requirements. Sometimes
several partitions overlap.
* a script that adjusts priorities according to our policies every 5
minutes.
By comb
hi sam,
thanks for that pointer. we already have:
PriorityFavorSmall=YES
PriorityFlags=SMALL_RELATIVE_TO_TIME
but small jobs still seem to hold up. that's because cores more than nodes
are important in our usage scenarios. 99% of jobs request one node
so ideally we wanted to assign negative we
Hi Satra,
Have a look at PriorityFavorSmall (in slurm.conf). It may fit your needs. Not
used it myself, so I'm not able to say if it'll do exactly what you're after.
---
Samuel Gallop
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of
Satrajit Ghosh
Sent: 22 Novembe
slurm has a way of giving larger jobs more priority. is it possible to do
the reverse?
i.e., is there a way to configure priority to give smaller jobs (use less
resources) higher priority than bigger ones?
cheers,
satra
resources: can be a weighted combination depending on system resources
avai
Hi,
In the documentation on job arrays
https://slurm.schedmd.com/job_array.html
it says
Be mindful about the value of MaxArraySize as job arrays offer an easy
way for users to submit large numbers of jobs very quickly.
How much do I have to worry about this, if I am using fairshare
sched
Matt,
I saw a similar situation with a PBS job recently.
A process with is writing to disk cannot be killed (it is in S state). So
the job ended but PBS logged that it could not kill the process.
I would look in detail at the slurm logs at the point where that job is
being killed, and you might g
Hi All,
I'm wondering if you've seen this issue around, I can't seem to find
anything on it:
We have an NVIDIA DGX-1 that we run SLURM on in order to queue up jobs
on the GPU's there, but we're running into an issue:
1) launch a SLURM job (assume job id = 12345)
2) start a program