On 19/06/2019 22.30, Fulcomer, Samuel wrote:
>
> (...and yes, the name is inspired by a certain OEM's software licensing
> schemes...)
>
> At Brown we run a ~400 node cluster containing nodes of multiple
> architectures (Sandy/Ivy, Haswell/Broadwell, and Sky/Cascade) purchased
> in some cases by
Hi Slurm experts,
I'm new to SLURM and could really use some help getting preemption
working.
The limiting factor in our cluster is licenses and I want to have high and
low priority jobs where submitting a high priority job will preempt
(suspend) a low priority job if all the licenses are al
Hi Paul,
Thanks..Your setup is interesting. I see that you have your processor types
segregated in their own partitions (with the exception of of the requeue
partition), and that's how you get at the weighting mechanism. Do you have
your users explicitly specify multiple partitions in the batch
co
Hi Alex,
Thanks. The issue is that we don't know where they'll end up running in the
heterogenous environment. In addition, because the limit is applied by
GrpTRES=cpu=N, someone buying 100 cores today shouldn't get access to 130
of todays cores.
Regards,
Sam
On Wed, Jun 19, 2019 at 3:41 PM Alex
We do a similar thing here at Harvard:
https://www.rc.fas.harvard.edu/fairshare/
We simply weight all the partitions based on their core type and then we
allocate Shares for each account based on what they have purchased. We
don't use QoS at all, so we just rely purely on fairshare weighting
Hey Samuel,
Can't you just adjust the existing "cpu" limit numbers using those same
multipliers? Someone bought 100 CPUs 5 years ago, now that's ~70 CPUs.
Or vice versa, someone buys 100 CPUs today, they get a setting of 130 CPUs
because the CPUs are normalized to the old performance. Since it
(...and yes, the name is inspired by a certain OEM's software licensing
schemes...)
At Brown we run a ~400 node cluster containing nodes of multiple
architectures (Sandy/Ivy, Haswell/Broadwell, and Sky/Cascade) purchased in
some cases by University funds and in others by investigator funding
(~50:
On 6/18/19 11:29 PM, nathan norton wrote:
Without knowing the internals of slurm it feels like nodes that are
turned off+cloud state don't exist in the system until they are on?
Not quite, they exist internally but are not exposed until in use:
https://slurm.schedmd.com/elastic_computing.html
Using slurm 19.05.0-1
MinJobAge is set to 300
MaxJobCount is set to 1
There are only about 30 jobs running. However, when a job completes, it
vanishes immediately from the output of 'squeue'
Shouldn't it be staying there for 5 minutes?
Brian Andrus
Can you give the exact command/output you have from this?
I suspect a typo in your slurm.conf for nodenames or what you are typing.
Brian Andrus
On 6/18/2019 11:29 PM, nathan norton wrote:
Hi,
It just shows
"Node $NODE not found"
Whereas others all work as expected (ie, they are running)
W
Hi,
yes, modifying the database directly seems to be the only way.
Part of the story is, I think, that the account name is used as the
primary key instead of some account ID... which would at least make it
possible to rename an account.
Associations are however referenced by an ID, so ... ye
Hello,
Everyday we see several deadlocks in our slurmdbd log file. Together with the
deadlock we always see a failed "roll up" operation. Please see below for an
example.
We are running slurm 18.08.0 on our cluster. As far as we know these deadlocks
are not adversely affecting the operation
12 matches
Mail list logo