"Otherwise a user can have a sing le job that takes the entire cluster, and insidesplit it up the way he wants to." Yair, I agree. That is what I was referring to regardign interactive jobs. Perhaps not a user reserving the entire cluster, but a use reserving a lot of compute nodes and not making sure they are utilised fully.
On 8 May 2018 at 09:37, Yair Yarom <ir...@cs.huji.ac.il> wrote: > Hi, > > This is what we did, not sure those are the best solutions :) > > ## Queue stuffing > > We have set PriorityWeightAge several magnitudes lower than > PriorityWeightFairshare, and we also have PriorityMaxAge set to cap of > older jobs. As I see it, the fairshare is far more important than age. > > Besides the MaxJobs that was suggested, we are considering setting up > maximum allowed TRES resources, and not number of jobs. Otherwise a > user can have a single job that takes the entire cluster, and inside > split it up the way he wants to. As mentioned earlier, It will create > an issue where jobs are pending and there are idle resources, but for > that we have a special preempt-able "requeue" account/qos which users > can use but the jobs there will be killed when "real" jobs arrive. > > ## Interactive job availability > > We have two partitions: short and long. They are indeed fixed where > the short is on 100% of the cluster and the long is about 50%-80% of > the cluster (depending on the cluster). > >