(originally posted at https://bugs.schedmd.com/show_bug.cgi?id=10322)

There are some great tools for assigning discounts or penalties to jobs before 
they are allocated resources (QOS.UsageFactor, Partition.TRESBillingWeights, 
etc.).

But what if I want to change the cost of a job after the fact? I might want to 
avoid penalizing users who spent their allocated resources on jobs which failed 
due to reasons outside their control (hardware failure, parallel FS glitch, 
etc.). Or I might want to charge extra for jobs which require node reboots to 
cleanup afterwards. Either way, I want to be able to adjust how the job affects 
their current fairshare priority for queued jobs.

Are there any existing solutions for this?

The only solutions I've found so far are:

  1.  'sacctmgr modify ... set RawUsage=0' - obviously this is too big of a 
hammer. I only want to edit a single job, and I might want to *increase* the 
usage for the job - not decrease it.
  2.  For clusters using "banking" (limits on TRESMins and 
PriorityDecayHalfLife=0), you can essentially accomplish this by editing the 
limit after the fact (increasing the limit for a refund, decreasing it for a 
penalty). See 
https://github.com/jcftang/slurm-bank/blob/master/src/sbank-refund, for 
example. But we don't use that accounting strategy at our site. And that seems 
a little sketchy anyway since you'd need to remember to reset the limits back 
to their intended values at each usage reset.

The official answer I got on the bug is "I don't think what you are looking for 
is possible with Slurm at the moment." I'm posting here in hopes that someone 
else has a creative solution? How do y'all handle this?

Thanks!
Luke

Search keywords: priority bump refund penalty accounting

Reply via email to