Esteemed Slurm users,I am trying to mitigate a use case where jobs can be submitted for the maximum number of nodes allowed and for the maximum time slipping in where the queue itself is briefly empty. The general idea is users are allowed to use up to half of the nodes in the QoS, and jobs are allowed to run up to 6 hours for that QoS, but we've seen cases were users simultaneously request both repeatedly.
What I currently do is this: - allow users to use up to X nodes simultaneously - Set the maxwall to Y time And my dream is to add: - limit users who request the full X nodes to Y/n timeThis would allow users to run the maxwall time where needed or it would allow them to run on max nodes but it would dissuade them from running maxwall because then they'd also be limited to a smaller subset of nodes.
A solution that I feel might work well would be to limit the maximum run minutes each user can submit to the QoS at a time. I attempted to implement and test this without success this morning. I looked to the code to try to determine what I was doing wrong and came across this:
/* MaxTRESRunMinsPU doesn't do anything yet, if/when it does * change the last param in the print_tres_line to 0. */I did test setting GrpTRESRunMins=cpu=N for each user + account association, and that does appear to work. Does anyone know of any other solutions to this issue.
(Of course, we will talk to the users as well - but a reasonable technical solution is a nice backstop).
Thanks, Jesse Stroik
smime.p7s
Description: S/MIME Cryptographic Signature