Hi All, I'm trying out using GrpTRESRunMins to prevent users from opportunistically flooding an empty partition with long jobs. We have a partition set up for each CPU type, and give each association (account/user/partition) a separate limit based on that account's share of the partition.
It seems to work as expected, except when a job is submitted to multiple partitions. We had a few jobs getting blocked because only one partition would be over the limit. The blocking partition was alphabetically first, so I'm guessing that the GrpTRESRunMins check doesn't attempt to look at the others after one fails. This is with slurm 17.11.4. I haven't dug around in the code, but didn't see relevant changes in the changelog for 17.11.6. It's being used as a secondary backstop for abuse, so the limits aren't hit often, but suggestions for a fix/work-around would be welcome! Thanks, Nate -- Dr. Nathan Crawford nathan.crawf...@uci.edu Modeling Facility Director Department of Chemistry 1102 Natural Sciences II Office: 2101 Natural Sciences II University of California, Irvine Phone: 949-824-4508 Irvine, CA 92697-2025, USA