Hi all,

Beeing somewhat of a noob in Beowulf type clusters, I must ask, what do you use to manage user quotas for job queueing with Torque and Maiu? Gold Allocation Manager? Or does SGE do something like this? I've been browsing the web but couldn't find much.

Our current cluster uses just Maui + Torque.
The cluster currently accepts jobs on a FCFS basis but this behaviour is far from ideal. I would like to have the jobs to continue running as long as the user wishes but this usage would be "charged" in his account. The balance would be used to decide from whom would the next job waiting would be able to run on the nodes when one is made available.

Ideally, quotas would be defined by group, which would have various users. Each group would be given a specific number of nodes where the "sum of groups*their_nodes=number_of_nodes".

Say we have 8 nodes and three groups and then node quotas would be like this:

 - group1 would have 2 nodes
 - group2 would have 2 nodes
 - group3 would have 4 nodes

So that if the cluster is always full and time usage is the same between groups then group1 would be using 2 nodes, group2 two other nodes, etc. Now, say that group1 has two times the time usage of group2 and group3 is using triple of what group2 used, or their exact quota:

 - group 1 would have used 8 days
 - group 2 would have used 4 days
- group 3 would have used 12 days (I'm oversimplifying and probably will screw up the math somewhere)

Group 3 has used 50% of the time, so the quota is fine, group2 is way behind group 1. So, the allocation system should disallow group1 from having jobs allocated to nodes while they're usage isn't the same as group2 again - assuming that group3's usage remains constant and that all the nodes are booked:

- group1 would remain at 8 days
- group2 would reach 8 days of usage
- group3 would now be at 16 days of usage

And normal 2-2-4 quotas would be in place again. Or ideally this would be smoothed out over time, like in a 1-3-4 usage, to avoid that anyone would be unable to perform calculations for a long time just becaused they used the cluster too much when another group didn't use it for months.

The problem here is we have different groups with different amounts of researchers and some groups have allocated more research grants than others to the cluster, hence should be entitled to a fair usage scenario. This is likely to remain for a good amount of time and automation of the quotas would be ideal.

Is there any kind of solution that provides this sort of behaviour, even if only for users and not groups?

Best regards,
Tiago Marques
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to