All of the very large sites that I've been at (SGE users for the most part) who really need reservation and backfill capabilities had all pretty much invested effort in writing local job submission wrappers and front ends that programatically wrote the job scripts and handled job submission. Since the front end system was writing the job script and handling the "qsub" arguments it was easy to make all of the specific resource reservation and attribute requests that were necessary.

Mind you, they did not invest all this effort into wrapping SGE just to "hide complexity" from the users or even just to get backfill working efficiently. By rigidly controlling the syntax of the job submission commands they were able to squeeze a lot of value out of their workflows -- simple things like having a consistent and 100% uniform job naming scheme made processing the accounting logs, debugging and troubleshooting far more efficient.

The only site doing what I mentioned above that I know I can talk about is described a but more fully here:
http://gridengine.info/pages/profile-DNA-Productions

For sites where wrapping applications and workflows is out of the question here are some Grid Engine (SGE) specific bits that would be involved in a system where users were expected to request IO, memory or runtime resources ...

Option A
-----------
Create and define a user-requestable, consumable resource that is appropriate to what you want to meter or make scheduling decisions on Then associate that resource to a specific queue or the global execution host context Then, edit the SGE complex to make your custom attribute be of type "FORCED"

The "FORCED" type is the key in this scheme. Users who do not request this resource are not allowed to run a job either globally, per-host or per-queue (depends where you stick the attribute value). So if your users do not characterize their IO needs or runtime needs or whatever under this scheme they will either not be allowed to submit a job at all or (in much more common cases) they will only be allowed to submit to some default queue and won't be allowed access to the higher priority queues that may be offering reservation and backfill.

Option B
-----------
Create the same user requestable resource as mentioned above
Then, create a default value for that resource that is very "high" or "expensive"

The idea here in option B is that you have a metered value and you are applying a really "expensive" default value that applies to any user or job who does not bother to actually request the specific resource via the job script or the command line. The end result is that users who do not characterize their needs end up getting penalized in the backfill/reservation/whatever scheduling scheme because they get socked with the high default value. They can override the default value by making the appropriately sized request at job submission time.


Implementing this stuff tends to be site specific or workflow specific. There is no easy one size fits all solution. Depends on your apps, your execution host OS and your scheduling system (and may other factors).

People have all sorts of pie in the sky impressions as to how this stuff "should" work but their ideas tend to smash against the hard reality that very few applications can currently be seamlessly checkpointed, suspended, restarted and migrated without error. If you can't easily freeze an application and transparently move it to another node then all the fancy academic ideas about advanced reservation, backfill etc. all get real inefficient real fast in production computing environments.

My $.02 of course!

Regards,
Chris





On Apr 25, 2007, at 3:42 AM, Toon Knapen wrote:


Interesting. However this approach requires that the IO profile of the application is known. Additionally it requires the users of the application (which are generally not IT guys) to know and understand this info and pass it on to the scheduler when they launch their app. In your experience, do you manage to convince real-life users to provide this info?

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to