On Wed, 25 Apr 2007, Toon Knapen wrote:

Joe Landman wrote:

If we can assign a priority to the jobs, so that "short" jobs get a higher priority than longer jobs, and jobs priority decreases monotonically with run length, and we can safely checkpoint them, and migrate them (via a virtual container) to another node, or restart them on one node ... then we have something nice from a throughput view point.

right on. This is also exactly what the scheduler in the OS is doing. This approach thus just needs to be extrapolated to a whole cluster.

Does anyone know of any projects underway that are trying to accomplish exactly this ?

I believe that condor does all or part of it.  It certainly does the
checkpointing and migration (subject to the code being instrumented and
compiled with their checkpointing library).  Outside of that it has a
dazzling array of policy options -- I'm expect that you can do what is
described above or something even better.

   rgb


thanks,

toon

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


--
Robert G. Brown                        http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:[EMAIL PROTECTED]


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to