Joe Landman wrote:
If we can assign a priority to the jobs, so that "short" jobs get a higher priority than longer jobs, and jobs priority decreases monotonically with run length, and we can safely checkpoint them, and migrate them (via a virtual container) to another node, or restart them on one node ... then we have something nice from a throughput view point.
right on. This is also exactly what the scheduler in the OS is doing. This approach thus just needs to be extrapolated to a whole cluster.
Does anyone know of any projects underway that are trying to accomplish exactly this ?
thanks, toon _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf