> Preferably the state of the first job should be frozen, and saved to > disk, so that it can be restarted again when the higher priority job has > finished.
well, maybe. that process (checkpoint/restore) really makes sense only if the preemtor is giant and/or long. otherwise SIGSTOP is a much better solution (it implies that you should have swap, but you should have swap anyway.) > Is this at all possible (we are using torque/maui, and I couldn't find > this feature there)? this code (even moab) has all sorts of problems keeping track of suspension. the weak spot is usually that when you suspend a parallel job and the preemptor doesn't use all the cpus, you can't go starting random other jobs on these pseudo-free cpus. LSF wasn't all that great about this little detail either, at least back in 6.x versions. it's kind of amazing how poor all the schedulers are, really. classic example of how projects get sclerotic by adding features... regards, mark hahn _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf