Hi Stephen,

Am 19.08.2008 um 11:20 schrieb stephen mulcahy:

Up to now we've been working with a 20 node cluster where we'd have the luxury of working without any scheduling or queuing software - the cluster is pretty much dedicated to running a single job and is manually invoked with mpirun.

We're moving to a much larger cluster in the near future and are keen to keep the utilisation as high as possible. On the new cluster we have to to run 2 distinct jobs - one is a long-running (weeks or possibly months) job and the other is a regular short running job (running in a few hours) which has to run at a specific time each day.

We're currently looking at using SLURM for queuing up jobs on the system but I'm not sure if it will meet all of our needs here. Ideally, we'd have some system that would allow us to queue up the long-running job and a series of short-running jobs and the system would automatically suspend the long-running job when the short- running job is due to start, run the short-run job and then restart the long-running job.

I expect we're not the only ones in this situation. Is SLURM the right tool for this job? If not, can anyone recommend other tools out there, preferably open source?

normally I would refuse to answer as I'm biased, but as there are no replies at all: I would suggest to look into SGE: http:// gridengine.sunsource.net/ The requested automatic suspend feature is supported by implementing a subordinated queue for long running jobs in the long-queue. To start the short running jobs every day at a fixed time, you could use a calender for the short-queue, which will be enabled for a few hours every day and then drain again while the jobs finish.

-- Reuti
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to