Hi Stephen,
Am 19.08.2008 um 11:20 schrieb stephen mulcahy:
Up to now we've been working with a 20 node cluster where we'd have
the luxury of working without any scheduling or queuing software -
the cluster is pretty much dedicated to running a single job and is
manually invoked with mpirun.
We're moving to a much larger cluster in the near future and are
keen to keep the utilisation as high as possible. On the new
cluster we have to to run 2 distinct jobs - one is a long-running
(weeks or possibly months) job and the other is a regular short
running job (running in a few hours) which has to run at a specific
time each day.
We're currently looking at using SLURM for queuing up jobs on the
system but I'm not sure if it will meet all of our needs here.
Ideally, we'd have some system that would allow us to queue up the
long-running job and a series of short-running jobs and the system
would automatically suspend the long-running job when the short-
running job is due to start, run the short-run job and then restart
the long-running job.
I expect we're not the only ones in this situation. Is SLURM the
right tool for this job? If not, can anyone recommend other tools
out there, preferably open source?
normally I would refuse to answer as I'm biased, but as there are no
replies at all: I would suggest to look into SGE: http://
gridengine.sunsource.net/ The requested automatic suspend feature is
supported by implementing a subordinated queue for long running jobs
in the long-queue. To start the short running jobs every day at a
fixed time, you could use a calender for the short-queue, which will
be enabled for a few hours every day and then drain again while the
jobs finish.
-- Reuti
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf