Sean,
For what it's worth, Grid Engine (SGE) has a utility binary called
"qevent" that is not part of the official binary distribution but can
be built from the source distribution (http://
gridengine.sunsource.net). Do a google search for "sge + qevent" and
you'll at least hit a few SGE mailing list messages that cover what
it does.
You might also want to check out the DRMAA stuff (http://drmaa.org/
wiki/) -- it is supposed to be a DRM-neutral way of submitting jobs
to a queuing system. I'm not very familiar with DRMAA so I can't tell
you offhand if the current spec includes notification of completed
events or not.
Another option that would work with SGE would be the use of queue
level epilog scripts that execute each time a job leaves the system
for whatever reason. You can put a heck of a lot of logic and
programmable activities/notifications into a custom epilog script.
A third option is the use of job dependency syntax within grid
engine. For each of your web service initiated tasks you would submit
2 jobs -- the first job is your "worker" job. The second job is your
"notifier" job and it is submitted to SGE with a flag that says "this
job is dependent on the worker job". Once your notifier job is fired
up it can do whatever sort of results checking and notification would
be required.
Regards,
Chris
On Oct 16, 2007, at 10:08 AM, Sean Ward wrote:
I've started work on a web service which contains several
potentially long running processing steps (molecular dynamics),
which are perfect to farm out to the fairly large (90 node) Beowulf
I have access to. The primary issue is translating requests from
the event driven web service, to job queues, and back again upon
completion. Specifically, the major queuing systems I have
immediate access to (Sun Grid Engine and Condor) only support e-
mail based notification of job completion. Starting jobs isn't an
issue, as my service can simply ssh over and execute shell scripts
as needed to start things up, the problem is reliably being
informed when the jobs fail or complete, via any programmatic
method (such as executing a shell script, calling a web service via
SOAP/etc, or an asynchronous message library). My other problem,
ensuring that these web service requests don't starve in house jobs
on the Beowulf is easily handled via the priority levels built into
all the various job managers, although being able to checkpoint a
long running job would be a plus (such as is supported by Condor).
I am currently investigating modifications to either Condor (more
complex to update, but checkpoint is useful) or Ruby Queue (very
easy to update for reliable notification) to solve this issue, but
wanted to be sure I wasn't overlooking any existing solutions to
programmatic based queuing and receiving notifications on jobs in a
Beowulf environment...
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf