Hi Sean: On 10/16/07, Sean Ward <[EMAIL PROTECTED]> wrote:
> I've started work on a web service which contains several potentially > long running processing steps (molecular dynamics), which are perfect to > farm out to the fairly large (90 node) Beowulf I have access to. The > primary issue is translating requests from the event driven web service, > to job queues, and back again upon completion. Specifically, the major > queuing systems I have immediate access to (Sun Grid Engine and Condor) > only support e-mail based notification of job completion. Starting jobs > isn't an issue, as my service can simply ssh over and execute shell > scripts as needed to start things up, the problem is reliably being > informed when the jobs fail or complete, via any programmatic method > (such as executing a shell script, calling a web service via SOAP/etc, > or an asynchronous message library). My other problem, ensuring that > these web service requests don't starve in house jobs on the Beowulf is > easily handled via the priority levels built into all the various job > managers, although being able to checkpoint a long running job would be > a plus (such as is supported by Condor). > > I am currently investigating modifications to either Condor (more > complex to update, but checkpoint is useful) or Ruby Queue (very easy to > update for reliable notification) to solve this issue, but wanted to be > sure I wasn't overlooking any existing solutions to programmatic based > queuing and receiving notifications on jobs in a Beowulf environment... If you plan to stay with the SGE/Condor route, you should take a look at DRMAA: http://drmaa.org/wiki/ Perhaps you will find something useful there. Cheers, Bernard _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf