On Sat, May 15, 2010 at 10:44 AM, Andrew Piskorski <a...@piskorski.com> wrote: > Yes, that's what I did with SGE, that part works fine. SGE's other > behaviors often leave much to be desired.
Just because the default settings of SGE do not follow your workflow does not mean that "SGE's other behaviors often leave much to be desired." There are SGE users who do exactly not want SGE to automatically re-run jobs due to unreachable nodes -- for example, a network failure can partition a single SGE cluster into 2 sub-clusters, and thus every job can be run twice if the default is to re-run whenever nodes are not reachable. The SGE mailing "users" list is always responsive (Thanks to Reuti and others who contribute), so anything you don't like or understand in SGE, you should: 1) Google (very important) 2) Check the SGE manpage, HOWTO, admin guide 3) Ask on the list http://gridengine.sunsource.net/maillist.html >> > 4. I really, really want a good API for programmably interacting with >> > the cluster scheduler and ALL of its features. I don't care too much > >> I haven't looked at it much, but I think DRMAA will work for that in SGE. DRMAA is for job submission and some job monitoring, and if you want to interact with your scheduler, like changing the scheduling algorithms, then I don't think it can be easily done with anything available in the free/opensource world or commercial market. Rayson > > Not as far as I could tell from reading the SGE docs a while back, no. > It looked as if DRMAA only covers a very limited subset of SGE's > functionality, not enough to cover the features I need. > > I did not (yet) check the source to see how SGE's DRMAA support is > implemented, but the docs made it sound as if they were rolling it > from scratch rather than simply building on top of some clear > pre-existing SGE API. > >> > 8. Of course the scheduler must have a good way to track all the basic >> > information about my nodes: CPU sockets and cores, RAM, etc. Ideally >> > it'd also be straightforward for me to extend the database of node > >> SGE does this and can make it available as XML. > > Which reminds me, I need to look harder to figure out WHERE exactly > SGE stores its node configuration data, and how I can perhaps extend > it with additional information, like the network topology between my > nodes. This is probably simple but it wasn't obvious from the > (voluminous) SGE docs. > > -- > Andrew Piskorski <a...@piskorski.com> > http://www.piskorski.com/ > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf