FYI (Just to have it posted, in case anybody else ever runs into this.) A little while back I moved same names around in the cluster. To do so, in SGE a bunch of queues and some hosts were removed and then added back. There was much trial and error in doing so - I make no claim that the right commands were issued in the proper order. However, in the end all the queues were as desired and they all stayed up and running. Until the node was rebooted, at which point SGE came back up with only two queues present. After much poking around the problem was finally locate: some of the old host names and old queues were still present in files under:
$SGEROOT/default/spool/qmaster/qinstances and as soon as SGE hit one of those during startup, it would stop creating all further queues. The error message that resulted when that happened was of this form: 09/21/2011 12:22:56|qmaster|safserver|E|cannot recreate queue all.q from disk because of unknown host mendel and appeared in: $SGEROOT/default/spool/qmaster/messages Regards, David Mathog mat...@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf