This is an odd one, and I hope one of you has seen it and fixed it, because the only way I have been able to trigger the bug is through a reboot.
I updated one node from Mandriva 2007.1 to 2008.1. Those are both 2.6.x kernels, and are as you might guess about a year apart. Both use the exact same SGE distribution, which is NFS mounted on /usr/SGE6. On a reboot of the newer system, /etc/rc.d/init.d/sgeexecd, which is the last thing to start in runlevel 3 (except for S99local, which doesn't do anything except "touch /var/lock/subsys/local") fails. First it spews a bunch of lines which look like a script did "set", and as a side effect, this pushes all the other text lines off the console, and then it emits can't determine path to Grid Engine binaries without starting sge_execd. On the older system the exact same scipt starts up with none of this drama, leaving sge_execd running. However, once I logon as root at the console on the newer system, it happily starts up with: /etc/rc.d/init.d/sgeexecd start There are no SGE variables defined in .bashrc etc. The init script has these prerequisites, as on the older system: # Provides: sgeexecd # Required-Start: $network $remote_fs Ring any bells? I think maybe the NFS mounting is different, so that the remote_fs prerequisite isn't really satisfied, even though the associated script has run. The sgeexecd script does include a test: while [ ! -d "$SGE_ROOT" -a $count -le 120 ]; do count=`expr $count + 1` sleep 1 done but since SGE_ROOT is the mount point, the test will be true whether or not the NFS mount has completed. Maybe I'll change that to $SGE_ROOT/bin and see if it helps. Thanks, David Mathog [EMAIL PROTECTED] Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf