[Beowulf] Re: after update sgeexecd not starting correctly on reboot

2008-11-25 Thread David Mathog
> I think maybe the NFS mounting is different, so that the remote_fs > prerequisite isn't really satisfied, even though the associated script > has run. The sgeexecd script does include a test: > > while [ ! -d "$SGE_ROOT" -a $count -le 120 ]; do >count=`expr $count + 1` >sleep 1 > done

[Beowulf] after update sgeexecd not starting correctly on reboot

2008-11-25 Thread David Mathog
This is an odd one, and I hope one of you has seen it and fixed it, because the only way I have been able to trigger the bug is through a reboot. I updated one node from Mandriva 2007.1 to 2008.1. Those are both 2.6.x kernels, and are as you might guess about a year apart. Both use the exact s

Re: [Beowulf] tools for cluster event logging?

2008-11-25 Thread David Mathog
Here is a little init.d script "cluster_notify" I put together that uses syslog to log init state changes that take place on the computer nodes on the master node. For instance here is what comes out when one node is rebooted: Nov 25 13:23:21 monkey01.cluster logger: init change: 3 6 Nov 25 13:2

Re: [Beowulf] tools for cluster event logging?

2008-11-25 Thread Prentice Bisbal
David Mathog wrote: > Figured out that there is no "news" on any of my cluster machines, so I > usurped that facility for these messages. (It is sort of "news", > right?). On the clients added to /etc/syslog.conf > > news.* @safserver.cluster > news.*

Re: [Beowulf] tools for cluster event logging?

2008-11-25 Thread Huw Lynes
On Tue, 2008-11-25 at 10:19 -0800, David Mathog wrote: > What would be a good tool for logging cluster specific messages, and > nothing else, on a single server? The purpose of this is to let > computer nodes send messages like "node XXX hardware failure, shutting > down", or "node xxx, boot seque

Re: [Beowulf] tools for cluster event logging?

2008-11-25 Thread David Mathog
Figured out that there is no "news" on any of my cluster machines, so I usurped that facility for these messages. (It is sort of "news", right?). On the clients added to /etc/syslog.conf news.* @safserver.cluster news.*

Re: [Beowulf] tools for cluster event logging?

2008-11-25 Thread John Hearns
2008/11/25 David Mathog <[EMAIL PROTECTED]> > What would be a good tool for logging cluster specific messages, and > nothing else, on a single server? How about an SNMP trap, and have a specific cluster MIB? I know this idea sounds daft, but SNMP is well known and all the packages are readily av

Re: [Beowulf] tools for cluster event logging?

2008-11-25 Thread John Hearns
2008/11/25 David Mathog <[EMAIL PROTECTED]> > > I suppose syslog could be used for this, but the trick would be to > choose a facility/priority for it such that nothing other than the > desired cluster messages was ever sent. In other words, something > like: > > logger -p cluster.info "this is a

[Beowulf] tools for cluster event logging?

2008-11-25 Thread David Mathog
What would be a good tool for logging cluster specific messages, and nothing else, on a single server? The purpose of this is to let computer nodes send messages like "node XXX hardware failure, shutting down", or "node xxx, boot sequence completed" messages to a central repository. But I do not

[Beowulf] QsNet-1 parts, last call

2008-11-25 Thread Michael Brown
Hello all, As you may remember, I've ended up with a 128-way QsNet1 setup. Unfortunately, the house I've been renting has been sold, and the new place doesn't have the space to store it all. As a result, I'm going to be scrap-metalling the setup on about the 10th of December, give or take a fe