This is an odd one. I just realized that 9 of 20 nodes rebooted on Apr 4. (Since they all rebooted successfully everything was working and there was no reason to think that this had taken place.) This appears to be related to the daylights savings time change two days before. The reason I think that is that the nodes that rebooted have /var/log/messages files like:
Apr 4 08:01:00 nodename CROND ... /cron/hourly Apr 4 09:01:00 nodename CROND ... /cron/hourly Apr 4 08:24:33 nodename syslogd 1.4.1; restart Notice the time shift backwards between the last normal record and the first reboot record. As if it finally caught on that the clock had changed and that somehow triggered a reboot. Unfortunately none of the log files contain a message that indicated exactly what it was that ordered the reboot. Unclear to me what piece of software could have triggered this. Presumably something that had it's own clock stuck one hour off on the previous time standard and also has the ability to restart the system. ntpd? Ganglia? They were both running. Regards, David Mathog [EMAIL PROTECTED] Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf