This is an odd one.  I just realized that 9 of 20 nodes
rebooted on Apr 4.  (Since they all rebooted successfully everything
was working and there was no reason to think that this had
taken place.)  This appears to be related to the daylights
savings time change two days before.  The reason I think that is
that the nodes that rebooted have /var/log/messages files like:

Apr   4 08:01:00 nodename CROND ... /cron/hourly
Apr   4 09:01:00 nodename CROND ... /cron/hourly
Apr   4 08:24:33 nodename syslogd 1.4.1; restart

Notice the time shift backwards between the last normal 
record and the first reboot record.

As if it finally caught on that the clock had changed and that
somehow triggered a reboot.  Unfortunately none of the log files
contain a message that indicated exactly what it was that ordered
the reboot.

Unclear to me what piece of software could have triggered this. 
Presumably something that had it's own clock stuck one hour off
on the previous time standard and also has the ability to restart
the system.  ntpd?  Ganglia? They were both running.

Regards,

David Mathog
[EMAIL PROTECTED]
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to