On 11/12/06, Tim Moore <[EMAIL PROTECTED]> wrote:
Hello All -
I have a compute node that has started dropping off. When I say drop
off, I mean the node (while running a job) will lose all connectivity
and the machine does not respond. I have viewed the logs and can find
no reason for the node to cease functioning. Let me state that this
behavior did not occur until after a processor upgrade, BIOS upgrade and
OS upgrade. I went in to the BIOS and made a few changes that seemed to
prolong it even though its occurrence was mostly random. If I leave the
node idle, it will run for days.
Has anyone ever seen such behavior?
seen that with faulty hardware, but then you've changed a few things.
if you're sure it's not code or the OS then just take another spare
node and try out the different things you've changed processor, bios,
memory (?), step by step.
--
Gerald Davies
---------------------------------------------
w: http://www.geralddavies.com
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf