On 12/2/09 10:21 AM, beowulf-requ...@beowulf.org wrote:
------------------------------
Message: 8
Date: Tue, 1 Dec 2009 12:45:52 -0800
From: Art Poon<artp...@gmail.com>
Subject: [Beowulf] Re: cluster fails to boot with managed switch, but
5-port switch works OK
To:beowulf@beowulf.org
Message-ID:<825eeab3-c58f-46b8-a9c4-a806c5b68...@gmail.com>
Content-Type: text/plain; charset=us-ascii
Dear colleagues,
[snip]
What's got me and the IT guys stumped is that while the compute nodes boot via PXE from the head
node without trouble on the NetGear, they barf with the SMC. To be specific, after the initial
boot with a minimal Linux kernel, there is a "fatal error" with "timeout waiting for
getfile" when the compute node attempts to download the provisioning image from head.
However, when they were running Rocks before I arrived, the cluster worked fine with the SMC switch.
I've tried resetting the SMC switch to factory defaults (with auto-negotiate
on). I've checked the /etc/beowulf/modprobe.conf and it doesn't seem to be
demanding anything exotic. We've tried swapping out to another SMC switch but
that didn't change anything.
I'm grateful if you could weigh in with your expertise.
I don't know if my $.02 here could be classified as 'expertise'. With
that disclaimer out of the way I can say that SMC switches do have a
tendency to have very old firmware when they are stocked in warehouses
and they are not often updated. Their update process is a PITA compared
to other switches out there. I have seen cases where their old firmware
and STP (spanning tree protocol) causes enough delay when a port comes
up on the switch for the first time in a pxe/dhcp operation that the
process times out while the switch is trying to figure out if there are
network loops. The firmware update can be obtained from www.smc.com and
is at v2.3.0.0 updated in March. Check your switch to see where you are
at now.
The Netgear switches are layer-2 and too dumb to cause problems.
Thank you,
- Art.
------------------------------
--
------------------------------
Jeff Johnson
Manager
Aeon Computing
jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 f: 858-412-3845
4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf