On 12/2/09 10:21 AM, beowulf-requ...@beowulf.org wrote:
------------------------------

Message: 8
Date: Tue, 1 Dec 2009 12:45:52 -0800
From: Art Poon<artp...@gmail.com>
Subject: [Beowulf] Re: cluster fails to boot with managed switch,       but
        5-port switch works OK
To:beowulf@beowulf.org
Message-ID:<825eeab3-c58f-46b8-a9c4-a806c5b68...@gmail.com>
Content-Type: text/plain; charset=us-ascii

Dear colleagues,

[snip]

What's got me and the IT guys stumped is that while the compute nodes boot via PXE from the head 
node without trouble on the NetGear, they barf with the SMC.  To be specific, after the initial 
boot with a minimal Linux kernel, there is a "fatal error" with "timeout waiting for 
getfile" when the compute node attempts to download the provisioning image from head.  
However, when they were running Rocks before I arrived, the cluster worked fine with the SMC switch.

I've tried resetting the SMC switch to factory defaults (with auto-negotiate 
on).  I've checked the /etc/beowulf/modprobe.conf and it doesn't seem to be 
demanding anything exotic.  We've tried swapping out to another SMC switch but 
that didn't change anything.

I'm grateful if you could weigh in with your expertise.
I don't know if my $.02 here could be classified as 'expertise'. With that disclaimer out of the way I can say that SMC switches do have a tendency to have very old firmware when they are stocked in warehouses and they are not often updated. Their update process is a PITA compared to other switches out there. I have seen cases where their old firmware and STP (spanning tree protocol) causes enough delay when a port comes up on the switch for the first time in a pxe/dhcp operation that the process times out while the switch is trying to figure out if there are network loops. The firmware update can be obtained from www.smc.com and is at v2.3.0.0 updated in March. Check your switch to see where you are at now.

The Netgear switches are layer-2 and too dumb to cause problems.
Thank you,
- Art.




------------------------------


--
------------------------------
Jeff Johnson
Manager
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810   f: 858-412-3845

4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to