> > What's got me and the IT guys stumped is that while the compute nodes > > boot via PXE from the head node without trouble on the NetGear, they > > barf with the SMC. To be specific, after the initial boot with a > > minimal Linux kernel, there is a "fatal error" with "timeout waiting > > for getfile" when the compute node attempts to download the > > provisioning image from head. However, when they were running Rocks > > before I arrived, the cluster worked fine with the SMC switch. > > Switches sometimes have broadcast storm suppression turned on, or worse, > sometimes they have spanning tree turned on. You want the switch to be > as dumb as you can possibly make it for most linux clusters. Fast, but > dumb.
As some have already commented, I'm assuming you have tested each service (DHCP, tftp, etc.). My bet is on "spanning tree", as mentioned above. Watch the Ethernet lights on the node when booting and see if the port comes alive/stable before you get the timeout. I've seen this in spades if "spanning tree portfast" isn't set on Cisco switches-- just takes too long to negotiate the GbE interface. --- Cris -- Cristopher J. Rhea Mayo Clinic - Research Computing Facility 200 First St SW, Rochester, MN 55905 cr...@mayo.edu (507) 284-0587 _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf