We've run some forensics with a real testset. It's not the HP Procurve switch.

We've also seen good jumbo results with some of the managed Linksys 48-port gigabit switches.

In other words, it's not the switch. I tend to "think out loud" to expose all possible failure modes, a process I learned at NASA/Johnson when I worked on Space Station's Medical Operations. In manned spaceflight, you have one exercise where you sit around and try to determine everything that could possibly go wrong, and how such a failure would manifest itself. That tends to be useful in other operations, too.

Paulo Afonso Lopes wrote:
I wonder if the switch could be implicated.  We have seen some (cheap)
GbE switches not support (in practice) jumbo frames (irrespective of
literature).

I got the SMC 8624T because it advertised both Jumbo and link aggregation.
Is this one of the "cheap" you have seen that does not work with Jumbo?

paulo


Nifty Tom Mitchell wrote:
On Sat, Jan 24, 2009 at 09:36:09AM -0600, Gerry Creager wrote:
Couple of follow-up notes.

MTU=4500:  Had one node fall over with the same overflow errors.
MTU=3000:  A WRF model is running, but single timesteps are executing
2.5x slower than MTU=1500
Segment offload?  Is TSO on or off?

        ethtool -k eth0

will tell you.  You might also have one very reluctant machine, in the
sense of being unwilling to switch their mtu.  Could you do an

        ifconfig eth0 | grep MTU

on each machine and verify that everyone is using the right MTU?


I'll go snag the new driver and compile it.  After all: What can it
hurt!

Thanks, Guy!

Regards, Gerry

Guy Coates wrote:
Hi,

We have also seen problems with the bnx2 drivers.

I got a more recent set of bnx2 drivers from Broadcom:

......

Has the data been snooped for this data to see if all
is as expected.

If you are seeing a natural MTU running faster than a jumbo MTU
then something is fragmenting or causing fragmentation of the data.

Should the MTU=4500 causes overflow errors it might be related to
fragmentation.
Both the sender and receiver have to keep all the bits on a reliable
transfer until the data has been acknowledged.   At one time
fragmentation
could only be done once to a minimum MTU in the life of a packet.

In addition to snooping packets try "tracepath" to and from all
the involved boxes to discover what is going on.



--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: land...@scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf




--
Gerry Creager -- gerry.crea...@tamu.edu
Texas Mesonet -- AATLT, Texas A&M University        
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to