First, thanks to all who've responded. I've been looking a bit thins morning and am trying to grok the results.

Joe Landman wrote:
Hi Gerry

Gerry Creager wrote:
History/background/description of the cluster
* 126 node Dell 1950 cluster with dual-quad core Xeons
* HP 5412zl switch for gigabit cluster backplane and 10GBE interconnect to selected services (file server, etc)
* Gigabit interconnect
* Hand compiled 2.6.26 kernel
* bnx2 module loaded for the Broadcom onboard nics
* Switch, compute nodes, head node set to 9000 byte MTU

We have had *lots* of problems with Broadcom nics and jumbo frames. From 2.6.9 timeframe onwards.

Marvelous.  I'd prefer to not have to back-rev if I can avoid it...


We're seeing the following error in WRF compiled with openMPI and the PGI 7.2 compiler:
mca_btl_tcp_frag_send:writev failed with errno=104

While all nodes were accessible prior to the run and returned appropriate "stuff" when queried with, eg., ssh and a command, two nodes now return something like this:
[ge...@brazos SCOOP12km]$ ssh c0522
Received disconnect from 192.168.200.154: 2: Bad packet length 808464432.

Hmmm... sounds like a link tried re-negotiating. Can you get on via serial/console and

My guess is that the driver wandered across memory boundaries. This stinks of a buffer problem to me. Typically, after this happens, I can't log into the node via any interface, nor on console. It requites an ipmi or physical reboot.

r...@lightning:~# ethtool eth0

-bash-3.2# ethtool eth1
Settings for eth1:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: g
        Wake-on: d
        Link detected: yes

You might want to

    ethtool eth0 autoneg off

to force it not to renegotiate its speed.  Also, look at

-bash-3.2# ethtool -A eth1 autoneg off
autoneg unmodified, ignoring
no pause parameters changed, aborting

r...@lightning:~# ethtool -g eth0

-bash-3.2# ethtool -g eth1
Ring parameters for eth1:
Pre-set maximums:
RX:             1020
RX Mini:        0
RX Jumbo:       4080
TX:             255
Current hardware settings:
RX:             255
RX Mini:        0
RX Jumbo:       765
TX:             255

See if you can do something like

    ethtool  -G eth0 rx-jumbo 100

if you have zero jumbo ring rx entries.

Doesn't look like this requires much change.

Also, while I'm in the neighborhood, to respond to Mark's suggestions:

-bash-3.2# ethtool -k eth1
Offload parameters for eth1:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: off

Hmmm Might be worth changing tcp segmentation here.

-bash-3.2# ethtool -S eth1
NIC statistics:
     rx_bytes: 43454
     rx_error_bytes: 0
     tx_bytes: 51103
     tx_error_bytes: 0
     rx_ucast_packets: 231
     rx_mcast_packets: 0
     rx_bcast_packets: 329
     tx_ucast_packets: 250
     tx_mcast_packets: 0
     tx_bcast_packets: 4
     tx_mac_errors: 0
     tx_carrier_errors: 0
     rx_crc_errors: 0
     rx_align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     tx_deferred: 0
     tx_excess_collisions: 0
     tx_late_collisions: 0
     tx_total_collisions: 0
     rx_fragments: 0
     rx_jabbers: 0
     rx_undersize_packets: 0
     rx_oversize_packets: 0
     rx_64_byte_packets: 365
     rx_65_to_127_byte_packets: 166
     rx_128_to_255_byte_packets: 20
     rx_256_to_511_byte_packets: 7
     rx_512_to_1023_byte_packets: 1
     rx_1024_to_1522_byte_packets: 1
     rx_1523_to_9022_byte_packets: 0
     tx_64_byte_packets: 42
     tx_65_to_127_byte_packets: 84
     tx_128_to_255_byte_packets: 31
     tx_256_to_511_byte_packets: 97
     tx_512_to_1023_byte_packets: 0
     tx_1024_to_1522_byte_packets: 0
     tx_1523_to_9022_byte_packets: 0
     rx_xon_frames: 0
     rx_xoff_frames: 0
     tx_xon_frames: 0
     tx_xoff_frames: 0
     rx_mac_ctrl_frames: 0
     rx_filtered_packets: 60
     rx_discards: 0
     rx_fw_discards: 0
-bash-3.2# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:1E:C9:AC:27:FB
inet addr:192.168.200.154 Bcast:192.168.203.255 Mask:255.255.252.0
          UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
          RX packets:574 errors:0 dropped:0 overruns:0 frame:0
          TX packets:265 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:44422 (43.3 KiB)  TX bytes:54606 (53.3 KiB)
          Interrupt:16 Memory:f4000000-f4012100



I'm stumped and looking for causes and solutions. Yeah, the WRF as compiled did run before the change to Jumbos.

Do I reduce the size of the frames to something smaller, like 8800 bytes? 7500? 1500?

In the past I had heard that jumbo frames may work on Broadcom NICs around 6000 byte length. We haven't tried this in a while ... YMMV.


I'm not completely out of ideas but stumped.

Thanks, gerry



--
Gerry Creager -- gerry.crea...@tamu.edu
Texas Mesonet -- AATLT, Texas A&M University        
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to