Andy Gospodarek wrote:

Can you elaborate on what isn't going well with this driver/hardware?

I have a ppc64 blade running a customized 2.6.10. At init time, two of our gigE links (eth4 and eth5) are bonded together to form bond0. This link has an MTU of 9000, and uses arp monitoring. We're using an ethernet driver with a modified RX path for jumbo frames[1]. With the stock driver, it seems to work fine.

The problem is that eth5 seems to be bouncing up and down every 15 sec or so (see the attached log excerpt). Also, "ifconfig" shows that only 3 packets totalling 250 bytes have gone out eth5, when I know that the arp monitoring code from the bond layer is sending 10 arps/sec out the link.


eth5      Link encap:Ethernet  HWaddr 00:03:CC:51:01:3E
          inet6 addr: fe80::203:ccff:fe51:13e/64 Scope:Link
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:9000  Metric:1
RX packets:119325 errors:90283 dropped:90283 overruns:90283 frame:0
          TX packets:3 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:8978310 (8.5 MiB)  TX bytes:250 (250.0 b)
          Base address:0x3840 Memory:92220000-92240000


I had initially suspected that it might be due to the "u32 jiffies" stuff in bonding.h, but changing that doesn't seem to fix the issue.

If I boot the system and then log in and manually create the bond link (rather than it happening at init time) then I don't see the problem.

If it matters at all, normally the system boots from eth4. I'm going to try booting from eth6 and see if the problem still occurs.


Chris




[1] I'm not sure if I'm supposed to mention the specific driver, as it hasn't been officially released yet, so I'll keep this high-level. Normally for jumbo frames you need to allocate a large physically contiguous buffer. With the modified driver, rather than receiving into a contiguous buffer the incoming packet is split across multiple pages which are then reassembled into an sk_buff and passed up the link.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: ARP monitoring set to 100 
ms with 2 target(s): 172.24.136.0 172.24.137.0
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: ARP monitoring set to 100 
ms with 2 target(s): 172.25.136.0 172.25.137.0
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: Warning: failed to get 
speed/duplex from eth4, speed forced to 100Mbps, duplex forced to Full.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: enslaving eth4 as an 
active interface with an up link.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: Warning: failed to get 
speed/duplex from eth5, speed forced to 100Mbps, duplex forced to Full.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: enslaving eth5 as an 
active interface with an up link.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: scheduling interface 
eth5 to be reset in 30000 msec.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is 
now down.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: scheduling interface 
eth4 to be reset in 30000 msec.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth4 is 
now down.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: now running without 
any active interface !
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: cancelled scheduled 
reset of interface eth5
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: link status 
definitely up for interface eth5
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: cancelled scheduled 
reset of interface eth4
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth4 is 
now up
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: scheduling interface 
eth5 to be reset in 30000 msec.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is 
now down.
Mar 29 20:54:09 base0-0-0-5-0-11-1 kernel: bonding: interface eth4 reset delay 
set to 600 msec.
Mar 29 20:54:59 base0-0-0-5-0-11-1 kernel: bonding: bond0: cancelled scheduled 
reset of interface eth5
Mar 29 20:54:59 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is 
now up
Mar 29 20:54:59 base0-0-0-5-0-11-1 kernel: bonding: bond0: scheduling interface 
eth5 to be reset in 30000 msec.
Mar 29 20:54:59 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is 
now down.
Mar 29 20:55:15 base0-0-0-5-0-11-1 kernel: bonding: bond0: cancelled scheduled 
reset of interface eth5
Mar 29 20:55:15 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is 
now up
Mar 29 20:55:15 base0-0-0-5-0-11-1 kernel: bonding: bond0: scheduling interface 
eth5 to be reset in 30000 msec.
Mar 29 20:55:15 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is 
now down.
Mar 29 20:55:30 base0-0-0-5-0-11-1 kernel: bonding: bond0: cancelled scheduled 
reset of interface eth5
Mar 29 20:55:30 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is 
now up
Mar 29 20:55:30 base0-0-0-5-0-11-1 kernel: bonding: bond0: scheduling interface 
eth5 to be reset in 30000 msec.
Mar 29 20:55:30 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is 
now down.
Mar 29 20:55:45 base0-0-0-5-0-11-1 kernel: bonding: bond0: cancelled scheduled 
reset of interface eth5
Mar 29 20:55:45 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is 
now up
Mar 29 20:55:46 base0-0-0-5-0-11-1 kernel: bonding: bond0: scheduling interface 
eth5 to be reset in 30000 msec.
Mar 29 20:55:46 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is 
now down.

Reply via email to