Andy Gospodarek wrote:
Can you elaborate on what isn't going well with this driver/hardware?
I have a ppc64 blade running a customized 2.6.10. At init time, two of
our gigE links (eth4 and eth5) are bonded together to form bond0. This
link has an MTU of 9000, and uses arp monitoring. We're using an
ethernet driver with a modified RX path for jumbo frames[1]. With the
stock driver, it seems to work fine.
The problem is that eth5 seems to be bouncing up and down every 15 sec
or so (see the attached log excerpt). Also, "ifconfig" shows that only
3 packets totalling 250 bytes have gone out eth5, when I know that the
arp monitoring code from the bond layer is sending 10 arps/sec out the link.
eth5 Link encap:Ethernet HWaddr 00:03:CC:51:01:3E
inet6 addr: fe80::203:ccff:fe51:13e/64 Scope:Link
UP BROADCAST RUNNING SLAVE MULTICAST MTU:9000 Metric:1
RX packets:119325 errors:90283 dropped:90283 overruns:90283
frame:0
TX packets:3 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:8978310 (8.5 MiB) TX bytes:250 (250.0 b)
Base address:0x3840 Memory:92220000-92240000
I had initially suspected that it might be due to the "u32 jiffies"
stuff in bonding.h, but changing that doesn't seem to fix the issue.
If I boot the system and then log in and manually create the bond link
(rather than it happening at init time) then I don't see the problem.
If it matters at all, normally the system boots from eth4. I'm going to
try booting from eth6 and see if the problem still occurs.
Chris
[1] I'm not sure if I'm supposed to mention the specific driver, as it
hasn't been officially released yet, so I'll keep this high-level.
Normally for jumbo frames you need to allocate a large physically
contiguous buffer. With the modified driver, rather than receiving into
a contiguous buffer the incoming packet is split across multiple pages
which are then reassembled into an sk_buff and passed up the link.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: ARP monitoring set to 100
ms with 2 target(s): 172.24.136.0 172.24.137.0
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: ARP monitoring set to 100
ms with 2 target(s): 172.25.136.0 172.25.137.0
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: Warning: failed to get
speed/duplex from eth4, speed forced to 100Mbps, duplex forced to Full.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: enslaving eth4 as an
active interface with an up link.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: Warning: failed to get
speed/duplex from eth5, speed forced to 100Mbps, duplex forced to Full.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: enslaving eth5 as an
active interface with an up link.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: scheduling interface
eth5 to be reset in 30000 msec.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is
now down.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: scheduling interface
eth4 to be reset in 30000 msec.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth4 is
now down.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: now running without
any active interface !
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: cancelled scheduled
reset of interface eth5
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: link status
definitely up for interface eth5
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: cancelled scheduled
reset of interface eth4
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth4 is
now up
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: scheduling interface
eth5 to be reset in 30000 msec.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is
now down.
Mar 29 20:54:09 base0-0-0-5-0-11-1 kernel: bonding: interface eth4 reset delay
set to 600 msec.
Mar 29 20:54:59 base0-0-0-5-0-11-1 kernel: bonding: bond0: cancelled scheduled
reset of interface eth5
Mar 29 20:54:59 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is
now up
Mar 29 20:54:59 base0-0-0-5-0-11-1 kernel: bonding: bond0: scheduling interface
eth5 to be reset in 30000 msec.
Mar 29 20:54:59 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is
now down.
Mar 29 20:55:15 base0-0-0-5-0-11-1 kernel: bonding: bond0: cancelled scheduled
reset of interface eth5
Mar 29 20:55:15 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is
now up
Mar 29 20:55:15 base0-0-0-5-0-11-1 kernel: bonding: bond0: scheduling interface
eth5 to be reset in 30000 msec.
Mar 29 20:55:15 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is
now down.
Mar 29 20:55:30 base0-0-0-5-0-11-1 kernel: bonding: bond0: cancelled scheduled
reset of interface eth5
Mar 29 20:55:30 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is
now up
Mar 29 20:55:30 base0-0-0-5-0-11-1 kernel: bonding: bond0: scheduling interface
eth5 to be reset in 30000 msec.
Mar 29 20:55:30 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is
now down.
Mar 29 20:55:45 base0-0-0-5-0-11-1 kernel: bonding: bond0: cancelled scheduled
reset of interface eth5
Mar 29 20:55:45 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is
now up
Mar 29 20:55:46 base0-0-0-5-0-11-1 kernel: bonding: bond0: scheduling interface
eth5 to be reset in 30000 msec.
Mar 29 20:55:46 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is
now down.