Hello, I'm having problems with my sky2 NIC hanging under heavy load. This appears to be an old problem since it happened for me with 2.6.17 as well. Upgrading the affected systems to 2.6.18 has not solved the problem. It's easily reproducible for me since I'm running some application stress testing that easily saturates the link.
I've had a look at the recent traffic on linux-kernel, netdev and the relevant bugzilla (http://bugzilla.kernel.org/show_bug.cgi?id=6839) but it's not clear to me which patch I should try against a stock 2.6.18 kernel. If someone could confirm that the "TX pause fix" attached to the bugzilla is sufficient, that would be great. The card in question is a: Sep 22 12:17:27 dezo kernel: sky2 v1.5 addr 0xf3000000 irq 169 Yukon-XL (0xb3) rev 1 it's a SysKonnect SK-9E21 PCI-E Server Adapter and the driver is using PCI-MSI interrupts on my system. The chip on the card is a Marvell 88E8061. The actual errors leading up to the latest hang are: Sep 21 21:47:06 dezo kernel: NETDEV WATCHDOG: eth1: transmit timed out Sep 21 21:47:06 dezo kernel: sky2 eth1: tx timeout Sep 21 21:47:06 dezo kernel: sky2 eth1: transmit ring 220 .. 179 report=220 done=220 Sep 21 21:47:06 dezo kernel: sky2 hardware hung? flushing Sep 21 21:59:41 dezo kernel: NETDEV WATCHDOG: eth1: transmit timed out Sep 21 21:59:41 dezo kernel: sky2 eth1: tx timeout Sep 21 21:59:41 dezo kernel: sky2 eth1: transmit ring 179 .. 138 report=220 done=220 Sep 21 21:59:41 dezo kernel: sky2 status report lost? Sep 21 22:00:41 dezo kernel: NETDEV WATCHDOG: eth1: transmit timed out Sep 21 22:00:41 dezo kernel: sky2 eth1: tx timeout Sep 21 22:00:41 dezo kernel: sky2 eth1: transmit ring 220 .. 179 report=220 done=220 Sep 21 22:00:41 dezo kernel: sky2 hardware hung? flushing Sep 21 22:13:10 dezo kernel: NETDEV WATCHDOG: eth1: transmit timed out Sep 21 22:13:10 dezo kernel: sky2 eth1: tx timeout Sep 21 22:13:10 dezo kernel: sky2 eth1: transmit ring 179 .. 138 report=220 done=220 Sep 21 22:13:10 dezo kernel: sky2 status report lost? Sep 21 22:14:20 dezo kernel: NETDEV WATCHDOG: eth1: transmit timed out Sep 21 22:14:20 dezo kernel: sky2 eth1: tx timeout Sep 21 22:14:20 dezo kernel: sky2 eth1: transmit ring 220 .. 179 report=220 done=220 Sep 21 22:14:20 dezo kernel: sky2 hardware hung? flushing Sep 21 22:15:09 dezo kernel: sky2 eth1: disabling interface Sep 21 22:15:09 dezo kernel: sky2 eth1: enabling interface Sep 21 22:15:12 dezo kernel: sky2 eth1: Link is up at 1000 Mbps, full duplex, flow control both Sep 21 22:15:20 dezo kernel: eth1: no IPv6 routers present While the interface does appear to have been reset, it never actually started working again and the system was hung until I rebooted it this morning. I'm also seeing a lot of these under high load: Sep 21 21:34:24 dezo kernel: eth1: hw csum failure. Sep 21 21:34:24 dezo kernel: Sep 21 21:34:24 dezo kernel: Call Trace: Sep 21 21:34:24 dezo kernel: [dump_stack+16/21] dump_stack+0x10/0x15 Sep 21 21:34:24 dezo kernel: [__skb_checksum_complete+85/121] __skb_checksum_complete+0x5 5/0x79 Sep 21 21:34:24 dezo kernel: [tcp_v4_rcv+218/2405] tcp_v4_rcv+0xda/0x965 Sep 21 21:34:24 dezo kernel: [ip_local_deliver+433/635] ip_local_deliver+0x1b1/0x27b Sep 21 21:34:24 dezo kernel: [ip_rcv+1234/1311] ip_rcv+0x4d2/0x51f Sep 21 21:34:24 dezo kernel: [netif_receive_skb+589/621] netif_receive_skb+0x24d/0x26d Sep 21 21:34:24 dezo kernel: [__nosave_end+128712870/2129981440] :sky2:sky2_status_intr+0 x23b/0x404 Sep 21 21:34:24 dezo kernel: [__nosave_end+128714646/2129981440] :sky2:sky2_poll+0x100/0x 1a1 Sep 21 21:34:24 dezo kernel: [net_rx_action+132/268] net_rx_action+0x84/0x10c Sep 21 21:34:24 dezo kernel: [__do_softirq+107/226] __do_softirq+0x6b/0xe2 Sep 21 21:34:24 dezo kernel: [call_softirq+28/40] call_softirq+0x1c/0x28 Sep 21 21:34:24 dezo kernel: [do_softirq+45/129] do_softirq+0x2d/0x81 Sep 21 21:34:24 dezo kernel: [do_IRQ+112/132] do_IRQ+0x70/0x84 Sep 21 21:34:24 dezo kernel: [ret_from_intr+0/11] ret_from_intr+0x0/0xb Sep 21 21:34:24 dezo kernel: [mwait_idle+58/82] mwait_idle+0x3a/0x52 Sep 21 21:34:24 dezo kernel: [cpu_idle+105/140] cpu_idle+0x69/0x8c Sep 21 21:34:24 dezo kernel: [start_kernel+483/488] start_kernel+0x1e3/0x1e8 Sep 21 21:34:24 dezo kernel: [x86_64_start_kernel+459/474] x86_64_start_kernel+0x1cb/0x1d Am happy to help with tracking this down... Thanks, -mato - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html