On Tue, 10 Jan 2006, Erik Mouw wrote:
I have lots of transmit timeouts with an Intel E1000 card during large
TCP transmissions (remotely viewing a 3000x2000 jpeg image using XV is
an excellent way to trigger it). This is what I get in linux-2.6.8.1:

sorry to hear you're having a problem, and cool, thanks for the test, we'll have to try it here. We've classically had problems reproducing the athlon based hangs.

Jan 10 15:24:41 zurix kernel: NETDEV WATCHDOG: eth0: transmit timed out
Jan 10 15:24:41 zurix kernel: e1000: eth0: e1000_watchdog: NIC Link is Up 1000 
Mbps Full Duplex
Jan 10 15:24:46 zurix kernel: nfs: server abra2 not responding, still trying
Jan 10 15:24:46 zurix kernel: nfs: server abra2 OK

And this is with linux-2.6.15:

Jan 10 06:53:27 zurix kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit 
Hang
Jan 10 06:53:27 zurix kernel:   TDH                  <b0>
Jan 10 06:53:27 zurix kernel:   TDT                  <b0>
Jan 10 06:53:27 zurix kernel:   next_to_use          <b0>
Jan 10 06:53:27 zurix kernel:   next_to_clean        <c3>
Jan 10 06:53:27 zurix kernel: buffer_info[next_to_clean]
Jan 10 06:53:27 zurix kernel:   dma                  <e938a5e>
Jan 10 06:53:27 zurix kernel:   time_stamp           <872de93>
Jan 10 06:53:27 zurix kernel:   next_to_watch        <c3>
Jan 10 06:53:27 zurix kernel:   jiffies              <872e086>
Jan 10 06:53:27 zurix kernel:   next_to_watch.status <0>

ugh, I don't get it, there is no way in the code that I know of that we would not update TDT when we enqueued a transmit.

These problems (for us) seem to be related to TSO, can you attempt to disable it and try your test again, using
ethtool -K eth0 tso off

The system is a an AMD Athlon XP 2000+ running at 1.666 GHz with a VIA
KT400 chipset (Asrock K7VT4APro).

ah yes, this is the famous one that seems to get lots of problem reports. You are running the latest bios, right? Seems lame but that has actually fixed problems here.

Here's the relevant output from lspci:

<snip>

So far I have replaced the NIC, the motherboard, the power supply, RAM,
network cable, and gigE switch, but to no avail. I've tried three
different kernels (2.6.8.1, 2.6.11-ac7, and 2.6.15) but the problem
remains. I've been stress testing the system by continuously compiling
kernels (over NFS), but after 288 runs there hasn't been a single error
so I guess the CPU and RAM are OK. The amount of transmit timeouts is
less with linux-2.6.8.1, so for the moment I keep running that version.

wow, thats a lot of work, I'm almost at the point of a personal crusade against these timeout issues. The biggest block we have to solving them is lack of reproduction locally.

We have about 15 other machines using the Intel E1000, but I haven't
seen these kind of problems on any of the other machines. I have run
out of ideas, so I hope somebody knows how to solve this. If you need
more information, just let me know.

like i said, try disabling TSO and see if that helps. Please try driver 6.3.9 from prdownloads.sf.net/e1000 and see if that changes anything too.

Thanks,
 Jesse
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to