On Mon, 9 Jan 2006, Robin Humble wrote:
until we turned off tso on our cluster using
ethtool -K eth0 tso off
ethtool -K eth1 tso off
then for certain sized runs of some codes we were getting:
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (279)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c (148)
Does anyone on netdev know why this would be relevant to TSO
enable/disable???
and a bunch of errors like this:
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
TDH <26>
TDT <13>
next_to_use <13>
next_to_clean <25>
buffer_info[next_to_clean]
time_stamp <79694e6>
next_to_watch <2a>
jiffies <79695b9>
next_to_watch.status <0>
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
TDH <26>
TDT <13>
next_to_use <13>
next_to_clean <25>
buffer_info[next_to_clean]
time_stamp <79694e6>
next_to_watch <2a>
jiffies <7969681>
next_to_watch.status <0>
NETDEV WATCHDOG: eth0: transmit timed out
e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex
those errors are kinda serious as TCP messages were being lost and
codes were hanging and crashing. usually when codes died we saw the
KERNEL assert, but sometimes they ran into problems with just the eth0
or eth1 NETDEV resets :-/
machines are dual Xeon 2.4GHz on e7500 chipset with 1G of ram, built-in
dual e1000 82546EB's and are running RedHat EL AS4.
I tried a range of 2.6 kernels from 2.6.12 up to 2.6.15, and the
latest e1000 driver 6.3.9-NAPI as well as 6.2.15-NAPI and the default
drivers in the kernels (eg. 6.1.16-k2-NAPI).
apart from different syntax in the error message they all behaved the
same (ie. the codes died unless we set tso=off).
various ITR's didn't help. our default is 15000 ITR.
Thanks for trying the latest driver and kernel, that really helps us get
started.
the major problems only happen for >32 cpu parallel runs. smaller runs
work fine. unfortunately we haven't found a simple small MPI code that
triggers the tso problems.
do you know what packet size triggered the problem? It sounds like the
network traffic at the time of failure is lots and lots of outstanding
transmits over many concurrent connections, is that correct?
we'd like to use tso as it means 5 to 10% less cpu usage for large
message sizes (but strangely a few more micro-seconds latency).
see attached pic.
Thats what TSO is supposed to help. The latency increase can be played
with or mitigated by changing tcp_tso_win_divisor in /proc/.../ipv4
so what's the best way I can help you debug TCP segmentation offload
issue?
we can start with getting some transmit ring dumps at the time of failure.
I have code to do this but need to port it to 2.6.15. i'll try to get
that code to you in the next couple of days.
Jesse
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html