On 18 Sep 2007, Urs Thuermann wrote: > Bill Fink <[EMAIL PROTECTED]> writes: > > > It may also be a useful test to disable hardware TSO support > > via "ethtool -K ethX tso off". > > All suggestions here on the list, i.e. checking for flow control, > duplex, cable problems, etc. don't explain (at least to me) why LF > sees file corruption. How can a corrupted frame pass the TCP checksum > check? Does TCP use the hardware checksum of the NIC if available? > AFAICS, this would be the only way for a corrupt frame to make it into > the file. But Bill already suggested this and LF reported that it > didn't make a difference. > > A few months ago I had hadware problems with an embedded device, where > transmission from the NIC via the PCI bus to the CPU had some bits > flipped. But tcpdump clearly showed the TCP checksum errors and also > TCP recognized the errors and the connection was stalled. And, BTW, > we also observed an increasing percentage of corrupted frames with > increasing traffic on that interface, i.e. increasing load on the PCI > bus. > > So I would run tcpdump -s0 and watch for "incorrect checksum" messages.
I agree TSO is an unlikely candidate since it should only affect transmits and the problem as I understand it is with receives. But still one of the first things I try doing when dealing with weird problems is disabling all hardware assists. But I also agree with you that network errors should normally be detected by the TCP checksum (unless hardware checksumming was messed up), and from what I recall there were no receive checksum errors being seen. That and the fact that the problem was seen with two different NICs would lead me to believe that the problem is elsewhere in the system. That leaves many possibilities. It could be a memory problem, although it was indicated that memory testing was successfully performed (but we don't know how extensive the memory checking is enabled via the BIOS). It could be the PCI bus writes back to the disk, or a problem with the disk/controller/fs writes themselves (some kind of disk stress test might be useful). -Bill - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html