Jamie webb wrote:

> 
> Well, so far so good. I'll let you know if it happens again, but it 
> looks like that's fixed it.
> 
> Further testing showed that I also had to disable rx checksumming, 
> otherwise I was getting random kernel crashes. Presumably it was not 
> only reading data from random memory locations, but also 
> writing in the 
> wrong place...
> 
> So, do I understand correctly that this is causing the CPU 
> rather than 
> the NIC to do the checksumming?

That's right.  See the discussion on a similar problem here:

<http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=10
59644&admit=-682735245+1158251313899+28353475>

In this other case, the symptoms were very similar to yours.
I suspected it was an IOMMU problem, and the problem went
away when the user upgraded to the latest kernel at that
time.

> 
> Is this a reasonable permanent solution?
> 

I don't think so.  We need to get to the bottom of it.
Looking at your original email, the corruption also happened
in multiples of 16 bytes.  If it is a hardware or driver bug,
the corruption is usually more random than this.  Or MSS-
sized chunks should be corrupted.  If you have other captured
corruption data, please send it to me.

Anyway, I'll see if our lab can setup a similar machine to
test this out.  Thanks.

> Crashing aside, I'm a little nervous about putting into 
> production a box 
> that might for example randomly decide to serve up its SSL 
> private keys 
> halfway through an email message...
> 
> Cheers
> 
> /J
> 
> 
> 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to