Jamie webb wrote: > > Well, so far so good. I'll let you know if it happens again, but it > looks like that's fixed it. > > Further testing showed that I also had to disable rx checksumming, > otherwise I was getting random kernel crashes. Presumably it was not > only reading data from random memory locations, but also > writing in the > wrong place... > > So, do I understand correctly that this is causing the CPU > rather than > the NIC to do the checksumming?
That's right. See the discussion on a similar problem here: <http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=10 59644&admit=-682735245+1158251313899+28353475> In this other case, the symptoms were very similar to yours. I suspected it was an IOMMU problem, and the problem went away when the user upgraded to the latest kernel at that time. > > Is this a reasonable permanent solution? > I don't think so. We need to get to the bottom of it. Looking at your original email, the corruption also happened in multiples of 16 bytes. If it is a hardware or driver bug, the corruption is usually more random than this. Or MSS- sized chunks should be corrupted. If you have other captured corruption data, please send it to me. Anyway, I'll see if our lab can setup a similar machine to test this out. Thanks. > Crashing aside, I'm a little nervous about putting into > production a box > that might for example randomly decide to serve up its SSL > private keys > halfway through an email message... > > Cheers > > /J > > > - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html