On Tue, Dec 13, 2016 at 04:28:17AM +0200, Michael S. Tsirkin wrote: > > That's unfortunate, of course. It could be a hypervisor or > a guest kernel bug. ideas: > - does host have mq capability? how many queues? > - how about # of msix vectors? > - after you send something on tx queues, > are interrupts arriving on rx queues? > - is problem rx or tx? > set ip and arp manually and send a packet to known MAC, > does it get there?
Sorry, I don't know how to debug virtio-net. Given that it's in a cloud environment, I also can't set ip addresses manually, since ip addresses are set manually. If you can send me a patch, I'm happy to apply it and send you back results. I can say that I've had _zero_ problems using pretty much any kernel from 3.10 to 4.9 using Google Compute Engine. The commit I referenced caused things to stop working. So in terms of regression, this is definitely a regression, and it's definitely caused by commit 449000102901. Even if it is a hypervisor "bug", I'm pretty sure I know what Linus will say if I ask him to revert it. Linux kernels are expected to work around hardware bugs, and breaking users just because hardware is "broken" by some definition is generally not considered friendly, especially when has been working for years and years before some commit "fixed" things. I would very much like to work with you to fix it, but I will need your help, since virtio-net doesn't seem to print any informational during the boot sequence, and I don't know how the best way to debug it. Cheers, - Ted