On 2016年12月13日 11:12, Theodore Ts'o wrote:
On Tue, Dec 13, 2016 at 04:28:17AM +0200, Michael S. Tsirkin wrote:
That's unfortunate, of course. It could be a hypervisor or
a guest kernel bug. ideas:
- does host have mq capability? how many queues?
- how about # of msix vectors?
- after you send something on tx queues,
are interrupts arriving on rx queues?
- is problem rx or tx?
set ip and arp manually and send a packet to known MAC,
does it get there?
Sorry, I don't know how to debug virtio-net. Given that it's in a
cloud environment, I also can't set ip addresses manually, since ip
addresses are set manually.
If you can send me a patch, I'm happy to apply it and send you back
results.
I can say that I've had _zero_ problems using pretty much any kernel
from 3.10 to 4.9 using Google Compute Engine. The commit I referenced
caused things to stop working. So in terms of regression, this is
definitely a regression, and it's definitely caused by commit
449000102901. Even if it is a hypervisor "bug", I'm pretty sure I
know what Linus will say if I ask him to revert it. Linux kernels are
expected to work around hardware bugs, and breaking users just because
hardware is "broken" by some definition is generally not considered
friendly, especially when has been working for years and years before
some commit "fixed" things.
I would very much like to work with you to fix it, but I will need
your help, since virtio-net doesn't seem to print any informational
during the boot sequence, and I don't know how the best way to debug
it.
Cheers,
- Ted
Thanks for reporting this issue. Looks like I blindly set the affinity
instead of queues during probe. Could you please try the following patch
to see if it works?
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b425fa1..fe9f772 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1930,7 +1930,9 @@ static int virtnet_probe(struct virtio_device *vdev)
goto free_unregister_netdev;
}
- virtnet_set_affinity(vi);
+ rtnl_lock();
+ virtnet_set_queues(vi, vi->curr_queue_pairs);
+ rtnl_unlock();
/* Assume link up if device can't report link status,
otherwise get link status from config. */