Before going into discussions how it "should" be I added more debug code and gatherered some good case vs bad case data.
First of all it is "ok" to have no more buffers. I had a prink in a codepath that only triggers when !more_used triggers. And I've seen plentry for all kind of idx values. On adding virtio traffic it triggers a few times as well. Eventually that is what the loop is for, to wait until there is ia buffer that it can get. So things aren't broken if this triggers ever - but of course it is if it never changes. IIRC: last_used is != vring_used->idx just means nothing happened since our last interaction (to be confirmed). Good case: Some !more_used might occur, but not related and not infintely [ 393.542550] __virtqueue_get_buf: No more buffers in vq ffff8801b74b3000 - vq->last_used_idx 303 == vq->vring.used->idx 303 [ 394.097117] __virtqueue_get_buf: No more buffers in vq ffff8801b74b3000 - vq->last_used_idx 304 == vq->vring.used->idx 304 [ 394.097413] __virtqueue_get_buf: No more buffers in vq ffff8801b74b4000 - vq->last_used_idx 125 == vq->vring.used->idx 125 [...] [ 394.449672] __virtqueue_get_buf: Entry checks passed - vq ffff8800bbaef000 from _vq ffff8800bbaef000 [ 394.452734] __virtqueue_get_buf: Exit checks passed - ffff8801b74b5840 vq->data[i] [ 394.455087] __virtqueue_get_buf: Returning ret ffff8801b74b5840 Done Bad case (after DPDK ran): Now both debug printk's trigger I get a LOT of [ 552.018862] __virtqueue_is_broken: - vq ffff8800bbaef000 from _vq ffff8800bbaef000 -> broken 0 Followed by a sequence like that in between [ 554.157376] __virtqueue_get_buf: No more buffers in vq ffff8800bbaef000 - vq->last_used_idx 2 == vq->vring.used->idx 2 [ 554.158916] __virtqueue_is_broken: - vq ffff8800bbaef000 from _vq ffff8800bbaef000 -> broken 0 [ 554.160135] __virtqueue_get_buf: No more buffers in vq ffff8800bbaef000 - vq->last_used_idx 2 == vq->vring.used->idx 2 [ 554.161583] __virtqueue_is_broken: - vq ffff8800bbaef000 from _vq ffff8800bbaef000 -> broken 0 [ 554.162776] __virtqueue_get_buf: No more buffers in vq ffff8800bbaef000 - vq->last_used_idx 2 == vq->vring.used->idx 2 [ 554.164189] __virtqueue_is_broken: - vq ffff8800bbaef000 from _vq ffff8800bbaef000 -> broken 0 [...] (infinite loop) Current assumption: DPDK disables something in the host part of the virtio device that makes the host no more response "correctly". Via unbinding/binding the driver we can reinitialize that, but if not we will run into this hang. Remember: we only initialize DPDK with testpmd, no load whatsoever is driven by it. We likely need two fixes: 1. find what DPDK does "to" the device and avoid it 2. the kernel should give up after some number of retries or so and give up returning a fail (not good, but much better than hanging) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1570195 Title: Net tools cause kernel soft lockup after DPDK touched VirtIO-pci devices Status in dpdk package in Ubuntu: Confirmed Status in linux package in Ubuntu: Confirmed Bug description: Guys, I'm facing an issue here with both "ethtool" and "ip", while trying to manage black-listed by DPDK PCI VirtIO devices. You'll need an Ubuntu Xenial KVM guest, with 4 VirtIO vNIC cards, to run those tests PCI device example from inside a Xenial guest: --- # lspci | grep Ethernet 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device 00:04.0 Ethernet controller: Red Hat, Inc Virtio network device 00:05.0 Ethernet controller: Red Hat, Inc Virtio network device 00:06.0 Ethernet controller: Red Hat, Inc Virtio network device --- Where "ens3" is the first / default interface, attached to Libvirt's "default" network. The "ens4" is reserved for "ethtool / ip" tests (attached to another Libvirt's network without IPs or DHCP), "ens5" will be "dpdk0" and "ens6" "dpdk1"... --- *** How it works? 1- For example, try to enable multi-queue on DPDK's devices, boot your Xenial guest, and run: ethtool -L ens5 combined 4 ethtool -L ens6 combined 4 2- Install openvswitch-switch-dpdk configure DPDK and OVS and fire it up. https://help.ubuntu.com/16.04/serverguide/DPDK.html service openvswitch-switch stop service dpdk stop OVS DPDK Options (/etc/default/openvswitch-switch): -- DPDK_OPTS='--dpdk -c 0x1 -n 4 --socket-mem 1024 --pci-blacklist 0000:00:03.0,0000:00:04.0' -- service dpdk start service openvswitch-switch start - Enable multi-queue on OVS+DPDK inside of the VM: ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=4 ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0xff00 * Multi-queue apparently works! ovs-vswitchd consumes more that 100% of CPU, meaning that it multi-queue is there... *** Where it fails? 1- Reboot the VM and try to run ethtool again (or go straight to 2 below): ethtool -L ens5 combined 4 2- Try to fire up ens4: ip link set dev ens4 up # FAIL! Both commands hangs, consuming 100% of guest's CPU... So, it looks like a Linux fault, because it is "allowing" the DPDK VirtIO App (a user land App), to interfere with kernel devices in a strange way... Best, Thiago ProblemType: Bug DistroRelease: Ubuntu 16.04 Package: linux-image-4.4.0-18-generic 4.4.0-18.34 ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6 Uname: Linux 4.4.0-18-generic x86_64 AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Apr 14 00:35 seq crw-rw---- 1 root audio 116, 33 Apr 14 00:35 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.1-0ubuntu1 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: [Errno 2] No such file or directory: 'fuser' CRDA: N/A Date: Thu Apr 14 01:27:27 2016 HibernationDevice: RESUME=UUID=833e999c-e066-433c-b8a2-4324bb8d56de InstallationDate: Installed on 2016-04-07 (7 days ago) InstallationMedia: Ubuntu-Server 16.04 LTS "Xenial Xerus" - Beta amd64 (20160406) IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub MachineType: QEMU Standard PC (i440FX + PIIX, 1996) PciMultimedia: ProcFB: 0 VESA VGA ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic root=UUID=9911604e-353b-491f-a0a9-804724350592 ro RelatedPackageVersions: linux-restricted-modules-4.4.0-18-generic N/A linux-backports-modules-4.4.0-18-generic N/A linux-firmware N/A RfKill: Error: [Errno 2] No such file or directory: 'rfkill' SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 04/01/2014 dmi.bios.vendor: SeaBIOS dmi.bios.version: Ubuntu-1.8.2-1ubuntu1 dmi.chassis.type: 1 dmi.chassis.vendor: QEMU dmi.chassis.version: pc-i440fx-wily dmi.modalias: dmi:bvnSeaBIOS:bvrUbuntu-1.8.2-1ubuntu1:bd04/01/2014:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-wily:cvnQEMU:ct1:cvrpc-i440fx-wily: dmi.product.name: Standard PC (i440FX + PIIX, 1996) dmi.product.version: pc-i440fx-wily dmi.sys.vendor: QEMU To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/1570195/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp