On Thu, 24 Oct 2019 14:45:28 +0000 Thanos Makatos <thanos.maka...@nutanix.com> wrote:
> I have a Ubuntu VM (4.15.0-48-generic) to which I pass through a PCI device, > specifically a virtual NVMe controller. The problem I have is that only one > I/O queue > is initialized, while there should be more (e.g. four). I'm using upstream > QEMU > v4.1.0 confgiured without any additional options. Most likely there's > something > broken with my virtual device implementation but I can't figure out exactly > what, I was hoping to get some debugging directions. > > I run QEMU as follows: > > ~/src/qemu/x86_64-softmmu/qemu-system-x86_64 \ > -kernel bionic-server-cloudimg-amd64-vmlinuz-generic \ > -smp cores=2,sockets=2 \ > -nographic \ > -append "console=ttyS0 root=/dev/sda1 single nvme.sgl_threshold=0 > nokaslr nvme.io_queue_depth=4" \ > -initrd bionic-server-cloudimg-amd64-initrd-generic \ > -hda bionic-server-cloudimg-amd64.raw \ > -hdb data.raw \ > -m 1G \ > -object > memory-backend-file,id=ram-node0,prealloc=yes,mem-path=mem,share=yes,size=1073741824 > -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 \ > -device > vfio-pci,sysfsdev=/sys/bus/mdev/devices/00000000-0000-0000-0000-000000000000 \ > -trace enable=vfio*,file=qemu.trace \ > -net none \ > -s > > This is what QEMU thinks of the devices: > > (qemu) info pci > Bus 0, device 0, function 0: > Host bridge: PCI device 8086:1237 > PCI subsystem 1af4:1100 > id "" > Bus 0, device 1, function 0: > ISA bridge: PCI device 8086:7000 > PCI subsystem 1af4:1100 > id "" > Bus 0, device 1, function 1: > IDE controller: PCI device 8086:7010 > PCI subsystem 1af4:1100 > BAR4: I/O at 0xc000 [0xc00f]. > id "" > Bus 0, device 1, function 3: > Bridge: PCI device 8086:7113 > PCI subsystem 1af4:1100 > IRQ 9. > id "" > Bus 0, device 2, function 0: > VGA controller: PCI device 1234:1111 > PCI subsystem 1af4:1100 > BAR0: 32 bit prefetchable memory at 0xfd000000 [0xfdffffff]. > BAR2: 32 bit memory at 0xfebf4000 [0xfebf4fff]. > BAR6: 32 bit memory at 0xffffffffffffffff [0x0000fffe]. > id "" > Bus 0, device 3, function 0: > Class 0264: PCI device 4e58:0001 > PCI subsystem 0000:0000 > IRQ 11. > BAR0: 32 bit memory at 0xfebf0000 [0xfebf3fff]. > id "" > > And this is what the guest thinks of the device in question: > > root@ubuntu:~# lspci -vvv -s 00:03.0 > 00:03.0 Non-Volatile memory controller: Device 4e58:0001 (prog-if 02 [NVM > Express]) > Physical Slot: 3 > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- > Stepping- SERR+ FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0 > Interrupt: pin A routed to IRQ 24 > NUMA node: 0 > Region 0: Memory at febf0000 (32-bit, non-prefetchable) [size=16K] > Capabilities: [40] Power Management version 0 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA > PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [48] MSI: Enable+ Count=1/4 Maskable- 64bit+ > Address: 00000000fee01004 Data: 4023 > Capabilities: [60] Express (v1) Endpoint, MSI 00 > DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, > L1 <1us > ExtTag- AttnBtn+ AttnInd- PwrInd- RBE- FLReset- > SlotPowerLimit 128.000W > DevCtl: Report errors: Correctable- Non-Fatal- Fatal- > Unsupported- > RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- > MaxPayload 128 bytes, MaxReadReq 128 bytes > DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- > TransPend- > LnkCap: Port #0, Speed unknown, Width x0, ASPM not supported, > Exit Latency L0s <64ns, L1 <1us > ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed unknown, Width x0, TrErr- Train- SlotClk- > DLActive- BWMgmt- ABWMgmt- > Kernel driver in use: nvme > Kernel modules: nvme > > I tried debugging the guest kernel and I think the following if is taken in > pci_msi_domain_check_cap(): > > if (pci_msi_desc_is_multi_msi(desc) && > !(info->flags & MSI_FLAG_MULTI_PCI_MSI)) > return 1; > > because flags is 0x3b (MSI_FLAG_MULTI_PCI_MSI is 0x4). And this I think means > that MSI_FLAG_MULTI_PCI_MSI is not set for that msi_domain_info. > > # grep -i msi qemu.trace > 1327@1571926064.595365:vfio_msi_setup 00000000-0000-0000-0000-000000000000 > PCI MSI CAP @0x48 > 1334@1571926073.489691:vfio_msi_enable > (00000000-0000-0000-0000-000000000000) Enabled 1 MSI vectors > 1334@1571926073.501741:vfio_msi_disable > (00000000-0000-0000-0000-000000000000) > 1334@1571926073.507127:vfio_msi_enable > (00000000-0000-0000-0000-000000000000) Enabled 1 MSI vectors > 1327@1571926073.520840:vfio_msi_interrupt > (00000000-0000-0000-0000-000000000000) vector 0 0xfee01004/0x4023 > ... more vfio_msi_interrupt ... > > How can I further debug this? The quick answer is to implement MSI-X in your device, MSI requires a contiguous block of vectors and there's limited support for actually making use of more than a single vector. There is almost no real hardware that doesn't implement MSI-X for multiple vector support. Thanks, Alex