On Wed, Mar 30, 2022 at 09:40:42AM +0000, Thanos Makatos wrote: > > -----Original Message----- > > From: Jag Raman <[email protected]> > > Sent: 29 March 2022 20:07 > > To: Stefan Hajnoczi <[email protected]> > > Cc: Alex Williamson <[email protected]>; qemu-devel <qemu- > > [email protected]>; Michael S. Tsirkin <[email protected]>; Philippe Mathieu- > > Daudé <[email protected]>; Paolo Bonzini <[email protected]>; Beraldo > > Leal <[email protected]>; Daniel P. Berrangé <[email protected]>; > > [email protected]; Marcel Apfelbaum <[email protected]>; > > Eric Blake <[email protected]>; Markus Armbruster <[email protected]>; > > Juan Quintela <[email protected]>; [email protected]; John Levon > > <[email protected]>; Thanos Makatos <[email protected]>; > > Elena Ufimtseva <[email protected]>; John Johnson > > <[email protected]>; Kanth Ghatraju <[email protected]> > > Subject: Re: [PATCH v6 15/19] vfio-user: handle device interrupts > > > > > > > > > On Mar 29, 2022, at 10:24 AM, Stefan Hajnoczi <[email protected]> > > wrote: > > > > > > On Sat, Mar 26, 2022 at 11:47:36PM +0000, Jag Raman wrote: > > >> > > >> > > >>> On Mar 7, 2022, at 5:24 AM, Stefan Hajnoczi <[email protected]> > > wrote: > > >>> > > >>> On Thu, Feb 17, 2022 at 02:49:02AM -0500, Jagannathan Raman wrote: > > >>>> Forward remote device's interrupts to the guest > > >>>> > > >>>> Signed-off-by: Elena Ufimtseva <[email protected]> > > >>>> Signed-off-by: John G Johnson <[email protected]> > > >>>> Signed-off-by: Jagannathan Raman <[email protected]> > > >>>> --- > > >>>> include/hw/pci/pci.h | 6 ++ > > >>>> include/hw/remote/vfio-user-obj.h | 6 ++ > > >>>> hw/pci/msi.c | 13 +++- > > >>>> hw/pci/msix.c | 12 +++- > > >>>> hw/remote/machine.c | 11 +-- > > >>>> hw/remote/vfio-user-obj.c | 107 ++++++++++++++++++++++++++++++ > > >>>> stubs/vfio-user-obj.c | 6 ++ > > >>>> MAINTAINERS | 1 + > > >>>> hw/remote/trace-events | 1 + > > >>>> stubs/meson.build | 1 + > > >>>> 10 files changed, 158 insertions(+), 6 deletions(-) > > >>>> create mode 100644 include/hw/remote/vfio-user-obj.h > > >>>> create mode 100644 stubs/vfio-user-obj.c > > >>>> > > >>>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h > > >>>> index c3f3c90473..d42d526a48 100644 > > >>>> --- a/include/hw/pci/pci.h > > >>>> +++ b/include/hw/pci/pci.h > > >>>> @@ -129,6 +129,8 @@ typedef uint32_t PCIConfigReadFunc(PCIDevice > > *pci_dev, > > >>>> typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int region_num, > > >>>> pcibus_t addr, pcibus_t size, int type); > > >>>> typedef void PCIUnregisterFunc(PCIDevice *pci_dev); > > >>>> +typedef void PCIMSINotify(PCIDevice *pci_dev, unsigned vector); > > >>>> +typedef void PCIMSIxNotify(PCIDevice *pci_dev, unsigned vector); > > >>>> > > >>>> typedef struct PCIIORegion { > > >>>> pcibus_t addr; /* current PCI mapping address. -1 means not mapped > > >>>> */ > > >>>> @@ -323,6 +325,10 @@ struct PCIDevice { > > >>>> /* Space to store MSIX table & pending bit array */ > > >>>> uint8_t *msix_table; > > >>>> uint8_t *msix_pba; > > >>>> + > > >>>> + PCIMSINotify *msi_notify; > > >>>> + PCIMSIxNotify *msix_notify; > > >>>> + > > >>>> /* MemoryRegion container for msix exclusive BAR setup */ > > >>>> MemoryRegion msix_exclusive_bar; > > >>>> /* Memory Regions for MSIX table and pending bit entries. */ > > >>>> diff --git a/include/hw/remote/vfio-user-obj.h > > >>>> b/include/hw/remote/vfio- > > user-obj.h > > >>>> new file mode 100644 > > >>>> index 0000000000..87ab78b875 > > >>>> --- /dev/null > > >>>> +++ b/include/hw/remote/vfio-user-obj.h > > >>>> @@ -0,0 +1,6 @@ > > >>>> +#ifndef VFIO_USER_OBJ_H > > >>>> +#define VFIO_USER_OBJ_H > > >>>> + > > >>>> +void vfu_object_set_bus_irq(PCIBus *pci_bus); > > >>>> + > > >>>> +#endif > > >>>> diff --git a/hw/pci/msi.c b/hw/pci/msi.c > > >>>> index 47d2b0f33c..93f5e400cc 100644 > > >>>> --- a/hw/pci/msi.c > > >>>> +++ b/hw/pci/msi.c > > >>>> @@ -51,6 +51,8 @@ > > >>>> */ > > >>>> bool msi_nonbroken; > > >>>> > > >>>> +static void pci_msi_notify(PCIDevice *dev, unsigned int vector); > > >>>> + > > >>>> /* If we get rid of cap allocator, we won't need this. */ > > >>>> static inline uint8_t msi_cap_sizeof(uint16_t flags) > > >>>> { > > >>>> @@ -225,6 +227,8 @@ int msi_init(struct PCIDevice *dev, uint8_t offset, > > >>>> dev->msi_cap = config_offset; > > >>>> dev->cap_present |= QEMU_PCI_CAP_MSI; > > >>>> > > >>>> + dev->msi_notify = pci_msi_notify; > > >>> > > >>> Are you sure it's correct to skip the msi_is_masked() logic? I think the > > >>> callback function should only override the behavior of > > >>> msi_send_message(), not the entire msi_notify() function. > > >>> > > >>> The same applies to MSI-X. > > >> > > >> Hi Stefan, > > >> > > >> We noticed that the client is handling the masking and unmasking of MSIx > > >> interrupts. > > >> > > >> Concerning MSIx, vfio_msix_vector_use() handles unmasking and > > >> vfio_msix_vector_release() handles masking operations. The server > > >> triggers > > >> an MSIx interrupt by signaling the eventfd associated with the vector. > > >> If the > > vector > > >> is unmasked, the interrupt bypasses the client/QEMU and takes this > > >> path: “server -> KVM -> guest”. Whereas, if the vector is masked, it > > >> lands on > > the > > >> client via: “server -> vfio_msi_interrupt()”. vfio_msi_interrupt() > > >> suppresses > > the > > >> interrupt if the vector is masked. The use and release functions switch > > >> the > > server’s > > >> eventfd between VFIOPCIDevice->VFIOMSIVector[i]->kvm_interrupt and > > >> VFIOPCIDevice->VFIOMSIVector[i]->interrupt using the > > >> VFIO_DEVICE_SET_IRQS message. > > >> > > >> Concerning MSI, the server should check if the vector is unmasked before > > >> triggering. The server is not doing it presently, will update it. For > > >> some > > reason, > > >> I had assumed that MSI handling is similar to MSIx in terms of masking - > > >> sorry > > >> about that. The masking and unmasking information for MSI is in the > > >> config > > space > > >> registers, so the server should have this information. > > >> > > >> You had previously suggested using callbacks for msi_get_message & > > >> msi_send_message, considering the masking issue. Given MSIx masking > > >> (including the MSIx table BAR) is handled at the client, the masking > > information > > >> doesn’t reach the server - so msix_notify will never invoke the > > >> msi_send_message callback - all the vectors remain masked at the server > > >> end (msix_init() -> msix_mask_all()). > > > > > > I was expecting vfio-user devices to be involved in MSI-X masking so > > libvfio-user can't be involved in the first place since QEMU emulates MSI/X: > https://lore.kernel.org/all/[email protected]/T/ > > > > they can implement the Pending bit semantics described in the spec: > > > > > > If a masked vector has its Pending bit Set, and the associated > > > underlying interrupt events are somehow satisfied (usually by software > > > though the exact manner is Function-specific), the Function must Clear > > > the Pending bit, to avoid sending a spurious interrupt message later > > > when software unmasks the vector. > > > > > > Does QEMU VFIO support this? > > > > QEMU VFIO doesn’t seem to support it - I couldn’t find a place where > > an assigned/passthru PCI device clears the pending bits in QEMU. > > > > > > > > What happens when a hw/net/e1000e_core.c vfio-user device uses > > > msix_clr_pending() and related APIs? > > > > > > Also, having the vfio-user daemon write to the eventfd while the vector > > > is masked is a waste of CPU cycles. The PCIe spec describes using MSI-X > > > masking for poll mode operation and going via eventfd is suboptimal: > > > > > > Software is permitted to mask one or more vectors indefinitely, and > > > service their associated interrupt events strictly based on polling > > > their Pending bits. A Function must Set and Clear its Pending bits as > > > necessary to support this “pure polling” mode of operation. > > > > > > Maybe the answer is these issues don't matter in practice because MSI-X > > > masking is not used much? > > > > From what I can tell, “pure polling” is used by ivshmem and virtio-pci > > devices in > > QEMU. > > > > e1000e doesn’t use “pure polling”, but it does clear pending interrupts. > > > > John Johnson, John Levon & Thanos, > > > > Any thoughts? > > If QEMU stops emulating MSI/X then we libvfio-user would have to do it.
If Alex is happy with VFIO not supporting the exact Pending bits semantics in the PCIe spec, then vfio-user probably doesn't need to either. I just wanted to raise this design issue and maybe we can document somewhere that this is intentional. Stefan
signature.asc
Description: PGP signature
