> On Sep 14, 2021, at 6:06 AM, Stefan Hajnoczi <stefa...@redhat.com> wrote: > > On Mon, Sep 13, 2021 at 05:23:33PM +0000, John Johnson wrote: >>>> On Sep 9, 2021, at 10:25 PM, John Johnson <john.g.john...@oracle.com> >>>> wrote: >>>>> On Sep 8, 2021, at 11:29 PM, Stefan Hajnoczi <stefa...@redhat.com> wrote: >>>>> On Thu, Sep 09, 2021 at 05:11:49AM +0000, John Johnson wrote: >>>>>> I did look at coroutines, but they seemed to work when the sender >>>>>> is triggering the coroutine on send, not when request packets are >>>>>> arriving >>>>>> asynchronously to the sends. >>>>> >>>>> This can be done with a receiver coroutine. Its job is to be the only >>>>> thing that reads vfio-user messages from the socket. A receiver >>>>> coroutine reads messages from the socket and wakes up the waiting >>>>> coroutine that yielded from vfio_user_send_recv() or >>>>> vfio_user_pci_process_req(). >>>>> >>>>> (Although vfio_user_pci_process_req() could be called directly from the >>>>> receiver coroutine, it seems safer to have a separate coroutine that >>>>> processes requests so that the receiver isn't blocked in case >>>>> vfio_user_pci_process_req() yields while processing a request.) >>>>> >>>>> Going back to what you mentioned above, the receiver coroutine does >>>>> something like this: >>>>> >>>>> if it's a reply >>>>> reply = find_reply(...) >>>>> qemu_coroutine_enter(reply->co) // instead of signalling reply->cv >>>>> else >>>>> QSIMPLEQ_INSERT_TAIL(&pending_reqs, request, next); >>>>> if (pending_reqs_was_empty) { >>>>> qemu_coroutine_enter(process_request_co); >>>>> } >>>>> >>>>> The pending_reqs queue holds incoming requests that the >>>>> process_request_co coroutine processes. >>>>> >>>> >>>> >>>> How do coroutines work across threads? There can be multiple vCPU >>>> threads waiting for replies, and I think the receiver coroutine will be >>>> running in the main loop thread. Where would a vCPU block waiting for >>>> a reply? I think coroutine_yield() returns to its coroutine_enter() caller >>> >>> >>> >>> A vCPU thread holding the BQL can iterate the event loop if it has >>> reached a synchronous point that needs to wait for a reply before >>> returning. I think we have this situation when a MemoryRegion is >>> accessed on the proxy device. >>> >>> For example, block/block-backend.c:blk_prw() kicks off a coroutine and >>> then runs the event loop until the coroutine finishes: >>> >>> Coroutine *co = qemu_coroutine_create(co_entry, &rwco); >>> bdrv_coroutine_enter(blk_bs(blk), co); >>> BDRV_POLL_WHILE(blk_bs(blk), rwco.ret == NOT_DONE); >>> >>> BDRV_POLL_WHILE() boils down to a loop like this: >>> >>> while ((cond)) { >>> aio_poll(ctx, true); >>> } >>> >> >> I think that would make vCPUs sending requests and the >> receiver coroutine all poll on the same socket. If the “wrong” >> routine reads the message, I’d need a second level of synchronization >> to pass the message to the “right” one. e.g., if the vCPU coroutine >> reads a request, it needs to pass it to the receiver; if the receiver >> coroutine reads a reply, it needs to pass it to a vCPU. >> >> Avoiding this complexity is one of the reasons I went with >> a separate thread that only reads the socket over the mp-qemu model, >> which does have the sender poll, but doesn’t need to handle incoming >> requests. > > Only one coroutine reads from the socket, the "receiver" coroutine. In a > previous reply I sketched what the receiver does: > > if it's a reply > reply = find_reply(...) > qemu_coroutine_enter(reply->co) // instead of signalling reply->cv > else > QSIMPLEQ_INSERT_TAIL(&pending_reqs, request, next); > if (pending_reqs_was_empty) { > qemu_coroutine_enter(process_request_co); > } >
Sorry, I was assuming when you said the coroutine will block with aio_poll(), you implied it would also read messages from the socket. > The qemu_coroutine_enter(reply->co) call re-enters the coroutine that > was created by the vCPU thread. Is this the "second level of > synchronization" that you described? It's very similar to signalling > reply->cv in the existing patch. > Yes, the only difference is it would be woken on each message, even though it doesn’t read them. Which is what I think you’re addressing below. > Now I'm actually thinking about whether this can be improved by keeping > the condvar so that the vCPU thread doesn't need to call aio_poll() > (which is awkward because it doesn't drop the BQL and therefore blocks > other vCPUs from making progress). That approach wouldn't require a > dedicated thread for vfio-user. > Wouldn’t you need to acquire BQL twice for every vCPU reply: once to run the receiver coroutine, and once when the vCPU thread wakes up and wants to return to the VFIO code. The migration thread would also add a BQL dependency, where it didn’t have one before. Is your objection with using an iothread, or using a separate thread? I can change to using qemu_thread_create() and running aio_poll() from the thread routine, instead of creating an iothread. On a related subject: On Aug 24, 2021, at 8:14 AM, Stefan Hajnoczi <stefa...@redhat.com> wrote: >> + ret = qio_channel_readv_full(proxy->ioc, &iov, 1, &fdp, &numfds, >> + &local_err); > > This is a blocking call. My understanding is that the IOThread is shared > by all vfio-user devices, so other devices will have to wait if one of > them is acting up (e.g. the device emulation process sent less than > sizeof(msg) bytes). This shouldn’t block if the emulation process sends less than sizeof(msg) bytes. qio_channel_readv() will eventually call recvmsg(), which only blocks a short read if MSG_WAITALL is set, and it’s not set in this case. recvmsg() will return the data available, and vfio_user_recv() will treat a short read as an error. JJ