On Mon, Apr 29, 2019 at 12:07:05PM +0800, lin li wrote: > Tiwei Bie <tiwei....@intel.com> 于2019年4月28日周日 下午7:18写道: > > On Fri, Apr 26, 2019 at 05:40:21AM -0400, Li Lin wrote: [...] > > > @@ -98,12 +102,26 @@ struct rte_vhost_memory { > > > struct rte_vhost_mem_region regions[]; > > > }; > > > > > > +typedef struct VhostUserInflightEntry { > > > + uint8_t inflight; > > > +} VhostUserInflightEntry; > > > + > > > +typedef struct VhostInflightInfo { > > > + uint16_t version; > > > + uint16_t last_inflight_io; > > > + uint16_t used_idx; > > > + VhostUserInflightEntry desc[0]; > > > +} VhostInflightInfo; > > > > Is there any details on above structure? Why does it not match > > QueueRegionSplit or QueueRegionPacked structures described in > > qemu/docs/interop/vhost-user.txt? > > Qemu have its vhost-user backend,
Do you mean contrib/libvhost-user in QEMU? > qemu did the submission of IO in it. What does this mean? > The implementation of dpdk is more general. It is just to mark inflight entry. Based on the discussion in below threads: https://patchwork.kernel.org/patch/10753893/ https://patchwork.kernel.org/patch/10817735/ IIUC, above structure is the extra buffer allocated by slave to store the information of inflight descriptors and share with master for persistent. All vhost-user backends' implementation should follow the vhost-user protocol (including the format of above structure) defined in qemu/docs/interop/vhost-user.txt. Right? > The submission of inflight entry is handle over to different backends. > They have their own ways to handle it, such as spdk. > So there are some differences in data structure. > > > [...] > > > +static int > > > +vhost_user_set_inflight_fd(struct virtio_net **pdev, VhostUserMsg *msg, > > > + int main_fd __rte_unused) > > > +{ > > > + int fd, i; > > > + uint64_t mmap_size, mmap_offset; > > > + uint16_t num_queues, queue_size; > > > + uint32_t pervq_inflight_size; > > > + void *rc; > > > + struct vhost_virtqueue *vq; > > > + struct virtio_net *dev = *pdev; > > > + > > > + fd = msg->fds[0]; > > > + if (msg->size != sizeof(msg->payload.inflight) || fd < 0) { > > > + RTE_LOG(ERR, VHOST_CONFIG, "Invalid set_inflight_fd message > > > size is %d,fd is %d\n", > > > + msg->size, fd); > > > + return -1; > > > > Ditto. > > > > > + } > > > + > > > + mmap_size = msg->payload.inflight.mmap_size; > > > + mmap_offset = msg->payload.inflight.mmap_offset; > > > + num_queues = msg->payload.inflight.num_queues; > > > + queue_size = msg->payload.inflight.queue_size; > > > + pervq_inflight_size = get_pervq_shm_size(queue_size); > > > + > > > + RTE_LOG(INFO, VHOST_CONFIG, > > > + "set_inflight_fd mmap_size: %lu\n", mmap_size); > > > + RTE_LOG(INFO, VHOST_CONFIG, > > > + "set_inflight_fd mmap_offset: %lu\n", mmap_offset); > > > + RTE_LOG(INFO, VHOST_CONFIG, > > > + "set_inflight_fd num_queues: %u\n", num_queues); > > > + RTE_LOG(INFO, VHOST_CONFIG, > > > + "set_inflight_fd queue_size: %u\n", queue_size); > > > + RTE_LOG(INFO, VHOST_CONFIG, > > > + "set_inflight_fd fd: %d\n", fd); > > > + RTE_LOG(INFO, VHOST_CONFIG, > > > + "set_inflight_fd pervq_inflight_size: %d\n", > > > + pervq_inflight_size); > > > + > > > + if (dev->inflight_info.addr) > > > + munmap(dev->inflight_info.addr, dev->inflight_info.size); > > > + > > > + rc = mmap(0, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, > > > + fd, mmap_offset); > > > > Why call it rc? Maybe addr is a better name? > > In some scenarios, shared memory is reallocated or resized by qemu, so > again mmap is needed. In which case the shared memory will be reallocated or resized by QEMU? > > > > > > + if (rc == MAP_FAILED) { > > > + RTE_LOG(ERR, VHOST_CONFIG, "failed to mmap share > > > memory.\n"); > > > + return -1; > > > > Should always return RTE_VHOST_MSG_RESULT_* in > > message handler. > > > > > + } > > > + > > > + if (dev->inflight_info.fd) > > > + close(dev->inflight_info.fd); > > > + > > > + dev->inflight_info.fd = fd; > > > + dev->inflight_info.addr = rc; > > > + dev->inflight_info.size = mmap_size; > > > + > > > + for (i = 0; i < num_queues; i++) { > > > + vq = dev->virtqueue[i]; > > > + vq->inflight = (VhostInflightInfo *)rc; > > > + rc = (void *)((char *)rc + pervq_inflight_size); > > > + } > > > + > > > + return RTE_VHOST_MSG_RESULT_OK; > > > +} [...] > > > diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h > > > index 2a650fe4b..35f969b1b 100644 > > > --- a/lib/librte_vhost/vhost_user.h > > > +++ b/lib/librte_vhost/vhost_user.h > > > @@ -23,7 +23,8 @@ > > > (1ULL << > > > VHOST_USER_PROTOCOL_F_CRYPTO_SESSION) | \ > > > (1ULL << > > > VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD) | \ > > > (1ULL << > > > VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) | \ > > > - (1ULL << > > > VHOST_USER_PROTOCOL_F_PAGEFAULT)) > > > + (1ULL << > > > VHOST_USER_PROTOCOL_F_PAGEFAULT) | \ > > > + (1ULL << > > > VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) > > > > It will advertise this feature for builtin net and crypto > > backends. It's probably not what you intended. > > > > Indeed, this feature is mainly used for spdk-like backends. You mean > this function is disabled by default? External backends should use rte_vhost_driver_set_protocol_features() to advertise the protocol features they support.