On Tue, 18 Jun 2019 15:54:33 +0300 Ivan Khoronzhuk <ivan.khoronz...@linaro.org> wrote:
> On Sun, Jun 16, 2019 at 10:56:25AM +0000, Tariq Toukan wrote: > > > >On 6/15/2019 12:33 PM, Ivan Khoronzhuk wrote: > >> On Thu, Jun 13, 2019 at 08:28:42PM +0200, Jesper Dangaard Brouer wrote: [...] > >> > >> What would you recommend to do for the following situation: > >> > >> Same receive queue is shared between 2 network devices. The receive ring is > >> filled by pages from page_pool, but you don't know the actual port (ndev) > >> filling this ring, because a device is recognized only after packet is > >> received. > >> > >> The API is so that xdp rxq is bind to network device, each frame has > >> reference > >> on it, so rxq ndev must be static. That means each netdev has it's own rxq > >> instance even no need in it. Thus, after your changes, page must be > >> returned to > >> the pool it was taken from, or released from old pool and recycled in > >> new one > >> somehow. > >> > >> And that is inconvenience at least. It's hard to move pages between > >> pools w/o performance penalty. No way to use common pool either, > >> as unreg_rxq now drops the pool and 2 rxqa can't reference same > >> pool. > > > >Within the single netdev, separate page_pool instances are anyway > >created for different RX rings, working under different NAPI's. > > The circumstances are so that same RX ring is shared between 2 > netdevs... and netdev can be known only after descriptor/packet is > received. Thus, while filling RX ring, there is no actual device, > but when packet is received it has to be recycled to appropriate > net device pool. Before this change there were no difference from > which pool the page was allocated to fill RX ring, as there were no > owner. After this change there is owner - netdev page pool. It not really a dependency added in this patchset. A page_pool is strictly bound to a single RX-queue, for performance, as this allow us a NAPI fast-path return used for early drop (XDP_DROP). I can see that the API xdp_rxq_info_reg_mem_model() make it possible to call it on different xdp_rxq_info structs with the same page_pool pointer. But it was never intended to be used like that, and I consider it an API usage violation. I originally wanted to add the allocator pointer to xdp_rxq_info_reg() call, but the API was extended in different versions, so I didn't want to break users. I've actually tried hard to catch when drivers use the API wrong, via WARN(), but I guess you found a loop hole. Besides, we already have a dependency from the RX-queue to the netdev in the xdp_rxq_info structure. E.g. the xdp_rxq_info->dev is sort of central, and dereferenced by BPF-code to read xdp_md->ingress_ifindex, and also used by cpumap when creating SKBs. > For cpsw the dma unmap is common for both netdevs and no difference > who is freeing the page, but there is difference which pool it's > freed to. > > So that, while filling RX ring the page is taken from page pool of > ndev1, but packet is received for ndev2, it has to be later > returned/recycled to page pool of ndev1, but when xdp buffer is > handed over to xdp prog the xdp_rxq_info has reference on ndev2 ... > > And no way to predict the final ndev before packet is received, so no > way to choose appropriate page pool as now it becomes page owner. > > So, while RX ring filling, the page/dma recycling is needed but should > be some way to identify page owner only after receiving packet. > > Roughly speaking, something like: > > pool->pages_state_hold_cnt++; > > outside of page allocation API, after packet is received. Don't EVER manipulate the internal state outside of page allocation API. That kills the purpose of defining any API. > and free of the counter while allocation (w/o owing the page). You use-case of two netdev's sharing the same RX-queue sounds dubious, and very hardware specific. I'm not sure why we want to bend the APIs to support this? If we had to allow page_pool to be registered twice, via xdp_rxq_info_reg_mem_model() then I guess we could extend page_pool with a usage/users reference count, and then only really free the page_pool when refcnt reach zero. But it just seems and looks wrong (in the code) as the hole trick to get performance is to only have one user. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer