On 6/15/2019 12:33 PM, Ivan Khoronzhuk wrote: > On Thu, Jun 13, 2019 at 08:28:42PM +0200, Jesper Dangaard Brouer wrote: > Hi, Jesper > >> This patch is needed before we can allow drivers to use page_pool for >> DMA-mappings. Today with page_pool and XDP return API, it is possible to >> remove the page_pool object (from rhashtable), while there are still >> in-flight packet-pages. This is safely handled via RCU and failed >> lookups in >> __xdp_return() fallback to call put_page(), when page_pool object is >> gone. >> In-case page is still DMA mapped, this will result in page note getting >> correctly DMA unmapped. >> >> To solve this, the page_pool is extended with tracking in-flight >> pages. And >> XDP disconnect system queries page_pool and waits, via workqueue, for all >> in-flight pages to be returned. >> >> To avoid killing performance when tracking in-flight pages, the implement >> use two (unsigned) counters, that in placed on different cache-lines, and >> can be used to deduct in-flight packets. This is done by mapping the >> unsigned "sequence" counters onto signed Two's complement arithmetic >> operations. This is e.g. used by kernel's time_after macros, described in >> kernel commit 1ba3aab3033b and 5a581b367b5, and also explained in >> RFC1982. >> >> The trick is these two incrementing counters only need to be read and >> compared, when checking if it's safe to free the page_pool structure. >> Which >> will only happen when driver have disconnected RX/alloc side. Thus, on a >> non-fast-path. >> >> It is chosen that page_pool tracking is also enabled for the non-DMA >> use-case, as this can be used for statistics later. >> >> After this patch, using page_pool requires more strict resource >> "release", >> e.g. via page_pool_release_page() that was introduced in this >> patchset, and >> previous patches implement/fix this more strict requirement. >> >> Drivers no-longer call page_pool_destroy(). Drivers already call >> xdp_rxq_info_unreg() which call xdp_rxq_info_unreg_mem_model(), which >> will >> attempt to disconnect the mem id, and if attempt fails schedule the >> disconnect for later via delayed workqueue. >> >> Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> >> --- >> drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 - >> include/net/page_pool.h | 41 ++++++++++--- >> net/core/page_pool.c | 62 >> +++++++++++++++----- >> net/core/xdp.c | 65 >> +++++++++++++++++++-- >> 4 files changed, 136 insertions(+), 35 deletions(-) >> >> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c >> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c >> index 2f647be292b6..6c9d4d7defbc 100644 >> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c >> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > > [...] > >> --- a/net/core/xdp.c >> +++ b/net/core/xdp.c >> @@ -38,6 +38,7 @@ struct xdp_mem_allocator { >> }; >> struct rhash_head node; >> struct rcu_head rcu; >> + struct delayed_work defer_wq; >> }; >> >> static u32 xdp_mem_id_hashfn(const void *data, u32 len, u32 seed) >> @@ -79,13 +80,13 @@ static void __xdp_mem_allocator_rcu_free(struct >> rcu_head *rcu) >> >> xa = container_of(rcu, struct xdp_mem_allocator, rcu); >> >> + /* Allocator have indicated safe to remove before this is called */ >> + if (xa->mem.type == MEM_TYPE_PAGE_POOL) >> + page_pool_free(xa->page_pool); >> + > > What would you recommend to do for the following situation: > > Same receive queue is shared between 2 network devices. The receive ring is > filled by pages from page_pool, but you don't know the actual port (ndev) > filling this ring, because a device is recognized only after packet is > received. > > The API is so that xdp rxq is bind to network device, each frame has > reference > on it, so rxq ndev must be static. That means each netdev has it's own rxq > instance even no need in it. Thus, after your changes, page must be > returned to > the pool it was taken from, or released from old pool and recycled in > new one > somehow. > > And that is inconvenience at least. It's hard to move pages between > pools w/o > performance penalty. No way to use common pool either, as unreg_rxq now > drops > the pool and 2 rxqa can't reference same pool. >
Within the single netdev, separate page_pool instances are anyway created for different RX rings, working under different NAPI's. So I don't understand your comment above about breaking some multi-netdev pool sharing use case... Tariq