On 2018年03月09日 17:35, Jesper Dangaard Brouer wrote:
On Fri, 9 Mar 2018 15:24:10 +0800
Jason Wang <jasow...@redhat.com> wrote:
On 2018年03月08日 21:07, Jesper Dangaard Brouer wrote:
Introduce an xdp_return_frame API, and convert over cpumap as
the first user, given it have queued XDP frame structure to leverage.
Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com>
---
include/net/xdp.h | 32 +++++++++++++++++++++++++++
kernel/bpf/cpumap.c | 60
+++++++++++++++++++++++++++++++--------------------
net/core/xdp.c | 18 +++++++++++++++
3 files changed, 86 insertions(+), 24 deletions(-)
diff --git a/include/net/xdp.h b/include/net/xdp.h
index b2362ddfa694..3cb726a6dc5b 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -33,16 +33,48 @@
* also mandatory during RX-ring setup.
*/
+enum mem_type {
+ MEM_TYPE_PAGE_SHARED = 0, /* Split-page refcnt based model */
+ MEM_TYPE_PAGE_ORDER0, /* Orig XDP full page model */
+ // Possible new ideas for types:
+ // MEM_TYPE_PAGE_POOL, /* Will be added later */
+ // MEM_TYPE_AF_XDP,
+ // MEM_TYPE_DEVICE_OWNED -- invoking an dev->ndo?
+ MEM_TYPE_MAX,
+};
So if we plan to support dev->ndo, it looks to me two types AF_XDP and
DEVICE_OWNED are sufficient? Driver can do what it wants (e.g page pool
or ordinary page allocator) in ndo or what ever other callbacks.
So, the design is open to go both ways, we can figure out later.
The reason I'm not calling page_pool from a driver level dev->ndo, is
that I'm trying to avoid invoking code in a module, as the driver
module code can be (in the process of being) unloaded from the kernel,
while another driver is calling xdp_return_frame. The current design
in patch 10, uses the RCU period to guarantee that the allocator
pointer is valid as long as the ID lookup succeeded.
To do what you propose, we also need to guarantee that the net_device
cannot disappear so it is safe to invoke dev->ndo. To do so, the
driver likely have to add a rcu_barrier before unloading. I'm also
thinking that for MEM_TYPE_DEVICE_OWNED the allocator pointer ("under
protection") could be the net_device pointer, and we could take a
refcnt on dev (dev_hold/dev_put).
Then it looks like a device (e.g tun) can lock another device for
indefinite time.
Thus, I think it is doable, but lets
figure this out later.
Another important design consideration, is that the xdp core need to
know how to release memory in case the ID lookup failed.
Can we simply use put_page() in this case?
This is
another argument for keeping the memory release code inside the xdp
core, and not leave too much freedom to the drivers.
+struct xdp_mem_info {
+ u32 type; /* enum mem_type, but known size type */
+ u32 id; // Needed later (to lookup struct xdp_rxq_info)
+};
+
struct xdp_rxq_info {
struct net_device *dev;
u32 queue_index;
u32 reg_state;
+ struct xdp_mem_info mem;
} ____cacheline_aligned; /* perf critical, avoid false-sharing */
+
+static inline
+void xdp_return_frame(void *data, struct xdp_mem_info *mem)
+{
+ if (mem->type == MEM_TYPE_PAGE_SHARED)
+ page_frag_free(data);
+
+ if (mem->type == MEM_TYPE_PAGE_ORDER0) {
+ struct page *page = virt_to_page(data); /* Assumes order0 page*/
+
+ put_page(page);
+ }
+}
+
int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
struct net_device *dev, u32 queue_index);
void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq);
void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq);
bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq);
+int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
+ enum mem_type type, void *allocator);
#endif /* __LINUX_NET_XDP_H__ */
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index a4bb0b34375a..3e4bbcbe3e86 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
[...]
static void get_cpu_map_entry(struct bpf_cpu_map_entry *rcpu)
{
atomic_inc(&rcpu->refcnt);
@@ -188,6 +168,10 @@ struct xdp_pkt {
u16 len;
u16 headroom;
u16 metasize;
+ /* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
+ * while mem info is valid on remote CPU.
+ */
Can we simply move the xdp_mem_info to xdp_buff to avoid conversion?
No, xdp_buff is a stack allocated piece of memory, thus we do need a
conversion into another piece of memory.
Right but just duplicate xdp_buff content is sufficient in this case or
is there anything other I missed?
Thanks