On 08/07/2018 09:42 AM, Alexei Starovoitov wrote:
On Mon, Aug 06, 2018 at 03:58:30PM +0200, Mauricio Vasquez B wrote:
Bpf queue implements a LIFO/FIFO data containers for ebpf programs.
queue/stack datastructure would be a great addition.
It allows to push an element to the queue by using the update operation
and to pop an element from the queue by using the lookup operation.
A use case for this is to keep track of a pool of elements, like
network ports in a SNAT.
Signed-off-by: Mauricio Vasquez B <mauricio.vasq...@polito.it>
---
include/linux/bpf_types.h | 1
include/uapi/linux/bpf.h | 5 +
kernel/bpf/Makefile | 2
kernel/bpf/queuemap.c | 287 +++++++++++++++++++++++++++++++++++++++++++++
kernel/bpf/syscall.c | 61 +++++++---
kernel/bpf/verifier.c | 16 ++-
6 files changed, 353 insertions(+), 19 deletions(-)
create mode 100644 kernel/bpf/queuemap.c
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index c5700c2d5549..6c7a62f3fe43 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -58,3 +58,4 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_CPUMAP, cpu_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_XSKMAP, xsk_map_ops)
#endif
#endif
+BPF_MAP_TYPE(BPF_MAP_TYPE_QUEUE, queue_map_ops)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0ebaaf7f3568..2c171c40eb45 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -120,6 +120,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_CPUMAP,
BPF_MAP_TYPE_XSKMAP,
BPF_MAP_TYPE_SOCKHASH,
+ BPF_MAP_TYPE_QUEUE,
};
enum bpf_prog_type {
@@ -255,6 +256,10 @@ enum bpf_attach_type {
/* Flag for stack_map, store build_id+offset instead of pointer */
#define BPF_F_STACK_BUILD_ID (1U << 5)
+/* Flags for queue_map, type of queue */
+#define BPF_F_QUEUE_FIFO (1U << 16)
+#define BPF_F_QUEUE_LIFO (2U << 16)
the choice of flags looks odd.
Why start at bit 16 and why waste two bits?
It's either stack or queue.
May be instead define two map_types:
BPF_MAP_TYPE_QUEUE and BPF_MAP_TYPE_STACK
that share common implementation?
I like this solution, I'll spin a v2 with two different types of maps.
And how about adding three new helpers: push/pop/peek as well?
Reusing lookup/update is neat, but does lookup == pop
or does lookup == peek ?
I suspect it will be confusing.
Three new helpers cost nothing, but will make bpf progs easier to read.
I agree. I have one doubt here, update/lookup/delete is implemented in
all map types, if the operation is not supported it returns -EINVAL.
For push/pop/peek, should we implement it in all maps or is it better to
check the map type before invoking map->ops->push/pop/seek?
(Maybe checking if map->ops->xxx is NULL will also work)
Could you also add a flag for replacement policy?
In this implementation when max_entries limit is reached
the map_update_elem (aka push) will fail with e2big.
It would be useful to allow pushing and dropping elements from
the other side then such queue/stack can be used to keep
track of the last N events (the prog would unconditionally push
and user space would swap the queue for new one via map-in-map
and drain old queue).
I like it, will do.
Speaking of map-in-map, please add a test to make sure
queue/stack works with hash/array_of_maps.
Sure.
selftests in patch2 needs to have kernel side as well.
it's not enough to test syscall access only.
Will do.