On 08/09/2018 11:23 AM, Alexei Starovoitov wrote:
On Thu, Aug 09, 2018 at 09:51:49AM -0500, Mauricio Vasquez wrote:
Agree that existing ops are not the right alias, but deferring to user
space as inline function also doesn't really seem like a good fit, imho,
so I'd prefer rather to have something native. (Aside from that, the
above inline bpf_pop() would also race between CPUs.)
I think we should have push/pop/peek syscalls as well, having a bpf_pop()
that is race prone would create problems. Users expected maps operations to
be safe, so having one that is not will confuse them.
agree the races are not acceptable.
How about a mixed solution:
- introduce bpf_push/pop/peak helpers that programs will use, so
   they don't need to pass useless key=NULL
- introduce map->ops->lookup_and_delete and map->ops->lookup_or_init
   that prog-side helpers can use and syscall has 1-1 mapping for
I think if is a fair solution.
Native lookup_or_init() helper for programs and syscall is badly missing.
Most of the bcc scripts use it and bcc has a racy workaround.
Similarly lookup_and_delete() syscall is 1-1 to pop() for stack/queue
and useful for regular hash maps.

At the end for stack/queue map the programs will use:
int bpf_push(map, value);

Also flags should be passed here.

value_or_null = bpf_pop(map); // guaranteed non-racy for multi-cpu
value_or_null = bpf_peak(map); // racy if 2+ cpus doing it
Is there any reason for it to be racy?


from syscall:
bpf_map_lookup_elem(map, NULL, &value); // returns top of stack
bpf_map_lookup_and_delete_elem(map, NULL, &value); // returns top and deletes 
top atomically
bpf_map_update_elem(map, NULL, &value); // pushes new value into stack 
atomically

Eventually hash and other maps will implement bpf_map_lookup_and_delete()
for both bpf progs and syscall.

The main point that prog-side api doesn't have to match 1-1 to syscall-side,
since they're different enough already.
Like lookup_or_init() is badly needed for programs, but unnecessary for syscall.

Thoughts?

I agree with the idea, if there are not more thoughts on this I'd proceed to the implementation.

Reply via email to