From: Daniel Borkmann <dan...@iogearbox.net>
Date: Tue, 24 Nov 2015 21:28:15 +0100

> Currently, when having map file descriptors pointing to program arrays,
> there's still the issue that we unconditionally flush program array
> contents via bpf_fd_array_map_clear() in bpf_map_release(). This happens
> when such a file descriptor is released and is independent of the map's
> refcount.
> 
> Having this flush independent of the refcount is for a reason: there
> can be arbitrary complex dependency chains among tail calls, also circular
> ones (direct or indirect, nesting limit determined during runtime), and
> we need to make sure that the map drops all references to eBPF programs
> it holds, so that the map's refcount can eventually drop to zero and
> initiate its freeing. Btw, a walk of the whole dependency graph would
> not be possible for various reasons, one being complexity and another
> one inconsistency, i.e. new programs can be added to parts of the graph
> at any time, so there's no guaranteed consistent state for the time of
> such a walk.
> 
> Now, the program array pinning itself works, but the issue is that each
> derived file descriptor on close would nevertheless call unconditionally
> into bpf_fd_array_map_clear(). Instead, keep track of users and postpone
> this flush until the last reference to a user is dropped. As this only
> concerns a subset of references (f.e. a prog array could hold a program
> that itself has reference on the prog array holding it, etc), we need to
> track them separately.
> 
> Short analysis on the refcounting: on map creation time usercnt will be
> one, so there's no change in behaviour for bpf_map_release(), if unpinned.
> If we already fail in map_create(), we are immediately freed, and no
> file descriptor has been made public yet. In bpf_obj_pin_user(), we need
> to probe for a possible map in bpf_fd_probe_obj() already with a usercnt
> reference, so before we drop the reference on the fd with fdput().
> Therefore, if actual pinning fails, we need to drop that reference again
> in bpf_any_put(), otherwise we keep holding it. When last reference
> drops on the inode, the bpf_any_put() in bpf_evict_inode() will take
> care of dropping the usercnt again. In the bpf_obj_get_user() case, the
> bpf_any_get() will grab a reference on the usercnt, still at a time when
> we have the reference on the path. Should we later on fail to grab a new
> file descriptor, bpf_any_put() will drop it, otherwise we hold it until
> bpf_map_release() time.
> 
> Joint work with Alexei.
> 
> Fixes: b2197755b263 ("bpf: add support for persistent maps/progs")
> Signed-off-by: Daniel Borkmann <dan...@iogearbox.net>
> Signed-off-by: Alexei Starovoitov <a...@kernel.org>

Applied, thanks a lot Daniel.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to