On Thu, Feb 21, 2019 at 11:29 AM Alexei Starovoitov <alexei.starovoi...@gmail.com> wrote: > > On Thu, Feb 21, 2019 at 01:56:53PM +0100, Jann Horn wrote: > > On Thu, Feb 21, 2019 at 9:53 AM Daniel Borkmann <dan...@iogearbox.net> > > wrote: > > > On 02/21/2019 06:31 AM, Kees Cook wrote: > > > > On Wed, Feb 20, 2019 at 8:03 PM Alexei Starovoitov > > > > <alexei.starovoi...@gmail.com> wrote: > > > >> > > > >> On Wed, Feb 20, 2019 at 3:59 PM Alexei Starovoitov > > > >> <alexei.starovoi...@gmail.com> wrote: > > > >>> > > > >>> On Thu, Feb 21, 2019 at 12:01:35AM +0100, Daniel Borkmann wrote: > > > >>>> In 568f196756ad ("bpf: check that BPF programs run with preemption > > > >>>> disabled") > > > >>>> a check was added for BPF_PROG_RUN() that for every invocation > > > >>>> preemption is > > > >>>> disabled to not break eBPF assumptions (e.g. per-cpu map). Of course > > > >>>> this does > > > >>>> not count for seccomp because only cBPF -> eBPF is loaded here and > > > >>>> it does > > > >>>> not make use of any functionality that would require this assertion. > > > >>>> Fix this > > > >>>> false positive by adding and using SECCOMP_RUN() variant that does > > > >>>> not have > > > >>>> the cant_sleep(); check. > > > >>>> > > > >>>> Fixes: 568f196756ad ("bpf: check that BPF programs run with > > > >>>> preemption disabled") > > > >>>> Reported-by: syzbot+8bf19ee2aa580de7a...@syzkaller.appspotmail.com > > > >>>> Signed-off-by: Daniel Borkmann <dan...@iogearbox.net> > > > >>>> Acked-by: Kees Cook <keesc...@chromium.org> > > > >>> > > > >>> Applied, Thanks > > > >> > > > >> Actually I think it's a wrong approach to go long term. > > > >> I'm thinking to revert it. > > > >> I think it's better to disable preemption for duration of > > > >> seccomp cbpf prog. > > > >> It's short and there is really no reason for it to be preemptible. > > > >> When seccomp switches to ebpf we'll have this weird inconsistency. > > > >> Let's just disable preemption for seccomp as well. > > > > > > > > A lot of changes will be needed for seccomp ebpf -- not the least of > > > > which is convincing me there is a use-case. ;) > > > > > > > > But the main issue is that I'm not a huge fan of dropping two > > > > barriers() across syscall entry. That seems pretty heavy-duty for > > > > something that is literally not needed right now. > > > > > > Yeah, I think it's okay to add once actually technically needed. Last > > > time I looked, if I recall correctly, at least Chrome installs some > > > heavy duty seccomp programs that go close to prog limit. > > > > Half of that is probably because that seccomp BPF code is so > > inefficient, though. > > > > This snippet shows that those programs constantly recheck the high > > halves of arguments: > > > > Some of the generated code is pointless because all reachable code > > from that point on has the same outcome (the last "ret ALLOW" in the > > following sample is unreachable because they've already checked that > > the high bit of the low half is set, so the low half can't be 3): > > and with ebpf these optimizations will be available for free > because llvm will remove unnecessary loads and simplify branches. > There is no technical reason not to use ebpf in seccomp. > > When we discussed preemption of classic vs extended in socket filters > context we agreed to make it a requirement that preemption must be > disabled though it's not strictly necessary. RX side of socket filters > was already non-preempt while TX was preemptible. > We must not make an exception of this rule for seccomp. > Hence I've reverted this commit. > > Here is the actual fix for seccomp: > From: Alexei Starovoitov <a...@kernel.org> > Date: Thu, 21 Feb 2019 10:40:14 -0800 > Subject: [PATCH] seccomp, bpf: disable preemption before calling into bpf prog > > All BPF programs must be called with preemption disabled. > > Signed-off-by: Alexei Starovoitov <a...@kernel.org> > --- > kernel/seccomp.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/kernel/seccomp.c b/kernel/seccomp.c > index e815781ed751..a43c601ac252 100644 > --- a/kernel/seccomp.c > +++ b/kernel/seccomp.c > @@ -267,6 +267,7 @@ static u32 seccomp_run_filters(const struct seccomp_data > *sd, > * All filters in the list are evaluated and the lowest BPF return > * value always takes priority (ignoring the DATA). > */ > + preempt_disable(); > for (; f; f = f->prev) { > u32 cur_ret = BPF_PROG_RUN(f->prog, sd); > > @@ -275,6 +276,7 @@ static u32 seccomp_run_filters(const struct seccomp_data > *sd, > *match = f; > } > } > + preempt_enable(); > return ret; > } > #endif /* CONFIG_SECCOMP_FILTER */ > -- > > Doing per-cpu increment of cache hot data is practically free and it makes > seccomp > play by the rules.
Other accesses should dominate the run time, yes. I'm still not a big fan of unconditionally adding this, but I won't NAK. :P -- Kees Cook