On Sun, 10 Jul 2016 15:27:38 -0500
Tom Herbert <t...@herbertland.com> wrote:

> On Sun, Jul 10, 2016 at 8:37 AM, Jesper Dangaard Brouer
> <bro...@redhat.com> wrote:
> > On Sat, 9 Jul 2016 08:47:52 -0500
> > Tom Herbert <t...@herbertland.com> wrote:
> >  
> >> On Sat, Jul 9, 2016 at 3:14 AM, Jesper Dangaard Brouer
> >> <bro...@redhat.com> wrote:  
> >> > On Thu,  7 Jul 2016 19:15:13 -0700
> >> > Brenden Blanco <bbla...@plumgrid.com> wrote:
> >> >  
> >> >> Add a new bpf prog type that is intended to run in early stages of the
> >> >> packet rx path. Only minimal packet metadata will be available, hence a
> >> >> new context type, struct xdp_md, is exposed to userspace. So far only
> >> >> expose the packet start and end pointers, and only in read mode.
> >> >>
> >> >> An XDP program must return one of the well known enum values, all other
> >> >> return codes are reserved for future use. Unfortunately, this
> >> >> restriction is hard to enforce at verification time, so take the
> >> >> approach of warning at runtime when such programs are encountered. The
> >> >> driver can choose to implement unknown return codes however it wants,
> >> >> but must invoke the warning helper with the action value.  
> >> >
> >> > I believe we should define a stronger semantics for unknown/future
> >> > return codes than the once stated above:
> >> >  "driver can choose to implement unknown return codes however it wants"
> >> >
> >> > The mlx4 driver implementation in:
> >> >  [PATCH v6 04/12] net/mlx4_en: add support for fast rx drop bpf program
> >> >
> >> > On Thu,  7 Jul 2016 19:15:16 -0700 Brenden Blanco <bbla...@plumgrid.com> 
> >> > wrote:
> >> >  
> >> >> +             /* A bpf program gets first chance to drop the packet. It 
> >> >> may
> >> >> +              * read bytes but not past the end of the frag.
> >> >> +              */
> >> >> +             if (prog) {
> >> >> +                     struct xdp_buff xdp;
> >> >> +                     dma_addr_t dma;
> >> >> +                     u32 act;
> >> >> +
> >> >> +                     dma = be64_to_cpu(rx_desc->data[0].addr);
> >> >> +                     dma_sync_single_for_cpu(priv->ddev, dma,
> >> >> +                                             
> >> >> priv->frag_info[0].frag_size,
> >> >> +                                             DMA_FROM_DEVICE);
> >> >> +
> >> >> +                     xdp.data = page_address(frags[0].page) +
> >> >> +                                                     
> >> >> frags[0].page_offset;
> >> >> +                     xdp.data_end = xdp.data + length;
> >> >> +
> >> >> +                     act = bpf_prog_run_xdp(prog, &xdp);
> >> >> +                     switch (act) {
> >> >> +                     case XDP_PASS:
> >> >> +                             break;
> >> >> +                     default:
> >> >> +                             bpf_warn_invalid_xdp_action(act);
> >> >> +                     case XDP_DROP:
> >> >> +                             goto next;
> >> >> +                     }
> >> >> +             }  
> >> >
> >> > Thus, mlx4 choice is to drop packets for unknown/future return codes.
> >> >
> >> > I think this is the wrong choice.  I think the choice should be
> >> > XDP_PASS, to pass the packet up the stack.
> >> >
> >> > I find "XDP_DROP" problematic because it happen so early in the driver,
> >> > that we lost all possibilities to debug what packets gets dropped.  We
> >> > get a single kernel log warning, but we cannot inspect the packets any
> >> > longer.  By defaulting to XDP_PASS all the normal stack tools (e.g.
> >> > tcpdump) is available.
> >> >  
> >>
> >> It's an API issue though not a problem with the packet. Allowing
> >> unknown return codes to pass seems like a major security problem also.  
> >
> > We have the full power and flexibility of the normal Linux stack to
> > drop these packets.  And from a usability perspective it gives insight
> > into what is wrong and counters metrics.  Would you rather blindly drop
> > e.g. 0.01% of the packets in your data-centers without knowing.
> >  
> This is not blindly dropping packets; the bad action should be logged,
> counters incremented, and packet could be passed to the stack as an
> error if deeper inspection is needed. 

Well, the patch only logs a single warning.  There is no method of
counting or passing to the stack in this proposal.  And adding such
things is a performance regression risk, and DoS vector in itself.

> IMO, I would rather drop
> something not understood than accept it-- determinism is a goal also.
> 
> > We already talk about XDP as an offload mechanism.  Normally when
> > loading a (XDP) "offload" program it should be rejected, e.g. by the
> > validator.  BUT we cannot validate all return eBPF codes, because they
> > can originate from a table lookup.  Thus, we _do_ allow programs to be
> > loaded, with future unknown return code.
> >  This then corresponds to only part of the program can be offloaded,
> > thus the natural response is to fallback, handling this is the
> > non-offloaded slower-path.
> >
> > I see the XDP_PASS fallback as a natural way of supporting loading
> > newer/future programs on older "versions" of XDP.  
> 
> Then in this model we could only add codes that allow passing packets.
> For instance, what if a new return code means "Drop this packet and
> log it as critical because if you receive it the stack will crash"?

Drop is drop. I don't see how we would need to drop in a "new" way.
If you need to log a critical event do it in the eBPF program.

> ;-) IMO ignoring something not understood for the sake of
> extensibility is a red herring. In the long run doing this actually
> limits are ability to extend things for both APIs and protocols (a
> great example of this is VLXAN that mandates  unknown flags are
> ignored in RX so VXLAN-GPE has a be a new incompatible protocol to get
> a next protocol field).
> 
> >   E.g. I can have a XDP program that have a valid filter protection
> > mechanism, but also use a newer mechanism, and my server fleet contains
> > different NIC vendors, some NICs only support the filter part.  Then I
> > want to avoid having to compile and maintain different XDP/eBPF
> > programs per NIC vendor. (Instead I prefer having a Linux stack
> > fallback mechanism, and transparently XDP offload as much as the NIC
> > driver supports).
> >  
> As Brenden points out, fallbacks easily become DOS vectors.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

Reply via email to