On Thu, Feb 15, 2018 at 11:43:21PM +0100, Jesper Dangaard Brouer wrote: > The virtio_net code have three different RX code-paths in receive_buf(). > Two of these code paths can handle XDP, but one of them is broken for > at least XDP_REDIRECT. > > Function(1): receive_big() does not support XDP. > Function(2): receive_small() support XDP fully and uses build_skb(). > Function(3): receive_mergeable() broken XDP_REDIRECT uses napi_alloc_skb(). > > The simple explanation is that receive_mergeable() is broken because > it uses napi_alloc_skb(), which violates XDP given XDP assumes packet > header+data in single page and enough tail room for skb_shared_info. > > The longer explaination is that receive_mergeable() tries to > work-around and satisfy these XDP requiresments e.g. by having a > function xdp_linearize_page() that allocates and memcpy RX buffers > around (in case packet is scattered across multiple rx buffers). This > does currently satisfy XDP_PASS, XDP_DROP and XDP_TX (but only because > we have not implemented bpf_xdp_adjust_tail yet). > > The XDP_REDIRECT action combined with cpumap is broken, and cause hard > to debug crashes. The main issue is that the RX packet does not have > the needed tail-room (SKB_DATA_ALIGN(skb_shared_info)), causing > skb_shared_info to overlap the next packets head-room (in which cpumap > stores info). > > Reproducing depend on the packet payload length and if RX-buffer size > happened to have tail-room for skb_shared_info or not. But to make > this even harder to troubleshoot, the RX-buffer size is runtime > dynamically change based on an Exponentially Weighted Moving Average > (EWMA) over the packet length, when refilling RX rings. > > This patch only disable XDP_REDIRECT support in receive_mergeable() > case, because it can cause a real crash. > > But IMHO we should NOT support XDP in receive_mergeable() at all, > because the principles behind XDP are to gain speed by (1) code > simplicity, (2) sacrificing memory and (3) where possible moving > runtime checks to setup time. These principles are clearly being > violated in receive_mergeable(), that e.g. runtime track average > buffer size to save memory consumption.
As long as buffers we supply are large enough, we can handle mergeable by receive_small normally. The only issue is with outstanding buffers submitted before XDP was enabled. It should be possible to just copy these out. > Besides the described bug: > > Update(1): There is also a OOM leak in the XDP_REDIRECT code, which > receive_small() is likely also affected by. > > Update(2): Also observed a guest crash when redirecting out an > another virtio_net device, when device is down. > > Fixes: 186b3c998c50 ("virtio-net: support XDP_REDIRECT") > Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> > --- > drivers/net/virtio_net.c | 7 ------- > 1 file changed, 7 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 626c27352ae2..0ca91942a884 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -677,7 +677,6 @@ static struct sk_buff *receive_mergeable(struct > net_device *dev, > struct bpf_prog *xdp_prog; > unsigned int truesize; > unsigned int headroom = mergeable_ctx_to_headroom(ctx); > - int err; > > head_skb = NULL; > > @@ -754,12 +753,6 @@ static struct sk_buff *receive_mergeable(struct > net_device *dev, > goto err_xdp; > rcu_read_unlock(); > goto xdp_xmit; > - case XDP_REDIRECT: > - err = xdp_do_redirect(dev, &xdp, xdp_prog); > - if (!err) > - *xdp_xmit = true; > - rcu_read_unlock(); > - goto xdp_xmit; > default: > bpf_warn_invalid_xdp_action(act); > case XDP_ABORTED: