> -----Original Message----- > From: Jason Wang [mailto:jasow...@redhat.com] > Sent: Thursday, December 24, 2020 1:56 PM > To: wangyunjian <wangyunj...@huawei.com> > Cc: netdev@vger.kernel.org; m...@redhat.com; > willemdebruijn.ker...@gmail.com; virtualizat...@lists.linux-foundation.org; > Lilijun (Jerry) <jerry.lili...@huawei.com>; chenchanghu > <chenchan...@huawei.com>; xudingke <xudin...@huawei.com>; huangbin (J) > <brian.huang...@huawei.com> > Subject: Re: [PATCH net v4 2/2] vhost_net: fix tx queue stuck when sendmsg > fails > > > On 2020/12/24 下午12:37, wangyunjian wrote: > >> -----Original Message----- > >> From: Jason Wang [mailto:jasow...@redhat.com] > >> Sent: Thursday, December 24, 2020 11:10 AM > >> To: wangyunjian <wangyunj...@huawei.com>; netdev@vger.kernel.org; > >> m...@redhat.com; willemdebruijn.ker...@gmail.com > >> Cc: virtualizat...@lists.linux-foundation.org; Lilijun (Jerry) > >> <jerry.lili...@huawei.com>; chenchanghu <chenchan...@huawei.com>; > >> xudingke <xudin...@huawei.com>; huangbin (J) > >> <brian.huang...@huawei.com> > >> Subject: Re: [PATCH net v4 2/2] vhost_net: fix tx queue stuck when > >> sendmsg fails > >> > >> > >> On 2020/12/24 上午10:25, wangyunjian wrote: > >>> From: Yunjian Wang <wangyunj...@huawei.com> > >>> > >>> Currently the driver doesn't drop a packet which can't be sent by > >>> tun (e.g bad packet). In this case, the driver will always process > >>> the same packet lead to the tx queue stuck. > >>> > >>> To fix this issue: > >>> 1. in the case of persistent failure (e.g bad packet), the driver > >>> can skip this descriptor by ignoring the error. > >>> 2. in the case of transient failure (e.g -EAGAIN and -ENOMEM), the > >>> driver schedules the worker to try again. > >> > >> I might be wrong but looking at alloc_skb_with_frags(), it returns > >> -ENOBUFS > >> actually: > >> > >> *errcode = -ENOBUFS; > >> skb = alloc_skb(header_len, gfp_mask); > >> if (!skb) > >> return NULL; > > Yes, but the sock_alloc_send_pskb() returns - EAGAIN when no sndbuf space. > > So there is need to check return value which is -EAGAIN or -ENOMEM or - > EAGAIN? > > > > struct sk_buff *sock_alloc_send_pskb() { ... > > for (;;) { > > ... > > sk_set_bit(SOCKWQ_ASYNC_NOSPACE, sk); > > set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); > > err = -EAGAIN; > > if (!timeo) > > goto failure; > > ... > > } > > skb = alloc_skb_with_frags(header_len, data_len, max_page_order, > > errcode, sk->sk_allocation); > > if (skb) > > skb_set_owner_w(skb, sk); > > return skb; > > ... > > *errcode = err; > > return NULL; > > } > > > -EAGAIN and -ENOBFS are fine. But I don't see how -ENOMEM can be returned.
The tun_build_skb() and tun_napi_alloc_frags() return -ENOMEM when memory allocation fails. Thanks > > Thanks > > > >> Thanks > >> > >> > >>> Fixes: 3a4d5c94e959 ("vhost_net: a kernel-level virtio server") > >>> Signed-off-by: Yunjian Wang <wangyunj...@huawei.com> > >>> --- > >>> drivers/vhost/net.c | 16 ++++++++-------- > >>> 1 file changed, 8 insertions(+), 8 deletions(-) > >>> > >>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index > >>> c8784dfafdd7..e76245daa7f6 100644 > >>> --- a/drivers/vhost/net.c > >>> +++ b/drivers/vhost/net.c > >>> @@ -827,14 +827,13 @@ static void handle_tx_copy(struct vhost_net > >>> *net, > >> struct socket *sock) > >>> msg.msg_flags &= ~MSG_MORE; > >>> } > >>> > >>> - /* TODO: Check specific error and bomb out unless ENOBUFS? > */ > >>> err = sock->ops->sendmsg(sock, &msg, len); > >>> - if (unlikely(err < 0)) { > >>> + if (unlikely(err == -EAGAIN || err == -ENOMEM)) { > >>> vhost_discard_vq_desc(vq, 1); > >>> vhost_net_enable_vq(net, vq); > >>> break; > >>> } > >>> - if (err != len) > >>> + if (err >= 0 && err != len) > >>> pr_debug("Truncated TX packet: len %d != %zd\n", > >>> err, len); > >>> done: > >>> @@ -922,7 +921,6 @@ static void handle_tx_zerocopy(struct vhost_net > >> *net, struct socket *sock) > >>> msg.msg_flags &= ~MSG_MORE; > >>> } > >>> > >>> - /* TODO: Check specific error and bomb out unless ENOBUFS? > */ > >>> err = sock->ops->sendmsg(sock, &msg, len); > >>> if (unlikely(err < 0)) { > >>> if (zcopy_used) { > >>> @@ -931,11 +929,13 @@ static void handle_tx_zerocopy(struct > >>> vhost_net > >> *net, struct socket *sock) > >>> nvq->upend_idx = > >>> ((unsigned)nvq->upend_idx - 1) > >>> % UIO_MAXIOV; > >>> } > >>> - vhost_discard_vq_desc(vq, 1); > >>> - vhost_net_enable_vq(net, vq); > >>> - break; > >>> + if (err == -EAGAIN || err == -ENOMEM) { > >>> + vhost_discard_vq_desc(vq, 1); > >>> + vhost_net_enable_vq(net, vq); > >>> + break; > >>> + } > >>> } > >>> - if (err != len) > >>> + if (err >= 0 && err != len) > >>> pr_debug("Truncated TX packet: " > >>> " len %d != %zd\n", err, len); > >>> if (!zcopy_used)