Consider an skb which doesn't match a ptype_base/ptype_specific handler. If this skb is delivered to a ptype_all handler, it does not count as a drop. However, if the skb is also processed by an rx_handler which returns RX_HANDLER_PASS, the frame is now counted as a drop because pt_prev was reset. An example of this situation is an LLDP frame received on a bridge port while lldpd is listening on a packet socket with ETH_P_ALL (ex. by specifying `lldpd -c`).
Fix by adding an extra condition variable to record if the skb was delivered to a packet tap before running an rx_handler. The situation is similar for RX_HANDLER_EXACT frames so their accounting is also changed. OTOH, the behavior is unchanged for RX_HANDLER_ANOTHER frames - they are accounted according to what happens with the new skb->dev. Fixes: caf586e5f23c ("net: add a core netdev->rx_dropped counter") Signed-off-by: Benjamin Poirier <bpoir...@cumulusnetworks.com> --- The main problem (described above) involves RX_HANDLER_PASS and I think it's clear that it needs fixing but I'm wondering if there are different views on what the behavior should be in related cases. For RX_HANDLER_ANOTHER, Considering an example with bonding, currently, with a similar use case as described for the bridge, an LLDP frame received on a bond slave is counted as rx_dropped on the bond master, even if it was delivered to a ptype_all tap bound to the slave. That seems a bit iffy to me but kind of fair because it's counted as a drop _on the master_. For RX_HANDLER_EXACT, Also considering an example with bonding, currently, a unicast frame received on a backup bond slave is counted as rx_nohandler on the slave, even if it was delivered to a ptype_all tap. I'd say that's not correct so the patch is changing that too. Also, it looks to me like a better fix for these issues would be for rx_handlers to avoid spuriously unsharing skbs in cases where they are not trying to enqueue them (cf. commit 7b995651e373 "[BRIDGE]: Unshare skb upon entry") and to return something like a pt_prev func in other cases so that copies can be deferred. That looks like quite a bit more work though. net/core/dev.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index f36bd3b21997..13ff1933e791 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5061,10 +5061,10 @@ static inline int nf_ingress(struct sk_buff *skb, struct packet_type **pt_prev, static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc, struct packet_type **ppt_prev) { + bool deliver_exact = false, rx_tapped = false; struct packet_type *ptype, *pt_prev; rx_handler_func_t *rx_handler; struct net_device *orig_dev; - bool deliver_exact = false; int ret = NET_RX_DROP; __be16 type; @@ -5155,12 +5155,14 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc, if (pt_prev) { ret = deliver_skb(skb, pt_prev, orig_dev); pt_prev = NULL; + rx_tapped = true; } switch (rx_handler(&skb)) { case RX_HANDLER_CONSUMED: ret = NET_RX_SUCCESS; goto out; case RX_HANDLER_ANOTHER: + rx_tapped = false; goto another_round; case RX_HANDLER_EXACT: deliver_exact = true; @@ -5231,11 +5233,13 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc, goto drop; *ppt_prev = pt_prev; } else { + if (!rx_tapped) { drop: - if (!deliver_exact) - atomic_long_inc(&skb->dev->rx_dropped); - else - atomic_long_inc(&skb->dev->rx_nohandler); + if (!deliver_exact) + atomic_long_inc(&skb->dev->rx_dropped); + else + atomic_long_inc(&skb->dev->rx_nohandler); + } kfree_skb(skb); /* Jamal, now you will not able to escape explaining * me how you were going to use this. :-) -- 2.26.2