On 08/23/2017 07:23 PM, Florian Fainelli wrote: > On 08/23/2017 05:43 PM, Eric Dumazet wrote: >> On Wed, 2017-08-23 at 17:03 -0700, Florian Fainelli wrote: >> >>> Attached is the perf report --stdio of: >>> >>> # perf record -a -g -e skb:kfree_skb iperf -c 192.168.1.23 -b 800M -t 60 >>> WARNING: option -b implies udp testing >>> ------------------------------------------------------------ >>> Client connecting to 192.168.1.23, UDP port 5001 >>> Sending 1470 byte datagrams >>> UDP buffer size: 160 KByte (default) >>> ------------------------------------------------------------ >>> [ 4] local 192.168.1.66 port 36169 connected with 192.168.1.23 port 5001 >>> [ ID] Interval Transfer Bandwidth >>> [ 4] 0.0-60.0 sec 685 MBytes 95.8 Mbits/sec >>> [ 4] Sent 488633 datagrams >>> [ 4] Server Report: >>> [ 4] 0.0-74.4 sec 685 MBytes 77.2 Mbits/sec 0.187 ms 300/488632 >>> (0.061%) >>> [ 4] 0.0-74.4 sec 58 datagrams received out-of-order >>> >>> It measured 488644 events which is greater than the number of packets >>> transmitted by iperf but I only count 8 non-IP packets being sent >>> (protocl=2054 -> ETH_P_ARP) so I am not sure what the other 4 were and >>> why there are not accounted for. >>> >>> Almost all events came from net_tx_action() except 2 that came from >>> net/core/neighbour.c::neigh_probe() and 65 that came from >>> arch/arm/include/asm/irqflags.h::arch_local_irq_save() ?!? >> >> Oh you have too much noise and need this fix : >> >> diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c >> b/drivers/net/ethernet/broadcom/bcmsysport.c >> index dc3052751bc1..fcfa8d991850 100644 >> --- a/drivers/net/ethernet/broadcom/bcmsysport.c >> +++ b/drivers/net/ethernet/broadcom/bcmsysport.c >> @@ -597,7 +597,7 @@ static int bcm_sysport_set_coalesce(struct net_device >> *dev, >> >> static void bcm_sysport_free_cb(struct bcm_sysport_cb *cb) >> { >> - dev_kfree_skb_any(cb->skb); >> + dev_consume_skb_any(cb->skb); >> cb->skb = NULL; >> dma_unmap_addr_set(cb, dma_addr, 0); >> } >> > > Yay, now I am getting some sensible information, thanks! > > iperf reports 143 packets lost, and perf report gets me 146 entries for > kfree_skb() looking like that: > # > 2.74% 2.74% skbaddr=0xee5f30c0 protocol=2048 location=0xc0855cdc > | > ---0x2d > write > 0xe9fc2368 > kfree_skb > > > What is annoying is that 0xc0855cdc is resolved to > arch/arm/include/asm/irqflags.h::arch_local_irq_save() which does not > really help telling me where exactly in the stack the drop is happening, > although now I know it is somewhere down the path from write(2)... of > course it is :) > > Now what is disturbing too is that iperf does have its write() system > call get an error returned, write happily returned the number of bytes > written even though the perf trace indicates there was packet drops down > the road..
Because perf was not able to track down exactly where the location was with builtin_return_address(0) and always pointed to arch_local_irq_save(), I added a WARN() in kfree_skb(), and of course this made it impossible to see packet loss under the same conditions anymore. After increasing dramatically wmem_{default,max} I could start seeing significant drop made largely worse by the printks, so not very conclusive either. I took another approach and recorded net_dev_queue() events, and I can see only 976704 (gphy + eth0 devices) / 2 = 488352 dev_queue_events() were recorded. The total sent by iperf was 488898 and we lost 558 packets according to the server 488352 + 558 > 488898 which makes sense because neigh_probe() and others still run. This still confirms there is a drop occurring between UDP socket sendmsg and the SKB enqueuing... So my take away is that the CPU has a bursty write(2) behavior, and the larger the socket write buffer size, the more datagrams you can burst into and the more of these SKBs can be flat out being dropped when the transmit queue is congested. Adding a WARN() is enough of a slow down for the CPU that we are not able to reliably reproduce this burst. Eric, are there areas of the stack where we are allowed to drop packets, not propagate that back to write(2) and also not increment any counter either, or maybe I am not looking where I should... Thanks! -- Florian