On Mon, 2018-09-03 at 10:36 +0100, Jose Abreu wrote: > Hi Jerome, > > On 03-09-2018 09:56, Jerome Brunet wrote: > > On Thu, 2018-08-30 at 11:37 +0100, Jose Abreu wrote: > > > [ As for now this is only for testing! ] > > > > > > This follows David Miller advice and tries to fix coalesce timer in > > > multi-queue scenarios. > > > > > > We are now using per-queue coalesce values and per-queue TX timer. This > > > assumes that tx_queues == rx_queues, which can not be necessarly true. > > > Official patch will need to have this fixed. > > > > > > Coalesce timer default values was changed to 1ms and the coalesce frames > > > to 25. > > > > > > Tested in B2B setup between XGMAC2 and GMAC5. > > > > Tested on Amlogic meson-axg-s400. No regression seen so far. > > (arch/arm64/boot/dts/amlogic/meson-axg-s400.dts) > > > > As far as I understand from the device tree parsing, this platform (and all > > other amlogic platforms) use single queue. > > Thanks for testing! I will send a formal patch once I get around > the problem of rx queues != tx queues. > > > > > --- > > > > Jose, > > > > On another topic doing iperf3 test on amlogic's devices we seen a strange > > behavior. > > > > Doing Tx or Rx test usually works fine (700MBps to 900MBps depending on the > > platform). However, when doing both Rx and Tx at the same time, We see the > > Tx > > throughput dropping significantly (~30MBps) and lot of TCP retries. > > > > Would you any idea what might be our problem ? or how to start investigating > > this ? > > > > I'm not able to reproduce this here but I'm using multiple queue. > I will try with single queue. In the meantime please try this > patch (it shall be applied directly on top of this RFT):
No notable change. Rx is fine but Tx: [ 5] 3.00-4.00 sec 3.55 MBytes 29.8 Mbits/sec 51 12.7 KBytes I suppose the problem as something to do with the retries. When doing Tx test alone, we don't have such a things a throughput where we expect it to be. By the way, your mailer (and its auto 80 column rule I suppose) made the patch below a bit harder to apply > > > --->8 > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > index ae26a6e8608e..1407975320aa 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > @@ -2210,8 +2210,7 @@ static int stmmac_init_dma_engine(struct > stmmac_priv *priv) > stmmac_init_tx_chan(priv, priv->ioaddr, > priv->plat->dma_cfg, > tx_q->dma_tx_phy, chan); > > - tx_q->tx_tail_addr = tx_q->dma_tx_phy + > - (DMA_TX_SIZE * sizeof(struct dma_desc)); > + tx_q->tx_tail_addr = tx_q->dma_tx_phy; > stmmac_set_tx_tail_ptr(priv, priv->ioaddr, > tx_q->tx_tail_addr, chan); > } > @@ -3004,6 +3003,7 @@ static netdev_tx_t stmmac_tso_xmit(struct > sk_buff *skb, struct net_device *dev) > > netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue), > skb->len); > > + tx_q->tx_tail_addr = tx_q->dma_tx_phy + (tx_q->cur_tx * > sizeof(*desc)); > stmmac_set_tx_tail_ptr(priv, priv->ioaddr, > tx_q->tx_tail_addr, queue); > > if (priv->tx_coal_timer && !tx_q->tx_timer_active) { > @@ -3223,6 +3223,8 @@ static netdev_tx_t stmmac_xmit(struct > sk_buff *skb, struct net_device *dev) > netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue), > skb->len); > > stmmac_enable_dma_transmission(priv, priv->ioaddr); > + > + tx_q->tx_tail_addr = tx_q->dma_tx_phy + (tx_q->cur_tx * > sizeof(*desc)); > stmmac_set_tx_tail_ptr(priv, priv->ioaddr, > tx_q->tx_tail_addr, queue); > > if (priv->tx_coal_timer && !tx_q->tx_timer_active) { > --->8 > > Thanks and Best Regards, > Jose Miguel Abreu