On 16.03.2019 15:38, VDR User wrote: >> Part of the issue though is that we don't know how reliable that test >> was. I believe Derek he hasn't had any crashes, but he wasn't confident >> that it had actually resolved the issue. > > Previously I thought I could easily & consistently reproduce the crash > but the more testing I did, the more I realized that wasn't the case. > That's why my confidence was low in that reversing commit 5317d5c6d47e > ("r8169: use napi_consume_skb where possible") fixed it. I felt like I > needed to do a lot more testing over the weekend to be sure. But, I > can now confirm that reversing that commit did not solve the problem. > I didn't ifdown/ifup after the crash so the nic eventually recovered > on its own I guess. The `ethtool -S` output is: > > NIC statistics: > tx_packets: 5370650 > rx_packets: 57340787 > tx_errors: 0 > rx_errors: 0 > rx_missed: 26 > align_errors: 0 > tx_single_collisions: 0 > tx_multi_collisions: 0 > unicast: 57332905 > broadcast: 6409 > multicast: 1473 > tx_aborted: 0 > tx_underrun: 0 > [...] > > Please let me know if there's anything I can do to help. > Derek > Below are two patches. The first removes an extra PCI register read in the interrupt handler, the second one just adds some tracing for debugging.
Interesting would be whether patch 1 has an impact on the issue, and the trace output of patch 2 after the issue occurred (w/ or w/o patch 1). Trace output you find in /sys/kernel/debug/tracing/trace Heiner diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 761097710..46a4dc888 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -679,6 +679,7 @@ struct rtl8169_private { struct work_struct work; } wk; + unsigned irq_enabled:1; unsigned supports_gmii:1; dma_addr_t counters_phys_addr; struct rtl8169_counters *counters; @@ -1294,6 +1295,7 @@ static void rtl_ack_events(struct rtl8169_private *tp, u16 bits) static void rtl_irq_disable(struct rtl8169_private *tp) { RTL_W16(tp, IntrMask, 0); + tp->irq_enabled = 0; } #define RTL_EVENT_NAPI_RX (RxOK | RxErr) @@ -1302,6 +1304,7 @@ static void rtl_irq_disable(struct rtl8169_private *tp) static void rtl_irq_enable(struct rtl8169_private *tp) { + tp->irq_enabled = 1; RTL_W16(tp, IntrMask, tp->irq_mask); } @@ -6521,9 +6524,8 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance) { struct rtl8169_private *tp = dev_instance; u16 status = RTL_R16(tp, IntrStatus); - u16 irq_mask = RTL_R16(tp, IntrMask); - if (status == 0xffff || !(status & irq_mask)) + if (!tp->irq_enabled || status == 0xffff || !(status & tp->irq_mask)) return IRQ_NONE; if (unlikely(status & SYSErr)) { -- 2.21.0 diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 46a4dc888..5a40fa6f8 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -6258,6 +6258,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, * not miss a ring update when it notices a stopped queue. */ smp_wmb(); + trace_printk("stopping tx queue\n"); netif_stop_queue(dev); /* Sync with rtl_tx: * - publish queue status and cur_tx ring index (write barrier) @@ -6267,8 +6268,10 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, * can't. */ smp_mb(); - if (rtl_tx_slots_avail(tp, MAX_SKB_FRAGS)) + if (rtl_tx_slots_avail(tp, MAX_SKB_FRAGS)) { + trace_printk("waking tx queue\n"); netif_wake_queue(dev); + } } return NETDEV_TX_OK; @@ -6376,6 +6379,7 @@ static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp, smp_mb(); if (netif_queue_stopped(dev) && rtl_tx_slots_avail(tp, MAX_SKB_FRAGS)) { + trace_printk("waking tx queue\n"); netif_wake_queue(dev); } /* -- 2.21.0