Hi Heiner, Thanks for your response. Request info follows..
> > Hi, after updating to kernel 5.0, the nic driver (r8169) has been > > crashing whenever I start using heavy traffic on it (for example, > > xferring large files to the box across my lan). The destination > > harddrive may be sleeping and need to spin-up, or not, but the box > > itself does not suspend/hibernate. The nic becomes completely > > unresponsive and all connections to the box drop. After what I think > > is several minutes, the connection comes back to life. The problem > > happens consistently but seemingly not consistently at the same point. > > For example, I can xfer a few 4gb files and it will crash at around > > 2-3gb on the first file. The next time it might not crash until 2-3gb > > on the second file.Prior to kernel 5.0 I was using 4.19.12 and this > > problem didn't occur. I have since downgraded back to 4.19.12 pending > > what response this post gets. > > > Thanks for the report. Helpful would be: > - full dmesg output Added as attachment. > - "lspci -vv" output (as root) for the network card 04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06) Subsystem: Elitegroup Computer Systems RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 17 Region 0: I/O ports at c000 [size=256] Region 2: Memory at d0004000 (64-bit, prefetchable) [size=4K] Region 4: Memory at d0000000 (64-bit, prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [70] Express (v2) Endpoint, MSI 01 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 4096 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [b0] MSI-X: Enable+ Count=4 Masked- Vector table: BAR=4 offset=00000000 PBA: BAR=4 offset=00000800 Capabilities: [d0] Vital Product Data Not readable Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [140 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 Status: NegoPending- InProgress- Capabilities: [160 v1] Device Serial Number 40-01-00-00-68-4c-e0-00 Kernel driver in use: r8169 > - ethtool -S <if> output Unfortunately I just realized I did this _after_ ifdown/ifup'ing the nic to get it back online so this is probably useless but I'll include it anyway. If I get it to crash again I'll try to remember to get this before restarting the nic: NIC statistics: tx_packets: 8844844 rx_packets: 23550316 tx_errors: 0 rx_errors: 0 rx_missed: 13 align_errors: 0 tx_single_collisions: 0 tx_multi_collisions: 0 unicast: 23544796 broadcast: 4420 multicast: 1100 tx_aborted: 0 tx_underrun: 0 > Can you test a recent 4.20 kernel? This would narrow down the number > of potentially problematic patches. I compiled and test 4.20.15 and didn't experience any crashing. I then switched back to 5.0.0 and this time I had to transfer significantly more until the crash occured. I'm not sure but it seems like the crashes happen when there's both outgoing & incoming traffic simultaneously. Is the dmesg crash info helpful at all? Thanks, Derek
crash.dmesg.log
Description: Binary data