Hi Heiner, I tried disabling the ASPM using the pcie_aspm=off kernel parameter and this made no difference.
I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and subsequently loaded the module in the running 4.19.18 kernel. I can confirm that this immediately resolved the issue and access to the NFS shares operated as expected. I presume this means it is an issue with the r8169 driver included in 4.19 onwards? To answer your last questions: Base Board Information Manufacturer: Alienware Product Name: 0PGRP5 Version: A02 ... and yes, the RTL8168 is the onboard network chip. Regards, Peter. On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallwe...@gmail.com> wrote: > > Hi Peter, > > I think the vendor driver doesn't enable ASPM per default. > So it's worth a try to disable ASPM in the BIOS or via sysfs. > Few older systems seem to have issues with ASPM, what kind of > system / mainboard are you using? The RTL8168 is the onboard > network chip? > > Rgds, Heiner > > > On 29.01.2019 07:20, Peter Ceiley wrote: > > Hi Heiner, > > > > Thanks, I'll do some more testing. It might not be the driver - I > > assumed it was due to the fact that using the r8168 driver 'resolves' > > the issue. I'll see if I can test the r8169.c on top of 4.19 - this is > > a good idea. > > > > Cheers, > > > > Peter. > > > > On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallwe...@gmail.com> wrote: > >> > >> Hi Peter, > >> > >> at a first glance it doesn't look like a typical driver issue. > >> What you could do: > >> > >> - Test the r8169.c from 4.18 on top of 4.19. > >> > >> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect. > >> > >> - Bisect between 4.18 and 4.19 to find the offending commit. > >> > >> Any specific reason why you think root cause is in the driver and not > >> elsewhere in the network subsystem? > >> > >> Heiner > >> > >> > >> On 28.01.2019 23:10, Peter Ceiley wrote: > >>> Hi Heiner, > >>> > >>> Thanks for getting back to me. > >>> > >>> No, I don't use jumbo packets. > >>> > >>> Bandwidth is *generally* good, and iperf results to my NAS provide > >>> over 900 Mbits/s in both circumstances. The issue seems to appear when > >>> establishing a connection and is most notable, for example, on my > >>> mounted NFS shares where it takes seconds (up to 10's of seconds on > >>> larger directories) to list the contents of each directory. Once a > >>> transfer begins on a file, I appear to get good bandwidth. > >>> > >>> I'm unsure of the best scientific data to provide you in order to > >>> troubleshoot this issue. Running the following > >>> > >>> netstat -s |grep retransmitted > >>> > >>> shows a steady increase in retransmitted segments each time I list the > >>> contents of a remote directory, for example, running 'ls' on a > >>> directory containing 345 media files did the following using kernel > >>> 4.19.18: > >>> > >>> increased retransmitted segments by 21 and the 'time' command showed > >>> the following: > >>> real 0m19.867s > >>> user 0m0.012s > >>> sys 0m0.036s > >>> > >>> The same command shows no retransmitted segments running kernel > >>> 4.18.16 and 'time' showed: > >>> real 0m0.300s > >>> user 0m0.004s > >>> sys 0m0.007s > >>> > >>> ifconfig does not show any RX/TX errors nor dropped packets in either > >>> case. > >>> > >>> dmesg XID: > >>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g, > >>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32 > >>> > >>> # lspci -vv > >>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. > >>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c) > >>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet > >>> Controller > >>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > >>> ParErr- Stepping- SERR- FastB2B- DisINTx+ > >>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > >>> <TAbort- <MAbort- >SERR- <PERR- INTx- > >>> Latency: 0, Cache Line Size: 64 bytes > >>> Interrupt: pin A routed to IRQ 19 > >>> Region 0: I/O ports at d000 [size=256] > >>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K] > >>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K] > >>> Capabilities: [40] Power Management version 3 > >>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA > >>> PME(D0+,D1+,D2+,D3hot+,D3cold+) > >>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- > >>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ > >>> Address: 0000000000000000 Data: 0000 > >>> Capabilities: [70] Express (v2) Endpoint, MSI 01 > >>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s > >>> <512ns, L1 <64us > >>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- > >>> SlotPowerLimit 10.000W > >>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- > >>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- > >>> MaxPayload 128 bytes, MaxReadReq 4096 bytes > >>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ > >>> TransPend- > >>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit > >>> Latency L0s unlimited, L1 <64us > >>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ > >>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+ > >>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- > >>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok) > >>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > >>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, > >>> OBFF Via message/WAKE# > >>> AtomicOpsCap: 32bit- 64bit- 128bitCAS- > >>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, > >>> OBFF Disabled > >>> AtomicOpsCtl: ReqEn- > >>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- > >>> Transmit Margin: Normal Operating Range, > >>> EnterModifiedCompliance- ComplianceSOS- > >>> Compliance De-emphasis: -6dB > >>> LnkSta2: Current De-emphasis Level: -6dB, > >>> EqualizationComplete-, EqualizationPhase1- > >>> EqualizationPhase2-, EqualizationPhase3-, > >>> LinkEqualizationRequest- > >>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked- > >>> Vector table: BAR=4 offset=00000000 > >>> PBA: BAR=4 offset=00000800 > >>> Capabilities: [d0] Vital Product Data > >>> pcilib: sysfs_read_vpd: read failed: Input/output error > >>> Not readable > >>> Capabilities: [100 v1] Advanced Error Reporting > >>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- > >>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > >>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- > >>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > >>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- > >>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > >>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ > >>> AdvNonFatalErr- > >>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- > >>> AdvNonFatalErr+ > >>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- > >>> ECRCChkCap+ ECRCChkEn- > >>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- > >>> HeaderLog: 00000000 00000000 00000000 00000000 > >>> Capabilities: [140 v1] Virtual Channel > >>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 > >>> Arb: Fixed- WRR32- WRR64- WRR128- > >>> Ctrl: ArbSelect=Fixed > >>> Status: InProgress- > >>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- > >>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- > >>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 > >>> Status: NegoPending- InProgress- > >>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00 > >>> Capabilities: [170 v1] Latency Tolerance Reporting > >>> Max snoop latency: 71680ns > >>> Max no snoop latency: 71680ns > >>> Kernel driver in use: r8169 > >>> Kernel modules: r8169 > >>> > >>> Please let me know if you have any other ideas in terms of testing. > >>> > >>> Thanks! > >>> > >>> Peter. > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallwe...@gmail.com> > >>> wrote: > >>>> > >>>> On 28.01.2019 12:13, Peter Ceiley wrote: > >>>>> Hi, > >>>>> > >>>>> I have been experiencing very poor network performance since Kernel > >>>>> 4.19 and I'm confident it's related to the r8169 driver. > >>>>> > >>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing > >>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with > >>>>> 4.20.4 & 4.19.18). > >>>>> > >>>>> If someone could guide me in the right direction, I'm happy to help > >>>>> troubleshoot this issue. Note that I have been keeping an eye on one > >>>>> issue related to loading of the PHY driver, however, my symptoms > >>>>> differ in that I still have a network connection. I have attempted to > >>>>> reload the driver on a running system, but this does not improve the > >>>>> situation. > >>>>> > >>>>> Using the proprietary r8168 driver returns my device to proper working > >>>>> order. > >>>>> > >>>>> lshw shows: > >>>>> description: Ethernet interface > >>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet > >>>>> Controller > >>>>> vendor: Realtek Semiconductor Co., Ltd. > >>>>> physical id: 0 > >>>>> bus info: pci@0000:03:00.0 > >>>>> logical name: enp3s0 > >>>>> version: 0c > >>>>> serial: > >>>>> size: 1Gbit/s > >>>>> capacity: 1Gbit/s > >>>>> width: 64 bits > >>>>> clock: 33MHz > >>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list > >>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd > >>>>> 1000bt-fd autonegotiation > >>>>> configuration: autonegotiation=on broadcast=yes driver=r8169 > >>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25 > >>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s > >>>>> resources: irq:19 ioport:d000(size=256) > >>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff > >>>>> > >>>>> Kind Regards, > >>>>> > >>>>> Peter. > >>>>> > >>>> Hi Peter, > >>>> > >>>> the description "poor network performance" is quite vague, therefore: > >>>> > >>>> - Can you provide any measurements? > >>>> - iperf results before and after > >>>> - statistics about dropped packets (rx and/or tx) > >>>> - Do you use jumbo packets? > >>>> > >>>> Also help would be a "lspci -vv" output for the network card and > >>>> the dmesg output line with the chip XID. > >>>> > >>>> Heiner > >>> > >> > > >