On 03.04.2019 21:18, Frederick Lawler wrote: > Heiner Kallweit wrote on 4/3/19 12:45 PM: >> On 03.04.2019 15:14, Bjorn Helgaas wrote: >>> [+cc Frederick] >>> >>> On Wed, Apr 03, 2019 at 07:53:40AM +0200, Heiner Kallweit wrote: >>>> On 02.04.2019 23:57, Bjorn Helgaas wrote: >>>>> On Tue, Apr 02, 2019 at 10:41:20PM +0200, Heiner Kallweit wrote: >>>>>> On 02.04.2019 22:16, Florian Fainelli wrote: >>>>>>> On 4/2/19 12:55 PM, Heiner Kallweit wrote: >>>>>>>> There are numerous reports about different problems caused by ASPM >>>>>>>> incompatibilities between certain network chip versions and board >>>>>>>> chipsets. On the other hand on (especially mobile) systems where ASPM >>>>>>>> works properly it can significantly contribute to power-saving and >>>>>>>> increased battery runtime. >>>>>>>> One problem so far to make ASPM configurable was to find an acceptable >>>>>>>> way of configuration (e.g. module parameters are discouraged for that >>>>>>>> purpose). >>>>>>>> >>>>>>>> As a new attempt let's switch off ASPM per default and make it >>>>>>>> configurable by a sysfs attribute. The attribute is documented in >>>>>>>> new file Documentation/networking/device_drivers/realtek/r8169.txt. >>>>> >>>>> Both module parameters and sysfs attributes are a poor user >>>>> experience. It's very difficult for users to figure out that >>>>> a tweak is needed. >>>>> >>>>>>> I am not sure this is where it should be solved, there is >>>>>>> definitively a device specific aspect to properly supporting the >>>>>>> enabling of ASPM L0s, L1s etc, but the actual sysfs knobs should >>>>>>> belong to the pci_device itself, since this is something that >>>>>>> likely other drivers would want to be able to expose. You would >>>>>>> probably want to work with the PCI maintainers to come up with a >>>>>>> standard solution that applies beyond just r8169 since presumably >>>>>>> there must be a gazillion of devices with the same issues. >>>>> >>>>> The Linux PCI core support for ASPM is poor. Without more details, >>>>> it's impossible to tell whether these issues are hardware or firmware >>>>> defects on the device itself, or something that Linux is doing wrong. >>>>> There are several known defects, especially related to L1 substates >>>>> and hotplug. >>>>> >>>> The vendor refuses to release datasheets or errata. Only certain >>>> combinations of board chipsets (and maybe BIOS versions) and network >>>> chip versions (from the ~ 50 supported by the driver) seem to be >>>> affected. One typical symptom is missed RX packets, maybe the RX FIFO >>>> isn't big enough to buffer all packets until PCIe has woken up. >>>> The Windows vendor driver uses a hack, they dynamically disable ASPM >>>> under load. >>> >>> I'm not super sympathetic to vendors like that or to OEMs that work >>> with them. If we can make the NIC work reliably by disabling ASPM, >>> that's step one. If we can figure out how to extend battery life by >>> enabling ASPM in some cases, great, but we have to be careful to do it >>> in a way that is supportable and doesn't generate lots of user >>> complaints that require debugging. >>> >>> That said, I think Frederick has already started working on a plan for >>> the PCI core to expose sysfs files to manage ASPM. This is similar to >>> the link_state files enabled by CONFIG_PCIEASPM_DEBUG, but it will be >>> always enabled and probably structured slightly differently. The idea >>> is that this would be generic and would not require any driver >>> support. >>> >> Thanks, Bjorn! >> Frederick, is there anything you could share already? Or any timeline? > > Heiner, > > I've been on hold for this pet project for a bit. If you're willing to take > this one on, all the merrier :) > > The plan is the sysfs files would sit on the endpoint devices, and given a > supported state value, configure the link(s) for the endpoints upstream port > to its immediate root port or switch downstream port. Then, this allows > something else handle the walking of the tree via the ABI. We don't > necessarily want to configure sibling links under a switch. > > I'm currently traveling, and I can't get you patches until this weekend. > Thanks for the update. I'm not in a hurry, send me whatever you have already once you have time. Then I can check whether I think I can move forward with it.
>> Based on Bjorns info what seems to be best to me: >> 1. Disable ASPM for r8169 on stable (back to 4.19). >> 2. Once the generic ASPM sysfs attributes are available, reenable ASPM >> for r8169 in net-next. >> >>> Bjorn >>> >> Heiner >> > > Frederick Lawler > > Heiner