On 03.04.2019 21:18, Frederick Lawler wrote:
> Heiner Kallweit wrote on 4/3/19 12:45 PM:
>> On 03.04.2019 15:14, Bjorn Helgaas wrote:
>>> [+cc Frederick]
>>>
>>> On Wed, Apr 03, 2019 at 07:53:40AM +0200, Heiner Kallweit wrote:
>>>> On 02.04.2019 23:57, Bjorn Helgaas wrote:
>>>>> On Tue, Apr 02, 2019 at 10:41:20PM +0200, Heiner Kallweit wrote:
>>>>>> On 02.04.2019 22:16, Florian Fainelli wrote:
>>>>>>> On 4/2/19 12:55 PM, Heiner Kallweit wrote:
>>>>>>>> There are numerous reports about different problems caused by ASPM
>>>>>>>> incompatibilities between certain network chip versions and board
>>>>>>>> chipsets. On the other hand on (especially mobile) systems where ASPM
>>>>>>>> works properly it can significantly contribute to power-saving and
>>>>>>>> increased battery runtime.
>>>>>>>> One problem so far to make ASPM configurable was to find an acceptable
>>>>>>>> way of configuration (e.g. module parameters are discouraged for that
>>>>>>>> purpose).
>>>>>>>>
>>>>>>>> As a new attempt let's switch off ASPM per default and make it
>>>>>>>> configurable by a sysfs attribute. The attribute is documented in
>>>>>>>> new file Documentation/networking/device_drivers/realtek/r8169.txt.
>>>>>
>>>>> Both module parameters and sysfs attributes are a poor user
>>>>> experience.  It's very difficult for users to figure out that
>>>>> a tweak is needed.
>>>>>
>>>>>>> I am not sure this is where it should be solved, there is
>>>>>>> definitively a device specific aspect to properly supporting the
>>>>>>> enabling of ASPM L0s, L1s etc, but the actual sysfs knobs should
>>>>>>> belong to the pci_device itself, since this is something that
>>>>>>> likely other drivers would want to be able to expose. You would
>>>>>>> probably want to work with the PCI maintainers to come up with a
>>>>>>> standard solution that applies beyond just r8169 since presumably
>>>>>>> there must be a gazillion of devices with the same issues.
>>>>>
>>>>> The Linux PCI core support for ASPM is poor.  Without more details,
>>>>> it's impossible to tell whether these issues are hardware or firmware
>>>>> defects on the device itself, or something that Linux is doing wrong.
>>>>> There are several known defects, especially related to L1 substates
>>>>> and hotplug.
>>>>>
>>>> The vendor refuses to release datasheets or errata. Only certain
>>>> combinations of board chipsets (and maybe BIOS versions) and network
>>>> chip versions (from the ~ 50 supported by the driver) seem to be
>>>> affected. One typical symptom is missed RX packets, maybe the RX FIFO
>>>> isn't big enough to buffer all packets until PCIe has woken up.
>>>> The Windows vendor driver uses a hack, they dynamically disable ASPM
>>>> under load.
>>>
>>> I'm not super sympathetic to vendors like that or to OEMs that work
>>> with them.  If we can make the NIC work reliably by disabling ASPM,
>>> that's step one.  If we can figure out how to extend battery life by
>>> enabling ASPM in some cases, great, but we have to be careful to do it
>>> in a way that is supportable and doesn't generate lots of user
>>> complaints that require debugging.
>>>
>>> That said, I think Frederick has already started working on a plan for
>>> the PCI core to expose sysfs files to manage ASPM.  This is similar to
>>> the link_state files enabled by CONFIG_PCIEASPM_DEBUG, but it will be
>>> always enabled and probably structured slightly differently.  The idea
>>> is that this would be generic and would not require any driver
>>> support.
>>>
>> Thanks, Bjorn!
>> Frederick, is there anything you could share already? Or any timeline?
> 
> Heiner,
> 
> I've been on hold for this pet project for a bit. If you're willing to take 
> this one on, all the merrier :)
> 
> The plan is the sysfs files would sit on the endpoint devices, and given a 
> supported state value, configure the link(s) for the endpoints upstream port 
> to its immediate root port or switch downstream port. Then, this allows 
> something else handle the walking of the tree via the ABI. We don't 
> necessarily want to configure sibling links under a switch.
> 
> I'm currently traveling, and I can't get you patches until this weekend.
> 
Thanks for the update. I'm not in a hurry, send me whatever you have already 
once you have time.
Then I can check whether I think I can move forward with it.

>> Based on Bjorns info what seems to be best to me:
>> 1. Disable ASPM for r8169 on stable (back to 4.19).
>> 2. Once the generic ASPM sysfs attributes are available, reenable ASPM
>>     for r8169 in net-next.
>>
>>> Bjorn
>>>
>> Heiner
>>
> 
> Frederick Lawler
> 
> 
Heiner

Reply via email to