On Sun, 18 Oct 2020 at 12:35, Ard Biesheuvel <a...@kernel.org> wrote: > > On Sun, 18 Oct 2020 at 01:02, Andrew Lunn <and...@lunn.ch> wrote: > > > > On Sun, Oct 18, 2020 at 12:19:25AM +0200, Ard Biesheuvel wrote: > > > (cc'ing some folks who may care about functional networking on SynQuacer) > > > > > > On Sat, 17 Oct 2020 at 21:49, Andrew Lunn <and...@lunn.ch> wrote: > > > > > > > > > So we can fix this firmware by just setting phy-mode to the empty > > > > > string, right? > > > > > > > > I've never actually tried it, but i think so. There are no DT files > > > > actually doing that, so you really do need to test it and see. And > > > > there might be some differences between device_get_phy_mode() and > > > > of_get_phy_mode(). > > > > > > > > > > Yes, that works fine. Fixed now in the latest firmware build [0] > > > > Thanks for reporting back on that. It probably needs to be something > > we always recommend for ACPI systems, either use "", or preferably no > > phy-mode at all, and let the driver default to NA. ACPI and networking > > is at a very early stage at the moment, since UEFA says nothing about > > it. It will take a while before we figure out the best practices, and > > some vendor gets something added to the ACPI specifications. > > > > You mean MDIO topology, right? Every x86 PC and server in the world > uses ACPI, and networking doesn't seem to be a problem there, so it is > simply a matter of choosing the right abstraction level. I know this > is much easier to achieve when all the network controllers are on PCIe > cards, but the point remains valid: exhaustively describing the entire > SoC like we do for DT is definitely not the right choice for ACPI > systems. This also means that ACPI is simply not the right fit for all > designs, and we should push back harder against the tick-the-box > exercises that are going on to use ACPI for describing unusual designs > that are never going to boot RHEL or other ACPI-only distros anyway. > > > > But that still leaves the question whether and how to work around this > > > for units in the field. Ignoring the PHY mode in the driver would > > > help, as all known hardware ships with firmware that configures the > > > PHY with the correct settings, but we will lose the ability to use > > > other PHY modes in the future, in case the SoC is ever used with DT > > > based minimal firmware that does not configure networking. > > > > > > Since ACPI implies rich firmware, we could make ACPI probing of the > > > driver ignore the phy-mode setting in the DSDT. > > > > I'm O.K. with that, for this driver, but please add a comment in the > > code about why ACPI ignores DSDT, because of older broken firmware. > > > > Sure. > > > > But if we don't do the same for DT, it would mean DT users are > > > forced to upgrade their firmware, and hopefully do so before > > > upgrading to a kernel that breaks networking. > > > > Nothing new there. As i said, we have been through this before with > > the Atheros PHY and others. > > > > One option would be to move the DT into the kernel and fix it. Most > > distributions already use the DT version shipped with the kernel, so > > they would automatically get the fixed DT at the same time as the > > kernel which needs the fix. > > > > The DT is not a flat file that you can simply source from elsewhere - > it is constructed at boot using firmware settings and other data that > is different between systems. > > > I do have a question about the history here, btw. As I understand it, > before commit f81dadbcf7fd ("net: phy: realtek: Add rtl8211e rx/tx > delays config"), the phy-mode setting did not have any effect on > Realtek PHYs in the first place, right? Since otherwise, this platform > would never have worked from the beginning. > > So commit f81dadbcf7fd was backported to -stable, even though it > didn't actually work, and had to be fixed in bbc4d71d63549bcd ("net: > phy: realtek: fix rtl8211e rx/tx delay config"), which is the commit > that causes the regression on these boards. > > So why was commit f81dadbcf7fd sent to -stable in the first place? It > doesn't have a fixes tag or cc:stable, and given that it is not > actually correct to begin with, there seems to be little justification > for this. Surely, no platform exists in the field that requires this > commit (as it is broken) or the followup fix (since only was never > taken into account before) > > In summary, I think that - even if we agree that the way forward is to > fix the firmware and make the driver do the right thing without any > quirks - these patches do not belong in -stable, and should be > reverted, unless anyone can point out a system that was working > before, got broken and was fixed again by f81dadbcf7fd (i.e., a true > regression)
Oops - f81dadbcf7fd was never backported, only the fix was. Apologies for mixing that up. However, that leaves the question why bbc4d71d63549bcd was backported, although I understand why the discussion is a bit trickier there. But if it did not fix a regression, only broken code that never worked in the first place, I am not convinced it belongs in -stable.