On Fri, 13 Nov 2020 at 23:43, Andrew Lunn <and...@lunn.ch> wrote: > > Hi Arnd > > > Something of that sort. I also see a similar patch in KSZ9031 > > now, see 7dd8f0ba88fc ("arm: dts: imx6qdl-udoo: fix rgmii phy-mode > > for ksz9031 phy") > > > > As this exact mismatch between rgmii and rgmii-id mode is apparently > > a more widespread problem, the best workaround I can think of > > is that we redefine the phy-mode="rgmii" property to actually mean > > "use rgmii mode and let the phy driver decide the delay configuration", > > The problem is, the PHY driver has no idea what the delay > configuration should be. That is the whole point of the DT property. > > The MAC and the PHY have to work together to ensure one of them > inserts the delay. In most cases, the MAC driver reads the property > and passes it unmodified to the PHY. The PHY then does what it is > told. In some cases, the MAC decides to add the delay, it changes the > rgmii-id to rgmii before passing it onto the PHY. The PHY does as it > is told, and it works. And a very small number of boards simply have > longer clock lines than signal lines, so the PCB adds the delay. It is > not clearly defined how that should be described in DT, but it works > so far because most MAC drivers don't add delays, pass the 'rgmii' > from DT to the PHY and it does as it is told and does not add delays. > > There is one more case, which is not used very often. The PHY is > passed the NA values, which means, don't touch, something else has set > it up. So when the straps are doing the correct thing, you could pass > NA. However, some MAC drivers look at the phy mode, see it is one of > the 4 rgmii modes, and configure their end to rgmii, instead of gmii, > mii, sgmii, etc. How networking does ACPI is still very undefined, but > i think we need to push for ACPI to pass NA, and the firmware does all > the setup. That seems to be ACPI way. > > > with a new string to mean specifically rgmii mode with no delay. > > As you said, we have phy-mode="rgmii" 235 times. How many of those are > going to break when you change the definition of rgmii? I have no > idea, but my gut feeling is more than the number of boards which are > currently broken because of the problem with this PHY. > > And, as i said above, some MAC drivers look for one of the 4 RGMII > modes in order to configure their side. If you add a new string, you > need to review all the MAC drivers and make sure they check for all 5 > strings, not 4. Some of that is easy, modify > phy_interface_mode_is_rgmii(), but not all MAC use it, and it is no > help in a switch statement. > > And we are potentially going to get into the same problem > again. History has shown, we cannot get 4 properties right. Do you > think we will do any better getting 5 properties right? Especially > when phy-mode="rgmii" does not mean rgmii, but do whatever you think > might be correct? > > Having suffered the pain from the Atheros PHY, this is something i > review much more closely, so hopefully we are getting better at > this. But PHY drivers live for a long time, ksz9031 was added 7 years > ago, well before we started looking closely at delays. I expect more > similar problems will keep being found over the next decade. > > To some extent, we actually need DT writers to properly test their > DT. If both rgmii and rgmii-id works, there is a 50% chance whatever > they pick is wrong. And it would be nice if they told the networking > people so we can fix the PHY. >
One question that still has not been answered is how many actual platforms were fixed by backporting Realtek's follow up fix to -stable. My suspicion is none. That by itself should be enough justification to revert the backport of that change. I do agree that we should fix this properly going forward, and if we do manage to fix this in a backwards compatible way, we should backport that fix. But letting the current situation exist because nobody can be bothered to fix it properly is not the right solution IMHO.