Hi Russell, On 12/10/2017 08:47 AM, Russell King - ARM Linux wrote: > Guys, > > I've just tripped over a bug with the Micrel PHY driver, but it > really isn't specific to the Micrel PHY driver. > > When we suspend, we suspend the PHY and then the MAC driver (eg, > on the ZII board): > > [ 198.822751] 400d0000.ethernet-1:00: bus : mdio_bus_suspend+0x0/0x34 > [ 198.822859] __mdiobus_read: 117 400d0000.ethernet-1 00 00 => 3100 > [ 198.822878] __mdiobus_write: 117 400d0000.ethernet-1 00 00 <= 3900 > ... > [ 198.826235] 400d0000.ethernet: bus : platform_pm_freeze+0x0/0x5c > [ 198.826354] __mdiobus_read: 117 400d0000.ethernet-1 00 1f => 9198 > [ 198.826374] __mdiobus_write: 117 400d0000.ethernet-1 00 1f <= 9198 > [ 198.826503] __mdiobus_write: 117 400d0000.ethernet-1 00 1b <= 0000 > [ 198.826699] __mdiobus_read: 117 400d0000.ethernet-1 00 1b => 0000 > > When we resume, the order is reversed: > > [ 198.848300] 400d0000.ethernet: bus : platform_pm_thaw+0x0/0x54 > [ 198.849024] __mdiobus_read: 117 400d0000.ethernet-1 00 1b => 0000 > [ 198.849120] __mdiobus_read: 117 400d0000.ethernet-1 00 1f => 9198 > [ 198.849141] __mdiobus_write: 117 400d0000.ethernet-1 00 1f <= 9198 > [ 198.849243] __mdiobus_write: 117 400d0000.ethernet-1 00 1b <= 0500 > [ 198.849401] __mdiobus_read: 117 400d0000.ethernet-1 00 00 => 3900 > [ 198.849419] __mdiobus_write: 117 400d0000.ethernet-1 00 00 <= 3100 > [ 198.849637] __mdiobus_read: 61 400d0000.ethernet-1 00 01 => 7849 > ... > [ 198.852677] 400d0000.ethernet-1:00: bus : mdio_bus_resume+0x0/0x34 > [ 198.852778] __mdiobus_read: 117 400d0000.ethernet-1 00 00 => 3100 > [ 198.852797] __mdiobus_write: 117 400d0000.ethernet-1 00 00 <= 3100 > > Now, the MAC driver calls phy_stop() and phy_start() from within its > own suspend/resume methods, and at this is done while the PHY is, > as far as the kernel PM code is concerned, suspended. > > However, phylib works around that by resuming the PHY itself when > phy_start() is called in this situation. That looks good, but it > really isn't in this case. Given the above sequence, we will be in > PHY_HALTED state. > > So, when phy_start() is called, the first thing it does is check the > state, and finds PHY_HALTED. It then tries to enable the PHY > interrupts. This cause the Micrel driver to write to the PHY to > enable interrupts - it reads 0x1b to ack any pre-existing interrupt, > modifies 0x1f to set the interrupt pin level, and then supposedly > writes 0x1b to enable interrupts. > > However, at this point, the PHY is still powered down, as we can see > in the following read of the BMCR - containing 0x3900. The write to > enable interrupts is ignored, and so interrupts remain disabled. > > The resume process continues, and the system resumes, but interrupts > on the PHY remain disabled, and the phylib state machine never > advances. You can do anything you like with cables etc, as far as > phylib is concerned, the link steadfastly remains "down". > > That's a bit of a problem if your platform is running root-NFS through > that network interface. > > It looks like some variants of the Micrel phy code work around this by > including in their resume path code to enable PHY interrupts > independently of phylib. > > phy_resume() can't be called inside the locked region in phy_start(), > where we enable interrupts, because genphy_resume() also wants to take > phydev->lock - and it wants to do that to safely read-modify-write the > BMCR register. This, I feel, comes back to an abuse of the phy state > machine lock to also protect atomic bus operations. > > If we move over to using the bus lock to protect bus atomic operations > (as I believe we should) then phy_resume() can be called while holding > phydev->lock from within bits of the code that are protecting themselves > from concurrency with the phylib state machine. It also means that we > can get rid of some of these boolean variables, and most importantly > in this particular case, call phy_resume() before we enable interrupts. > > With that arrangement, things look a lot better: > > [ 74.545584] 400d0000.ethernet: bus : platform_pm_thaw+0x0/0x54 > [ 74.546010] __mdiobus_read: 117 400d0000.ethernet-1 00 00 => 3900 > [ 74.546029] __mdiobus_write: 117 400d0000.ethernet-1 00 00 <= 3100 > [ 74.546264] __mdiobus_read: 117 400d0000.ethernet-1 00 1b => 0000 > [ 74.546354] __mdiobus_read: 117 400d0000.ethernet-1 00 1f => 8100 > [ 74.546373] __mdiobus_write: 117 400d0000.ethernet-1 00 1f <= 8100 > [ 74.546651] __mdiobus_write: 117 400d0000.ethernet-1 00 1b <= 0500 > > The patch for this is slightly larger than it needs to be because I've > converted more places that do read-modify-write to the new xxx_modify() > accessor, and it also ensures that phy_resume() is consistently called > with the phy state machine lock held. It also means we can get rid of > the interrupt enable hack for some micrel PHYs, which I suspect comes > from this same root cause.
Since this is a bug fix, I would really rather have the ability to target the "net" tree with a simpler fix that is contained within drivers/net/phy/phy.c and which does not require the use of phy_modify(). Thanks for doing all the detective work! -- Florian