If you have this problem and only care about solutions, jump to "workarounds" below.
### RECAP For unlucky souls who come fresh upon this problem and down want to read though a better part of a decade's worth of conflicting reports.... 1. Due to a design issue, the BeagleBone Black and descendants have a problem where they intermittently come up with various bad state set in the physical network connection chip (PHY) that make the wired Ethernet port inaccessible and there is no way to get it to recover using only software - a power cycle or hardware reset is required. 2. One of the ways that the PHY can have bad state is that its address can be assigned a different value than expected. The latest versions of the kernel will scan all possible addresses and find the PHY no matter what address is happens to get, so this failure mode is not longer part of issue as long as you use one of these new kernels. (BTW, I have an elegant solution to reassign the PHY back to the expected address which will work with any kernel version if you need it. It also avoids the current kluge that hacks up the device tree to match the new found PHY address.) 3. There are still some bad states that the PHY chip can come up in that are not addressed by the new kernel. As far as I know there is no software only workaround for these - a power cycle or hardware reset is required. 4. In my personal experience, the bad state seems to be significantly less likely when the board is powered though the barrel connector (or USB om BeagleBone Green) than when it is powered via the pin on P9 header. I've also noticed that most people in this thread are powering thier boards via a cape or header connected power supply which makes sense since these people tend to seen the problem more often. Note that the non-recoverable bad state can still happen even on a baord powered via the barrel - it is just less likely. 5. In my personal experience, the bad state seems to be more likely on certain individual boards than others. I have a board that comes up in the bad state about 50% of the time, while other boards only come up int he bad state 1 in 100 times. 6. In my experience, the bad state seems to be significantly less like if *nothing* is connected to the Ethernet port at power up. I really mean not connected - even if there is an unpowered device connected to the other end of the network cable, then the bad state occurs more often. The cable much be unplugged at one end or the other. 7. Bit 13 in register 18 seems to be a 100% indication that you are in the bad state. I have never seen a board with that bit set recover, and I have never seen a non-recoverable board without that bit set (except for a couple of seconds if you manually clear it before it sets itself on again). This bit is "reserved" in the datasheet and so far no hints from Microchip as to what it might mean that might lead to a better understanding of the issue. 8. In the bad state, it is possible to get the PHY to link by manually configuring it to 10Mbs half duplex (no auto negotiation). While the link light comes on and the "link active" bit is set, it does not appear to be decoding incoming packets so this is not a useful workaround. ### WORKAROUNDS In order of effectiveness/desirability. 1. Use a different board. All the commercially available BeagleBone Black and descendants share this design issue, so look at maybe the Raspberry Pi or one of the other ARM based SBCs. 2. Spin your own version of the board. This problem could be completely resolved by adding a connection between the reset pin of the PHY and a gpio on the ARM. This way the ARM would be carefully control the required timing sequence for bringing up the PHY chip - and also be able to hardware reset the chip in case there are any problems. 3. Use a USB Ethernet adapter rather than the on-board eth0 port. Compatible adapters can be found for less than $10. 4. Connect a gpio pin to the reset pin on header P9. That reset pin is tied to the hardware reset pin of the PHY chip, so you can reset it under software control. gpio 60 happens to be very close physically, making for a very easy jumper connection. Then you need a script to test for the bad state, and activate the gpio to reset if it is found. Note that the reset pin will also reset the ARM, the the BB will reboot every-time you do this but should eventually come up (and satay up) with the PHY in the good state. 5. Unplug the the Ethernet port during power up, check for bad state after the board comes up, and keep power cycling it until it comes up in a good state, then reconnect the network cable. 6. Power the board though the barrel or USB rather than though the headers. Though a combination of 5 & 6, I was able to get my bank of boards to come up with a better than 80% good state rate on the first try. Yona Applegate (of LEDscape fame) reports being able to get his large collection of BBS to all come up with good networking 100% of the time using #4, although the amount of time it takes for all boards to get to the good state is indeterminate. ### FUTURE DIRECTIONS There are likely other workaround possible if someone wants to invest more time working on this issue. Here is a tool that let's you easily inspect and modify registers in the PHY.... https://github.com/bigjosh/phyreg Here are all my notes from debugging this issue... https://www.evernote.com/pub/bigjosh2/bbbphyproblem I am happy to try and help anyone who want to dig in deeper. I personally would love to not have to unplug/replug 72 ethernet cables every time I have to power cycle my bank of BBBs! -josh On Tuesday, November 26, 2013 at 5:22:42 PM UTC-5, AndrewTaneGlen wrote: > > Hello, > > I have noticed very rare cases (~1/50) of the ethernet phy on the > Beaglebone Black not being detected on boot, and requiring a hard reset (as > opposed to calling 'reset' from the command line) to get it to work/be > detected again. > > This problem has been mentioned in a couple of other threads (below) > concerning different topics (i.e. problems getting the BBB to boot, and the > ethernet phy 'dying' some time after initially working fine), with no > solution/workaround for this specific problem being suggested - so I > thought I'd start a thread specifically for it. > https://groups.google.com/forum/#!msg/beagleboard/Vp4pxwHm8BU/Iaw3p5xm0MoJ > https://groups.google.com/forum/#!topic/beagleboard/aXv6An1xfqI > > In the first thread mlc/Mike discussed his response to the problem as > follows: > > > > > > > > > > > > > *"I had issues with the network not coming up on boot, and it was > traced down to problems with the SYS_RESETn line. I had a level translator > connected to SYS_RESETn, to drive a 5V chip. It was powered by a 5V rail. > If the 5V rail powered up "differently" than the 3.3V rail (not sure of the > exact relationship), I guess it pulled the SYS_RESETn line to weird levels > that affected the network chip but not the main processor. I'm now using a > GPIO to drive the external 5V chip now, instead of the SYS_RESETn > line. Anyway, the moral is be very, very careful with SYS_RESETn, because > it can cause hard-to-trace problems with networking.*" > > I see that the A6 Revision of the Beaglebone Black has some changes to the > SYS_RESETn line: > > "*Based on notification from TI, in random instances there could be a > glitch in the SYS_RESETn signal from the processor where the SYS_RESETn > signal was taken high for a momentary amount of time before it was supposed > to. To prevent this, the signal was ORed with the PORZn (Power On reset).* > " ( > http://elinux.org/Beagleboard:BeagleBoneBlack#Revision_A6_.28Production_Version.29 > ) > > Is it likely that this modification will improve/resolve the issue I am > seeing with the ethernt phy not resetting/powering-up correctly?, seeing as > the SYS_RESETn signal also feeds into the nRST pin on the ethernet phy (The > SYS_RESETn line is left untouched in my application). > > > Some additional observations from dmesg concerning this use: > > On a good phy boot I see the following: > [ 2.810749] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6 > [ 2.817206] davinci_mdio 4a101000.mdio: detected phy mask fffffffe > [ 2.833517] libphy: 4a101000.mdio: probed > [ 2.837871] davinci_mdio 4a101000.mdio: phy[0]: device > 4a101000.mdio:00, driver unknown > > Followed later by: > [ 21.286920] net eth0: initializing cpsw version 1.12 (0) > [ 21.301166] net eth0: phy found : id is : 0x7c0f1 > > On a 'bad phy' boot I see the following (differences highlighted): > [ 2.806763] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6 > [ 2.813213] davinci_mdio 4a101000.mdio: detected phy mask *fffffffb* > [ 2.829512] libphy: 4a101000.mdio: probed > [ 2.833875] davinci_mdio 4a101000.mdio: phy[2]: device > 4a101000.mdio:02, driver unknown > > Followed later by: > [ 21.346861] net eth0: initializing cpsw version 1.12 (0) > [ 21.354379] *libphy: PHY 4a101000.mdio:00 not found* > [ 21.359469] *net eth0: phy 4a101000.mdio:00 not found on slave 0* > > > So it looks like the 'davinci_mdio_reset' function see the phy in both > instances, but reports differently on the bad boot. I am not sure what to > make of this. > > I am using the Debian 7.2 Rootfs and the 'RobertCNelson' kernel > '3.12.0-bone8'. > > > > Regards, > Andrew. > > > -- For more options, visit http://beagleboard.org/discuss --- You received this message because you are subscribed to the Google Groups "BeagleBoard" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/beagleboard/67b58640-60eb-407b-bd95-5e8b0f8e2845%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
