If you have this problem and only care about solutions, jump to 
"workarounds" below.

### RECAP 

For unlucky souls who come fresh upon this problem and down want to read 
though a better part of a decade's worth of conflicting reports....

1. Due to a design issue, the BeagleBone Black and descendants have a 
problem where they intermittently come up with various bad state set in the 
physical network connection chip (PHY) that make the wired Ethernet port 
inaccessible and there is no way to get it to recover using only software - 
a power cycle or hardware reset is required. 

2. One of the ways that the PHY can have bad state is that its address can 
be assigned a different value than expected. The latest versions of the 
kernel will scan all possible addresses and find the PHY no matter what 
address is happens to get, so this failure mode is not longer part of issue 
as long as you use one of these new kernels. (BTW, I have an elegant 
solution to reassign the PHY back to the expected address which will work 
with any kernel version if you need it. It also avoids the current kluge 
that hacks up the device tree to match the new found PHY address.) 

3. There are still some bad states that the PHY chip can come up in that 
are not addressed by the new kernel. As far as I know there is no software 
only workaround for these - a power cycle or hardware reset is required. 

4. In my personal experience, the bad state seems to be significantly less 
likely when the board is powered though the barrel connector (or USB om 
BeagleBone Green) than when it is powered via the pin on P9 header. I've 
also noticed that most people in this thread are powering thier boards via 
a cape or header connected power supply which makes sense since these 
people tend to seen the problem more often. Note that the non-recoverable 
bad state can still happen even on a baord powered via the barrel - it is 
just less likely. 

5. In my personal experience, the bad state seems to be more likely on 
certain individual boards than others. I have a board that comes up in the 
bad state about 50% of the time, while other boards only come up int he bad 
state 1 in 100 times. 

6. In my experience, the bad state seems to be significantly less like if 
*nothing* is connected to the Ethernet port at power up. I really mean not 
connected - even if there is an unpowered device connected to the other end 
of the network cable, then the bad state occurs more often. The cable much 
be unplugged at one end or the other. 

7. Bit 13 in register 18 seems to be a 100% indication that you are in the 
bad state. I have never seen a board with that bit set recover, and I have 
never seen a non-recoverable board without that bit set (except for a 
couple of seconds if you manually clear it before it sets itself on again). 
This bit is "reserved" in the datasheet and so far no hints from Microchip 
as to what it might mean that might lead to a better understanding of the 
issue. 

8. In the bad state, it is possible to get the PHY to link by manually 
configuring it to 10Mbs half duplex (no auto negotiation). While the link 
light comes on and the "link active" bit is set, it does not appear to be 
decoding incoming packets so this is not a useful workaround. 

### WORKAROUNDS

In order of effectiveness/desirability. 

1. Use a different board. All the commercially available BeagleBone Black 
and descendants share this design issue, so look at maybe the Raspberry Pi 
or one of the other ARM based SBCs.

2. Spin your own version of the board. This problem could be completely 
resolved by adding a connection between the reset pin of the PHY and a gpio 
on the ARM. This way the ARM would be carefully control the required timing 
sequence for bringing up the PHY chip - and also be able to hardware reset 
the chip in case there are any problems. 

3. Use a USB Ethernet adapter rather than the on-board eth0 port. 
Compatible adapters can be found for less than $10. 

4. Connect a gpio pin to the reset pin on header P9. That reset pin is tied 
to the hardware reset pin of the PHY chip, so you can reset it under 
software control. gpio 60 happens to be very close physically, making for a 
very easy jumper connection. Then you need a script to test for the bad 
state, and activate the gpio to reset if it is found. Note that the reset 
pin will also reset the ARM, the the BB will reboot every-time you do this 
but should eventually come up (and satay up) with the PHY in the good 
state. 

5. Unplug the the Ethernet port during power up, check for bad state after 
the board comes up, and keep power cycling it until it comes up in a good 
state, then reconnect the network cable.

6. Power the board though the barrel or USB rather than though the headers.

Though a combination of 5 & 6, I was able to get my bank of boards to come 
up with a better than 80% good state rate on the first try. Yona Applegate 
(of LEDscape fame) reports being able to get his large collection of BBS to 
all come up with good networking 100% of the time using #4, although the 
amount of time it takes for all boards to get to the good state is 
indeterminate. 

### FUTURE DIRECTIONS

There are likely other workaround possible if someone wants to invest more 
time working on this issue. 

Here is a tool that let's you easily inspect and modify registers in the 
PHY....
https://github.com/bigjosh/phyreg

Here are all my notes from debugging this issue...
https://www.evernote.com/pub/bigjosh2/bbbphyproblem

I am happy to try and help anyone who want to dig in deeper. I personally 
would love to not have to unplug/replug 72 ethernet cables every time I 
have to power cycle my bank of BBBs!

-josh









On Tuesday, November 26, 2013 at 5:22:42 PM UTC-5, AndrewTaneGlen wrote:
>
> Hello,
>
> I have noticed very rare cases (~1/50) of the ethernet phy on the 
> Beaglebone Black not being detected on boot, and requiring a hard reset (as 
> opposed to calling 'reset' from the command line) to get it to work/be 
> detected again.
>
> This problem has been mentioned in a couple of other threads (below) 
> concerning different topics (i.e. problems getting the BBB to boot, and the 
> ethernet phy 'dying' some time after initially working fine), with no 
> solution/workaround for this specific problem being suggested - so I 
> thought I'd start a thread specifically for it.
> https://groups.google.com/forum/#!msg/beagleboard/Vp4pxwHm8BU/Iaw3p5xm0MoJ
> https://groups.google.com/forum/#!topic/beagleboard/aXv6An1xfqI
>
> In the first thread mlc/Mike discussed his response to the problem as 
> follows:
>
>
>
>
>
>
>
>
>
>
>
>
> *"I had issues with the network not coming up on boot, and it was 
> traced down to problems with the SYS_RESETn line. I had a level translator 
> connected to SYS_RESETn, to drive a 5V chip. It was powered by a 5V rail. 
> If the 5V rail powered up "differently" than the 3.3V rail (not sure of the 
> exact relationship), I guess it pulled the SYS_RESETn line to weird levels 
> that affected the network chip but not the main processor. I'm now using a 
> GPIO to drive the external 5V chip now, instead of the SYS_RESETn 
> line. Anyway, the moral is be very, very careful with SYS_RESETn, because 
> it can cause hard-to-trace problems with networking.*"
>
> I see that the A6 Revision of the Beaglebone Black has some changes to the 
> SYS_RESETn line:
>
> "*Based on notification from TI, in random instances there could be a 
> glitch in the SYS_RESETn signal from the processor where the SYS_RESETn 
> signal was taken high for a momentary amount of time before it was supposed 
> to. To prevent this, the signal was ORed with the PORZn (Power On reset).*
> " (
> http://elinux.org/Beagleboard:BeagleBoneBlack#Revision_A6_.28Production_Version.29
> )
>
> Is it likely that this modification will improve/resolve the issue I am 
> seeing with the ethernt phy not resetting/powering-up correctly?, seeing as 
> the SYS_RESETn signal also feeds into the nRST pin on the ethernet phy (The 
> SYS_RESETn line is left untouched in my application).
>
>
> Some additional observations from dmesg concerning this use:
>
> On a good phy boot I see the following:
> [    2.810749] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6
> [    2.817206] davinci_mdio 4a101000.mdio: detected phy mask fffffffe
> [    2.833517] libphy: 4a101000.mdio: probed
> [    2.837871] davinci_mdio 4a101000.mdio: phy[0]: device 
> 4a101000.mdio:00, driver unknown
>
> Followed later by:
> [   21.286920] net eth0: initializing cpsw version 1.12 (0)
> [   21.301166] net eth0: phy found : id is : 0x7c0f1
>
> On a 'bad phy' boot I see the following (differences highlighted):
> [    2.806763] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6
> [    2.813213] davinci_mdio 4a101000.mdio: detected phy mask *fffffffb*
> [    2.829512] libphy: 4a101000.mdio: probed
> [    2.833875] davinci_mdio 4a101000.mdio: phy[2]: device 
> 4a101000.mdio:02, driver unknown
>
> Followed later by:
> [   21.346861] net eth0: initializing cpsw version 1.12 (0)
> [   21.354379] *libphy: PHY 4a101000.mdio:00 not found*
> [   21.359469] *net eth0: phy 4a101000.mdio:00 not found on slave 0*
>
>
> So it looks like the 'davinci_mdio_reset' function see the phy in both 
> instances, but reports differently on the bad boot. I am not sure what to 
> make of this.
>
> I am using the Debian 7.2 Rootfs and the 'RobertCNelson' kernel 
> '3.12.0-bone8'.
>
>
>
> Regards,
> Andrew.
>
>
>

-- 
For more options, visit http://beagleboard.org/discuss
--- 
You received this message because you are subscribed to the Google Groups 
"BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beagleboard/67b58640-60eb-407b-bd95-5e8b0f8e2845%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to