On Wed, Oct 19, 2016 at 12:54 PM, William Hermans <[email protected]> wrote:
>
>
> On Wed, Oct 19, 2016 at 3:24 AM, Graham <[email protected]> wrote:
>>
>> I have two BBG units that I use as headless servers, with only access
>> through Ethernet.  Both have been running without reboot for multiple months
>> without any issues.  I think that I mentioned that I did have a BBB do
>> exactly what you describe, while running as a headless server last year, but
>> at the time there was a thunderstorm in the area, and lightning strikes in
>> the neighborhood. It recovered on reboot, and has never repeated the
>> symptom.
>>
>> So, my conclusion is that it is possible to happen, but rare, and in my
>> case was probably caused by electrical transient coming in the Ethernet
>> connection which is routed from a cable modem to the outside world.
>>
>> For high reliability application, perhaps some extra transient protection
>> on the Ethernet connection, and some kind of "ping monitor" that can
>> auto-reboot the BBG.
>>
>> --- Graham
>
>
> I haven't had a BBG Until the last 2-3 months to play with. Now, I've had
> ~30 over the course of the last 2 months to observe this behavior on. Which
> again has only happen once. So, I attributed what happen to me accidentally
> knocking the board around a little. Until I talked with another person I
> know who has experienced this issue with multiple kernels, and multiple
> times over the last I don't know . . . maybe 6 months.
>
> So what I did was first installed the same Debian image he was using, then
> changed kernels to the *bone* LTS kernel. Removed g_ether, by removing
> Robert's custom boot script for the 335x evm board. After that I got the
> project files from this person I know and duplicated his software setup.
> Which is a mqtt application. With a custom cape.
>
> Anyway, I was running this software last night, and then I downloaded and
> ran nload from a ssh session. But I keep getting ssh Broken pipe errors.
> Which is not necessarily a concern. I've seen that  before. I intend to hook
> up a serial debug cable and run nload from that, but just have not gotten
> around to it.
>
> One thing on my mind is that perhaps the software this person I know wrote
> is somehow failing to deal with a "busy network" properly. Meaning if the
> internet connection is bandwidth saturated, and the application is for some
> reason unable to deal with a "stale connection" How will it act ? However, I
> would not think this should cause the hardware to fail. Because that's what
> I'm seeing when the ethernet traffic indication LEDs stop functioning, While
> also rendering the ethernet connection non functional. What I was able to
> observe so far however. Was that this application sends around 8-9kBit/s
> data, and gets 2-3kBit/s back.
>
> Another concern: Knowing that mqtt by default is an inherently insecure
> protocol, and this app does currently run as root . . .However there
> areseveral caveats to this statement / concern. First, the application is a
> peer to peer design in that only the mqtt broker can communicate with the
> board. Whether it sends commands, or collects data back from the board.
> Second, mqtt is able to use certificates, however I do not htink that is
> currently the case with this software *YET*. I given this person I know the
> standard security lecture on running root, and locking things down, etc. We
> just have not acted on it yet
>
> With all of the above mentioned. When I ran into this issue myself, I was
> not running anything other than a stock image, and the stock software that
> comes with it. While the board was also just idling for 5-6 days. Maybe a
> little longer. I ran uptime from an ssh session where it reported back "5
> days . . ." After which this happened. So I'm more inclined to think this is
> most likely not a userspace application issue.
>
> I'm not even sure where to go from here, as far as tracking this issue down.
> All I can really do is throw everything I know / have at the board, and hope
> I get an error trapped from the live kernel log through serial.

I think it's related to suspend/cpuidle..  I know another user was
having issues, where they had to ping it twice, as the first would
never respond..

one thing that might help: remove the sleep pinmux's from: mac/davinci_mdio:

https://github.com/RobertCNelson/dtb-rebuilder/blob/4.4-ti/src/arm/am335x-bone-common.dtsi#L370-L383

Regards,

-- 
Robert Nelson
https://rcn-ee.com/

-- 
For more options, visit http://beagleboard.org/discuss
--- 
You received this message because you are subscribed to the Google Groups 
"BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beagleboard/CAOCHtYiMw40NSswGzXJGas3xMkjAqwL79T8%3DyOinDmcfYFg4Kw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to