We are working in an ARMv7 embedded system running kernel 4.1 but including patches to upgrade dsa/mv88e6xxx to kernel version 4.3 (5acf4d0, Wed, 27 May 2015 15:32:15 -0700) "[PATCH] blk: rq_data_dir() should not return a boolean."
This is the schema of the system. +---------------------+ eth0 | +--+ | | | | Embedded system +--+ | | | ARMv7 | | | Marvell 88E8057(sky2) +------------------+ | +--+ +--+ +--+ eth1@marvell | | +---------------------------+ | | +-------+ | +--+ CPU port +--+ mv88e6176 +--+ +------+--+-----------+ | | emulated | | | | GPIO-MDIO +--+ +--+ +--+ eth2@marvell +-------------------------------------------+ | | +-------+ MDIO +--+ +--+ +------------------+ There is a bridge (br-lan) which includes eth0/eth1/eth2 >From time to time, We are seeing a link down and up of about 1s. Following the message that kernel sends. [ 312.769399] dsa dsa@0 eth2: Link is Down [ 312.773372] br-lan: port 3(eth2) entered disabled state [ 312.947274] dsa dsa@0 eth2: link up, 100 Mb/s, full duplex, flow control disabled [ 312.963807] br-lan: port 3(eth2) entered forwarding state [ 312.969276] br-lan: port 3(eth2) entered forwarding state [ 313.777815] dsa dsa@0 eth2: Link is Up - 100Mbps/Full - flow control rx/tx [ 314.966277] br-lan: port 3(eth2) entered forwarding state Moreover, under a reboot loop test which consists in booting the system, ping the unit and, if it responds, reboot again, we found that the bridge does not forward packages after many reboots. Looking into 88e6176 registers we saw the following GLOBAL GLOBAL2 0 1 2 3 4 5 6 0: c820 0 de0f 5d0f 500f 500f 500f 4e07 4007 1: 3 0 3e 3 3 3 3 3 3 2: 0 ffff 0 0 0 0 0 0 0 3: 0 ffff 1761 1761 1761 1761 1761 1761 1761 4: 6000 258 373f 433 430 433 433 433 433 5: 1000 c12f 0 0 0 0 0 0 0 6: c000 1f0f 101e 3005 3003 4001 5001 6001 7001 7: 0 707f 0 0 0 0 0 0 0 8: 0 7800 2480 2480 2480 2480 2480 2480 2480 9: 0 1600 1 1 1 1 1 1 1 a: 148 0 0 0 0 0 0 0 0 b: 6000 1000 1 2 4 8 10 20 40 c: 0 22 0 0 0 0 0 0 0 d: ffff 507 0 0 0 0 0 0 0 e: ffff 36 0 0 0 0 0 0 0 f: ffff f00 dada dada dada dada dada dada dada 10: 0 0 0 0 0 0 0 0 0 11: 0 0 0 0 0 0 0 0 0 12: 5555 0 0 0 0 0 0 0 0 13: 5555 0 34d 8b18 54d 0 0 0 0 14: aaaa 400 0 0 0 0 0 0 0 15: aaaa 0 0 0 0 0 0 0 0 16: ffff 0 33 33 33 33 33 33 0 17: ffff 0 0 0 0 0 0 0 0 18: fa41 1884 3210 3210 3210 3210 3210 3210 3210 19: 0 5e1 7654 7654 7654 7654 7654 7654 7654 1a: 0 0 0 0 0 0 0 0 0 1b: 1fc f869 8000 8000 8000 8000 8000 8000 8000 1c: 0 4c00 0 0 0 0 0 0 0 1d: 5ce0 0 0 0 0 0 0 0 0 1e: 0 0 0 0 0 0 0 0 0 1f: 0 0 0 0 0 0 0 0 0 The main difference is GLOBAL2 5th register. When the unit is just initialized, the driver sets this register to 00ff, however, when the issue happens, its value is c12f. We got a patch which allows us to set registers values. If we change c12f to 00ff the ping works, otherwise, ping does not work. We do not know who is changing the register value. Apparently, driver does not. Weirderif possible, sometimes even global2 5th register is set to 00ff and bridge does not forward packages either. We have not sorted out which other register is affecting. Finally, The weirdest behaviour we are seeing is that the unit does not detect a link change, register 0 of ports 1 and 2 do not update their status. Have you experienced a similar issue in your side? Is it possible that those micro-outage could be the reason of bad settings in Global2 5th register? Have you fixed this issues in a newer Linux kernel version? Thanks in advance.