On 12/22/2017 02:19 PM, Tim Harvey wrote:
On Tue, Dec 19, 2017 at 12:52 PM, Andrew Lunn <and...@lunn.ch> wrote:
On Mon, Dec 18, 2017 at 01:53:47PM -0800, Tim Harvey wrote:
On Wed, Dec 13, 2017 at 11:43 AM, Andrew Lunn <and...@lunn.ch> wrote:
The nic appears to work fine (pings, TCP etc) up until a performance
test is attempted.
When an iperf bandwidth test is attempted the nic ends up in a state
where truncated-ip packets are being sent out (per a tcpdump from
another board):
Hi Tim
Are pause frames supported? Have you tried turning them off?
Can you reproduce the issue with UDP? Or is it TCP only?
Andrew,
Pause frames don't appear to be supported yet and the issue occurs
when using UDP as well as TCP. I'm not clear what the best way to
troubleshoot this is.
Hi Tim
Is pause being negotiated? In theory, it should not be. The PHY should
not offer it, if the MAC has not enabled it. But some PHY drivers are
probably broken and offer pause when they should not.
Also, can you trigger the issue using UDP at say 75% the maximum
bandwidth. That should be low enough that the peer never even tries to
use pause.
All this pause stuff is just a stab in the dark. Something else to try
is to turn off various forms off acceleration, ethtook -K, and see if
that makes a difference.
Andrew,
Currently I'm not using the DP83867_PHY driver (after verifying the
issue occurs with or without that driver).
It does not occur if I limit UDP (ie 950mbps). I disabled all offloads
and the issue still occurs.
I have found that once the issue occurs I can recover to a working
state by clearing/setting BGX_CMRX_CFG[BGX_EN] and once I encounter
the issue and recover with that, I can never trigger the issue again.
If toggle that register bit upon power-up before the issue occurs it
will still occur.
The CN80XX reference manual describes BGX_CMRX_CFG[BGX_EN] as:
- when cleared all dedicated BGX context state for LMAC (state
machine, FIFOs, counters etc) are reset and LMAC access to shared BGX
resources (data path, serdes lanes) is disabled
- when set LMAC operation is enabled (link bring-up, sync, and tx/rx
of idles and fault sequences)
You could try looking at
BGXX_GMP_PCS_INTX
BGXX_GMP_GMI_RXX_INT
BGXX_GMP_GMI_TXX_INT
Those are all W1C registers that should contain all zeros. If they
don't, just write back to them to clear before running a test.
If there are bits asserting in these when the thing gets wedged up, it
might point to a possible cause.
You could also look at these RO registers:
BGXX_GMP_PCS_TXX_STATES
BGXX_GMP_PCS_RXX_STATES
I'm told that the particular Cavium reference board with an SGMII phy
doesn't show this issue (I don't have that specific board to do my own
testing or comparisons against our board) so I'm inclined to think it
has something to do with an interaction with the DP83867 PHY. I would
like to start poking at PHY registers to see if I can find anything
unusual. The best way to do that from userspace is via
SIOCGMIIREG/SIOCSMIIREG right? The thunderx nic doesn't currently
support ioctl's so I guess I'll have to add that support unless
there's a way to get at phy registers from userspace through a phy
driver?
Regards,
Tim