Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared

2015-09-18 Thread David Woodhouse
On Fri, 2015-09-18 at 01:44 +0200, Francois Romieu wrote: > The TxDmaOkLowDesc register may tell if the Tx dma part is still > making any progress. I have added a TxPoll request. See below. It isn't making any progress. And TxPoll doesn't help. The only thing I've found that restarts it is to cle

Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared

2015-09-18 Thread David Woodhouse
On Fri, 2015-09-18 at 02:04 +0100, David Woodhouse wrote: > On Fri, 2015-09-18 at 01:44 +0200, Francois Romieu wrote: > > The TxDmaOkLowDesc register may tell if the Tx dma part is still > > making any progress. I have added a TxPoll request. See below. > > I've just added that into the original

Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared

2015-09-17 Thread David Woodhouse
On Fri, 2015-09-18 at 01:44 +0200, Francois Romieu wrote: > The TxDmaOkLowDesc register may tell if the Tx dma part is still > making any progress. I have added a TxPoll request. See below. I've just added that into the original TX timeout handler, since that doesn't seem to be crashing the box f

Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared

2015-09-17 Thread Francois Romieu
David Woodhouse : [...] > And of course, even if I fix the TX timeout handling, I'd still like to > know why it's happening in the first place... So do I. The TxDmaOkLowDesc register may tell if the Tx dma part is still making any progress. I have added a TxPoll request. See below. diff --git a

Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared

2015-09-17 Thread David Woodhouse
On Thu, 2015-09-17 at 22:44 +0200, Francois Romieu wrote: > David Woodhouse : > > On Thu, 2015-09-17 at 12:36 +0100, David Woodhouse wrote: > > > > > > Thanks; I'll try that. In fact since updating to 4.2 the problem has > > > got worse — now the whole machine dies: > > > > There is something ve

Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared

2015-09-17 Thread Francois Romieu
David Woodhouse : > On Thu, 2015-09-17 at 12:36 +0100, David Woodhouse wrote: > > > > Thanks; I'll try that. In fact since updating to 4.2 the problem has > > got worse — now the whole machine dies: > > There is something very strange going on here. I've found two ways to > make it stop crashing

Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared

2015-09-17 Thread David Woodhouse
On Thu, 2015-09-17 at 12:36 +0100, David Woodhouse wrote: > > Thanks; I'll try that. In fact since updating to 4.2 the problem has > got worse — now the whole machine dies: There is something very strange going on here. I've found two ways to make it stop crashing when cp_tx_timeout() hits the 'p

Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared

2015-09-17 Thread David Woodhouse
On Mon, 2015-09-14 at 23:59 +0200, Francois Romieu wrote: > Instant (untested) hack below. That seems to trigger a lot, but ultimately doesn't help... [ 250.998980] 8139cp :00:0b.0 eth1: Timeout head=000b, tail=000a [ 252.637287] net_ratelimit: 5 callbacks suppressed

Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared

2015-09-17 Thread David Woodhouse
On Mon, 2015-09-14 at 23:59 +0200, Francois Romieu wrote: > > [...] > > [308309.574551] 8139cp :00:0b.0 eth1: Transmit timeout, status > c 2b0 80ff > > Rx and Tx are enabled. > > Instant (untested) hack below. Thanks; I'll try that. In fact since updating to 4.2 the problem has got w

Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared

2015-09-14 Thread Francois Romieu
David Woodhouse : [...] > Did you ever work this out ? Not specifically. > I'm seeing something similar on the inward -facing interface on my home > router under high load — and it doesn't automatically recover. [...] > [308309.457239] Pid: 0, comm: swapper Not tainted 3.7.1 #1 It's unrelated b

Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared

2015-09-14 Thread David Woodhouse
On Mon, 2013-05-20 at 17:27 -0700, Stephen Hemminger wrote: > On Mon, 20 May 2013 23:37:28 +0200 > Francois Romieu wrote: > > > cp_stop_hw includes netdev_reset_queue. > > > > You have imho exhibited a start_xmit after cp_stop_hw race - not sure if > > it happens in cp_tx_timeout or cp_change_mt