Re: [PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-26 Thread David Miller
From: Daniel Borkmann Date: Wed, 21 Dec 2016 18:04:11 +0100 > Shahar reported a soft lockup in tc_classify(), where we run into an > endless loop when walking the classifier chain due to tp->next == tp > which is a state we should never run into. The issue only seems to > trigge

Re: [PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-24 Thread Daniel Borkmann
On 12/24/2016 08:34 AM, Cong Wang wrote: On Thu, Dec 22, 2016 at 4:26 PM, Daniel Borkmann wrote: On 12/22/2016 08:05 PM, Cong Wang wrote: On Wed, Dec 21, 2016 at 1:07 PM, Daniel Borkmann wrote: Ok, you mean for net. In that case I prefer the smaller sized fix to be honest. It also covers ev

Re: [PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-23 Thread Cong Wang
On Thu, Dec 22, 2016 at 4:26 PM, Daniel Borkmann wrote: > On 12/22/2016 08:05 PM, Cong Wang wrote: >> >> On Wed, Dec 21, 2016 at 1:07 PM, Daniel Borkmann >> wrote: >>> >>> >>> Ok, you mean for net. In that case I prefer the smaller sized fix to be >>> honest. It also covers everything from the po

Re: [PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-22 Thread Daniel Borkmann
On 12/22/2016 08:05 PM, Cong Wang wrote: On Wed, Dec 21, 2016 at 1:07 PM, Daniel Borkmann wrote: Ok, you mean for net. In that case I prefer the smaller sized fix to be honest. It also covers everything from the point where we fetch the chain via cops->tcf_chain() to the end of the function, w

Re: [PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-22 Thread Daniel Borkmann
On 12/22/2016 06:50 PM, John Fastabend wrote: On 16-12-22 08:53 AM, David Miller wrote: From: Daniel Borkmann Date: Wed, 21 Dec 2016 22:07:48 +0100 Ok, you mean for net. In that case I prefer the smaller sized fix to be honest. It also covers everything from the point where we fetch the chain

Re: [PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-22 Thread Daniel Borkmann
On 12/22/2016 02:16 PM, Shahar Klein wrote: On 12/21/2016 7:04 PM, Daniel Borkmann wrote: Shahar reported a soft lockup in tc_classify(), where we run into an endless loop when walking the classifier chain due to tp->next == tp which is a state we should never run into. The issue only seems

Re: [PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-22 Thread Cong Wang
On Wed, Dec 21, 2016 at 1:07 PM, Daniel Borkmann wrote: > > Ok, you mean for net. In that case I prefer the smaller sized fix to be > honest. It also covers everything from the point where we fetch the chain > via cops->tcf_chain() to the end of the function, which is where most of > the complexit

Re: [PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-22 Thread John Fastabend
On 16-12-22 08:53 AM, David Miller wrote: > From: Daniel Borkmann > Date: Wed, 21 Dec 2016 22:07:48 +0100 > >> Ok, you mean for net. In that case I prefer the smaller sized fix to >> be honest. It also covers everything from the point where we fetch >> the chain via cops->tcf_chain() to the end o

Re: [PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-22 Thread David Miller
From: Daniel Borkmann Date: Wed, 21 Dec 2016 22:07:48 +0100 > Ok, you mean for net. In that case I prefer the smaller sized fix to > be honest. It also covers everything from the point where we fetch > the chain via cops->tcf_chain() to the end of the function, which is > where most of the comple

Re: [PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-22 Thread Shahar Klein
On 12/21/2016 7:04 PM, Daniel Borkmann wrote: Shahar reported a soft lockup in tc_classify(), where we run into an endless loop when walking the classifier chain due to tp->next == tp which is a state we should never run into. The issue only seems to trigger under load in the tc control p

Re: [PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-21 Thread Daniel Borkmann
On 12/21/2016 09:47 PM, Cong Wang wrote: On Wed, Dec 21, 2016 at 12:02 PM, Daniel Borkmann wrote: On 12/21/2016 08:10 PM, Cong Wang wrote: On Wed, Dec 21, 2016 at 10:51 AM, Cong Wang wrote: On Wed, Dec 21, 2016 at 9:04 AM, Daniel Borkmann wrote: What happens is that in tc_ctl_tfilter(), t

Re: [PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-21 Thread Cong Wang
On Wed, Dec 21, 2016 at 12:02 PM, Daniel Borkmann wrote: > On 12/21/2016 08:10 PM, Cong Wang wrote: >> >> On Wed, Dec 21, 2016 at 10:51 AM, Cong Wang >> wrote: >>> >>> On Wed, Dec 21, 2016 at 9:04 AM, Daniel Borkmann >>> wrote: What happens is that in tc_ctl_tfilter(), thread A allocat

Re: [PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-21 Thread Daniel Borkmann
On 12/21/2016 08:10 PM, Cong Wang wrote: On Wed, Dec 21, 2016 at 10:51 AM, Cong Wang wrote: On Wed, Dec 21, 2016 at 9:04 AM, Daniel Borkmann wrote: What happens is that in tc_ctl_tfilter(), thread A allocates a new tp, initializes it, sets tp_created to 1, and calls into tp->ops->change() wit

Re: [PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-21 Thread Cong Wang
On Wed, Dec 21, 2016 at 10:51 AM, Cong Wang wrote: > On Wed, Dec 21, 2016 at 9:04 AM, Daniel Borkmann wrote: >> What happens is that in tc_ctl_tfilter(), thread A allocates a new >> tp, initializes it, sets tp_created to 1, and calls into tp->ops->change() >> with it. In that classifier callback

Re: [PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-21 Thread Daniel Borkmann
On 12/21/2016 07:51 PM, Cong Wang wrote: On Wed, Dec 21, 2016 at 9:04 AM, Daniel Borkmann wrote: What happens is that in tc_ctl_tfilter(), thread A allocates a new tp, initializes it, sets tp_created to 1, and calls into tp->ops->change() with it. In that classifier callback we had to unlock/lo

Re: [PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-21 Thread Cong Wang
On Wed, Dec 21, 2016 at 9:04 AM, Daniel Borkmann wrote: > What happens is that in tc_ctl_tfilter(), thread A allocates a new > tp, initializes it, sets tp_created to 1, and calls into tp->ops->change() > with it. In that classifier callback we had to unlock/lock the rtnl > mutex and returned with

Re: [PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-21 Thread Eric Dumazet
On Wed, 2016-12-21 at 18:04 +0100, Daniel Borkmann wrote: > Shahar reported a soft lockup in tc_classify(), where we run into an > endless loop when walking the classifier chain due to tp->next == tp > which is a state we should never run into. The issue only seems to > trigger un

[PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-21 Thread Daniel Borkmann
Shahar reported a soft lockup in tc_classify(), where we run into an endless loop when walking the classifier chain due to tp->next == tp which is a state we should never run into. The issue only seems to trigger under load in the tc control path. What happens is that in tc_ctl_tfilter(), thr

Re: Soft lockup in tc_classify

2016-12-21 Thread Shahar Klein
On 12/21/2016 12:15 PM, Daniel Borkmann wrote: On 12/21/2016 08:03 AM, Cong Wang wrote: On Tue, Dec 20, 2016 at 10:44 PM, Shahar Klein wrote: [...] Looks like you added a debug printk inside tcf_destroy() too, which seems racy with filter creation, it should not happen since in both cases w

Re: Soft lockup in tc_classify

2016-12-21 Thread Daniel Borkmann
On 12/21/2016 01:58 PM, Shahar Klein wrote: On 12/21/2016 12:15 PM, Daniel Borkmann wrote: On 12/21/2016 08:03 AM, Cong Wang wrote: On Tue, Dec 20, 2016 at 10:44 PM, Shahar Klein wrote: [...] Looks like you added a debug printk inside tcf_destroy() too, which seems racy with filter creation,

Re: Soft lockup in tc_classify

2016-12-21 Thread Shahar Klein
On 12/21/2016 9:03 AM, Cong Wang wrote: On Tue, Dec 20, 2016 at 10:44 PM, Shahar Klein wrote: Tried it with same results This piece is pretty interesting: [ 408.554689] DEBUGG:SK thread-2853[cpu-1] setting tp_created to 1 tp=94b5b02805a0 back=94b9ea932060 [ 408.574258] DEBUGG:SK

Re: Soft lockup in tc_classify

2016-12-21 Thread Daniel Borkmann
On 12/21/2016 08:03 AM, Cong Wang wrote: On Tue, Dec 20, 2016 at 10:44 PM, Shahar Klein wrote: [...] Looks like you added a debug printk inside tcf_destroy() too, which seems racy with filter creation, it should not happen since in both cases we take RTNL lock. Don't know if changing all RCU_

Re: Soft lockup in tc_classify

2016-12-21 Thread Shahar Klein
On 12/20/2016 1:47 PM, Daniel Borkmann wrote: Hi Shahar, On 12/20/2016 07:22 AM, Shahar Klein wrote: On 12/19/2016 7:58 PM, Cong Wang wrote: On Mon, Dec 19, 2016 at 8:39 AM, Shahar Klein wrote: On 12/13/2016 12:51 AM, Cong Wang wrote: On Mon, Dec 12, 2016 at 1:18 PM, Or Gerlitz wrote:

Re: Soft lockup in tc_classify

2016-12-20 Thread Cong Wang
On Tue, Dec 20, 2016 at 10:44 PM, Shahar Klein wrote: > > Tried it with same results This piece is pretty interesting: [ 408.554689] DEBUGG:SK thread-2853[cpu-1] setting tp_created to 1 tp=94b5b02805a0 back=94b9ea932060 [ 408.574258] DEBUGG:SK thread-2853[cpu-1] add/change filter by: f

Re: Soft lockup in tc_classify

2016-12-20 Thread Shahar Klein
On 12/19/2016 7:58 PM, Cong Wang wrote: Hello, On Mon, Dec 19, 2016 at 8:39 AM, Shahar Klein wrote: On 12/13/2016 12:51 AM, Cong Wang wrote: On Mon, Dec 12, 2016 at 1:18 PM, Or Gerlitz wrote: On Mon, Dec 12, 2016 at 3:28 PM, Daniel Borkmann wrote: Note that there's still the RCU fi

Re: Soft lockup in tc_classify

2016-12-20 Thread Daniel Borkmann
Hi Shahar, On 12/20/2016 07:22 AM, Shahar Klein wrote: On 12/19/2016 7:58 PM, Cong Wang wrote: On Mon, Dec 19, 2016 at 8:39 AM, Shahar Klein wrote: On 12/13/2016 12:51 AM, Cong Wang wrote: On Mon, Dec 12, 2016 at 1:18 PM, Or Gerlitz wrote: On Mon, Dec 12, 2016 at 3:28 PM, Daniel Borkmann

Re: Soft lockup in tc_classify

2016-12-19 Thread Shahar Klein
On 12/13/2016 12:51 AM, Cong Wang wrote: On Mon, Dec 12, 2016 at 1:18 PM, Or Gerlitz wrote: On Mon, Dec 12, 2016 at 3:28 PM, Daniel Borkmann wrote: Note that there's still the RCU fix missing for the deletion race that Cong will still send out, but you say that the only thing you do is to

Re: Soft lockup in tc_classify

2016-12-19 Thread Cong Wang
Hello, On Mon, Dec 19, 2016 at 8:39 AM, Shahar Klein wrote: > > > On 12/13/2016 12:51 AM, Cong Wang wrote: >> >> On Mon, Dec 12, 2016 at 1:18 PM, Or Gerlitz wrote: >>> >>> On Mon, Dec 12, 2016 at 3:28 PM, Daniel Borkmann >>> wrote: >>> Note that there's still the RCU fix missing for the de

Re: Soft lockup in tc_classify

2016-12-13 Thread Shahar Klein
On 12/12/2016 9:07 PM, Cong Wang wrote: On Mon, Dec 12, 2016 at 8:04 AM, Shahar Klein wrote: On 12/12/2016 3:28 PM, Daniel Borkmann wrote: Hi Shahar, On 12/12/2016 10:43 AM, Shahar Klein wrote: Hi All, sorry for the spam, the first time was sent with html part and was rejected. We ob

Soft lockup in tc_classify

2016-12-12 Thread Shahar Klein
Hi All, sorry for the spam, the first time was sent with html part and was rejected. We observed an issue where a classifier instance next member is pointing back to itself, causing a CPU soft lockup. We found it by running traffic on many udp connections and then adding a new flower rule usin

Re: Soft lockup in tc_classify

2016-12-12 Thread Cong Wang
On Mon, Dec 12, 2016 at 1:18 PM, Or Gerlitz wrote: > On Mon, Dec 12, 2016 at 3:28 PM, Daniel Borkmann wrote: > >> Note that there's still the RCU fix missing for the deletion race that >> Cong will still send out, but you say that the only thing you do is to >> add a single rule, but no other ope

Re: Soft lockup in tc_classify

2016-12-12 Thread Or Gerlitz
On Mon, Dec 12, 2016 at 3:28 PM, Daniel Borkmann wrote: > Note that there's still the RCU fix missing for the deletion race that > Cong will still send out, but you say that the only thing you do is to > add a single rule, but no other operation in involved during that test? What's missing to ha

Re: Soft lockup in tc_classify

2016-12-12 Thread Cong Wang
On Mon, Dec 12, 2016 at 8:04 AM, Shahar Klein wrote: > > > On 12/12/2016 3:28 PM, Daniel Borkmann wrote: >> >> Hi Shahar, >> >> On 12/12/2016 10:43 AM, Shahar Klein wrote: >>> >>> Hi All, >>> >>> sorry for the spam, the first time was sent with html part and was >>> rejected. >>> >>> We observed a

Re: Soft lockup in tc_classify

2016-12-12 Thread Shahar Klein
On 12/12/2016 3:28 PM, Daniel Borkmann wrote: Hi Shahar, On 12/12/2016 10:43 AM, Shahar Klein wrote: Hi All, sorry for the spam, the first time was sent with html part and was rejected. We observed an issue where a classifier instance next member is pointing back to itself, causing a CPU so

Re: Soft lockup in tc_classify

2016-12-12 Thread Daniel Borkmann
Hi Shahar, On 12/12/2016 10:43 AM, Shahar Klein wrote: Hi All, sorry for the spam, the first time was sent with html part and was rejected. We observed an issue where a classifier instance next member is pointing back to itself, causing a CPU soft lockup. We found it by running traffic on man