Re: [PATCH v2] netlink: Replace rhash_portid with bound

2015-09-24 Thread Tejun Heo
Hello, David. On Thu, Sep 24, 2015 at 12:11:42PM -0700, David Miller wrote: > From: Herbert Xu > Date: Tue, 22 Sep 2015 11:38:56 +0800 > > > The commit 1f770c0a09da855a2b51af6d19de97fb955eca85 ("netlink: > > Fix autobind race condition that leads to zero port ID") created > > some new races that

Re: [PATCH v2] netlink: Replace rhash_portid with bound

2015-09-24 Thread David Miller
From: Herbert Xu Date: Tue, 22 Sep 2015 11:38:56 +0800 > The commit 1f770c0a09da855a2b51af6d19de97fb955eca85 ("netlink: > Fix autobind race condition that leads to zero port ID") created > some new races that can occur due to inconcsistencies between the > two port IDs. > > Tejun is right that a

Re: [PATCH v2] netlink: Replace rhash_portid with bound

2015-09-23 Thread Herbert Xu
On Wed, Sep 23, 2015 at 11:43:21PM -0400, Tejun Heo wrote: > On Thu, Sep 24, 2015 at 11:42:14AM +0800, Herbert Xu wrote: > > Well I disagree so let's leave it at that. > > Leaving things disagreed is fine but there's still a patch to commit > here, so I get that you're still dead against just appl

Re: [PATCH v2] netlink: Replace rhash_portid with bound

2015-09-23 Thread Tejun Heo
On Thu, Sep 24, 2015 at 11:42:14AM +0800, Herbert Xu wrote: > Well I disagree so let's leave it at that. Leaving things disagreed is fine but there's still a patch to commit here, so I get that you're still dead against just applying the pattern? -- tejun -- To unsubscribe from this list: send t

Re: [PATCH v2] netlink: Replace rhash_portid with bound

2015-09-23 Thread Herbert Xu
On Wed, Sep 23, 2015 at 11:41:16PM -0400, Tejun Heo wrote: > On Thu, Sep 24, 2015 at 11:31:17AM +0800, Herbert Xu wrote: > > No this isn't what happened. My error was trying to see if there > > is a way to do it without barriers. In the end there wasn't. This > > has nothing to do with using pri

Re: [PATCH v2] netlink: Replace rhash_portid with bound

2015-09-23 Thread Tejun Heo
On Thu, Sep 24, 2015 at 11:31:17AM +0800, Herbert Xu wrote: > No this isn't what happened. My error was trying to see if there > is a way to do it without barriers. In the end there wasn't. This > has nothing to do with using primitives. Hmmm... yeah, you can say that, but it still was a failur

Re: [PATCH v2] netlink: Replace rhash_portid with bound

2015-09-23 Thread Herbert Xu
On Wed, Sep 23, 2015 at 11:29:28PM -0400, Tejun Heo wrote: > > So, while that also has been a common failure mode that we've been > seeing with barrier usages, what you're suggesting isn't the right > balance either. It's error-prone in a different way as amply > exemplified in this very thread.

Re: [PATCH v2] netlink: Replace rhash_portid with bound

2015-09-23 Thread Tejun Heo
Hello, On Thu, Sep 24, 2015 at 11:21:00AM +0800, Herbert Xu wrote: > Well we'll have to agree to disagree on that one. I have seen too > many instances over the years where people post patches that use > primitives such as RCU and think that they must be safe because > it compiles with no warning

Re: [PATCH v2] netlink: Replace rhash_portid with bound

2015-09-23 Thread Herbert Xu
On Wed, Sep 23, 2015 at 11:06:09PM -0400, Tejun Heo wrote: > > I think this is where we're not agreeing. My point is that better > understanding and lower likelihood of bug doesn't equate specializing > each usage site. That's a lot more likely to lead to unnecessary > cognition overhead and nat

Re: [PATCH v2] netlink: Replace rhash_portid with bound

2015-09-23 Thread Tejun Heo
Hello, On Thu, Sep 24, 2015 at 10:54:36AM +0800, Herbert Xu wrote: > What I am concerned about is the next guy who comes along and > does a rewrite like the one that introduced the netlink_bind > bug. That person needs to fully understand what each primitive > is protecting against. > > Using pr

Re: [PATCH v2] netlink: Replace rhash_portid with bound

2015-09-23 Thread Herbert Xu
On Wed, Sep 23, 2015 at 10:46:08PM -0400, Tejun Heo wrote: > > Hmm... It looks like I'm failing at communicating. Lemme try again. > There are two situations where we do this. > > 1. When there are different locking contexts. In this case, the write >path is. It's already protected by the

Re: [PATCH v2] netlink: Replace rhash_portid with bound

2015-09-23 Thread Tejun Heo
Hello, Herbert. On Thu, Sep 24, 2015 at 10:30:11AM +0800, Herbert Xu wrote: > Well if someone provided helpers which > > 1) uses smp_wmb and smp_rmb instead of full barriers; This part is fine. > 2) provides raw variants for the cases that barriers aren't needed Hmm... It looks like I'm failin

Re: [PATCH v2] netlink: Replace rhash_portid with bound

2015-09-23 Thread Herbert Xu
On Wed, Sep 23, 2015 at 11:54:40AM -0400, Tejun Heo wrote: > > Hmm... lemme try again. When using barriers or RCU, it's desirable to > establish certain invariants because it usually is extremely easy to > miss corner cases. It is helpful to have an abstraction, even if just > conceptual, where

Re: [PATCH v2] netlink: Replace rhash_portid with bound

2015-09-23 Thread Tejun Heo
Hello, Herbert. On Wed, Sep 23, 2015 at 02:13:42PM +0800, Herbert Xu wrote: > It doesn't matter on x86 but the semantics of smp_load_acquire > and smp_store_release explicitly includes a full barrier which > is something that we don't need. Yeah, I was confused there. I was thinking smp_rmb() wa

Re: [PATCH v2] netlink: Replace rhash_portid with bound

2015-09-22 Thread Herbert Xu
On Tue, Sep 22, 2015 at 12:10:56PM -0400, Tejun Heo wrote: > > That's a pentium pro era errata. Virtually no working machine is > affected by that anymore and nobody builds kernel with that option. > In most cases, store_release and load_acquire are cheaper as they're > more specific. On x86, sto

Re: [PATCH v2] netlink: Replace rhash_portid with bound

2015-09-22 Thread Linus Torvalds
On Tue, Sep 22, 2015 at 1:36 PM, Bjørn Mork wrote: > > http://download.intel.com/design/archives/processors/pro/docs/24268935.pdf > > Says "NoFix" for erratas 66 and 92. Yeah, 66 and 92 do look like they could cause the apparent ordering of accesses to be violated. That said, both of them seem to

Re: [PATCH v2] netlink: Replace rhash_portid with bound

2015-09-22 Thread Bjørn Mork
Linus Torvalds writes: > The PPRO_FENCE_BUG thing is rather questionable. I'm not sure it's > even documented, but I can't find the official PPro errata now. I > think I discussed it with Andy Glew back when Intel was starting to > document the memory ordering rules. > > I'd be willing to get rid

Re: [PATCH v2] netlink: Replace rhash_portid with bound

2015-09-22 Thread Linus Torvalds
On Tue, Sep 22, 2015 at 12:50 PM, Tejun Heo wrote: > > Hmmm? We're doing that. PPRO_FENCE makes both acquire and release > use smp_mb(). Oh, right you are. I just grepped for rmb, and missed it because as you say, it's a full mb. The PPRO_FENCE_BUG thing is rather questionable. I'm not sure it

Re: [PATCH v2] netlink: Replace rhash_portid with bound

2015-09-22 Thread Tejun Heo
Hello, On Tue, Sep 22, 2015 at 12:28:45PM -0700, Linus Torvalds wrote: > Both acquire and smp_rmb() are equally free on x86. Was this always like this? Ah, okay, from 2007. Was thinking it's still doing an actual rmb. Talk about being confused. > It appears that we don't do the X86_PPRO_FENCE

Re: [PATCH v2] netlink: Replace rhash_portid with bound

2015-09-22 Thread Linus Torvalds
On Tue, Sep 22, 2015 at 11:53 AM, Tejun Heo wrote: > > I see. The write path here is cold so the competition is between rmb > and acquire. Unless some significant archs completely screwed it up, > acquire still seems like the better option. It's essentially free on > x86 after all. Both acquir

Re: [PATCH v2] netlink: Replace rhash_portid with bound

2015-09-22 Thread Tejun Heo
Hello, Linus. On Tue, Sep 22, 2015 at 11:42:33AM -0700, Linus Torvalds wrote: ... > smp_rmb() should generally be about the same cost as an acquire. It > can go either way. > > So *if* the algorithm is amenable to smp_wmb()/smp_rmb() kind of > barriers, that's actually quite possibly better than

Re: [PATCH v2] netlink: Replace rhash_portid with bound

2015-09-22 Thread Linus Torvalds
On Tue, Sep 22, 2015 at 9:10 AM, Tejun Heo wrote: > > That's a pentium pro era errata. Virtually no working machine is > affected by that anymore and nobody builds kernel with that option. > In most cases, store_release and load_acquire are cheaper as they're > more specific. On x86, store_relea

Re: [PATCH v2] netlink: Replace rhash_portid with bound

2015-09-22 Thread Tejun Heo
Hello, Herbert. On Tue, Sep 22, 2015 at 11:38:56AM +0800, Herbert Xu wrote: > On Mon, Sep 21, 2015 at 02:20:22PM -0400, Tejun Heo wrote: > > store_release and load_acquire are different from the usual memory > > barriers and can't be paired this way. You have to pair store_release > > and load_ac

[PATCH v2] netlink: Replace rhash_portid with bound

2015-09-21 Thread Herbert Xu
On Mon, Sep 21, 2015 at 02:20:22PM -0400, Tejun Heo wrote: > > store_release and load_acquire are different from the usual memory > barriers and can't be paired this way. You have to pair store_release > and load_acquire. Besides, it isn't a particularly good idea to OK I've decided to drop the

Re: netlink: Replace rhash_portid with bound

2015-09-21 Thread Tejun Heo
Hello, Herbert. On Mon, Sep 21, 2015 at 09:34:16PM +0800, Herbert Xu wrote: > @@ -1119,7 +1120,11 @@ static int netlink_insert(struct sock *sk, u32 portid) > goto err; > } > > - nlk_sk(sk)->portid = portid; > + /* rhashtable_insert carries an implicit write memory bar

netlink: Replace rhash_portid with bound

2015-09-21 Thread Herbert Xu
On Sun, Sep 20, 2015 at 11:11:04PM -0700, David Miller wrote: > > Yeah at this point incremental patches work the best. OK here is the patch: ---8<--- The commit 1f770c0a09da855a2b51af6d19de97fb955eca85 ("netlink: Fix autobind race condition that leads to zero port ID") created some new races tha