On Mon, Jun 01, 2015 at 10:00:28AM -0400, Neil Horman wrote: > On Fri, May 29, 2015 at 01:50:37PM -0300, Marcelo Ricardo Leitner wrote: > > On Fri, May 29, 2015 at 09:17:26AM -0400, Neil Horman wrote: > > > On Thu, May 28, 2015 at 11:46:29AM -0300, Marcelo Ricardo Leitner wrote: > > > > On Thu, May 28, 2015 at 10:27:32AM -0300, Marcelo Ricardo Leitner wrote: > > > > > On Thu, May 28, 2015 at 08:17:27AM -0300, Marcelo Ricardo Leitner > > > > > wrote: > > > > > > On Thu, May 28, 2015 at 06:15:11AM -0400, Neil Horman wrote: > > > > > > > On Wed, May 27, 2015 at 09:52:17PM -0300, mleit...@redhat.com > > > > > > > wrote: > > > > > > > > From: Marcelo Ricardo Leitner <marcelo.leit...@gmail.com> > > > > > > > > > > > > > > > > ->auto_asconf_splist is per namespace and mangled by functions > > > > > > > > like > > > > > > > > sctp_setsockopt_auto_asconf() which doesn't guarantee any > > > > > > > > serialization. > > > > > > > > > > > > > > > > Also, the call to inet_sk_copy_descendant() was backuping > > > > > > > > ->auto_asconf_list through the copy but was not honoring > > > > > > > > ->do_auto_asconf, which could lead to list corruption if it was > > > > > > > > different between both sockets. > > > > > > > > > > > > > > > > This commit thus fixes the list handling by adding a spinlock > > > > > > > > to protect > > > > > > > > against multiple writers and converts the list to be protected > > > > > > > > by RCU > > > > > > > > too, so that we don't have a lock inverstion issue at > > > > > > > > sctp_addr_wq_timeout_handler(). > > > > > > > > > > > > > > > > And as this list now uses RCU, we cannot do such backup and > > > > > > > > restore > > > > > > > > while copying descendant data anymore as readers may be > > > > > > > > traversing the > > > > > > > > list meanwhile. We fix this by simply ignoring/not copying > > > > > > > > those fields, > > > > > > > > placed at the end of struct sctp_sock, so we can just ignore it > > > > > > > > together > > > > > > > > with struct ipv6_pinfo data. For that we create > > > > > > > > sctp_copy_descendant() > > > > > > > > so we don't clutter inet_sk_copy_descendant() with SCTP info. > > > > > > > > > > > > > > > > Issue was found with a test application that kept flipping > > > > > > > > sysctl > > > > > > > > default_auto_asconf on and off. > > > > > > > > > > > > > > > > Fixes: 9f7d653b67ae ("sctp: Add Auto-ASCONF support (core).") > > > > > > > > Signed-off-by: Marcelo Ricardo Leitner > > > > > > > > <marcelo.leit...@gmail.com> > > > > > > > > --- > > > > > > > > include/net/netns/sctp.h | 6 +++++- > > > > > > > > include/net/sctp/structs.h | 2 ++ > > > > > > > > net/sctp/protocol.c | 6 +++++- > > > > > > > > net/sctp/socket.c | 39 > > > > > > > > ++++++++++++++++++++++++++------------- > > > > > > > > 4 files changed, 38 insertions(+), 15 deletions(-) > > > > > > > > > > > > > > > > diff --git a/include/net/netns/sctp.h b/include/net/netns/sctp.h > > > > > > > > index > > > > > > > > 3573a81815ad9e0efb6ceb721eb066d3726419f0..e080bebb3147af39c8275261f57018eb01e917b0 > > > > > > > > 100644 > > > > > > > > --- a/include/net/netns/sctp.h > > > > > > > > +++ b/include/net/netns/sctp.h > > > > > > > > @@ -30,12 +30,15 @@ struct netns_sctp { > > > > > > > > struct list_head local_addr_list; > > > > > > > > struct list_head addr_waitq; > > > > > > > > struct timer_list addr_wq_timer; > > > > > > > > - struct list_head auto_asconf_splist; > > > > > > > > + struct list_head __rcu auto_asconf_splist; > > > > > > > You should use the addr_wq_lock here instead of creating a new > > > > > > > lock, as thats > > > > > > > already used to protect most accesses to the list you are > > > > > > > concerned about. > > > > > > > > > > > > Ok, that works too. > > > > > > > > > > > > > Though truthfully, that shouldn't be necessecary. The list in > > > > > > > question is only > > > > > > > read in one location and only written in one location. You can > > > > > > > likely just > > > > > > > rcu-ify, as the write side is in process context and protected by > > > > > > > lock_sock. > > > > > > > > > > > > It should, it's not protected by lock_sock as this list resides in > > > > > > netns_sctp structure, which lock_sock doesn't cover. Write side is > > > > > > in > > > > > > process context yes, but this list is written in sctp_init_sock(), > > > > > > sctp_destroy_sock() and sctp_setsockopt_auto_asconf(), so one could > > > > > > trigger this by either creating/destroying sockets if > > > > > > default_auto_asconf=1 or just by creating a bunch of sockets and > > > > > > flipping asconf via setsockopt (or a combination of these > > > > > > operations). > > > > > > (I'll point this out in the changelog) > > > > > > > > > > Hmm.. by reusing addr_wq_lock we don't need to rcu-ify the list, as > > > > > the > > > > > reader is inside that lock too, so I can just protect > > > > > auto_asconf_splist > > > > > writers with addr_wq_lock. > > > > > > > > > > Nice, thanks Neil. > > > > > > > > Cannot really do that.. as that creates a lock inversion between > > > > sctp_destroy_sock() (which already holds lock_sock) and > > > > sctp_addr_wq_timeout_handler(), which first grabs addr_wq_lock and then > > > > locks socket by socket. > > > > > > > > Due to that, I'm afraid reusing this lock is not possible, and we should > > > > stick with the patch.. what do you think? (though I have to fix the nits > > > > in there) > > > > > > > I don't think thats accurate. You are correct in that the the locks are > > > taken > > > in opposing order, which would imply a lock inversion that could result in > > > deadlock, but we can avoid that by deferring the asconf list removal > > > until after > > > sk_common_release and unlock_sock_bh is called in sctp_close. That will > > > make > > > the lock ordering consistent. Alternatively, we can pre-emptively take > > > the > > > asconf_lock in sctp_close before locking the socket. > > > > For your first approach, deferring the asconf list removal, we can only > > do that reliably via some work queue, because we initialize asconf stuff > > on sctp_init_sock() and it should be de-initialized on its counterpart, > > sctp_destroy_sock(), as we have code like: > > > > (same for ipv4) > > sctp_v6_create_accept_sk() > > { > > ... > > if (newsk->sk_prot->init(newsk)) { > > sk_common_release(newsk); > > newsk = NULL; > > } > > ... > > } > > > > and at inet6_create() too: > > if (sk->sk_prot->init) { > > err = sk->sk_prot->init(sk); > > if (err) > > sk_common_release(sk); > > } > > > > Or we (kind of) abuse of knowing that sctp_init_sock() cannot fail after > > initializing asconf and move asconf stuff from sctp_destroy_sock() to > > sctp_close(). AFAICT it could be enough, I'm just not a big fan of not > > having that similarity. > > > > If we try to lock addr_wq_lock early in sctp_close(), > > sctp_destroy_sock() would be unprotected on above situations, but if we > > know that sctp_init_sock() won't fail after initilizating asconf, it > > wouldn't be a problem... > > > > > I'd really rather avoid creating an additional lock here if we don't have > > > to > > > > I understand. I'm just not seeing another way out so far... I'll keep > > trying, but please I'm all ears to ideas ;) > > > > Marcelo > rcu-ify auto_asconf_splist, then just use the addr_wq_lock to protect the list > modification sites, as those are all handled at locations that already hold a > socket lock. sctp_addr_wq_timeout_handler is a read-only function for > auto_asconf_splist, and so the spin_lock_bh there can be replaced with an > rcu_read_lock_bh call, breaking the lock inversion > Neil
That's still the lock inversion, because at sctp_addr_wq_timeout_handler() we are not grabbing addr_wq_lock just for this patch, it's already there and later inside that for_each it will lock socket by socket; while on sctp_destroy_sock() , it is called with socket lock already held and we would be trying to grab ->addr_wq_lock in order to protect the list change. We could rcu-ify addr_waitq too. Let sctp_addr_wq_timeout_handler() work on the list protected by rcu_read_lock_bh() and only at the bottom of that for() we grab addr_wq_lock and do the list change. Would require more changes than just that in order to work properly (and avoid double-frees through that timer and sctp_free_addr_wq() for example) Marcelo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html