On Thu, Aug 10, 2006 at 01:28:27PM -0700, Stephen Hemminger wrote:
> >
> >On Wed, Aug 09, 2006 at 11:31:42AM -0700, Stephen Hemminger wrote:
> >> Replace the gross custom locking done in socket code for net_family[]
> >> with simple RCU usage. Some reordering necessary to avoid sleep
> >> issues with sock_alloc.
> >
> >Definitely a good use of RCU from a read-intensive standpoint -- does
> >anyone other than Linux-kernel networking developers change the elements
> >of the net_family[] array except at boot and shutdown? ;-)
> >
> >Some comments included below. Looks good, but one question about
> >things like atalk_create() being able to sleep and a place or two
> >where a comment would be good.
> >
>
> ...
>
> >>
> >> + /*
> >> + * Allocate the socket and allow the family to set things up. if
> >> + * the protocol is 0, the family is instructed to select an
> >> appropriate
> >> + * default.
> >> + */
> >> + sock = sock_alloc();
> >> + if (!sock) {
> >> + printk(KERN_WARNING "socket: no more sockets\n");
> >> + return -ENFILE; /* Not exactly a match, but its the
> >> + closest posix thing */
> >> + }
> >> +
> >> + sock->type = type;
> >> +
> >> #if defined(CONFIG_KMOD)
> >> /* Attempt to load a protocol module if the find failed.
> >> *
> >> @@ -1166,70 +1138,59 @@
> >> * requested real, full-featured networking support upon configuration.
> >> * Otherwise module support will break!
> >> */
> >> - if (net_families[family] == NULL) {
> >> + if (net_families[family] == NULL)
> >> request_module("net-pf-%d", family);
> >
> >OK, I'll bite...
> >
> >What happens if the module is not present? Or is this what the
> >"Otherwise module support will break" comment is getting at?
>
> request_module loads the module (and blocks). One would
> expect that the module loaded would set net_families[] via
> sock_register, so later reference would succeed. Comment is
> historical since intention was to make base socket code itself modular
> which never was done, and is probably a bad idea to even consider.
>
> If module is not present, then net_families[] will still be NULL.
Got it!
> >Also, this reference to "net_families[family]" is done without
> >rcu_dereference() and without any clear update-side lock. This
> >just happens to be OK, since we are only testing for NULL, but
> >should at least have a comment.
>
> >> - }
> >> #endif
> >>
> >> - net_family_read_lock();
> >> - if (net_families[family] == NULL) {
> >> - err = -EAFNOSUPPORT;
> >> - goto out;
> >> - }
> >> -
> >> -/*
> >> - * Allocate the socket and allow the family to set things up. if
> >> - * the protocol is 0, the family is instructed to select an
> >> appropriate
> >> - * default.
> >> - */
> >> -
> >> - if (!(sock = sock_alloc())) {
> >> - printk(KERN_WARNING "socket: no more sockets\n");
> >> - err = -ENFILE; /* Not exactly a match, but its the
> >> - closest posix thing */
> >> - goto out;
> >> - }
> >> -
> >> - sock->type = type;
> >> + rcu_read_lock();
> >> + pf = rcu_dereference(net_families[family]);
> >
> >OK, so the elements of the net_families array are protected by RCU.
> >All references should either be under rcu_read_lock() and accessed
> >via rcu_dereference() or under the update-side lock, whatever that
> >might be.
> >
>
> Yes, the net_family_lock
Good.
> >>
> >> -/*
> >> +/**
> >> + * sock_unregister - remove a protocol handler
> >> + * @family: protocol family to remove
> >> + *
> >> * This function is called by a protocol handler that wants to
> >> * remove its address family, and have it unlinked from the
> >> - * SOCKET module.
> >> + * new socket creation.
> >> + *
> >> + * If protocol handler is a module, then it can use module
> >> reference
> >> + * counts to protect against new references. If protocol handler
> >> is not
> >> + * a module then it needs to provide its own protection in
> >> + * the ops->create routine.
> >> */
> >> -
> >> int sock_unregister(int family)
> >> {
> >> if (family < 0 || family >= NPROTO)
> >> - return -1;
> >> + return -EINVAL;
> >>
> >> - net_family_write_lock();
> >> + spin_lock(&net_family_lock);
> >> net_families[family] = NULL;
> >
> >And this one is covered by net_families_lock, so we are set, since this
> >is the last one.
> >
> >> - net_family_write_unlock();
> >> + spin_unlock(&net_family_lock);
> >> +
> >> + synchronize_rcu();
> >
> >OK, and the caller is presumably going to free up whatever needs to be
> >freed.
> >
> >Or, if nothing need be freed, beyond this point, we know that all
> >non-sleeping code paths through any of the net_protocol_family
> >functions have completed.
> >
> >(So, are all of the functions non-sleeping, or do we care? The
> >definition of net_protocol_family in include/linux/net.h doesn't say
> >that they need to be non-sleeping...)
> >
> >atalk_create() can potentially sleep in the following line of code:
> >
> > sk = sk_alloc(PF_APPLETALK, GFP_KERNEL, &ddp_proto, 1);
>
> The module reference counts are used to prevent that. Since
> appletalk module can't be unloaded until there are no more appletalk
> sockets open (ie ref count of appletalk module is zero). To prevent
> new references there is a call to try_module_get() before the
> net_families[family]->create() call. This happens inside
> rcu_read_lock.
>
> >What prevents atalk_create() running concurrently with sock_unregister()?
>
> Module ref counts. I didn't want to start ref counting the net_families
> because that would cause another atomic op, and would penalize the normal
> case of non-modular IPV4 (and non-moduler IPV6).
OK.
> >(One possible reason is that ->create is only called in __sock_create(),
> >and that there is something preventing sock_unregister() from being called
> >before __sock_create() returns -- but I must defer to people who understand
> >networking better than do I.)
>
> That is why I added a comment to sock_unregister() describing how for the
> normal case of protocol as module, this is safe. But if protocol wants
> to call sock_unregister() other times it needs to do it's own ref counting.
>
> There are a couple of calls to sock_unregister() from module init routines,
> but these are in benign error unwind cases. The code in these cases
> could be reorganized to do sock_register() last during initilization,
> but it hardly matters.
Fair enough!
Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html