On Thu, Aug 10, 2006 at 01:28:27PM -0700, Stephen Hemminger wrote:
> >
> >On Wed, Aug 09, 2006 at 11:31:42AM -0700, Stephen Hemminger wrote:
> >> Replace the gross custom locking done in socket code for net_family[]
> >> with simple RCU usage. Some reordering necessary to avoid sleep
> >> issues with sock_alloc.
> >
> >Definitely a good use of RCU from a read-intensive standpoint -- does
> >anyone other than Linux-kernel networking developers change the elements
> >of the net_family[] array except at boot and shutdown?  ;-)
> >
> >Some comments included below.  Looks good, but one question about
> >things like atalk_create() being able to sleep and a place or two
> >where a comment would be good.
> >
> 
> ...
> 
> >>  
> >> +  /*
> >> +   *      Allocate the socket and allow the family to set things up. if
> >> +   *      the protocol is 0, the family is instructed to select an 
> >> appropriate
> >> +   *      default.
> >> +   */
> >> +  sock = sock_alloc();
> >> +  if (!sock) {
> >> +          printk(KERN_WARNING "socket: no more sockets\n");
> >> +          return -ENFILE; /* Not exactly a match, but its the
> >> +                             closest posix thing */
> >> +  }
> >> +
> >> +  sock->type = type;
> >> +
> >>  #if defined(CONFIG_KMOD)
> >>    /* Attempt to load a protocol module if the find failed.
> >>     *
> >> @@ -1166,70 +1138,59 @@
> >>     * requested real, full-featured networking support upon configuration.
> >>     * Otherwise module support will break!
> >>     */
> >> -  if (net_families[family] == NULL) {
> >> +  if (net_families[family] == NULL)
> >>            request_module("net-pf-%d", family);
> >
> >OK, I'll bite...
> >
> >What happens if the module is not present?  Or is this what the
> >"Otherwise module support will break" comment is getting at?
> 
> request_module loads the module (and blocks). One would
> expect that the module loaded would set net_families[] via
> sock_register, so later reference would succeed. Comment is
> historical since intention was to make base socket code itself modular
> which never was done, and is probably a bad idea to even consider.
> 
> If module is not present, then net_families[] will still be NULL.

Got it!

> >Also, this reference to "net_families[family]" is done without
> >rcu_dereference() and without any clear update-side lock.  This
> >just happens to be OK, since we are only testing for NULL, but
> >should at least have a comment.
> 
> >> -  }
> >>  #endif
> >>  
> >> -  net_family_read_lock();
> >> -  if (net_families[family] == NULL) {
> >> -          err = -EAFNOSUPPORT;
> >> -          goto out;
> >> -  }
> >> -
> >> -/*
> >> - *        Allocate the socket and allow the family to set things up. if
> >> - *        the protocol is 0, the family is instructed to select an 
> >> appropriate
> >> - *        default.
> >> - */
> >> -
> >> -  if (!(sock = sock_alloc())) {
> >> -          printk(KERN_WARNING "socket: no more sockets\n");
> >> -          err = -ENFILE;  /* Not exactly a match, but its the
> >> -                             closest posix thing */
> >> -          goto out;
> >> -  }
> >> -
> >> -  sock->type = type;
> >> +  rcu_read_lock();
> >> +  pf = rcu_dereference(net_families[family]);
> >
> >OK, so the elements of the net_families array are protected by RCU.
> >All references should either be under rcu_read_lock() and accessed
> >via rcu_dereference() or under the update-side lock, whatever that
> >might be.
> >
> 
> Yes, the net_family_lock

Good.

> >>  
> >> -/*
> >> +/**
> >> + *        sock_unregister - remove a protocol handler
> >> + *        @family: protocol family to remove
> >> + *
> >>   *        This function is called by a protocol handler that wants to
> >>   *        remove its address family, and have it unlinked from the
> >> - *        SOCKET module.
> >> + *        new socket creation.
> >> + *
> >> + *        If protocol handler is a module, then it can use module 
> >> reference
> >> + *        counts to protect against new references. If protocol handler 
> >> is not
> >> + *        a module then it needs to provide its own protection in
> >> + *        the ops->create routine.
> >>   */
> >> -
> >>  int sock_unregister(int family)
> >>  {
> >>    if (family < 0 || family >= NPROTO)
> >> -          return -1;
> >> +          return -EINVAL;
> >>  
> >> -  net_family_write_lock();
> >> +  spin_lock(&net_family_lock);
> >>    net_families[family] = NULL;
> >
> >And this one is covered by net_families_lock, so we are set, since this
> >is the last one.
> >
> >> -  net_family_write_unlock();
> >> +  spin_unlock(&net_family_lock);
> >> +
> >> +  synchronize_rcu();
> >
> >OK, and the caller is presumably going to free up whatever needs to be
> >freed.
> >
> >Or, if nothing need be freed, beyond this point, we know that all
> >non-sleeping code paths through any of the net_protocol_family
> >functions have completed.
> >
> >(So, are all of the functions non-sleeping, or do we care?  The
> >definition of net_protocol_family in include/linux/net.h doesn't say
> >that they need to be non-sleeping...)
> >
> >atalk_create() can potentially sleep in the following line of code:
> >
> >     sk = sk_alloc(PF_APPLETALK, GFP_KERNEL, &ddp_proto, 1);
> 
> The module reference counts are used to prevent that. Since
> appletalk module can't be unloaded until there are no more appletalk
> sockets open (ie ref count of appletalk module is zero). To prevent
> new references there is a call to try_module_get() before the
> net_families[family]->create() call. This happens inside
> rcu_read_lock.
> 
> >What prevents atalk_create() running concurrently with sock_unregister()?
> 
> Module ref counts. I didn't want to start ref counting the net_families
> because that would cause another atomic op, and would penalize the normal
> case of non-modular IPV4 (and non-moduler IPV6).

OK.

> >(One possible reason is that ->create is only called in __sock_create(),
> >and that there is something preventing sock_unregister() from being called
> >before __sock_create() returns -- but I must defer to people who understand
> >networking better than do I.)
> 
> That is why I added a comment to sock_unregister() describing how for the
> normal case of protocol as module, this is safe. But if protocol wants
> to call sock_unregister() other times it needs to do it's own ref counting.
> 
> There are a couple of calls to sock_unregister() from module init routines,
> but these are in benign error unwind cases. The code in these cases
> could be reorganized to do sock_register() last during initilization,
> but it hardly matters.

Fair enough!

                                                        Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to