Hi Patrick,

I have done allmost all changes to the code as you suggested.  The
changes to use the return value of can_rx_register() also fixed a
minor flax with failing bind() and setsockopt() on raw sockets.

But there are two things left I would like to ask/understand:

Patrick McHardy <[EMAIL PROTECTED]> writes:

> > When the module is unloaded it calls can_proto_unregister() which
> > clears the pointer.  Do you see a race condition here?
> 
> Yes, you do request_module, load the module, get the cp pointer
> from proto_tab, the module is unloaded again. cp points to
> stable memory. Using module references would fix this.

How would I use the module reference counter?  Somehow with
try_module_get()?  I have thought something like

        cp = proto_tab[protocol];
        if (!cp ...)
                return ...;

        if (!try_module_get(cp->prot->owner))
                return ...;

        sk = sk_alloc(...)

        module_put(...);
        return ret;

But here I see two problems:

1. Between the check !cp...  and referencing cp->prot->owner the
   module could get unloaded and the reference be invalid.  Is there
   some lock I can hold that prevents module unloading?  I haven't
   found something like this in include/linux/module.h

2. If the module gets unloaded after the first check and
   request_module() but before the call to try_module_get() the
   socket() syscall will return with error, although module auto
   loading would normally be successful.  How can I prevent that?

> > find_dev_rcv_lists() is called in one place from can_rcv() with RCU
> > lock held, as you write.  The other two calls to find_dev_rcv_lists()
> > are from can_rx_register/unregister() functions which change the
> > receive lists.  Therefore, we can't only use RCU but need protection
> > against simultanous writes.  We do this with the spin_lock_bh().  The
> > _bh variant, because can_rcv() runs in interrupt and we need to block
> > that.  I thought this is pretty standard.
> > 
> > I'll check this again tomorrow, but I have put much time in these
> > locking issues already, changed it quite a few times and hoped to have
> > got it right finally.
> 
> 
> I'm not saying you should use *only* RCU, you need the lock
> for additions/removal of course, but since the receive path
> doesn't take that lock and relies on RCU, you need to use
> the _rcu list walking variant to avoid races with concurrent
> list changes.

I have no objections to add the _rcu suffix for the code changing the
receive lists, but I don't see why it's necessary.  When I do a
spin_lock_bh() before writing, can't I be sure that there is no
interrupt routine running in parallel while I hold this spinlock?  If
so, there is no reader in parallel because the can_rcv() function runs
in a softirq.  I'd really like to understand why you think the writers
should also use the _rcu variant.  I'm sorry if I miss something
obvious here, but could you try to explain it to me?

urs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to