On Fri, Feb 09, 2007 at 09:52:04AM -0800, Stephen Hemminger wrote:
> On Fri, 9 Feb 2007 08:42:11 +0100
> Jarek Poplawski <[EMAIL PROTECTED]> wrote:
> 
> > On 07-02-2007 23:09, Stephen Hemminger wrote:
> > > On Wed, 7 Feb 2007 12:52:16 -0800
> > > Andrew Morton <[EMAIL PROTECTED]> wrote:
> > ...
> > >> Feb  7 21:20:18 plop kernel: BUG: unable to handle kernel paging request 
> > >> at
> > >> virtual address 6b6b6b6b
> > >> Feb  7 21:20:18 plop kernel:  printing eip:
> > >> Feb  7 21:20:18 plop kernel: *pde = 00000000
> > >> Feb  7 21:20:18 plop kernel: Oops: 0000 [#1]
> > >> Feb  7 21:20:18 plop kernel: CPU:    0
> > >> Feb  7 21:20:19 plop kernel: EIP:    0060:[pg0+814360305/1067136000]    
> > >> Not
> > >> tainted VLI
> > >> Feb  7 21:20:19 plop kernel: EIP:    0060:[<f0eed6f1>]    Not tainted VLI
> > >> Feb  7 21:20:19 plop kernel: EFLAGS: 00010202   (2.6.20.0.rc7-1mdv #1)
> > >> Feb  7 21:20:19 plop kernel: EIP is at port_carrier_check+0x22/0x75 
> > >> [bridge]
> > >> Feb  7 21:20:19 plop kernel: eax: 6b6b6b6b   ebx: 6b6b6b6b   ecx: 
> > >> 00000000  
> > 
> > I think it's caused by pending delayed workqueue
> > trying to use dev after kfree (POISON_FREE in eax, ebx). 
> > 
> > > static void port_carrier_check(struct work_struct *work)
> > > {
> > >        struct net_bridge_port *p;
> > >        struct net_device *dev;
> > >        struct net_bridge *br;
> > >
> > >        dev = container_of(work, struct net_bridge_port,
> > >                           carrier_check.work)->dev;
> > >        work_release(work);
> > >
> > >        rtnl_lock();
> > >        p = dev->br_port;
> > >        if (!p)
> > >                goto done;
> > >        br = p->br;
> > >
> > >        if (netif_carrier_ok(dev))
> > >                p->path_cost = port_cost(dev);
> > >
> > >        if (br->dev->flags & IFF_UP) {
> > 
> > My investigation seems to point at this line (p == ebx
> > but not NULL because of mem debugging on, probably).

Sorry, I overpasted. This is the line:

-->        br = p->br;

> The carrier_check is canceled by removal of port from bridge.
> Perhaps there is something broken in rcu assumptions under Qemu

If you mean this:

> static void del_nbp(struct net_bridge_port *p)
> {
> ...
>        cancel_delayed_work(&p->carrier_check);

it's not sufficient. According to workqueue.h:

> /*
>  * Kill off a pending schedule_delayed_work().  Note that the work callback
>  * function may still be running on return from cancel_delayed_work().  Run
>  * flush_scheduled_work() to wait on it.
>  */
> static inline int cancel_delayed_work(struct delayed_work *work)

I can't see how rcu could help here with this pointer
to dev passed on to delayed_work (out of any rcu block).

IMHO dev_hold/dev_put (or something alike) is needed here.

Regards,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to