On Thu, 2019-04-18 at 15:05 +0800, Firo wrote:
> 
> On 4/2/19 12:25 AM, Saeed Mahameed wrote:
> > On Mon, 2019-04-01 at 20:24 +0800, Firo Yang wrote:
> > > This crash is triggered by a user-after-free since lake of
> > > the synchronization of a race condition between 
> > > be_update_queues() modifying multi-purpose channels of
> > > network device and be_tx_timeout().
> > > 
> > > BUG: unable to handle kernel NULL pointer dereference at (null)
> > > Call Trace:
> > > be_tx_timeout+0xa5/0x360 [be2net]
> > > dev_watchdog+0x1d8/0x210
> > > call_timer_fn+0x32/0x140
> > > 
> > > To fix it, detach the interface before modifying
> > > multi-purpose channels of network device.
> > > 
> > > Signed-off-by: Firo Yang <fy...@suse.com>
> > > ---
> > >  drivers/net/ethernet/emulex/benet/be_main.c | 12 ++++++++----
> > >  1 file changed, 8 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/net/ethernet/emulex/benet/be_main.c
> > > b/drivers/net/ethernet/emulex/benet/be_main.c
> > > index d5026909dec5..25d0128bf684 100644
> > > --- a/drivers/net/ethernet/emulex/benet/be_main.c
> > > +++ b/drivers/net/ethernet/emulex/benet/be_main.c
> > > @@ -4705,6 +4705,8 @@ int be_update_queues(struct be_adapter
> > > *adapter)
> > >   struct net_device *netdev = adapter->netdev;
> > >   int status;
> > >  
> > > + netif_device_detach(netdev);
> > > +
> > 
> > This will reduce the probability, but will not do the trick.
> > since this will not guarantee that the dev_watchdog is disabled.
> Hi Saeed,
> 
> What about using dev_watchdog_down/up() to temporarily disable the
> dev_watchdog?
> 
> +++ b/drivers/net/ethernet/emulex/benet/be_main.c
> @@ -4697,6 +4697,8 @@ int be_update_queues(struct be_adapter
> *adapter)
>         struct net_device *netdev = adapter->netdev;
>         int status;
> 
> +       dev_watchdog_down();
> +

there is no such API, currently this is a static function, and i don't
think it is a good idea to mess around with the watchdog.

If you want to avoid deferred work and explicit locking, you need
something similar to what you did with the device_detach to flag to the
watchdog to not look at your tx queues.

by looking at be_close() I see that it calls netif_tx_disable(netdev);
which provides some kind of state synchronization with the watchdog.

so maybe netif_device_detach() will work, but it is a very heavy gun!
netif_carrier_off(netdev) should do the work, and at the end you will
need to restore the original carrier state.

carrier_ok = netif_carrier_ok(netdev);
/* must be called before netif_tx_disable() */
netif_carrier_off(netdev); 

// do stuff

if (carrier_ok)
        netif_carrier_on(netdev);

>         if (netif_running(netdev))
>                 be_close(netdev);
> 
> @@ -4711,21 +4713,21 @@ int be_update_queues(struct be_adapter
> *adapter)
>         be_clear_queues(adapter);
>         status = be_cmd_if_destroy(adapter, adapter->if_handle,  0);
>         if (status)
> -               return status;
> +               goto out;
> 
>         if (!msix_enabled(adapter)) {
>                 status = be_msix_enable(adapter);
>                 if (status)
> -                       return status;
> +                       goto out;
>         }
> 
>         status = be_if_create(adapter);
>         if (status)
> -               return status;
> +               goto out;
> 
>         status = be_setup_queues(adapter);
>         if (status)
> -               return status;
> +               goto out;
> 
>         be_schedule_worker(adapter);
> 
> @@ -4741,6 +4743,9 @@ int be_update_queues(struct be_adapter
> *adapter)
>         if (netif_running(netdev))
>                 status = be_open(netdev);
> 
> +out:
> +       dev_watchdog_up();
> 
> Thanks,
> Firo
> 
> > what you need is proper locking mechanism and/or scheduling the
> > tx_timeout handling out of atomic context if a mutex will be
> > required.
> > 
> > netif_device_detach is a too heavy hammer for such synchronizations
> > tasks.
> > 
> > >   if (netif_running(netdev))
> > >           be_close(netdev);
> > >  
> > > @@ -4719,21 +4721,21 @@ int be_update_queues(struct be_adapter
> > > *adapter)
> > >   be_clear_queues(adapter);
> > >   status = be_cmd_if_destroy(adapter, adapter->if_handle,  0);
> > >   if (status)
> > > -         return status;
> > > +         goto out;
> > >  
> > >   if (!msix_enabled(adapter)) {
> > >           status = be_msix_enable(adapter);
> > >           if (status)
> > > -                 return status;
> > > +                 goto out;
> > >   }
> > >  
> > >   status = be_if_create(adapter);
> > >   if (status)
> > > -         return status;
> > > +         goto out;
> > >  
> > >   status = be_setup_queues(adapter);
> > >   if (status)
> > > -         return status;
> > > +         goto out;
> > >  
> > >   be_schedule_worker(adapter);
> > >  
> > > @@ -4748,6 +4750,8 @@ int be_update_queues(struct be_adapter
> > > *adapter)
> > >   if (netif_running(netdev))
> > >           status = be_open(netdev);
> > >  
> > > +out:
> > > + netif_device_attach(netdev);
> > >   return status;
> > >  }
> > >  

Reply via email to