> -----Original Message-----
> From: netdev-ow...@vger.kernel.org <netdev-ow...@vger.kernel.org>
> On Behalf Of Kaige Li
> Sent: Tuesday, June 23, 2020 8:41 PM
> To: David Miller <da...@davemloft.net>
> Cc: Christian Benvenuti (benve) <be...@cisco.com>; _gov...@gmx.com;
> netdev@vger.kernel.org; linux-ker...@vger.kernel.org;
> lixuef...@loongson.cn; yangtie...@loongson.cn
> Subject: Re: [PATCH RESEND] net/cisco: Fix a sleep-in-atomic-context bug in
> enic_init_affinity_hint()
> 
> 
> On 06/24/2020 11:23 AM, David Miller wrote:
> > From: Kaige Li <lika...@loongson.cn>
> > Date: Wed, 24 Jun 2020 09:56:47 +0800
> >
> >> On 06/24/2020 06:26 AM, David Miller wrote:
> >>> From: David Miller <da...@davemloft.net>
> >>> Date: Tue, 23 Jun 2020 14:33:11 -0700 (PDT)
> >>>
> >>>> Calling a NIC driver open function from a context holding a
> >>>> spinlock is very much the real problem, so many operations have to
> >>>> sleep and in face that ->ndo_open() method is defined as being
> >>>> allowed to sleep and that's why the core networking never invokes
> >>>> it with spinlocks
> >>>                                                         ^^^^
> >>>
> >>> I mean "without" of course. :-)
> >>>
> >>>> held.
> >> Did you mean that open function should be out of spinlock? If so, I
> >> will send V2 patch.
> > Yes, but only if that is safe.
> >
> > You have to analyze the locking done by this driver and fix it properly.
> > I anticipate it is not just a matter of changing where the spinlock
> > is held, you will have to rearchitect things a bit.
> 
> Okay, I will careful analyze this question, and make a suitable patch in V2.
> 
> Thank you.

Hi David, Kaige,
I assume you are referring to the enic_api_lock spin_lock used in

  enic_reset()
  which is used to hard-reset the interface when the driver receives an error 
interrupt

and

  enic_tx_hang_reset()
  which is used to soft-reset the interface when the stack detects the TX 
timeout.

Both reset functions above are called in the context of a workqueue.
However, the same spin_lock (enic_api_lock) is taken by the 
enic_api_devcmd_proxy_by_index() api that is exported by the enic driver and 
that the usnic_verbs driver uses to send commands to the firmware.
This spin_lock was likely added to guarantee that no firmware command is sent 
by usnic_verbs to an enic interface that is undergoing a reset.
Unfortunately changing that spin_lock to a mutex will likely not work on the 
usnic_verbs side, and removing the spin_lock will require a rearchitect of the 
code as mentioned by David.

Kaige, V2 is of course more than welcome and we can test it too.
We/Cisco will also look into it, hopefully a small code reorg will be 
sufficient.

David, from your previous emails on this 3D I assume
- we can leave request_irq() in ndo_open (ie, no need to move it to pci/probe), 
which is done by a number of other drivers too.
- no need to change GFP_KERNEL to GFP_ATOMIC as it was suggested in the 
original patch.

Thanks!
/Chris

Reply via email to