Jakub Kicinski <k...@kernel.org> writes:
On Thu, 13 Aug 2020 15:51:46 +0300 Shay Agroskin wrote:
Long answer:
The ena_destroy_device() function is called with rtnl_lock()
held,
so it cannot run in parallel with the reset function. Also the
destroy function clears the bit ENA_FLAG_TRIGGER_RESET without
which the reset function just exits without doing anything.
A problem can then only happen when some routine sets the
ENA_FLAG_TRIGGER_RESET bit before the reset function is
executed,
the following describes all functions from which this bit can
be
set:
ena_fw_reset_device() runs from a workqueue, it can be preempted
right
before it tries to take the rtnl_lock. Then after arbitrarily
long
delay it will start again, take the lock, and dereference
adapter->flags. But adapter could have been long freed at this
point.
Missed that the check for the 'flags' field also requires that
netdev_priv field (adapter variable) would be allocated. Thank you
for pointing that out, this indeed needs to be fixed. I'll add
reset work destruction in next patchset.
Thank you for reviewing it
Unless you flush a workqueue or cancel_work_sync() you can never
be
sure it's not scheduled. And I can only see a flush when module
is
unloaded now.