From: Jiri Pirko <j...@resnulli.us>
Date: Mon, 16 Oct 2017 16:28:28 +0200

> From: Ido Schimmel <ido...@mellanox.com>
> 
> When an EMAD is transmitted, a timeout work item is scheduled with a
> delay of 200ms, so that another EMAD will be retried until a maximum of
> five retries.
> 
> In certain situations, it's possible for the function waiting on the
> EMAD to be associated with a work item that is queued on the same
> workqueue (`mlxsw_core`) as the timeout work item. This results in
> flushing a work item on the same workqueue.
> 
> According to commit e159489baa71 ("workqueue: relax lockdep annotation
> on flush_work()") the above may lead to a deadlock in case the workqueue
> has only one worker active or if the system in under memory pressure and
> the rescue worker is in use. The latter explains the very rare and
> random nature of the lockdep splats we have been seeing:
 ...
> Fix this by creating another workqueue for EMAD timeouts, thereby
> preventing the situation of a work item trying to flush a work item
> queued on the same workqueue.
> 
> Fixes: caf7297e7ab5f ("mlxsw: core: Introduce support for asynchronous EMAD 
> register access")
> Signed-off-by: Ido Schimmel <ido...@mellanox.com>
> Reported-by: Jiri Pirko <j...@mellanox.com>
> Signed-off-by: Jiri Pirko <j...@mellanox.com>

Applied.

Reply via email to