From: Jiri Pirko <j...@resnulli.us> Date: Mon, 16 Oct 2017 16:28:28 +0200
> From: Ido Schimmel <ido...@mellanox.com> > > When an EMAD is transmitted, a timeout work item is scheduled with a > delay of 200ms, so that another EMAD will be retried until a maximum of > five retries. > > In certain situations, it's possible for the function waiting on the > EMAD to be associated with a work item that is queued on the same > workqueue (`mlxsw_core`) as the timeout work item. This results in > flushing a work item on the same workqueue. > > According to commit e159489baa71 ("workqueue: relax lockdep annotation > on flush_work()") the above may lead to a deadlock in case the workqueue > has only one worker active or if the system in under memory pressure and > the rescue worker is in use. The latter explains the very rare and > random nature of the lockdep splats we have been seeing: ... > Fix this by creating another workqueue for EMAD timeouts, thereby > preventing the situation of a work item trying to flush a work item > queued on the same workqueue. > > Fixes: caf7297e7ab5f ("mlxsw: core: Introduce support for asynchronous EMAD > register access") > Signed-off-by: Ido Schimmel <ido...@mellanox.com> > Reported-by: Jiri Pirko <j...@mellanox.com> > Signed-off-by: Jiri Pirko <j...@mellanox.com> Applied.