On 10/28/25 21:16, Matthew Brost wrote: > On Tue, Oct 28, 2025 at 10:32:54AM +0100, Christian König wrote: >> On 10/21/25 23:39, Matthew Brost wrote: >>> Drivers often use workqueues that are in the reclaim path (e.g., DRM >>> scheduler workqueues). It is useful to teach lockdep that memory cannot >>> be allocated on these workqueues. Add an interface to taint workqueue >>> lockdep with reclaim. >> >> Oh that is so wonderfully evil. I'm absolutely in favor of doing this. >> >> But can't we check for the existing WQ_MEM_RECLAIM flag in the workqueue >> handling instead? >> > > Tejun suggested tying the lockdep annotation to WQ_MEM_RECLAIM, but the > entire kernel explodes because many workqueues throughout Linux don’t > adhere to this rule. Here's a link to my latest reply to Tejun [1]. > > [1] > https://patchwork.freedesktop.org/patch/682494/?series=156284&rev=1#comment_1255380 >
Sorry my fault, I hadn't read up to the latest discussion when I wrote the mail. My educated guess is that a lot of wq just set WQ_MEM_RECLAIM to be guaranteed to to start even under memory pressure. So yeah probably best to keep your approach here for now and somebody from core MM should take a look at cleaning it up later on. >> Additional to that we should also make sure that the same wq is used for >> timeout and free and that this wq is single threaded *and* has the >> WQ_MEM_RECLAIM flag set. >> > > Currently, free runs on the same work queue as run_job. We could look > into moving it to a separate queue, but that’s a separate issue. We really need to make sure the free and timeout wq are the same and single threaded. The hack the scheduler currently does with removing and re-inserting the job on a timeout is something we should really try to fix. > IIRC the workqueue_struct is private and so we can't fish that out in > the DRM scheduler without adding helpers to workqueue layer. Ofc we > could do that too if you think this would be helpful. I might be wrong, but IIRC there was a helper to get the flags from the wq. That should be enough to test if it is single threaded or not. > >> Otherwise we run into the same lifetime issue with the job and memory >> reclaim during device reset as well. >> > > My patches in this series taint the submit_wq and timeout_wq in the DRM > scheduler [2]. I have a solid understanding of reclaim rules, and this > change helped uncover some convoluted cases in Xe—specifically in our > device reset code involving power management and reclaim [3]. So I can > confirm this has been quite helpful. Yeah, completely agree. We most likely have quite a bunch of issues in our reset code path as well. Regards, Christian. > > Matt > > [2] https://patchwork.freedesktop.org/patch/682496/?series=156284&rev=1 > [3] https://patchwork.freedesktop.org/series/156292/ > >> Regards, >> Christian. >> >>> >>> Cc: Tejun Heo <[email protected]> >>> Cc: Lai Jiangshan <[email protected]> >>> Signed-off-by: Matthew Brost <[email protected]> >>> --- >>> include/linux/workqueue.h | 19 +++++++++++++++++++ >>> kernel/workqueue.c | 9 +++++++++ >>> 2 files changed, 28 insertions(+) >>> >>> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h >>> index dabc351cc127..954c7eb7e225 100644 >>> --- a/include/linux/workqueue.h >>> +++ b/include/linux/workqueue.h >>> @@ -553,6 +553,25 @@ alloc_workqueue_lockdep_map(const char *fmt, unsigned >>> int flags, int max_active, >>> 1, lockdep_map, ##args)) >>> #endif >>> >>> + >>> +#ifdef CONFIG_LOCKDEP >>> +/** >>> + * taint_reclaim_workqueue - taint workqueue lockdep map with reclaim >>> + * @wq: workqueue to taint with reclaim >>> + * gfp: gfp taint >>> + * >>> + * Drivers often use workqueues that are in the reclaim path (e.g., DRM >>> + * scheduler workqueues). It is useful to teach lockdep that memory cannot >>> be >>> + * allocated on these workqueues. >>> + */ >>> +extern void taint_reclaim_workqueue(struct workqueue_struct *wq, gfp_t >>> gfp); >>> +#else >>> +static inline void taint_reclaim_workqueue(struct workqueue_struct *wq, >>> + gfp_t gfp) >>> +{ >>> +} >>> +#endif >>> + >>> /** >>> * alloc_ordered_workqueue - allocate an ordered workqueue >>> * @fmt: printf format for the name of the workqueue >>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c >>> index 45320e27a16c..fea410c20b71 100644 >>> --- a/kernel/workqueue.c >>> +++ b/kernel/workqueue.c >>> @@ -5846,6 +5846,15 @@ alloc_workqueue_lockdep_map(const char *fmt, >>> unsigned int flags, >>> return wq; >>> } >>> EXPORT_SYMBOL_GPL(alloc_workqueue_lockdep_map); >>> + >>> +void taint_reclaim_workqueue(struct workqueue_struct *wq, gfp_t gfp) >>> +{ >>> + fs_reclaim_acquire(gfp); >>> + lock_map_acquire(wq->lockdep_map); >>> + lock_map_release(wq->lockdep_map); >>> + fs_reclaim_release(gfp); >>> +} >>> +EXPORT_SYMBOL_GPL(taint_reclaim_workqueue); >>> #endif >>> >>> static bool pwq_busy(struct pool_workqueue *pwq) >>
