On Thu, Mar 5, 2026 at 1:23 AM Boris Brezillon <[email protected]> wrote: > > On Wed, 4 Mar 2026 14:51:39 -0800 > Chia-I Wu <[email protected]> wrote: > > > Hi, > > > > Our system compositor (surfaceflinger on android) submits gpu jobs > > from a SCHED_FIFO thread to an RT gpu queue. However, because > > workqueue threads are SCHED_NORMAL, the scheduling latency from submit > > to run_job can sometimes cause frame misses. We are seeing this on > > panthor and xe, but the issue should be common to all drm_sched users. > > > > Using a WQ_HIGHPRI workqueue helps, but it is still not RT (and won't > > meet future android requirements). It seems either workqueue needs to > > gain RT support, or drm_sched needs to support kthread_worker. > > > > I know drm_sched switched from kthread_worker to workqueue for better > > scaling when xe was introduced. > > Actually, it went from a plain kthread with open-coded "work" support to > workqueues. The kthread_worker+kthread_work model looks closer to what > workqueues provide, so transitioning drivers to it shouldn't be too > hard. The scalability issue you mentioned (one thread per GPU context > doesn't scale) doesn't apply, because we can pretty easily share the > same kthread_worker for all drm_gpu_scheduler instances, just like we > can share the same workqueue for all drm_gpu_scheduler instances today. > Luckily, it seems that no one so far has been using > WQ_PERCPU-workqueues, so that's one less thing we need to worry about. > The last remaining drawback with a kthread_work[er] based solution is > the fact workqueues can adjust the number of worker threads on demand > based on the load. If we really need this flexibility (a non static > number of threads per-prio level per-driver), that's something we'll > have to add support for. Wait, I thought this was the exact scaling issue that workqueue solved for xe and panthor? We needed to execute run_jobs for N drm_gpu_scheduler instances, where N is in total control of the userspace. We didn't want to serialize the executions to a single thread.
Granted, panthor holds a lock in its run_job callback and does not benefit from a workqueue. I don't know how xe's run_job does though. > > For Panthor, the way I see it, we could start with one thread per-group > priority, and then pick the worker thread to use at drm_sched_init() > based on the group prio. If we need something with a thread pool, then > drm_sched will have to know about those threads, and do some load > balancing when queueing the works... > > Note that someone at Collabora is working on dynamic context priority > support, meaning we'll have to be able to change the drm_gpu_scheduler > kthread_worker at runtime. > > TLDR; All of this is doable, but it's more work (for us, DRM devs) than > asking RT prio support to be added to workqueues. It looks like WQ_RT was last brought up in https://lore.kernel.org/all/[email protected]/ Maybe adding some form of bring-your-own-worker-pool support to workqueue will be acceptable? > > > But if drm_sched can support either > > workqueue or kthread_worker during drm_sched_init, drivers can > > selectively use kthread_worker only for RT gpu queues. And because > > drivers require CAP_SYS_NICE for RT gpu queues, this should not cause > > scaling issues. > > I think, whatever we choose to go for, we probably don't want to keep > both models around, because that's going to be a pain to maintain.
