Re: drm_sched run_job and scheduling latency

Chia-I Wu Thu, 05 Mar 2026 21:33:55 -0800

On Thu, Mar 5, 2026 at 1:23 AM Boris Brezillon
<[email protected]> wrote:
>
> On Wed, 4 Mar 2026 14:51:39 -0800
> Chia-I Wu <[email protected]> wrote:
>
> > Hi,
> >
> > Our system compositor (surfaceflinger on android) submits gpu jobs
> > from a SCHED_FIFO thread to an RT gpu queue. However, because
> > workqueue threads are SCHED_NORMAL, the scheduling latency from submit
> > to run_job can sometimes cause frame misses. We are seeing this on
> > panthor and xe, but the issue should be common to all drm_sched users.
> >
> > Using a WQ_HIGHPRI workqueue helps, but it is still not RT (and won't
> > meet future android requirements). It seems either workqueue needs to
> > gain RT support, or drm_sched needs to support kthread_worker.
> >
> > I know drm_sched switched from kthread_worker to workqueue for better
> > scaling when xe was introduced.
>
> Actually, it went from a plain kthread with open-coded "work" support to
> workqueues. The kthread_worker+kthread_work model looks closer to what
> workqueues provide, so transitioning drivers to it shouldn't be too
> hard. The scalability issue you mentioned (one thread per GPU context
> doesn't scale) doesn't apply, because we can pretty easily share the
> same kthread_worker for all drm_gpu_scheduler instances, just like we
> can share the same workqueue for all drm_gpu_scheduler instances today.
> Luckily, it seems that no one so far has been using
> WQ_PERCPU-workqueues, so that's one less thing we need to worry about.
> The last remaining drawback with a kthread_work[er] based solution is
> the fact workqueues can adjust the number of worker threads on demand
> based on the load. If we really need this flexibility (a non static
> number of threads per-prio level per-driver), that's something we'll
> have to add support for.
Wait, I thought this was the exact scaling issue that workqueue solved
for xe and panthor? We needed to execute run_jobs for N
drm_gpu_scheduler instances, where N is in total control of the
userspace. We didn't want to serialize the executions to a single
thread.


Granted, panthor holds a lock in its run_job callback and does not
benefit from a workqueue. I don't know how xe's run_job does though.

>
> For Panthor, the way I see it, we could start with one thread per-group
> priority, and then pick the worker thread to use at drm_sched_init()
> based on the group prio. If we need something with a thread pool, then
> drm_sched will have to know about those threads, and do some load
> balancing when queueing the works...
>
> Note that someone at Collabora is working on dynamic context priority
> support, meaning we'll have to be able to change the drm_gpu_scheduler
> kthread_worker at runtime.
>
> TLDR; All of this is doable, but it's more work (for us, DRM devs) than
> asking RT prio support to be added to workqueues.

It looks like WQ_RT was last brought up in

  https://lore.kernel.org/all/[email protected]/

Maybe adding some form of bring-your-own-worker-pool support to
workqueue will be acceptable?

>
> > But if drm_sched can support either
> > workqueue or kthread_worker during drm_sched_init, drivers can
> > selectively use kthread_worker only for RT gpu queues. And because
> > drivers require CAP_SYS_NICE for RT gpu queues, this should not cause
> > scaling issues.
>
> I think, whatever we choose to go for, we probably don't want to keep
> both models around, because that's going to be a pain to maintain.

Re: drm_sched run_job and scheduling latency

Reply via email to