On Thu, 2026-03-05 at 09:27 +0100, Boris Brezillon wrote: > Hi Matthew, > > On Wed, 4 Mar 2026 18:04:25 -0800 > Matthew Brost <[email protected]> wrote: > > > On Wed, Mar 04, 2026 at 02:51:39PM -0800, Chia-I Wu wrote: > > > Hi, > > > > > > Our system compositor (surfaceflinger on android) submits gpu jobs > > > from a SCHED_FIFO thread to an RT gpu queue. However, because > > > workqueue threads are SCHED_NORMAL, the scheduling latency from submit > > > to run_job can sometimes cause frame misses. We are seeing this on > > > panthor and xe, but the issue should be common to all drm_sched users. > > > > > > > I'm going to assume that since this is a compositor, you do not pass > > input dependencies to the page-flip job. Is that correct? > > > > If so, I believe we could fairly easily build an opt-in DRM sched path > > that directly calls run_job in the exec IOCTL context (I assume this is > > SCHED_FIFO) if the job has no dependencies. > > I guess by ::run_job() you mean something slightly more involved that > checks if: > > - other jobs are pending > - enough credits (AKA ringbuf space) is available > - and probably other stuff I forgot about > > > > > This would likely break some of Xe’s submission-backend assumptions > > around mutual exclusion and ordering based on the workqueue, but that > > seems workable. I don’t know how the Panthor code is structured or > > whether they have similar issues. > > Honestly, I'm not thrilled by this fast-path/call-run_job-directly idea > you're describing. There's just so many things we can forget that would > lead to races/ordering issues that will end up being hard to trigger and > debug. >
+1 I'm not thrilled either. More like the opposite of thrilled actually. Even if we could get that to work. This is more of a maintainability issue. The scheduler is full of insane performance hacks for this or that driver. Lockless accesses, a special lockless queue only used by that one party in the kernel (a lockless queue which is nowadays, after N reworks, being used with a lock. Ah well). In the past discussions Danilo and I made it clear that more major features in _new_ patch series aimed at getting merged into drm/sched must be preceded by cleanup work to address some of the scheduler's major problems. That's especially true if it's features aimed at performance buffs. P.
