On Thu, Mar 05, 2026 at 09:38:16AM +0100, Philipp Stanner wrote: > On Thu, 2026-03-05 at 09:27 +0100, Boris Brezillon wrote: > > Hi Matthew, > > > > On Wed, 4 Mar 2026 18:04:25 -0800 > > Matthew Brost <[email protected]> wrote: > > > > > On Wed, Mar 04, 2026 at 02:51:39PM -0800, Chia-I Wu wrote: > > > > Hi, > > > > > > > > Our system compositor (surfaceflinger on android) submits gpu jobs > > > > from a SCHED_FIFO thread to an RT gpu queue. However, because > > > > workqueue threads are SCHED_NORMAL, the scheduling latency from submit > > > > to run_job can sometimes cause frame misses. We are seeing this on > > > > panthor and xe, but the issue should be common to all drm_sched users. > > > > > > > > > > I'm going to assume that since this is a compositor, you do not pass > > > input dependencies to the page-flip job. Is that correct? > > > > > > If so, I believe we could fairly easily build an opt-in DRM sched path > > > that directly calls run_job in the exec IOCTL context (I assume this is > > > SCHED_FIFO) if the job has no dependencies. > > > > I guess by ::run_job() you mean something slightly more involved that > > checks if: > > > > - other jobs are pending
Yes. > > - enough credits (AKA ringbuf space) is available Yes. > > - and probably other stuff I forgot about The scheduler is not stopped; serialize the bypass path with scheduler stop/start. > > > > > > > > This would likely break some of Xe’s submission-backend assumptions > > > around mutual exclusion and ordering based on the workqueue, but that > > > seems workable. I don’t know how the Panthor code is structured or > > > whether they have similar issues. > > > > Honestly, I'm not thrilled by this fast-path/call-run_job-directly idea > > you're describing. There's just so many things we can forget that would > > lead to races/ordering issues that will end up being hard to trigger and > > debug. > > > > +1 > > I'm not thrilled either. More like the opposite of thrilled actually. > > Even if we could get that to work. This is more of a maintainability > issue. > > The scheduler is full of insane performance hacks for this or that > driver. Lockless accesses, a special lockless queue only used by that > one party in the kernel (a lockless queue which is nowadays, after N > reworks, being used with a lock. Ah well). > This is not relevant to this discussion—see below. In general, I agree that the lockless tricks in the scheduler are not great, nor is the fact that the scheduler became a dumping ground for driver-specific features. But again, that is not what we’re talking about here—see below. > In the past discussions Danilo and I made it clear that more major > features in _new_ patch series aimed at getting merged into drm/sched > must be preceded by cleanup work to address some of the scheduler's > major problems. Ah, we've moved to dictatorship quickly. Noted. > I can't say I agree with either of you here. In about an hour, I seemingly have a bypass path working in DRM sched + Xe, and my diff is: 108 insertions(+), 31 deletions(-) About 40 lines of the insertions are kernel-doc, so I'm not buying that this is a maintenance issue or a major feature - it is literally a single new function. I understand a bypass path can create issues—for example, on certain queues in Xe I definitely can't use the bypass path, so Xe simply wouldn’t use it in those cases. This is the driver's choice to use or not. If a driver doesn't know how to use the scheduler, well, that’s on the driver. Providing a simple, documented function as a fast path really isn't some crazy idea. The alternative—asking for RT workqueues or changing the design to use kthread_worker—actually is. > That's especially true if it's features aimed at performance buffs. > With the above mindset, I'm actually very confused why this series [1] would even be considered as this order of magnitude greater in complexity than my suggestion here. Matt [1] https://patchwork.freedesktop.org/series/159025/ > > > P.
