Quoting Tvrtko Ursulin (2018-05-10 18:26:31)
>
> On 10/05/2018 17:25, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-05-10 17:09:14)
> >>
> >> On 09/05/2018 15:27, Chris Wilson wrote:
> >>> Bypass using the tasklet to submit the first request to HW, as the
> >>> tasklet may be deferred unto ksoftirqd and at a minimum will add in
> >>> excess of 10us (and maybe tens of milliseconds) to our execution
> >>> latency. This latency reduction is most notable when execution flows
> >>> between engines.
> >>>
> >>> v2: Beware handling preemption completion from the direct submit path as
> >>> well.
> >>> v3: Make the abuse clear and track our extra state inside i915_tasklet.
> >>>
> >>> Suggested-by: Tvrtko Ursulin <[email protected]>
> >>> Signed-off-by: Chris Wilson <[email protected]>
> >>> Cc: Tvrtko Ursulin <[email protected]>
> >>> ---
> >>> drivers/gpu/drm/i915/i915_tasklet.h | 24 +++++++
> >>> drivers/gpu/drm/i915/intel_guc_submission.c | 10 ++-
> >>> drivers/gpu/drm/i915/intel_lrc.c | 71 +++++++++++++++++----
> >>> 3 files changed, 89 insertions(+), 16 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/i915_tasklet.h
> >>> b/drivers/gpu/drm/i915/i915_tasklet.h
> >>> index 42b002b88edb..99e2fa2241ba 100644
> >>> --- a/drivers/gpu/drm/i915/i915_tasklet.h
> >>> +++ b/drivers/gpu/drm/i915/i915_tasklet.h
> >>> @@ -8,8 +8,11 @@
> >>> #define _I915_TASKLET_H_
> >>>
> >>> #include <linux/atomic.h>
> >>> +#include <linux/bitops.h>
> >>> #include <linux/interrupt.h>
> >>>
> >>> +#include "i915_gem.h"
> >>> +
> >>> /**
> >>> * struct i915_tasklet - wrapper around tasklet_struct
> >>> *
> >>> @@ -19,6 +22,8 @@
> >>> */
> >>> struct i915_tasklet {
> >>> struct tasklet_struct base;
> >>> + unsigned long flags;
> >>> +#define I915_TASKLET_DIRECT_SUBMIT BIT(0)
> >>
> >> I would suggest a more generic name for the bit since i915_tasklet is
> >> generic-ish. For instance simply I915_TASKLET_DIRECT would signify the
> >> callback has been invoked directly and not (necessarily) from softirq
> >> context. Then it is for each user to know what that means for them
> >> specifically.
> >
> > Problem is we have two direct invocations, only one is special. It
> > really wants to be something like I915_TASKLET_ENGINE_IS_LOCKED - you can
> > see why I didn't propose that.
>
> TBC...
>
> >>> -static void __submit_queue(struct intel_engine_cs *engine, int prio)
> >>> +static void __wakeup_queue(struct intel_engine_cs *engine, int prio)
> >>> {
> >>> engine->execlists.queue_priority = prio;
> >>> +}
> >>
> >> Why is this called wakeup? Plans to add something in it later?
> >
> > Yes. It's called wakeup because it's setting the value that the dequeue
> > wakes up at. First name was kick_queue, but it doesn't kick either.
> >
> > The later side-effect involves controlling timers.
> >
> > __restart_queue()?
>
> __update_queue_priority? :)
It doesn't just update the priority...
Now a choice between restart_queue and update_queue.
> >>> +static void __schedule_queue(struct intel_engine_cs *engine)
> >>> +{
> >>> i915_tasklet_schedule(&engine->execlists.tasklet);
> >>> }
> >>>
> >>> +static bool __direct_submit(struct intel_engine_execlists *const
> >>> execlists)
> >>> +{
> >>> + struct i915_tasklet * const t = &execlists->tasklet;
> >>> +
> >>> + if (!tasklet_trylock(&t->base))
> >>> + return false;
> >>> +
> >>> + t->flags |= I915_TASKLET_DIRECT_SUBMIT;
> >>> + i915_tasklet_run(t);
> >>> + t->flags &= ~I915_TASKLET_DIRECT_SUBMIT;
> >>> +
> >>> + tasklet_unlock(&t->base);
> >>
> >> Feels like this whole sequence belongs to i915_tasklet since it touches
> >> the internals. Maybe i915_tasklet_try_run, or i915_tasklet_run_or_schedule?
> >
> > Keep reading the series and you'll see just why this is so special and
> > confined to execlists.
>
> ... TBC here.
>
> Having peeked ahead, it feels a bit not generic enough as it is, a bit
> too hacky.
>
> Would it work to pass context together with the invocation. Like:
>
> i915_tasklet_try(..., I915_TASKLET_SUBMIT_IDLE);
> i915_tasklet_try(..., I915_TASKLET_SUBMIT_IRQ);
>
> i915_tasklet.flags field namespace would then be owned by the caller
> completely. And the tasklet func itself would have more context on what
> to do.
That doesn't apply very well to the use case either. It's not the
tasklet being called from irq/process that's significant but whether we
are calling it with the engine/data locked.
I keep wanting to use LOCKED, but that has no meaning to the tasklet,
and tasklet_trylock means something entirely different.
> Following form that, i915_tasklet_run_or_schedule(.., flags).
>
> bool i915_taskle_try(tasklet, flags)
> {
> if (!trylock)
> return false;
>
> t->flags |= flags;
> i915_tasklet_run(...);
> t->flags &= ~flags;
>
> tasklet_unlock(...);
>
> return true;
> }
>
>
> void i915_tasklet_run_or_schedule(..., flags)
> {
> if (!i915_tasklet_try(..., flags))
> i915_tasklet_schedule(...);
> }
>
> ?
>
> Leaves a question of a tasklet_is_enabled check in your tasklet_try,
> which I don't quite get since that check wasn't there before. So why it
> is needed?
Concurrent reset can happen at irq time, but we know cannot happen from
the submit path.
-Chris
_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx