Quoting Bloomfield, Jon (2019-10-02 14:52:32)
> 
> 
> > -----Original Message-----
> > From: Chris Wilson <[email protected]>
> > Sent: Wednesday, October 2, 2019 4:20 AM
> > To: [email protected]
> > Cc: Chris Wilson <[email protected]>; Joonas Lahtinen
> > <[email protected]>; Winiarski, Michal
> > <[email protected]>; Bloomfield, Jon <[email protected]>
> > Subject: [PATCH 20/30] drm/i915: Cancel non-persistent contexts on close
> > 
> > Normally, we rely on our hangcheck to prevent persistent batches from
> > hogging the GPU. However, if the user disables hangcheck, this mechanism
> > breaks down. Despite our insistence that this is unsafe, the users are
> > equally insistent that they want to use endless batches and will disable
> > the hangcheck mechanism. We are looking at perhaps replacing hangcheck
> > with a softer mechanism, that sends a pulse down the engine to check if
> > it is well. We can use the same preemptive pulse to flush an active
> > persistent context off the GPU upon context close, preventing resources
> > being lost and unkillable requests remaining on the GPU after process
> > termination. To avoid changing the ABI and accidentally breaking
> > existing userspace, we make the persistence of a context explicit and
> > enable it by default (matching current ABI). Userspace can opt out of
> > persistent mode (forcing requests to be cancelled when the context is
> > closed by process termination or explicitly) by a context parameter. To
> > facilitate existing use-cases of disabling hangcheck, if the modparam is
> > disabled (i915.enable_hangcheck=0), we disable persistence mode by
> > default.  (Note, one of the outcomes for supporting endless mode will be
> > the removal of hangchecking, at which point opting into persistent mode
> > will be mandatory, or maybe the default perhaps controlled by cgroups.)
> > 
> > Testcase: igt/gem_ctx_persistence
> > Signed-off-by: Chris Wilson <[email protected]>
> > Cc: Joonas Lahtinen <[email protected]>
> > Cc: MichaƂ Winiarski <[email protected]>
> > Cc: Jon Bloomfield <[email protected]>
> > Reviewed-by: Jon Bloomfield <[email protected]>
> > ---
> >  drivers/gpu/drm/i915/Makefile                 |   1 +
> >  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 122 ++++++++++++++++++
> >  drivers/gpu/drm/i915/gem/i915_gem_context.h   |  15 +++
> >  .../gpu/drm/i915/gem/i915_gem_context_types.h |   1 +
> >  .../gpu/drm/i915/gem/selftests/mock_context.c |   2 +
> >  .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |  56 ++++++++
> >  .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |  14 ++
> >  drivers/gpu/drm/i915/gt/intel_engine_pm.c     |   2 +-
> >  drivers/gpu/drm/i915/i915_priolist_types.h    |   1 +
> >  include/uapi/drm/i915_drm.h                   |  15 +++
> >  10 files changed, 228 insertions(+), 1 deletion(-)
> >  create mode 100644 drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> >  create mode 100644 drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
> > 
> > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> > index 1f87c70a2842..561fa2bb3006 100644
> > --- a/drivers/gpu/drm/i915/Makefile
> > +++ b/drivers/gpu/drm/i915/Makefile
> > @@ -76,6 +76,7 @@ gt-y += \
> >       gt/intel_breadcrumbs.o \
> >       gt/intel_context.o \
> >       gt/intel_engine_cs.o \
> > +     gt/intel_engine_heartbeat.o \
> >       gt/intel_engine_pm.o \
> >       gt/intel_engine_pool.o \
> >       gt/intel_engine_sysfs.o \
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > index 0ab416887fc2..e832238a72e5 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> 
> <SNIP>
> 
> > +static int
> > +set_persistence(struct i915_gem_context *ctx,
> > +             const struct drm_i915_gem_context_param *args)
> > +{
> > +     if (args->size)
> > +             return -EINVAL;
> > +
> > +     if (args->value) {
> > +             i915_gem_context_set_persistence(ctx);
> > +             return 0;
> > +     }
> > +
> > +     /* To cancel a context we use "preempt-to-idle" */
> > +     if (!(ctx->i915->caps.scheduler &
> > I915_SCHEDULER_CAP_PREEMPTION))
> > +             return -ENODEV;
> > +
> > +     i915_gem_context_clear_persistence(ctx);
> > +     return 0;
> > +}
> 
> Hmmn. Given that disabling hangcheck is an explicit operation, and we already 
> change the default setting, can't we make it a hard requirement that 
> persistence requires hangcheck? You should not really be able to opt back in 
> to persistence if hangcheck is disabled. In fact you could just test for 
> hangcheck when deciding whether to kill the context, and force-kill if it is 
> off - that way if hangcheck is disabled after a context starts it will get 
> cleaned up.

I hear the toctou argument, but I really, really want to avoid coupling
in the modparam as much possible (I'd rather kill off the modparam
entirely for being too coarse).

But certainly having a check here saying you cannot re-enable
persistence without hangcheck seems valid. Let's see how that looks.
(That takes a bit of lalala to ignore the toctou.)
-Chris
_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to