Re: [Mesa-dev] [Intel-gfx] [RFC PATCH 2/2] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-06-11 Thread Matthew Brost
On Fri, Jun 04, 2021 at 07:59:05PM +0200, Daniel Vetter wrote:
> On Wed, May 26, 2021 at 04:33:57PM -0700, Matthew Brost wrote:
> > Add entry for i915 new parallel submission uAPI plan.
> > 
> > v2:
> >  (Daniel Vetter):
> >   - Expand logical order explaination
> >   - Add dummy header
> >   - Only allow N BBs in execbuf IOCTL
> >   - Configure parallel submission per slot not per gem context
> > v3:
> >  (Marcin Ślusarz):
> >   - Lot's of typos / bad english fixed
> >  (Tvrtko Ursulin):
> >   - Consistent pseudo code, clean up wording in descriptions
> > 
> > Cc: Tvrtko Ursulin 
> > Cc: Tony Ye 
> > CC: Carl Zhang 
> > Cc: Daniel Vetter 
> > Cc: Jason Ekstrand 
> > Signed-off-by: Matthew Brost 
> > ---
> >  Documentation/gpu/rfc/i915_parallel_execbuf.h | 145 ++
> >  Documentation/gpu/rfc/i915_scheduler.rst  |  55 ++-
> >  2 files changed, 198 insertions(+), 2 deletions(-)
> >  create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h
> > 
> > diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h 
> > b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> > new file mode 100644
> > index ..20de206e3ab4
> > --- /dev/null
> > +++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> > @@ -0,0 +1,145 @@
> > +#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see 
> > i915_context_engines_parallel_submit */
> > +
> > +/*
> > + * i915_context_engines_parallel_submit:
> 
> So the idea is to make these kerneldoc and pull them into the rfc section.
> Then when we merge, move them to the real uapi section, like what Matt has
> done for lmem.
> 

Yep, will fix in next rev.

> > + *
> > + * Setup a slot in the context engine map to allow multiple BBs to be 
> > submitted
> > + * in a single execbuf IOCTL. Those BBs will then be scheduled to run on 
> > the GPU
> > + * in parallel. Multiple hardware contexts are created internally in the 
> > i915
> > + * run these BBs. Once a slot is configured for N BBs only N BBs can be
> > + * submitted in each execbuf IOCTL and this is implicit behavior e.g. The 
> > user
> > + * doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL know 
> > how
> > + * many BBs there are based on the slots configuration. The N BBs are the 
> > last N
> > + * buffer objects for first N if I915_EXEC_BATCH_FIRST is set.
> 
> s/for/or/
> 
> > + *
> > + * There are two currently defined ways to control the placement of the
> > + * hardware contexts on physical engines: default behavior (no flags) and
> > + * I915_PARALLEL_IMPLICIT_BONDS (a flag). More flags may be added the in 
> > the
> > + * future as new hardware / use cases arise. Details of how to use this
> > + * interface above the flags field in this structure.
> > + *
> > + * Returns -EINVAL if hardware context placement configuration is invalid 
> > or if
> > + * the placement configuration isn't supported on the platform / submission
> > + * interface.
> > + * Returns -ENODEV if extension isn't supported on the platform / 
> > submission
> > + * inteface.
> > + */
> > +struct i915_context_engines_parallel_submit {
> > +   struct i915_user_extension base;
> > +
> > +   __u16 engine_index; /* slot for parallel engine */
> 
> Kernel doc here for the inline comments too.
>

Yep.
 
> > +   __u16 width;/* number of contexts per parallel engine */
> > +   __u16 num_siblings; /* number of siblings per context */
> > +   __u16 mbz16;
> > +/*
> > + * Default placement behavior (currently unsupported):
> > + *
> > + * Allow BBs to be placed on any available engine instance. In this case 
> > each
> > + * context's engine mask indicates where that context can be placed. It is
> > + * implied in this mode that all contexts have mutual exclusive placement.
> > + * e.g. If one context is running CSX[0] no other contexts can run on 
> > CSX[0]).
> > + *
> > + * Example 1 pseudo code:
> > + * CSX,Y[N] = generic engine class X or Y, logical instance N
> > + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
> > + * set_engines(INVALID)
> > + * set_parallel(engine_index=0, width=2, num_siblings=2,
> > + * engines=CSX[0],CSX[1],CSY[0],CSY[1])
> > + *
> > + * Results in the following valid placements:
> > + * CSX[0], CSY[0]
> > + * CSX[0], CSY[1]
> > + * CSX[1], CSY[0]
> > + * CSX[1], CSY[1]
> > + *
> > + * This can also be thought of as 2 virtual engines described by 2-D array 
> > in
> > + * the engines the field:
> > + * VE[0] = CSX[0], CSX[1]
> > + * VE[1] = CSY[0], CSY[1]
> > + *
> > + * Example 2 pseudo code:
> > + * CSX[Y] = generic engine of same class X, logical instance N
> > + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
> > + * set_engines(INVALID)
> > + * set_parallel(engine_index=0, width=2, num_siblings=3,
> > + * engines=CSX[0],CSX[1],CSX[2],CSX[0],CSX[1],CSX[2])
> > + *
> > + * Results in the following valid placements:
> > + * CSX[0], CSX[1]
> > + * CSX[0], CSX[2]
> > + * CSX[1], CSX[0]
> > + * CSX[1], CSX[2]

[Mesa-dev] [PATCH 0/2] GuC submission / DRM scheduler integration plan + new uAPI

2021-06-11 Thread Matthew Brost
Subject and patches say it all.

v2: Address comments, patches have details of changes
v3: Address comments, patches have details of changes
v4: Address comments, patches have details of changes

Signed-off-by: Matthew Brost 

Matthew Brost (2):
  drm/doc/rfc: i915 GuC submission / DRM scheduler
  drm/doc/rfc: i915 new parallel submission uAPI plan

 Documentation/gpu/rfc/i915_parallel_execbuf.h | 117 ++
 Documentation/gpu/rfc/i915_scheduler.rst  | 148 ++
 Documentation/gpu/rfc/index.rst   |   4 +
 3 files changed, 269 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h
 create mode 100644 Documentation/gpu/rfc/i915_scheduler.rst

-- 
2.28.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] drm/doc/rfc: i915 GuC submission / DRM scheduler

2021-06-11 Thread Matthew Brost
Add entry for i915 GuC submission / DRM scheduler integration plan.
Follow up patch with details of new parallel submission uAPI to come.

v2:
 (Daniel Vetter)
  - Expand explaination of why bonding isn't supported for GuC
submission
  - CC some of the DRM scheduler maintainers
  - Add priority inheritance / boosting use case
  - Add reasoning for removing in order assumptions
 (Daniel Stone)
  - Add links to priority spec
v4:
 (Tvrtko)
  - Add TODOs section
 (Daniel Vetter)
  - Pull in 1 line from following patch

Cc: Christian König 
Cc: Luben Tuikov 
Cc: Alex Deucher 
Cc: Steven Price 
Cc: Jon Bloomfield 
Cc: Jason Ekstrand 
Cc: Dave Airlie 
Cc: Daniel Vetter 
Cc: Jason Ekstrand 
Cc: dri-de...@lists.freedesktop.org
Signed-off-by: Matthew Brost 
Reviewed-by: Daniel Vetter 
Acked-by: Dave Airlie 
---
 Documentation/gpu/rfc/i915_scheduler.rst | 91 
 Documentation/gpu/rfc/index.rst  |  4 ++
 2 files changed, 95 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_scheduler.rst

diff --git a/Documentation/gpu/rfc/i915_scheduler.rst 
b/Documentation/gpu/rfc/i915_scheduler.rst
new file mode 100644
index ..7acd386a6b49
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_scheduler.rst
@@ -0,0 +1,91 @@
+=
+I915 GuC Submission/DRM Scheduler Section
+=
+
+Upstream plan
+=
+For upstream the overall plan for landing GuC submission and integrating the
+i915 with the DRM scheduler is:
+
+* Merge basic GuC submission
+   * Basic submission support for all gen11+ platforms
+   * Not enabled by default on any current platforms but can be enabled via
+ modparam enable_guc
+   * Lots of rework will need to be done to integrate with DRM scheduler so
+ no need to nit pick everything in the code, it just should be
+ functional, no major coding style / layering errors, and not regress
+ execlists
+   * Update IGTs / selftests as needed to work with GuC submission
+   * Enable CI on supported platforms for a baseline
+   * Rework / get CI heathly for GuC submission in place as needed
+* Merge new parallel submission uAPI
+   * Bonding uAPI completely incompatible with GuC submission, plus it has
+ severe design issues in general, which is why we want to retire it no
+ matter what
+   * New uAPI adds I915_CONTEXT_ENGINES_EXT_PARALLEL context setup step
+ which configures a slot with N contexts
+   * After I915_CONTEXT_ENGINES_EXT_PARALLEL a user can submit N batches to
+ a slot in a single execbuf IOCTL and the batches run on the GPU in
+ paralllel
+   * Initially only for GuC submission but execlists can be supported if
+ needed
+* Convert the i915 to use the DRM scheduler
+   * GuC submission backend fully integrated with DRM scheduler
+   * All request queues removed from backend (e.g. all backpressure
+ handled in DRM scheduler)
+   * Resets / cancels hook in DRM scheduler
+   * Watchdog hooks into DRM scheduler
+   * Lots of complexity of the GuC backend can be pulled out once
+ integrated with DRM scheduler (e.g. state machine gets
+ simplier, locking gets simplier, etc...)
+   * Execlists backend will minimum required to hook in the DRM scheduler
+   * Legacy interface
+   * Features like timeslicing / preemption / virtual engines would
+ be difficult to integrate with the DRM scheduler and these
+ features are not required for GuC submission as the GuC does
+ these things for us
+   * ROI low on fully integrating into DRM scheduler
+   * Fully integrating would add lots of complexity to DRM
+ scheduler
+   * Port i915 priority inheritance / boosting feature in DRM scheduler
+   * Used for i915 page flip, may be useful to other DRM drivers as
+ well
+   * Will be an optional feature in the DRM scheduler
+   * Remove in-order completion assumptions from DRM scheduler
+   * Even when using the DRM scheduler the backends will handle
+ preemption, timeslicing, etc... so it is possible for jobs to
+ finish out of order
+   * Pull out i915 priority levels and use DRM priority levels
+   * Optimize DRM scheduler as needed
+
+TODOs for GuC submission upstream
+=
+
+* Need an update to GuC firmware / i915 to enable error state capture
+* Open source tool to decode GuC logs
+* Public GuC spec
+
+New uAPI for basic GuC submission
+=
+No major changes are required to the uAPI for basic GuC submission. The only
+change is a new scheduler attribute: I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP.
+This attribute i

[Mesa-dev] [PATCH 2/2] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-06-11 Thread Matthew Brost
Add entry for i915 new parallel submission uAPI plan.

v2:
 (Daniel Vetter):
  - Expand logical order explaination
  - Add dummy header
  - Only allow N BBs in execbuf IOCTL
  - Configure parallel submission per slot not per gem context
v3:
 (Marcin Ślusarz):
  - Lot's of typos / bad english fixed
 (Tvrtko Ursulin):
  - Consistent pseudo code, clean up wording in descriptions
v4:
 (Daniel Vetter)
  - Drop flags
  - Add kernel doc
  - Reword a few things / fix typos
 (Tvrtko)
  - Reword a few things / fix typos

Cc: Tvrtko Ursulin 
Cc: Tony Ye 
CC: Carl Zhang 
Cc: Daniel Vetter 
Cc: Jason Ekstrand 
Signed-off-by: Matthew Brost 
Acked-by: Daniel Vetter 
---
 Documentation/gpu/rfc/i915_parallel_execbuf.h | 117 ++
 Documentation/gpu/rfc/i915_scheduler.rst  |  59 -
 2 files changed, 175 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h

diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h 
b/Documentation/gpu/rfc/i915_parallel_execbuf.h
new file mode 100644
index ..c22af3a359e4
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h
@@ -0,0 +1,117 @@
+#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see 
i915_context_engines_parallel_submit */
+
+/**
+ * struct drm_i915_context_engines_parallel_submit - Configure engine for
+ * parallel submission.
+ *
+ * Setup a slot in the context engine map to allow multiple BBs to be submitted
+ * in a single execbuf IOCTL. Those BBs will then be scheduled to run on the 
GPU
+ * in parallel. Multiple hardware contexts are created internally in the i915
+ * run these BBs. Once a slot is configured for N BBs only N BBs can be
+ * submitted in each execbuf IOCTL and this is implicit behavior e.g. The user
+ * doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows how
+ * many BBs there are based on the slot's configuration. The N BBs are the last
+ * N buffer objects or first N if I915_EXEC_BATCH_FIRST is set.
+ *
+ * The default placement behavior is to create implicit bonds between each
+ * context if each context maps to more than 1 physical engine (e.g. context is
+ * a virtual engine). Also we only allow contexts of same engine class and 
these
+ * contexts must be in logically contiguous order. Examples of the placement
+ * behavior described below. Lastly, the default is to not allow BBs to
+ * preempted mid BB rather insert coordinated preemption on all hardware
+ * contexts between each set of BBs. Flags may be added in the future to change
+ * bott of these default behaviors.
+ *
+ * Returns -EINVAL if hardware context placement configuration is invalid or if
+ * the placement configuration isn't supported on the platform / submission
+ * interface.
+ * Returns -ENODEV if extension isn't supported on the platform / submission
+ * inteface.
+ *
+ * .. code-block::
+ *
+ * Example 1 pseudo code:
+ * CS[X] = generic engine of same class, logical instance X
+ * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ * set_engines(INVALID)
+ * set_parallel(engine_index=0, width=2, num_siblings=1,
+ *  engines=CS[0],CS[1])
+ *
+ * Results in the following valid placement:
+ * CS[0], CS[1]
+ *
+ * Example 2 pseudo code:
+ * CS[X] = generic engine of same class, logical instance X
+ * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ * set_engines(INVALID)
+ * set_parallel(engine_index=0, width=2, num_siblings=2,
+ *  engines=CS[0],CS[2],CS[1],CS[3])
+ *
+ * Results in the following valid placements:
+ * CS[0], CS[1]
+ * CS[2], CS[3]
+ *
+ * This can also be thought of as 2 virtual engines described by 2-D array
+ * in the engines the field with bonds placed between each index of the
+ * virtual engines. e.g. CS[0] is bonded to CS[1], CS[2] is bonded to
+ * CS[3].
+ * VE[0] = CS[0], CS[2]
+ * VE[1] = CS[1], CS[3]
+ *
+ * Example 3 pseudo code:
+ * CS[X] = generic engine of same class, logical instance X
+ * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ * set_engines(INVALID)
+ * set_parallel(engine_index=0, width=2, num_siblings=2,
+ *  engines=CS[0],CS[1],CS[1],CS[3])
+ *
+ * Results in the following valid and invalid placements:
+ * CS[0], CS[1]
+ * CS[1], CS[3] - Not logical contiguous, return -EINVAL
+ */
+struct drm_i915_context_engines_parallel_submit {
+   /**
+* @base: base user extension.
+*/
+   struct i915_user_extension base;
+
+   /**
+* @engine_index: slot for parallel engine
+*/
+   __u16 engine_index;
+
+   /**
+* @width: number of contexts per parallel engine
+*/
+   __u16 width;
+
+   /**
+* @num_siblings: number of siblings per context
+*/
+   __u16 num_siblings;
+
+   /**
+* @mbz16: reserved for future use; mu