Re: [Mesa-dev] [Intel-gfx] [RFC PATCH 2/2] drm/doc/rfc: i915 new parallel submission uAPI plan
On Fri, Jun 04, 2021 at 07:59:05PM +0200, Daniel Vetter wrote: > On Wed, May 26, 2021 at 04:33:57PM -0700, Matthew Brost wrote: > > Add entry for i915 new parallel submission uAPI plan. > > > > v2: > > (Daniel Vetter): > > - Expand logical order explaination > > - Add dummy header > > - Only allow N BBs in execbuf IOCTL > > - Configure parallel submission per slot not per gem context > > v3: > > (Marcin Ślusarz): > > - Lot's of typos / bad english fixed > > (Tvrtko Ursulin): > > - Consistent pseudo code, clean up wording in descriptions > > > > Cc: Tvrtko Ursulin > > Cc: Tony Ye > > CC: Carl Zhang > > Cc: Daniel Vetter > > Cc: Jason Ekstrand > > Signed-off-by: Matthew Brost > > --- > > Documentation/gpu/rfc/i915_parallel_execbuf.h | 145 ++ > > Documentation/gpu/rfc/i915_scheduler.rst | 55 ++- > > 2 files changed, 198 insertions(+), 2 deletions(-) > > create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h > > > > diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h > > b/Documentation/gpu/rfc/i915_parallel_execbuf.h > > new file mode 100644 > > index ..20de206e3ab4 > > --- /dev/null > > +++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h > > @@ -0,0 +1,145 @@ > > +#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see > > i915_context_engines_parallel_submit */ > > + > > +/* > > + * i915_context_engines_parallel_submit: > > So the idea is to make these kerneldoc and pull them into the rfc section. > Then when we merge, move them to the real uapi section, like what Matt has > done for lmem. > Yep, will fix in next rev. > > + * > > + * Setup a slot in the context engine map to allow multiple BBs to be > > submitted > > + * in a single execbuf IOCTL. Those BBs will then be scheduled to run on > > the GPU > > + * in parallel. Multiple hardware contexts are created internally in the > > i915 > > + * run these BBs. Once a slot is configured for N BBs only N BBs can be > > + * submitted in each execbuf IOCTL and this is implicit behavior e.g. The > > user > > + * doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL know > > how > > + * many BBs there are based on the slots configuration. The N BBs are the > > last N > > + * buffer objects for first N if I915_EXEC_BATCH_FIRST is set. > > s/for/or/ > > > + * > > + * There are two currently defined ways to control the placement of the > > + * hardware contexts on physical engines: default behavior (no flags) and > > + * I915_PARALLEL_IMPLICIT_BONDS (a flag). More flags may be added the in > > the > > + * future as new hardware / use cases arise. Details of how to use this > > + * interface above the flags field in this structure. > > + * > > + * Returns -EINVAL if hardware context placement configuration is invalid > > or if > > + * the placement configuration isn't supported on the platform / submission > > + * interface. > > + * Returns -ENODEV if extension isn't supported on the platform / > > submission > > + * inteface. > > + */ > > +struct i915_context_engines_parallel_submit { > > + struct i915_user_extension base; > > + > > + __u16 engine_index; /* slot for parallel engine */ > > Kernel doc here for the inline comments too. > Yep. > > + __u16 width;/* number of contexts per parallel engine */ > > + __u16 num_siblings; /* number of siblings per context */ > > + __u16 mbz16; > > +/* > > + * Default placement behavior (currently unsupported): > > + * > > + * Allow BBs to be placed on any available engine instance. In this case > > each > > + * context's engine mask indicates where that context can be placed. It is > > + * implied in this mode that all contexts have mutual exclusive placement. > > + * e.g. If one context is running CSX[0] no other contexts can run on > > CSX[0]). > > + * > > + * Example 1 pseudo code: > > + * CSX,Y[N] = generic engine class X or Y, logical instance N > > + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE > > + * set_engines(INVALID) > > + * set_parallel(engine_index=0, width=2, num_siblings=2, > > + * engines=CSX[0],CSX[1],CSY[0],CSY[1]) > > + * > > + * Results in the following valid placements: > > + * CSX[0], CSY[0] > > + * CSX[0], CSY[1] > > + * CSX[1], CSY[0] > > + * CSX[1], CSY[1] > > + * > > + * This can also be thought of as 2 virtual engines described by 2-D array > > in > > + * the engines the field: > > + * VE[0] = CSX[0], CSX[1] > > + * VE[1] = CSY[0], CSY[1] > > + * > > + * Example 2 pseudo code: > > + * CSX[Y] = generic engine of same class X, logical instance N > > + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE > > + * set_engines(INVALID) > > + * set_parallel(engine_index=0, width=2, num_siblings=3, > > + * engines=CSX[0],CSX[1],CSX[2],CSX[0],CSX[1],CSX[2]) > > + * > > + * Results in the following valid placements: > > + * CSX[0], CSX[1] > > + * CSX[0], CSX[2] > > + * CSX[1], CSX[0] > > + * CSX[1], CSX[2]
[Mesa-dev] [PATCH 0/2] GuC submission / DRM scheduler integration plan + new uAPI
Subject and patches say it all. v2: Address comments, patches have details of changes v3: Address comments, patches have details of changes v4: Address comments, patches have details of changes Signed-off-by: Matthew Brost Matthew Brost (2): drm/doc/rfc: i915 GuC submission / DRM scheduler drm/doc/rfc: i915 new parallel submission uAPI plan Documentation/gpu/rfc/i915_parallel_execbuf.h | 117 ++ Documentation/gpu/rfc/i915_scheduler.rst | 148 ++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 269 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h create mode 100644 Documentation/gpu/rfc/i915_scheduler.rst -- 2.28.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] drm/doc/rfc: i915 GuC submission / DRM scheduler
Add entry for i915 GuC submission / DRM scheduler integration plan. Follow up patch with details of new parallel submission uAPI to come. v2: (Daniel Vetter) - Expand explaination of why bonding isn't supported for GuC submission - CC some of the DRM scheduler maintainers - Add priority inheritance / boosting use case - Add reasoning for removing in order assumptions (Daniel Stone) - Add links to priority spec v4: (Tvrtko) - Add TODOs section (Daniel Vetter) - Pull in 1 line from following patch Cc: Christian König Cc: Luben Tuikov Cc: Alex Deucher Cc: Steven Price Cc: Jon Bloomfield Cc: Jason Ekstrand Cc: Dave Airlie Cc: Daniel Vetter Cc: Jason Ekstrand Cc: dri-de...@lists.freedesktop.org Signed-off-by: Matthew Brost Reviewed-by: Daniel Vetter Acked-by: Dave Airlie --- Documentation/gpu/rfc/i915_scheduler.rst | 91 Documentation/gpu/rfc/index.rst | 4 ++ 2 files changed, 95 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_scheduler.rst diff --git a/Documentation/gpu/rfc/i915_scheduler.rst b/Documentation/gpu/rfc/i915_scheduler.rst new file mode 100644 index ..7acd386a6b49 --- /dev/null +++ b/Documentation/gpu/rfc/i915_scheduler.rst @@ -0,0 +1,91 @@ += +I915 GuC Submission/DRM Scheduler Section += + +Upstream plan += +For upstream the overall plan for landing GuC submission and integrating the +i915 with the DRM scheduler is: + +* Merge basic GuC submission + * Basic submission support for all gen11+ platforms + * Not enabled by default on any current platforms but can be enabled via + modparam enable_guc + * Lots of rework will need to be done to integrate with DRM scheduler so + no need to nit pick everything in the code, it just should be + functional, no major coding style / layering errors, and not regress + execlists + * Update IGTs / selftests as needed to work with GuC submission + * Enable CI on supported platforms for a baseline + * Rework / get CI heathly for GuC submission in place as needed +* Merge new parallel submission uAPI + * Bonding uAPI completely incompatible with GuC submission, plus it has + severe design issues in general, which is why we want to retire it no + matter what + * New uAPI adds I915_CONTEXT_ENGINES_EXT_PARALLEL context setup step + which configures a slot with N contexts + * After I915_CONTEXT_ENGINES_EXT_PARALLEL a user can submit N batches to + a slot in a single execbuf IOCTL and the batches run on the GPU in + paralllel + * Initially only for GuC submission but execlists can be supported if + needed +* Convert the i915 to use the DRM scheduler + * GuC submission backend fully integrated with DRM scheduler + * All request queues removed from backend (e.g. all backpressure + handled in DRM scheduler) + * Resets / cancels hook in DRM scheduler + * Watchdog hooks into DRM scheduler + * Lots of complexity of the GuC backend can be pulled out once + integrated with DRM scheduler (e.g. state machine gets + simplier, locking gets simplier, etc...) + * Execlists backend will minimum required to hook in the DRM scheduler + * Legacy interface + * Features like timeslicing / preemption / virtual engines would + be difficult to integrate with the DRM scheduler and these + features are not required for GuC submission as the GuC does + these things for us + * ROI low on fully integrating into DRM scheduler + * Fully integrating would add lots of complexity to DRM + scheduler + * Port i915 priority inheritance / boosting feature in DRM scheduler + * Used for i915 page flip, may be useful to other DRM drivers as + well + * Will be an optional feature in the DRM scheduler + * Remove in-order completion assumptions from DRM scheduler + * Even when using the DRM scheduler the backends will handle + preemption, timeslicing, etc... so it is possible for jobs to + finish out of order + * Pull out i915 priority levels and use DRM priority levels + * Optimize DRM scheduler as needed + +TODOs for GuC submission upstream += + +* Need an update to GuC firmware / i915 to enable error state capture +* Open source tool to decode GuC logs +* Public GuC spec + +New uAPI for basic GuC submission += +No major changes are required to the uAPI for basic GuC submission. The only +change is a new scheduler attribute: I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP. +This attribute i
[Mesa-dev] [PATCH 2/2] drm/doc/rfc: i915 new parallel submission uAPI plan
Add entry for i915 new parallel submission uAPI plan. v2: (Daniel Vetter): - Expand logical order explaination - Add dummy header - Only allow N BBs in execbuf IOCTL - Configure parallel submission per slot not per gem context v3: (Marcin Ślusarz): - Lot's of typos / bad english fixed (Tvrtko Ursulin): - Consistent pseudo code, clean up wording in descriptions v4: (Daniel Vetter) - Drop flags - Add kernel doc - Reword a few things / fix typos (Tvrtko) - Reword a few things / fix typos Cc: Tvrtko Ursulin Cc: Tony Ye CC: Carl Zhang Cc: Daniel Vetter Cc: Jason Ekstrand Signed-off-by: Matthew Brost Acked-by: Daniel Vetter --- Documentation/gpu/rfc/i915_parallel_execbuf.h | 117 ++ Documentation/gpu/rfc/i915_scheduler.rst | 59 - 2 files changed, 175 insertions(+), 1 deletion(-) create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h b/Documentation/gpu/rfc/i915_parallel_execbuf.h new file mode 100644 index ..c22af3a359e4 --- /dev/null +++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h @@ -0,0 +1,117 @@ +#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */ + +/** + * struct drm_i915_context_engines_parallel_submit - Configure engine for + * parallel submission. + * + * Setup a slot in the context engine map to allow multiple BBs to be submitted + * in a single execbuf IOCTL. Those BBs will then be scheduled to run on the GPU + * in parallel. Multiple hardware contexts are created internally in the i915 + * run these BBs. Once a slot is configured for N BBs only N BBs can be + * submitted in each execbuf IOCTL and this is implicit behavior e.g. The user + * doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows how + * many BBs there are based on the slot's configuration. The N BBs are the last + * N buffer objects or first N if I915_EXEC_BATCH_FIRST is set. + * + * The default placement behavior is to create implicit bonds between each + * context if each context maps to more than 1 physical engine (e.g. context is + * a virtual engine). Also we only allow contexts of same engine class and these + * contexts must be in logically contiguous order. Examples of the placement + * behavior described below. Lastly, the default is to not allow BBs to + * preempted mid BB rather insert coordinated preemption on all hardware + * contexts between each set of BBs. Flags may be added in the future to change + * bott of these default behaviors. + * + * Returns -EINVAL if hardware context placement configuration is invalid or if + * the placement configuration isn't supported on the platform / submission + * interface. + * Returns -ENODEV if extension isn't supported on the platform / submission + * inteface. + * + * .. code-block:: + * + * Example 1 pseudo code: + * CS[X] = generic engine of same class, logical instance X + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE + * set_engines(INVALID) + * set_parallel(engine_index=0, width=2, num_siblings=1, + * engines=CS[0],CS[1]) + * + * Results in the following valid placement: + * CS[0], CS[1] + * + * Example 2 pseudo code: + * CS[X] = generic engine of same class, logical instance X + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE + * set_engines(INVALID) + * set_parallel(engine_index=0, width=2, num_siblings=2, + * engines=CS[0],CS[2],CS[1],CS[3]) + * + * Results in the following valid placements: + * CS[0], CS[1] + * CS[2], CS[3] + * + * This can also be thought of as 2 virtual engines described by 2-D array + * in the engines the field with bonds placed between each index of the + * virtual engines. e.g. CS[0] is bonded to CS[1], CS[2] is bonded to + * CS[3]. + * VE[0] = CS[0], CS[2] + * VE[1] = CS[1], CS[3] + * + * Example 3 pseudo code: + * CS[X] = generic engine of same class, logical instance X + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE + * set_engines(INVALID) + * set_parallel(engine_index=0, width=2, num_siblings=2, + * engines=CS[0],CS[1],CS[1],CS[3]) + * + * Results in the following valid and invalid placements: + * CS[0], CS[1] + * CS[1], CS[3] - Not logical contiguous, return -EINVAL + */ +struct drm_i915_context_engines_parallel_submit { + /** +* @base: base user extension. +*/ + struct i915_user_extension base; + + /** +* @engine_index: slot for parallel engine +*/ + __u16 engine_index; + + /** +* @width: number of contexts per parallel engine +*/ + __u16 width; + + /** +* @num_siblings: number of siblings per context +*/ + __u16 num_siblings; + + /** +* @mbz16: reserved for future use; mu