On Sat, Jan 09, 2016 at 05:00:21PM +0530, [email protected] wrote:
> From: Akash Goel <[email protected]>
> 
> Gen9 has an additional address translation hardware support in form of
> Tiled Resource Translation Table (TR-TT) which provides an extra level
> of abstraction over PPGTT.
> This is useful for mapping Sparse/Tiled texture resources.
> Sparse resources are created as virtual-only allocations. Regions of the
> resource that the application intends to use is bound to the physical memory
> on the fly and can be re-bound to different memory allocations over the
> lifetime of the resource.
> 
> TR-TT is tightly coupled with PPGTT, a new instance of TR-TT will be required
> for a new PPGTT instance, but TR-TT may not enabled for every context.
> 1/16th of the 48bit PPGTT space is earmarked for the translation by TR-TT,
> which such chunk to use is conveyed to HW through a register.
> Any GFX address, which lies in that reserved 44 bit range will be translated
> through TR-TT first and then through PPGTT to get the actual physical address,
> so the output of translation from TR-TT will be a PPGTT offset.
> 
> TRTT is constructed as a 3 level tile Table. Each tile is 64KB is size which
> leaves behind 44-16=28 address bits. 28bits are partitioned as 9+9+10, and
> each level is contained within a 4KB page hence L3 and L2 is composed of
> 512 64b entries and L1 is composed of 1024 32b entries.
> 
> There is a provision to keep TR-TT Tables in virtual space, where the pages of
> TRTT tables will be mapped to PPGTT.
> Currently this is the supported mode, in this mode UMD will have a full 
> control
> on TR-TT management, with bare minimum support from KMD.
> So the entries of L3 table will contain the PPGTT offset of L2 Table pages,
> similarly entries of L2 table will contain the PPGTT offset of L1 Table pages.
> The entries of L1 table will contain the PPGTT offset of BOs actually backing
> the Sparse resources.

> The assumption here is that UMD only will do the complete PPGTT address space
> management and use the Soft Pin API for all the buffer objects associated with
> a given Context.

That is a poor assumption, and not one required for this to work.

> So UMD will also have to allocate the L3/L2/L1 table pages
> as a regular GEM BO only & assign them a PPGTT address through the Soft Pin 
> API.
> UMD would have to emit the MI_STORE_DATA_IMM commands in the batch buffer to
> program the relevant entries of L3/L2/L1 tables.

This only applies to te TR-TT L1-L3 cache, right?

> Any space in TR-TT segment not bound to any Sparse texture, will be handled
> through Invalid tile, User is expected to initialize the entries of a new
> L3/L2/L1 table page with the Invalid tile pattern. The entries corresponding 
> to
> the holes in the Sparse texture resource will be set with the Null tile 
> pattern
> The improper programming of TRTT should only lead to a recoverable GPU hang,
> eventually leading to banning of the culprit context without victimizing 
> others.
> 
> The association of any Sparse resource with the BOs will be known only to UMD,
> and only the Sparse resources shall be assigned an offset from the TR-TT 
> segment
> by UMD. The use of TR-TT segment or mapping of Sparse resources will be
> abstracted from the KMD,

s/abstracted from/transparent to/ s/,/;/

> UMD can do the address assignment from TR-TT segment

s/can/will/

> autonomously and KMD will be oblivious of it.
> The BOs must not be assigned an address from TR-TT segment, they will be 
> mapped

s/The BOs/Any object/

> to PPGTT in a regular way by KMD

s/using the Soft Pin offset provided by UMD// as this is irrelevant.

> This patch provides an interface through which UMD can convey KMD to enable
> TR-TT for a given context. A new I915_CONTEXT_PARAM_ENABLE_TRTT param has been
> added to I915_GEM_CONTEXT_SETPARAM ioctl for that purpose.
> UMD will have to pass the GFX address of L3 table page,

+along with the

> pattern value for the
> Null & invalid Tile registers.
> 
> Testcase: igt/gem_trtt
> 
> Signed-off-by: Akash Goel <[email protected]>
> ---
>  drivers/gpu/drm/i915/i915_dma.c         |  3 ++
>  drivers/gpu/drm/i915/i915_drv.h         | 12 +++++++
>  drivers/gpu/drm/i915/i915_gem_context.c | 45 ++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_gem_gtt.c     | 57 
> +++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_gem_gtt.h     |  6 ++++
>  drivers/gpu/drm/i915/i915_reg.h         | 19 +++++++++++
>  drivers/gpu/drm/i915/intel_lrc.c        | 41 ++++++++++++++++++++++++
>  include/uapi/drm/i915_drm.h             |  8 +++++
>  8 files changed, 191 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 988a380..c247c25 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -172,6 +172,9 @@ static int i915_getparam(struct drm_device *dev, void 
> *data,
>       case I915_PARAM_HAS_EXEC_SOFTPIN:
>               value = 1;
>               break;
> +     case I915_PARAM_HAS_TRTT:
> +             value = HAS_TRTT(dev);
> +             break;

Should we do this here, or just query the context? In fact you are
missing the context getparam path any way.

>       default:
>               DRM_DEBUG("Unknown parameter %d\n", param->param);
>               return -EINVAL;
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index c6dd4db..12c612e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -839,6 +839,7 @@ struct i915_ctx_hang_stats {
>  #define DEFAULT_CONTEXT_HANDLE 0
>  
>  #define CONTEXT_NO_ZEROMAP (1<<0)
> +#define CONTEXT_USE_TRTT   (1<<1)

Make flags unsigned whilst you are here, and fix the holes!

>  /**
>   * struct intel_context - as the name implies, represents a context.
>   * @ref: reference count.
> @@ -881,6 +882,15 @@ struct intel_context {
>               int pin_count;
>       } engine[I915_NUM_RINGS];
>  
> +     /* TRTT info */
> +     struct {

Give this a name now, we will be thankful in the future.

> +             uint32_t invd_tile_val;
> +             uint32_t null_tile_val;
> +             uint64_t l3_table_address;
> +             struct i915_vma *vma;
> +             bool update_trtt_params;
> +     } trtt_info;
> +
>       struct list_head link;
>  };
>  
> @@ -2626,6 +2636,8 @@ struct drm_i915_cmd_table {
>                                !IS_VALLEYVIEW(dev) && !IS_CHERRYVIEW(dev) && \
>                                !IS_BROXTON(dev))
>  
> +#define HAS_TRTT(dev)                (IS_GEN9(dev))
> +
>  #define INTEL_PCH_DEVICE_ID_MASK             0xff00
>  #define INTEL_PCH_IBX_DEVICE_ID_TYPE         0x3b00
>  #define INTEL_PCH_CPT_DEVICE_ID_TYPE         0x1c00
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
> b/drivers/gpu/drm/i915/i915_gem_context.c
> index 900ffd0..ae9fc34 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -146,6 +146,9 @@ static void i915_gem_context_clean(struct intel_context 
> *ctx)
>               if (WARN_ON(__i915_vma_unbind_no_wait(vma)))
>                       break;
>       }
> +
> +     if (ctx->flags & CONTEXT_USE_TRTT)
> +             i915_gem_destroy_trtt_vma(ctx->trtt_info.vma);

Sould be in context free.

>  }
>  
>  void i915_gem_context_free(struct kref *ctx_ref)
> @@ -512,6 +515,35 @@ i915_gem_context_get(struct drm_i915_file_private 
> *file_priv, u32 id)
>       return ctx;
>  }
>  
> +static int
> +i915_setup_trtt_ctx(struct intel_context *ctx,
> +                 struct drm_i915_gem_context_trtt_param *trtt_params)
> +{
> +     if (ctx->flags & CONTEXT_USE_TRTT)
> +             return -EEXIST;
> +
> +     /* basic sanity checks for the l3 table pointer */
> +     if ((ctx->trtt_info.l3_table_address >= GEN9_TRTT_SEGMENT_START) &&
> +         (ctx->trtt_info.l3_table_address <
> +                     (GEN9_TRTT_SEGMENT_START + GEN9_TRTT_SEGMENT_SIZE)))

Presumably l3_table has an actual size and you want to do a range
overlap test, not just the start address.

> +             return -EINVAL;
> +
> +     if (ctx->trtt_info.l3_table_address & ~GEN9_TRTT_L3_GFXADDR_MASK)
> +             return -EINVAL;

These are worth adding DRM_DEBUG() or even better start using dev_debug()
so that we can debug userspace startup issues.

> @@ -952,6 +984,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device 
> *dev, void *data,
>  {
>       struct drm_i915_file_private *file_priv = file->driver_priv;
>       struct drm_i915_gem_context_param *args = data;
> +     struct drm_i915_gem_context_trtt_param trtt_params;
>       struct intel_context *ctx;
>       int ret;
>  
> @@ -983,6 +1016,18 @@ int i915_gem_context_setparam_ioctl(struct drm_device 
> *dev, void *data,
>                       ctx->flags |= args->value ? CONTEXT_NO_ZEROMAP : 0;
>               }
>               break;
> +     case I915_CONTEXT_PARAM_ENABLE_TRTT:

Bump this case to i915_setup_trtt_ctx.
i.e. just have
        ret = i915_setup_trtt_ctx(ctx, args);
        break;

Otherwise this function will become very unwieldly very quickly.

>  int i915_ppgtt_init_hw(struct drm_device *dev)
>  {
> +     if (HAS_TRTT(dev) && USES_FULL_48BIT_PPGTT(dev)) {
> +             struct drm_i915_private *dev_priv = dev->dev_private;
> +
> +             I915_WRITE(GEN9_TR_CHICKEN_BIT_VECTOR,
> +                        GEN9_TRTT_BYPASS_DISABLE);

Shouldn't this be a context specific register? In which case you need to
set it in the context image instead.

Hmm. given you already do the context image tweaks, how does work with
non-trtt contexts?

> +struct i915_vma *
> +i915_gem_setup_trtt_vma(struct i915_address_space *vm)
> +{
> +     struct i915_vma *vma;
> +     int ret;
> +
> +     vma = kmem_cache_zalloc(to_i915(vm->dev)->vmas, GFP_KERNEL);
> +     if (vma == NULL)
> +             return ERR_PTR(-ENOMEM);
> +
> +     INIT_LIST_HEAD(&vma->vma_link);
> +     INIT_LIST_HEAD(&vma->mm_list);
> +     INIT_LIST_HEAD(&vma->exec_list);
> +     vma->vm = vm;
> +     i915_ppgtt_get(i915_vm_to_ppgtt(vm));

Tempted to write a patch to allow

        vma->vm = i915_ppggtt_get(i915_vm_to_ppgtt(vm));
?

> +     /* Mark the vma as perennially pinned */

s/perennially/permanently/

We don't want to lose the reservation as opposed to having it grow back
next year.

> +     vma->pin_count = 1;
> +
> +     /* Reserve from the 48 bit PPGTT space */
> +     vma->node.start = GEN9_TRTT_SEGMENT_START;
> +     vma->node.size = GEN9_TRTT_SEGMENT_SIZE;
> +     ret = drm_mm_reserve_node(&vm->mm, &vma->node);
> +     if (ret) {
> +             ret = i915_gem_evict_for_vma(vma);
> +             if (ret == 0)
> +                     ret = drm_mm_reserve_node(&vm->mm, &vma->node);

Good. I think we want i915_vm_reserve_node(vm, START, SIZE, &vma) - but
have a look at the other callsites to see if we have a common interface.
Looks like this would improve i915_vgpu.

> +struct drm_i915_gem_context_trtt_param {
> +     __u64 l3_table_address;
> +     __u32 invd_tile_val;
> +     __u32 null_tile_val;
> +};

Passes the ABI structure sanity checks.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to