Patch #1-#4 are Reviewed-by: Christian König <[email protected]>, please commit those so that we can consider them as done.

Patch #5:
+int amdgpu_bo_sync_between_bo_and_shadow(struct amdgpu_device *adev,
+                                        struct amdgpu_ring *ring,
+                                        struct amdgpu_bo *bo,
+                                        struct reservation_object *resv,
+                                        struct fence **fence)
Maybe rename this to amdgpu_bo_backup_shadow(), when talking about sync at least I always think about synchronization using fences.

+
+       if (bo->backup_shadow & AMDGPU_BO_SHADOW_TO_NONE) {
That condition will never be true, cause AMDGPU_BO_SHADOW_TO_NONE is defined as 0. You probably wanted to use "==" here, don't you?

+       r = amdgpu_bo_pin(bo, bo->prefered_domains, &bo_addr);
+       if (r) {
+               DRM_ERROR("Failed to pin bo object\n");
+               goto err1;
+       }
+       r = amdgpu_bo_pin(bo->shadow, bo->shadow->prefered_domains, 
&shadow_addr);
+       if (r) {
+               DRM_ERROR("Failed to pin bo shadow object\n");
+               goto err2;
+       }
Again don't use amdgpu_bo_pin() here.

Patch #6:
+       r = amdgpu_bo_reserve(vm->page_directory, false);
+       if (r)
+               return r;
You need to double check if the BOs aren't swapped out. See amdgpu_gem_va_update_vm() on how to do this.

+       fence_put(vm->shadow_sync_fence);
+       vm->shadow_sync_fence = fence_get(fence);
I wouldn't remember the backup operation at all, just return the fence to the caller instead.

+               r = amdgpu_bo_sync_between_bo_and_shadow(adev, ring, bo,
+                                                        NULL, &fence);
As discussed we don't want to use the scheduler for the recovery operation, so we need a flag to send the commands directly to the ring here.

Patch #7 and following. Thinking about this more the whole idea of using a separate entity is a NAK.

The basic problem is that you can't know which updates are completed, crashed or in flight when the the GPU recovery hits us. So the shadow needs to be updated before the real page table and none of this can be done out of order.

I suggest you just switch back to doing the updates twice for now.

Regards,
Christian.

Am 12.08.2016 um 08:38 schrieb Chunming Zhou:
Since we cannot ensure VRAM is consistent after a GPU reset, page
table shadowing is necessary. Shadowed page tables are, in a sense, a
method to recover the consistent state of the page tables before the
reset occurred.

We need to allocate GTT bo as the shadow of VRAM bo when creating page table,
and make them the same. After gpu reset, we will need to use SDMA to copy GTT bo
content to VRAM bo, then page table will be recoveried.


V2:
Shadow bo uses a shadow entity running on normal run queue, after gpu reset,
we need to wait for all shadow jobs finished first, then recovery page table 
from shadow.

V3:
Addressed Christian comments for shadow bo part.

Chunming Zhou (18):
   drm/amdgpu: add shadow bo support V2
   drm/amdgpu: validate shadow as well when validating bo
   drm/amdgpu: allocate shadow for pd/pt bo V2
   drm/amdgpu: add shadow flag V2
   drm/amdgpu: sync bo and shadow
   drm/amdgpu: implement vm recovery function from shadow
   drm/amdgpu: add shadow_entity for shadow page table updates
   drm/amdgpu: update pd shadow bo
   drm/amdgpu: update pt shadow
   drm/amd: add last fence in sched entity
   drm/amdgpu: link all vm clients
   drm/amdgpu: add vm_list_lock
   drm/amd: add block entity function
   drm/amdgpu: add shadow fence owner
   drm/amd: block entity
   drm/amdgpu: recover page tables after gpu reset
   drm/amdgpu: add need backup function
   drm/amdgpu: add backup condition for vm

  drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  25 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        |  82 +++++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |  88 +++++++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c    | 100 ++++++++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h    |   5 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c      |   3 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        | 281 +++++++++++++++++++-------
  drivers/gpu/drm/amd/scheduler/gpu_scheduler.c |  38 +++-
  drivers/gpu/drm/amd/scheduler/gpu_scheduler.h |   5 +
  include/uapi/drm/amdgpu_drm.h                 |   2 +
  10 files changed, 524 insertions(+), 105 deletions(-)


_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to