Am 19.09.19 um 10:00 schrieb Jesse Zhang:
When compute fence did not signal, compute ring cannot detect hardware
hang because its timeout value is set to be infinite by default.

In SR-IOV and passthrough mode, if user does not declare custome timeout
value for compute ring, then use gfx ring timeout value as default. So
that when there is a ture hardware hang, compute ring can detect it.

Change-Id: I794ec0868c6c0aad407749457260ecfee0617c10
Signed-off-by: Jesse Zhang <[email protected]>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  5 +----
  drivers/gpu/drm/amd/amdgpu/soc15.c        | 10 ++++++++++
  2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index cbcaa7c..963b6d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -468,10 +468,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
                         * For sriov case, always use the timeout
                         * as gfx ring
                         */

Please also remove the comment since that is now stale.

Apart from that looks good to me,
Christian.

-                       if (!amdgpu_sriov_vf(ring->adev))
-                               timeout = adev->compute_timeout;
-                       else
-                               timeout = adev->gfx_timeout;
+                       timeout = adev->compute_timeout;
                        break;
                case AMDGPU_RING_TYPE_SDMA:
                        timeout = adev->sdma_timeout;
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 7c7e9f5..6cd5548 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -687,6 +687,16 @@ int soc15_set_ip_blocks(struct amdgpu_device *adev)
        adev->rev_id = soc15_get_rev_id(adev);
        adev->nbio.funcs->detect_hw_virt(adev);
+ /*
+        * If running under SR-IOV or passthrough mode and user did not set
+        * custom value for compute ring timeout, set timeout to be the same
+        * as gfx ring timeout to avoid compute ring cannot detect an true
+        * hang.
+        */
+       if ((amdgpu_sriov_vf(adev) || amdgpu_passthrough(adev)) &&
+               (adev->compute_timeout == MAX_SCHEDULE_TIMEOUT))
+               adev->compute_timeout = adev->gfx_timeout;
+
        if (amdgpu_sriov_vf(adev))
                adev->virt.ops = &xgpu_ai_virt_ops;

_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to