Thanks you all

Let's keep pushing on hardware team ... hope MC or RLCV side could discover 
something interesting 

/Monk

-----Original Message-----
From: Christian König [mailto:[email protected]] 
Sent: 2018年3月28日 1:18
To: Alex Deucher <[email protected]>; Koenig, Christian 
<[email protected]>
Cc: Deng, Emily <[email protected]>; amd-gfx list 
<[email protected]>; Liu, Monk <[email protected]>
Subject: Re: [PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV

Am 27.03.2018 um 18:56 schrieb Alex Deucher:
> On Tue, Mar 27, 2018 at 12:30 PM, Christian König 
> <[email protected]> wrote:
>> Am 27.03.2018 um 17:52 schrieb Alex Deucher:
>>> [SNIP]
>>>>> 2. add the new callback implementation to gfx9 and gfx8 (I think 
>>>>> gfx8 will need this as well since we support sr-iov there too)
>>>>
>>>> gfx8 doesn't have the hardware bug which seems to make this 
>>>> necessary, not does it have the same VMHUB design as gfx9.
>>> Oh, right, in this case it's the req/ack engines which were new for 
>>> soc15.  We may want the same fix for sdma4 though.
>>
>> And exactly that is one of the reasons why this workaround doesn't 
>> work correctly.
>>
>> The SDMA is not directly connected to the GFXHUB, so even if the SDMA 
>> would provide a single command for this the write/wait would still be 
>> executed as two operations.
> I'm not sure I follow.  I think there are two issues: the hw bug you 
> are referring to and the SR-IOV requirement that the req and the ack 
> can't be split by a world switch.  I believe the world switch happens 
> at at least packet granularity so I think for the SR-IOV requirement 
> using a single packet should handle it.

The problem is to me it looks like there is no SR-IOV requirement to not split 
the req and ack. The hardware is duplicated per VF and I suggested to Emily to 
test my multiple write workaround.

Since I didn't heard back I strongly assume that this worked as well and that 
can only mean that we are indeed running into the same hw issue again.

That in turn means that not only the GFXHUB is affect, but ANY (GRBM?) register 
write could be silently dropped. I can't imagine how we want to build a stable 
driver around this.

I unfortunately can't reliable reproduce the issue on bare metal any more. It 
would probably best if we could setup a call with some of the hardware guys to 
come up with a plan to narrow down this issue further.

Regards,
Christian.

>
>> In other words we can again run into the problem and the same thing 
>> applies for CPU based updates.
> yeah, CPU based updates could indeed be an issue for the SR-IOV 
> requirement, but in that case it's easier to read back and retry.
>
> Alex
>
>
>> The only real workaround would be to write the request, read the 
>> register back and if the write didn't succeeded write it again.
>>
>> But seriously remember that this issue is not limited to the VMHUB 
>> registers. Do you want to write and read back every register to make 
>> sure the write succeeded?
>>
>> Regards,
>> Christian.
> _______________________________________________
> amd-gfx mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to