[Public]

-----Original Message-----
From: Kuehling, Felix <[email protected]>
Sent: Saturday, October 22, 2022 4:53 AM
To: Liang, Prike <[email protected]>; [email protected]
Cc: Deucher, Alexander <[email protected]>; Zhang, Yifan 
<[email protected]>; Huang, Ray <[email protected]>; Liu, Aaron 
<[email protected]>
Subject: Re: [PATCH 1/2] drm/amdkfd: introduce dummy cache info for property 
asic

On 2022-10-21 09:05, Liang, Prike wrote:
> [Public]
>
> -----Original Message-----
> From: Kuehling, Felix <[email protected]>
> Sent: Friday, October 21, 2022 1:11 PM
> To: Liang, Prike <[email protected]>; [email protected]
> Cc: Deucher, Alexander <[email protected]>; Zhang, Yifan
> <[email protected]>; Huang, Ray <[email protected]>; Liu, Aaron
> <[email protected]>
> Subject: Re: [PATCH 1/2] drm/amdkfd: introduce dummy cache info for
> property asic
>
> Am 2022-10-20 um 21:50 schrieb Liang, Prike:
>> [Public]
>>
>> -----Original Message-----
>> From: Kuehling, Felix <[email protected]>
>> Sent: Friday, October 21, 2022 12:03 AM
>> To: Liang, Prike <[email protected]>; [email protected]
>> Cc: Deucher, Alexander <[email protected]>; Zhang, Yifan
>> <[email protected]>; Huang, Ray <[email protected]>; Liu, Aaron
>> <[email protected]>
>> Subject: Re: [PATCH 1/2] drm/amdkfd: introduce dummy cache info for
>> property asic
>>
>>
>> Am 2022-10-20 um 05:15 schrieb Prike Liang:
>>> This dummy cache info will enable kfd base function support.
>>>
>>> Signed-off-by: Prike Liang <[email protected]>
>>> ---
>>>     drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 55 +++++++++++++++++++++++++--
>>>     1 file changed, 52 insertions(+), 3 deletions(-)
>>>
[snip]
>>> @@ -1528,7 +1574,10 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev 
>>> *kdev,
>>>                                 
>>> kfd_fill_gpu_cache_info_from_gfx_config(kdev, pcache_info);
>>>                         break;
>>>                 default:
>>> -                     return -EINVAL;
>>> +                     pcache_info = dummy_cache_info;
>>> +                     num_of_cache_types = ARRAY_SIZE(dummy_cache_info);
>>> +                     pr_warn("dummy cache info is used temporarily and 
>>> real cache info need update later.\n");
>>> +                     break;
>> Could we make this return an error if the amdgpu.exp_hw_support module 
>> parameter is not set?
>>
>> Regards,
>>      Felix
>>
>> [Prike] It's fine to protect this dummy info by checking the parameter 
>> amdgpu_exp_hw_support, but it may not friendly to end user by adding the 
>> parameter and some guys will still report KFD not enabled for this parameter 
>> setting problem. The original idea is the end user will not aware the dummy 
>> cache info and only alert the warning message to developer.
> I thought the intention was to simplify bring-up but make sure that valid 
> cache info is available by the time a chip goes into production.
> Therefore, normal end users should never need to set the 
> amdgpu_exp_hw_support option. I think you're saying that we would go to 
> production with dummy info. That seems like a bad idea to me.
>
> Regards,
>     Felix
>
> [Prike]  Sorry for the confusion. In fact, this dummy cache info only used 
> before production and meanwhile needn't require any parameter setting for CQE 
> do KFD test. Anyway if you still have concern on this solution I will add the 
> condition of checking amdgpu_exp_hw_support.

The idea to control this with a module parameter was to cause a more obvious 
failure if we don't have correct cache info before going to production. Just a 
warning in the log file is too easy to miss or ignore. Of course, if QA gets 
into the habit of testing with amdgpu_exp_hw_support, then this may not solve 
the problem. You need to have at least one test pass without 
amdgpu_exp_hw_support to catch missing cache info.

Regards,
   Felix

Get your point. As to the KFD support on a NPI will be tracked by a ticket 
which make sure the real cache info update later after this dummy cache info 
assigned in the early BU phase.

Thanks,
Prike

Reply via email to