[AMD Official Use Only - AMD Internal Distribution Only]
> -----Original Message-----
> From: Zhou1, Tao <[email protected]>
> Sent: Wednesday, April 2, 2025 11:02 PM
> To: Skvortsov, Victor <[email protected]>;
> [email protected]
> Cc: Zhang, Hawking <[email protected]>; Zhao, Victor
> <[email protected]>
> Subject: RE: [PATCH] drm/amdgpu: Disable ACA on VFs
>
> [AMD Official Use Only - AMD Internal Distribution Only]
>
> > -----Original Message-----
> > From: Skvortsov, Victor <[email protected]>
> > Sent: Thursday, April 3, 2025 6:16 AM
> > To: [email protected]
> > Cc: Zhang, Hawking <[email protected]>; Zhao, Victor
> > <[email protected]>; Zhou1, Tao <[email protected]>; Skvortsov,
> > Victor <[email protected]>
> > Subject: [PATCH] drm/amdgpu: Disable ACA on VFs
> >
> > VFs query RAS error counts directly from host with
> > AMDGPU_RAS_VIRT_ERROR_COUNT_QUERY. When ACA is enabled, an
> unusable
> > aca_sysfs is created rather than amdgpu_ras_sysfs_create()
> >
> > Likewise, VFs depend on host support to query CPERs, rather than ACA
> component.
> >
> > Signed-off-by: Victor Skvortsov <[email protected]>
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c | 4 ++--
> > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 10 ++++++----
> > 2 files changed, 8 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
> > index 360e07a5c7c1..5a234eadae8b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
> > @@ -549,7 +549,7 @@ int amdgpu_cper_init(struct amdgpu_device *adev) {
> > int r;
> >
> > - if (!amdgpu_aca_is_enabled(adev))
> > + if (!amdgpu_aca_is_enabled(adev) &&
> > + !amdgpu_sriov_ras_cper_en(adev))
>
> [Tao] can we put amdgpu_sriov_ras_cper_en into amdgpu_aca_is_enabled?
[Victor] This will cause problems inside amdgpu_ras_sysfs_create() since VFs
use the legacy sysfs to report IP block error counts through
AMDGPU_RAS_VIRT_ERROR_COUNT_QUERY.
>
> > return 0;
> >
> > r = amdgpu_cper_ring_init(adev); @@ -568,7 +568,7 @@ int
> > amdgpu_cper_init(struct amdgpu_device *adev)
> >
> > int amdgpu_cper_fini(struct amdgpu_device *adev) {
> > - if (!amdgpu_aca_is_enabled(adev))
> > + if (!amdgpu_aca_is_enabled(adev) &&
> > + !amdgpu_sriov_ras_cper_en(adev))
> > return 0;
> >
> > adev->cper.enabled = false;
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > index ebf1f63d0442..5bb7673fd28e 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > @@ -3794,10 +3794,12 @@ static void amdgpu_ras_check_supported(struct
> > amdgpu_device *adev)
> > adev->ras_hw_enabled & amdgpu_ras_mask;
> >
> > /* aca is disabled by default except for psp
> > v13_0_6/v13_0_12/v13_0_14 */
> > - adev->aca.is_enabled =
> > - (amdgpu_ip_version(adev, MP0_HWIP, 0) == IP_VERSION(13, 0, 6)
> > ||
> > - amdgpu_ip_version(adev, MP0_HWIP, 0) == IP_VERSION(13, 0, 12)
> > ||
> > - amdgpu_ip_version(adev, MP0_HWIP, 0) == IP_VERSION(13, 0,
> > 14));
> > + if (!amdgpu_sriov_vf(adev)) {
> > + adev->aca.is_enabled =
> > + (amdgpu_ip_version(adev, MP0_HWIP, 0) ==
> > IP_VERSION(13, 0, 6) ||
> > + amdgpu_ip_version(adev, MP0_HWIP, 0) ==
> > IP_VERSION(13, 0, 12) ||
> > + amdgpu_ip_version(adev, MP0_HWIP, 0) ==
> > IP_VERSION(13, 0, 14));
> > + }
> >
> > /* bad page feature is not applicable to specific app platform */
> > if (adev->gmc.is_app_apu &&
> > --
> > 2.34.1
>