Thank you for your contribution! Sashiko AI review found 1 potential issue(s) 
to consider:
- [Medium] Unconditional emission of spurious zero-delta RAS netlink events for 
unsupported or uncounted errors.
--

commit 64cb8a4b396dc586545ef894e855aeb177686e56
Author: Riana Tauro <[email protected]>

drm/xe/xe_drm_ras: Add error-event support for PVC

This patch reports drm_ras error events to userspace when an error occurs.
It also adds support for core-compute and SoC errors in PVC.

> diff --git a/drivers/gpu/drm/xe/xe_hw_error.c 
> b/drivers/gpu/drm/xe/xe_hw_error.c
> index 4a4b363fc844b..a833cecc74ec5 100644
> --- a/drivers/gpu/drm/xe/xe_hw_error.c
> +++ b/drivers/gpu/drm/xe/xe_hw_error.c
[ ... ]
> @@ -495,6 +495,9 @@ static void hw_error_source_handler(struct xe_tile *tile, 
> const enum hardware_er
>                       gt_hw_error_handler(tile, hw_err, error_id);
>               if (err_bit == XE_SOC_ERROR)
>                       soc_hw_error_handler(tile, hw_err, error_id);
> +
> +             value = atomic_read(&info[error_id].counter);
> +             xe_drm_ras_event(xe, error_id, severity, value, GFP_ATOMIC);

[Severity: Medium]
Does this unconditionally emit spurious zero-delta RAS netlink events for
unsupported or uncounted errors?

When gt_hw_error_handler() or soc_hw_error_handler() are called, they might
not increment the error counter. For example, gt_hw_error_handler() returns
early for non-PVC platforms:

drivers/gpu/drm/xe/xe_hw_error.c:gt_hw_error_handler() {
        ...
        if (xe->info.platform != XE_PVC)
                return;
        ...
}

Similarly, soc_hw_error_handler() skips counter increments for correctable
errors:

drivers/gpu/drm/xe/xe_hw_error.c:soc_hw_error_handler() {
        ...
        if (hw_err == HARDWARE_ERROR_CORRECTABLE) {
                xe_mmio_write32(mmio, SOC_GLOBAL_ERR_STAT_REG(master, hw_err), 
REG_GENMASK(31, 0));
                ...
                goto unmask_gsysevtctl;
        }
        ...
}

In these cases, xe_drm_ras_event() is still called with the unchanged counter
value. Could this cause userspace to receive confusing zero-delta event
updates?

>       }
>  
>  clear_reg:

-- 
Sashiko AI review ยท 
https://sashiko.dev/#/patchset/[email protected]?part=2

Reply via email to