Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider: - [High] Potential console flood due to unratelimited `drm_err` in atomic context when netlink multicast fails. - [Medium] Emitting RAS events with unchanged/zero counter values for uncounted errors. - [Medium] Pointer arithmetic is performed on a potentially NULL pointer before checking if the DRM RAS feature is enabled. --
commit 948141f8cea37215cdae8a5d0e089b0c6b181396 Author: Riana Tauro <[email protected]> drm/xe/xe_drm_ras: Add error-event support for PVC Report drm_ras error event to userspace when an error occurs. Add support for core-compute and SoC errors in PVC. > diff --git a/drivers/gpu/drm/xe/xe_drm_ras.c b/drivers/gpu/drm/xe/xe_drm_ras.c > index 7937d8ba0ed9c..36afdfb5e4120 100644 > --- a/drivers/gpu/drm/xe/xe_drm_ras.c > +++ b/drivers/gpu/drm/xe/xe_drm_ras.c [ ... ] > @@ -185,6 +185,36 @@ static int register_nodes(struct xe_device *xe) > return ret; > } > > +/** > + * xe_drm_ras_event() - Report drm-ras error event to userspace > + * @xe: xe device structure > + * @component: error component (see &enum drm_xe_ras_error_component) > + * @severity: error severity (see &enum drm_xe_ras_error_severity) > + * @value: value of error counter > + * @flags: flags for allocation > + * > + * Report an error-event to userspace. > + */ > +void xe_drm_ras_event(struct xe_device *xe, u32 component, u32 severity, u32 > value, gfp_t flags) > +{ > + struct xe_drm_ras *ras = &xe->ras; > + struct xe_drm_ras_counter *info = ras->info[severity]; > + struct drm_ras_node *node = &ras->node[severity]; [Severity: Medium] Is it safe to compute the node pointer here before checking if the DRM RAS feature is enabled? If xe->info.has_drm_ras is false, xe_drm_ras_init() might not have been called, leaving ras->node as NULL. Evaluating &ras->node[severity] on a NULL pointer results in undefined behavior which could be flagged by UBSAN or compiler optimizations. > + int ret; > + > + /* Event is supported only if drm_ras is enabled */ > + if (!xe->info.has_drm_ras) > + return; > + > + if (!info || !info[component].name) > + return; > + > + ret = drm_ras_nl_error_event(node, component, info[component].name, > value, flags); > + if (ret) > + drm_err(&xe->drm, "RAS error-event failed: %d for %s %s\n", ret, > + info[component].name, error_severity[severity]); [Severity: High] Could this unratelimited drm_err() cause a console flood? Since xe_drm_ras_event() is called from hw_error_source_handler() in hardware error interrupt context, an interrupt storm could occur. If drm_ras_nl_error_event() fails (e.g., due to GFP_ATOMIC allocation failures under memory pressure), this will rapidly print errors to the console. Should this use drm_err_ratelimited() similar to how the caller handles its logging? > +} > + > /** > * xe_drm_ras_init() - Initialize DRM RAS > * @xe: xe device instance > diff --git a/drivers/gpu/drm/xe/xe_hw_error.c > b/drivers/gpu/drm/xe/xe_hw_error.c > index 4a4b363fc844b..a833cecc74ec5 100644 > --- a/drivers/gpu/drm/xe/xe_hw_error.c > +++ b/drivers/gpu/drm/xe/xe_hw_error.c [ ... ] > @@ -495,6 +495,9 @@ static void hw_error_source_handler(struct xe_tile *tile, > const enum hardware_er > gt_hw_error_handler(tile, hw_err, error_id); > if (err_bit == XE_SOC_ERROR) > soc_hw_error_handler(tile, hw_err, error_id); > + > + value = atomic_read(&info[error_id].counter); > + xe_drm_ras_event(xe, error_id, severity, value, GFP_ATOMIC); [Severity: Medium] Will this emit an event with an unchanged or zero counter value for correctable SOC errors? In soc_hw_error_handler(), if the error is HARDWARE_ERROR_CORRECTABLE, it clears the registers and jumps to unmask_gsysevtctl without incrementing the error counter: soc_hw_error_handler() { ... if (hw_err == HARDWARE_ERROR_CORRECTABLE) { ... goto unmask_gsysevtctl; } ... } However, here in hw_error_source_handler(), xe_drm_ras_event() is called unconditionally after soc_hw_error_handler() returns. Does this result in userspace receiving an error event where the counter value hasn't actually changed? > } > > clear_reg: -- Sashiko AI review ยท https://sashiko.dev/#/patchset/[email protected]?part=2
