Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
Pre-existing issues: - [High] Missing rollback of hardware state in suspend paths when operations fail, leaving the device hung. -- commit ed5c98b04aafc61a13ebb42fe49a1f15121c1b4d Author: Lyude Paul <[email protected]> drm/nouveau/gsp/r570: Never enter Gcoff state It turns out that the only reason our previous fixes looked like they worked for this was because we would occasionally set the Gcoff state to 0 in the normal S3 path, which fixed suspend/resume on desktops - but not on machines using runtime suspend. The proper fix is to just never set this flag. Our current guess for the reasoning behind this is that Gcoff likely coincides with GC6, and not literally power off. > diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c > b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c > index 2945d5b4e5707..af5aa5065c3dd 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c > +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c > @@ -81,7 +81,7 @@ r570_fbsr_init(struct nvkm_gsp *gsp, struct sg_table *sgt, > u64 size) > ctrl->hClient = gsp->internal.client.object.handle; > ctrl->hSysMem = memlist.handle; > ctrl->sysmemAddrOfSuspendResumeData = gsp->sr.meta.addr; > - ctrl->bEnteringGcoffState = 1; > + ctrl->bEnteringGcoffState = 0; > > ret = nvkm_gsp_rm_ctrl_wr(&gsp->internal.device.subdevice, ctrl); > if (ret) [Severity: High] This is a pre-existing issue, but I have a question about the suspend path's error handling. If nvkm_gsp_rm_ctrl_wr() fails here, r570_fbsr_init() returns an error back to r570_fbsr_suspend(): drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c:r570_fbsr_suspend() { ... r570_fbsr_suspend_channels(gsp, true); ... device->bar->bar2 = false; ... ret = r570_fbsr_init(gsp, &gsp->sr.fbsr, size); if (ret) { nvkm_gsp_sg_free(device, &gsp->sr.fbsr); return ret; } } The same error path exists earlier in r570_fbsr_suspend() if nvkm_instobj_save() or nvkm_gsp_sg() fail. Since the PM core doesn't call the resume callback for devices that fail to suspend, does this leave the GPU hung with suspended channels and BAR2 disabled? Would it make sense to restore the channel and BAR state on error paths in r570_fbsr_suspend() before returning? > return ret; > > nvkm_gsp_rm_free(&memlist); > return 0; > } -- Sashiko AI review ยท https://sashiko.dev/#/patchset/[email protected]?part=2
