On Mon, May 13, 2024 at 08:10:32PM +0200, Florian Obser wrote:
> OCR'ed and edited a bit, there might be mistakes.
> Picture: https://dump.sha256.net/dump/unhibernating_panic.jpg
>
> unhibernating & block 50329599 Length 243MB
> uvm_fault(0xffffffff826b2860, 0x38, 0, 1) →> e
> kernel: page fault trap, code=0
> Stopped at ttm_resource_manager_evict_all+0x5e: cmpq %rbx, 0x38(%r14)
> TID PID UID PRFLAGS PFLAGS CPU COMMAND
> * 0 0 0 0x100000 0x20 0K swapper
> ttm_resource_manager_evict_all(ffff80000017f260,0,dba63e95861e671,ffff800000170000,ffff800000170058,2)
> at ttm_resource_
> manager_evict_all+0x5e
> amdgpu_device_prepare(ffff800000170058, ffff800000170058, fac0345246af 9871,
> ffff800000170058,0,2) at amdgpu_device_prepare
> +0x61
> amdgpu_activate(ffff800000170000, 2, b6a78044d3a303c5,0, ffff80000014400,
> fffffff f8228acc8) at amdgpu_activate+0x55
> config_activate_children(ffff800000144c00,2,172aac03cc1e?5dd,0,ffff80000014a000,2)
> at config_activate_children+0x85
> config_activate_children(ffff80000014a000,2,172aac03cc1e75dd,0,ffff800000144100,2)
> at config_activate_children+0x85
> config_activate_children(ffff800000144100,2,172aac03ccle75dd,0,
> ffff800000030280,2) at config_activate_chiLdren+0x85
> config_activate_children(ffff800000030280,2,172aac03cc1e7256,2,ffff800000030280,0)
> config_suspend_all (2,2,72519cb31f5203, fffffff f82a94a38,0,bfff50) at
> config_suspend_all+0x1ae
> hibernate_resume(8c03129a1118d1c,ffffffff82a9460,ffff800000142200,0.0,0) at
> hibernate_resume+0x1b4
> diskconf (25badalafa9d6262,8, ffffffff82538360,
> ffffffff82a8008,400056f4b50,8) at diskconf+0x188
> main(0,0,1001000, ffff800037c871f0,ffffffff81fda030,ffffffff82a94f40) at
> main+0x510
>
> I've bisected it to this changeset:
> https://codeberg.org/OpenBSD/src/commit/36668b1581688d40ad5fd6631f4f503e6d36091d
>
> suspend / resume seems to be unaffected by this, reverting makes
> hibernate / unhibernate work again.
hibernate does DVACT_QUIESCE/DVACT_SUSPEND from
diskconf()/hibernate_resume() before config_process_deferred_mountroot()
attaches most of the driver. So don't attempt to do anything.
Index: sys/dev/pci/drm/amd/amdgpu/amdgpu_drv.c
===================================================================
RCS file: /cvs/src/sys/dev/pci/drm/amd/amdgpu/amdgpu_drv.c,v
diff -u -p -r1.43 amdgpu_drv.c
--- sys/dev/pci/drm/amd/amdgpu/amdgpu_drv.c 11 Apr 2024 03:24:40 -0000
1.43
+++ sys/dev/pci/drm/amd/amdgpu/amdgpu_drv.c 14 May 2024 02:50:02 -0000
@@ -3665,7 +3665,7 @@ amdgpu_activate(struct device *self, int
struct drm_device *dev = &adev->ddev;
int rv = 0;
- if (dev->dev == NULL || amdgpu_fatal_error)
+ if (dev->dev == NULL || amdgpu_fatal_error || adev->shutdown)
return (0);
switch (act) {