On Mon, 2023-05-15 at 10:43 +0930, Christian Gelinek wrote:
> Hi,
> 
> I encountered my Debian frozen this morning. This is the 2nd time this 
> happened, the 1st one was on April 10, with very similar symptoms: The 
> PC was still running, but moving the mouse or typing didn't wake up my 
> screens and I couldn't connect to it via SSH.
> 
> After force-rebooting, I had a look at journalctl and these are the 
> messages before the reboot:
> 
> May 14 00:00:09 gar systemd[1]: Starting cups.service - CUPS Scheduler...
> May 14 00:00:09 gar audit[2912]: AVC apparmor="DENIED" 
> operation="capable" profile="/usr/sbin/cupsd" pid=2912 comm="cupsd" 
> capability=12  capname="net_admin"
> May 14 00:00:09 gar systemd[1]: Started cups.service - CUPS Scheduler.
> May 14 00:00:09 gar kernel: audit: type=1400 audit(1683988209.079:32): 
> apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=2912 
> comm="cupsd" capability=12  capname="net_admin"
> May 14 00:00:09 gar systemd[1]: Started cups-browsed.service - Make 
> remote CUPS printers available locally.
> May 14 00:00:09 gar systemd[1]: logrotate.service: Deactivated successfully.
> May 14 00:00:09 gar systemd[1]: Finished logrotate.service - Rotate log 
> files.
> May 14 00:17:01 gar CRON[2929]: pam_unix(cron:session): session opened 
> for user root(uid=0) by (uid=0)
> May 14 00:17:01 gar CRON[2930]: (root) CMD (cd / && run-parts --report 
> /etc/cron.hourly)
> May 14 00:17:01 gar CRON[2929]: pam_unix(cron:session): session closed 
> for user root
> May 14 00:54:00 gar kernel: snd_hda_intel 0000:04:00.0: Unable to change 
> power state from D3hot to D0, device inaccessible
> May 14 00:54:03 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
> *ERROR* render: timed out waiting for forcewake ack to clear.
> May 14 00:54:03 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI 
> [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
> May 14 00:54:07 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
> *ERROR* render: timed out waiting for forcewake ack to clear.
> May 14 00:54:07 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI 
> [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
> May 14 00:54:11 gar kernel: hrtimer: interrupt took 252466383 ns
> May 14 00:54:11 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
> *ERROR* render: timed out waiting for forcewake ack to clear.
> May 14 00:54:11 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI 
> [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
> May 14 00:54:16 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
> *ERROR* gt: timed out waiting for forcewake ack to clear.
> May 14 00:54:16 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI 
> [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
> May 14 00:54:17 gar kernel: i915 0000:03:00.0: [drm] *ERROR* CT: 
> Corrupted descriptor head=4294967295 tail=4294967295 status=0xffffffff
> May 14 00:54:26 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
> *ERROR* render: timed out waiting for forcewake ack to clear.
> May 14 00:54:26 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI 
> [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
> May 14 00:54:26 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
> *ERROR* gt: timed out waiting for forcewake ack to clear.
> May 14 00:54:26 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI 
> [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
> May 14 00:54:26 gar kernel: watchdog: BUG: soft lockup - CPU#15 stuck 
> for 26s! [kworker/15:1:233]
> May 14 00:54:26 gar kernel: Modules linked in: snd_seq_dummy snd_hrtimer 
> snd_seq snd_seq_device nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 
> dns_resolver nfs lockd grace fscache netfs rfkill qrtr sunrpc 
> binfmt_misc nls_ascii nls_cp437 vfat fat snd_sof_pci_>
> May 14 00:54:26 gar kernel:  intel_uncore ee1004 pcspkr watchdog snd 
> soundcore intel_vsec serial_multi_instantiate acpi_pad intel_pmc_core 
> acpi_tad mei_me sg mei evdev parport_pc ppdev lp parport fuse loop 
> efi_pstore configfs efivarfs ip_tables x_tables autof>
> May 14 00:54:26 gar kernel: CPU: 15 PID: 233 Comm: kworker/15:1 Tainted: 
> G     U  W          6.1.0-8-amd64 #1  Debian 6.1.25-1
> May 14 00:54:26 gar kernel: Hardware name: Micro-Star International Co., 
> Ltd. MS-7E02/PRO B760M-P DDR4 (MS-7E02), BIOS 1.00 10/21/2022
> May 14 00:54:26 gar kernel: Workqueue: pm pm_runtime_work
> May 14 00:54:26 gar kernel: RIP: 0010:pci_mmcfg_read+0xb0/0xe0
> May 14 00:54:26 gar kernel: Code: 5d 41 5e 41 5f c3 cc cc cc cc 4c 01 e0 
> 66 8b 00 0f b7 c0 89 45 00 eb dc 4c 01 e0 8a 00 0f b6 c0 89 45 00 eb cf 
> 4c 01 e0 8b 00 <89> 45 00 eb c5 e8 66 a2 78 ff c7 45 00 ff ff ff ff b8 
> ea ff ff ff
> May 14 00:54:26 gar kernel: RSP: 0018:ffffa9d000947cc0 EFLAGS: 00000286
> May 14 00:54:26 gar kernel: RAX: 00000000ffffffff RBX: 0000000000400000 
> RCX: 0000000000000ffc
> May 14 00:54:26 gar kernel: RDX: 00000000000000ff RSI: 0000000000000004 
> RDI: 0000000000000000
> May 14 00:54:26 gar kernel: RBP: ffffa9d000947cfc R08: 0000000000000004 
> R09: ffffa9d000947cfc
> May 14 00:54:26 gar kernel: R10: 0000000000000004 R11: ffffffffbb7a6b80 
> R12: 0000000000000ffc
> May 14 00:54:26 gar kernel: R13: 0000000000000000 R14: 0000000000000004 
> R15: 0000000000000000
> May 14 00:54:26 gar kernel: FS:  0000000000000000(0000) 
> GS:ffff967f1fbc0000(0000) knlGS:0000000000000000
> May 14 00:54:26 gar kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 
> 0000000080050033
> May 14 00:54:26 gar kernel: CR2: 000055ba02054018 CR3: 0000000109b4c004 
> CR4: 0000000000770ee0
> May 14 00:54:26 gar kernel: PKRU: 55555554
> May 14 00:54:26 gar kernel: Call Trace:
> May 14 00:54:26 gar kernel:  <TASK>
> May 14 00:54:26 gar kernel:  pci_bus_read_config_dword+0x46/0x80
> May 14 00:54:26 gar kernel:  pci_find_next_ext_capability+0x82/0xe0
> May 14 00:54:26 gar kernel:  ? pci_conf1_read+0x9b/0xf0
> May 14 00:54:26 gar kernel:  pci_restore_state.part.0+0x5d/0x3a0
> May 14 00:54:26 gar kernel:  pci_pm_runtime_resume+0x41/0xe0
> May 14 00:54:26 gar kernel:  ? pci_pm_restore_noirq+0xc0/0xc0
> May 14 00:54:26 gar kernel:  __rpm_callback+0x41/0x170
> May 14 00:54:26 gar kernel:  ? pci_pm_restore_noirq+0xc0/0xc0
> May 14 00:54:26 gar kernel:  rpm_callback+0x5d/0x70
> May 14 00:54:26 gar kernel:  ? pci_pm_restore_noirq+0xc0/0xc0
> May 14 00:54:26 gar kernel:  rpm_resume+0x5df/0x820
> May 14 00:54:26 gar kernel:  pm_runtime_work+0x6c/0xa0
> May 14 00:54:26 gar kernel:  process_one_work+0x1c4/0x380
> May 14 00:54:26 gar kernel:  worker_thread+0x4d/0x380
> May 14 00:54:26 gar kernel:  ? _raw_spin_lock_irqsave+0x23/0x50
> May 14 00:54:26 gar kernel:  ? rescuer_thread+0x3a0/0x3a0
> May 14 00:54:26 gar kernel:  kthread+0xe6/0x110
> May 14 00:54:26 gar kernel:  ? kthread_complete_and_exit+0x20/0x20
> May 14 00:54:26 gar kernel:  ret_from_fork+0x1f/0x30
> May 14 00:54:26 gar kernel:  </TASK>
> -- Boot 846264f027214bbfbb81c66db4ff1c81 --
> 
> It seems to be an issue with the i915 driver, potentially triggered by 
> snd_hda_intel.
> 
> `sudo lspci -v` reports (among others):
> 
> 03:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A750] (rev 
> 08) (prog-if 00 [VGA controller])
>          Subsystem: Intel Corporation DG2 [Arc A750]
>          Flags: bus master, fast devsel, latency 0, IRQ 153, IOMMU group 14
>          Memory at 80000000 (64-bit, non-prefetchable) [size=16M]
>          Memory at 4000000000 (64-bit, prefetchable) [size=8G]
>          Expansion ROM at 81000000 [disabled] [size=2M]
>          Capabilities: [40] Vendor Specific Information: Len=0c <?>
>          Capabilities: [70] Express Endpoint, MSI 00
>          Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
>          Capabilities: [d0] Power Management version 3
>          Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
>          Capabilities: [420] Physical Resizable BAR
>          Capabilities: [400] Latency Tolerance Reporting
>          Kernel driver in use: i915
>          Kernel modules: i915
> 
> 00:1f.3 Audio device: Intel Corporation Device 7a50 (rev 11)
>          DeviceName: Onboard - Sound
>          Subsystem: Micro-Star International Co., Ltd. [MSI] Device 9e02
>          Flags: bus master, fast devsel, latency 32, IRQ 158, IOMMU group 10
>          Memory at 4200920000 (64-bit, non-prefetchable) [size=16K]
>          Memory at 4200800000 (64-bit, non-prefetchable) [size=1M]
>          Capabilities: [50] Power Management version 3
>          Capabilities: [80] Vendor Specific Information: Len=14 <?>
>          Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+
>          Kernel driver in use: snd_hda_intel
>          Kernel modules: snd_hda_intel, snd_sof_pci_intel_tgl
> 
> I'm using firmware-misc-nonfree version 20230210-4,
> `sudo dmesg |grep i915` returns
> 
> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.1.0-8-amd64 
> root=/dev/mapper/gar--vg-root ro quiet i915.force_probe=56a1
> [    0.018130] Kernel command line: BOOT_IMAGE=/vmlinuz-6.1.0-8-amd64 
> root=/dev/mapper/gar--vg-root ro quiet i915.force_probe=56a1
> [    1.379955] i915 0000:03:00.0: [drm] Incompatible option enable_guc=3 
> - HuC is not supported!
> [    1.380780] i915 0000:03:00.0: [drm] VT-d active for gfx access
> [    1.380845] i915 0000:03:00.0: vgaarb: deactivate vga console
> [    1.380869] i915 0000:03:00.0: [drm] Local memory IO size: 
> 0x00000001fc000000
> [    1.380870] i915 0000:03:00.0: [drm] Local memory available: 
> 0x00000001fc000000
> [    1.393505] i915 0000:03:00.0: vgaarb: changed VGA decodes: 
> olddecodes=io+mem,decodes=io+mem:owns=none
> [    1.393643] i915 0000:03:00.0: firmware: direct-loading firmware 
> i915/dg2_dmc_ver2_07.bin
> [    1.396144] i915 0000:03:00.0: [drm] Finished loading DMC firmware 
> i915/dg2_dmc_ver2_07.bin (v2.7)
> [    1.404739] i915 0000:03:00.0: firmware: direct-loading firmware 
> i915/dg2_guc_70.bin
> [    1.484762] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist 
> Class(1):Compute(4)!
> [    1.484763] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist 
> Instance(2):Compute(4)!
> [    1.487222] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist 
> Class(1):Compute(4)!
> [    1.487223] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist 
> Instance(2):Compute(4)!
> [    1.488237] i915 0000:03:00.0: [drm] GuC firmware i915/dg2_guc_70.bin 
> version 70.5.1
> [    1.488347] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist 
> Class(1):Compute(4)!
> [    1.488348] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist 
> Instance(2):Compute(4)!
> [    1.500565] i915 0000:03:00.0: [drm] GuC submission enabled
> [    1.500565] i915 0000:03:00.0: [drm] GuC SLPC enabled
> [    1.500891] i915 0000:03:00.0: [drm] GuC RC: enabled
> [    1.521026] [drm] Initialized i915 1.6.0 20201103 for 0000:03:00.0 on 
> minor 0
> [    2.234182] fbcon: i915drmfb (fb0) is primary device
> [    2.326912] i915 0000:03:00.0: [drm] fb0: i915drmfb frame buffer device
> [    4.824372] snd_hda_intel 0000:04:00.0: bound 0000:03:00.0 (ops 
> i915_audio_component_bind_ops [i915])
> 
> Is anyone else seeing a similar problem? What can I do to avoid this? Do 
> we need anything else to narrow it down further?
> 
> Thanks for your time!
> 

Hi,

A little research shows that this is not that uncommon. A suggested workaround 
is to disable the
power management for the device as follows.

Create a file (such as): /etc/modprobe.d/snd-intel-disable-power-management.conf

Add the following line: options snd_hda_intel power_save=0

Reboot.

Hopefully this may assist.

Regards

Phil

-- 
*** Playing the game for the games own sake. ***


Associations:

* Debian Maintainer (DM)
* Fedora/EPEL Maintainer.
* Contributor member of the AlmaLinux foundation.

WWW: https://kathenas.org

Buy Me a Coffee: https://www.buymeacoffee.com/kathenasorg

Twitter: @kathenasorg

Instagram: @kathenasorg

IRC: kathenas

GPG: 724AA9B52F024C8B

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to