On Mon, 2023-05-15 at 10:43 +0930, Christian Gelinek wrote: > Hi, > > I encountered my Debian frozen this morning. This is the 2nd time this > happened, the 1st one was on April 10, with very similar symptoms: The > PC was still running, but moving the mouse or typing didn't wake up my > screens and I couldn't connect to it via SSH. > > After force-rebooting, I had a look at journalctl and these are the > messages before the reboot: > > May 14 00:00:09 gar systemd[1]: Starting cups.service - CUPS Scheduler... > May 14 00:00:09 gar audit[2912]: AVC apparmor="DENIED" > operation="capable" profile="/usr/sbin/cupsd" pid=2912 comm="cupsd" > capability=12 capname="net_admin" > May 14 00:00:09 gar systemd[1]: Started cups.service - CUPS Scheduler. > May 14 00:00:09 gar kernel: audit: type=1400 audit(1683988209.079:32): > apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=2912 > comm="cupsd" capability=12 capname="net_admin" > May 14 00:00:09 gar systemd[1]: Started cups-browsed.service - Make > remote CUPS printers available locally. > May 14 00:00:09 gar systemd[1]: logrotate.service: Deactivated successfully. > May 14 00:00:09 gar systemd[1]: Finished logrotate.service - Rotate log > files. > May 14 00:17:01 gar CRON[2929]: pam_unix(cron:session): session opened > for user root(uid=0) by (uid=0) > May 14 00:17:01 gar CRON[2930]: (root) CMD (cd / && run-parts --report > /etc/cron.hourly) > May 14 00:17:01 gar CRON[2929]: pam_unix(cron:session): session closed > for user root > May 14 00:54:00 gar kernel: snd_hda_intel 0000:04:00.0: Unable to change > power state from D3hot to D0, device inaccessible > May 14 00:54:03 gar kernel: [drm:fw_domains_get_with_fallback [i915]] > *ERROR* render: timed out waiting for forcewake ack to clear. > May 14 00:54:03 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI > [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915] > May 14 00:54:07 gar kernel: [drm:fw_domains_get_with_fallback [i915]] > *ERROR* render: timed out waiting for forcewake ack to clear. > May 14 00:54:07 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI > [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915] > May 14 00:54:11 gar kernel: hrtimer: interrupt took 252466383 ns > May 14 00:54:11 gar kernel: [drm:fw_domains_get_with_fallback [i915]] > *ERROR* render: timed out waiting for forcewake ack to clear. > May 14 00:54:11 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI > [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915] > May 14 00:54:16 gar kernel: [drm:fw_domains_get_with_fallback [i915]] > *ERROR* gt: timed out waiting for forcewake ack to clear. > May 14 00:54:16 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI > [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915] > May 14 00:54:17 gar kernel: i915 0000:03:00.0: [drm] *ERROR* CT: > Corrupted descriptor head=4294967295 tail=4294967295 status=0xffffffff > May 14 00:54:26 gar kernel: [drm:fw_domains_get_with_fallback [i915]] > *ERROR* render: timed out waiting for forcewake ack to clear. > May 14 00:54:26 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI > [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915] > May 14 00:54:26 gar kernel: [drm:fw_domains_get_with_fallback [i915]] > *ERROR* gt: timed out waiting for forcewake ack to clear. > May 14 00:54:26 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI > [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915] > May 14 00:54:26 gar kernel: watchdog: BUG: soft lockup - CPU#15 stuck > for 26s! [kworker/15:1:233] > May 14 00:54:26 gar kernel: Modules linked in: snd_seq_dummy snd_hrtimer > snd_seq snd_seq_device nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 > dns_resolver nfs lockd grace fscache netfs rfkill qrtr sunrpc > binfmt_misc nls_ascii nls_cp437 vfat fat snd_sof_pci_> > May 14 00:54:26 gar kernel: intel_uncore ee1004 pcspkr watchdog snd > soundcore intel_vsec serial_multi_instantiate acpi_pad intel_pmc_core > acpi_tad mei_me sg mei evdev parport_pc ppdev lp parport fuse loop > efi_pstore configfs efivarfs ip_tables x_tables autof> > May 14 00:54:26 gar kernel: CPU: 15 PID: 233 Comm: kworker/15:1 Tainted: > G U W 6.1.0-8-amd64 #1 Debian 6.1.25-1 > May 14 00:54:26 gar kernel: Hardware name: Micro-Star International Co., > Ltd. MS-7E02/PRO B760M-P DDR4 (MS-7E02), BIOS 1.00 10/21/2022 > May 14 00:54:26 gar kernel: Workqueue: pm pm_runtime_work > May 14 00:54:26 gar kernel: RIP: 0010:pci_mmcfg_read+0xb0/0xe0 > May 14 00:54:26 gar kernel: Code: 5d 41 5e 41 5f c3 cc cc cc cc 4c 01 e0 > 66 8b 00 0f b7 c0 89 45 00 eb dc 4c 01 e0 8a 00 0f b6 c0 89 45 00 eb cf > 4c 01 e0 8b 00 <89> 45 00 eb c5 e8 66 a2 78 ff c7 45 00 ff ff ff ff b8 > ea ff ff ff > May 14 00:54:26 gar kernel: RSP: 0018:ffffa9d000947cc0 EFLAGS: 00000286 > May 14 00:54:26 gar kernel: RAX: 00000000ffffffff RBX: 0000000000400000 > RCX: 0000000000000ffc > May 14 00:54:26 gar kernel: RDX: 00000000000000ff RSI: 0000000000000004 > RDI: 0000000000000000 > May 14 00:54:26 gar kernel: RBP: ffffa9d000947cfc R08: 0000000000000004 > R09: ffffa9d000947cfc > May 14 00:54:26 gar kernel: R10: 0000000000000004 R11: ffffffffbb7a6b80 > R12: 0000000000000ffc > May 14 00:54:26 gar kernel: R13: 0000000000000000 R14: 0000000000000004 > R15: 0000000000000000 > May 14 00:54:26 gar kernel: FS: 0000000000000000(0000) > GS:ffff967f1fbc0000(0000) knlGS:0000000000000000 > May 14 00:54:26 gar kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > May 14 00:54:26 gar kernel: CR2: 000055ba02054018 CR3: 0000000109b4c004 > CR4: 0000000000770ee0 > May 14 00:54:26 gar kernel: PKRU: 55555554 > May 14 00:54:26 gar kernel: Call Trace: > May 14 00:54:26 gar kernel: <TASK> > May 14 00:54:26 gar kernel: pci_bus_read_config_dword+0x46/0x80 > May 14 00:54:26 gar kernel: pci_find_next_ext_capability+0x82/0xe0 > May 14 00:54:26 gar kernel: ? pci_conf1_read+0x9b/0xf0 > May 14 00:54:26 gar kernel: pci_restore_state.part.0+0x5d/0x3a0 > May 14 00:54:26 gar kernel: pci_pm_runtime_resume+0x41/0xe0 > May 14 00:54:26 gar kernel: ? pci_pm_restore_noirq+0xc0/0xc0 > May 14 00:54:26 gar kernel: __rpm_callback+0x41/0x170 > May 14 00:54:26 gar kernel: ? pci_pm_restore_noirq+0xc0/0xc0 > May 14 00:54:26 gar kernel: rpm_callback+0x5d/0x70 > May 14 00:54:26 gar kernel: ? pci_pm_restore_noirq+0xc0/0xc0 > May 14 00:54:26 gar kernel: rpm_resume+0x5df/0x820 > May 14 00:54:26 gar kernel: pm_runtime_work+0x6c/0xa0 > May 14 00:54:26 gar kernel: process_one_work+0x1c4/0x380 > May 14 00:54:26 gar kernel: worker_thread+0x4d/0x380 > May 14 00:54:26 gar kernel: ? _raw_spin_lock_irqsave+0x23/0x50 > May 14 00:54:26 gar kernel: ? rescuer_thread+0x3a0/0x3a0 > May 14 00:54:26 gar kernel: kthread+0xe6/0x110 > May 14 00:54:26 gar kernel: ? kthread_complete_and_exit+0x20/0x20 > May 14 00:54:26 gar kernel: ret_from_fork+0x1f/0x30 > May 14 00:54:26 gar kernel: </TASK> > -- Boot 846264f027214bbfbb81c66db4ff1c81 -- > > It seems to be an issue with the i915 driver, potentially triggered by > snd_hda_intel. > > `sudo lspci -v` reports (among others): > > 03:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A750] (rev > 08) (prog-if 00 [VGA controller]) > Subsystem: Intel Corporation DG2 [Arc A750] > Flags: bus master, fast devsel, latency 0, IRQ 153, IOMMU group 14 > Memory at 80000000 (64-bit, non-prefetchable) [size=16M] > Memory at 4000000000 (64-bit, prefetchable) [size=8G] > Expansion ROM at 81000000 [disabled] [size=2M] > Capabilities: [40] Vendor Specific Information: Len=0c <?> > Capabilities: [70] Express Endpoint, MSI 00 > Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+ > Capabilities: [d0] Power Management version 3 > Capabilities: [100] Alternative Routing-ID Interpretation (ARI) > Capabilities: [420] Physical Resizable BAR > Capabilities: [400] Latency Tolerance Reporting > Kernel driver in use: i915 > Kernel modules: i915 > > 00:1f.3 Audio device: Intel Corporation Device 7a50 (rev 11) > DeviceName: Onboard - Sound > Subsystem: Micro-Star International Co., Ltd. [MSI] Device 9e02 > Flags: bus master, fast devsel, latency 32, IRQ 158, IOMMU group 10 > Memory at 4200920000 (64-bit, non-prefetchable) [size=16K] > Memory at 4200800000 (64-bit, non-prefetchable) [size=1M] > Capabilities: [50] Power Management version 3 > Capabilities: [80] Vendor Specific Information: Len=14 <?> > Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+ > Kernel driver in use: snd_hda_intel > Kernel modules: snd_hda_intel, snd_sof_pci_intel_tgl > > I'm using firmware-misc-nonfree version 20230210-4, > `sudo dmesg |grep i915` returns > > [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.1.0-8-amd64 > root=/dev/mapper/gar--vg-root ro quiet i915.force_probe=56a1 > [ 0.018130] Kernel command line: BOOT_IMAGE=/vmlinuz-6.1.0-8-amd64 > root=/dev/mapper/gar--vg-root ro quiet i915.force_probe=56a1 > [ 1.379955] i915 0000:03:00.0: [drm] Incompatible option enable_guc=3 > - HuC is not supported! > [ 1.380780] i915 0000:03:00.0: [drm] VT-d active for gfx access > [ 1.380845] i915 0000:03:00.0: vgaarb: deactivate vga console > [ 1.380869] i915 0000:03:00.0: [drm] Local memory IO size: > 0x00000001fc000000 > [ 1.380870] i915 0000:03:00.0: [drm] Local memory available: > 0x00000001fc000000 > [ 1.393505] i915 0000:03:00.0: vgaarb: changed VGA decodes: > olddecodes=io+mem,decodes=io+mem:owns=none > [ 1.393643] i915 0000:03:00.0: firmware: direct-loading firmware > i915/dg2_dmc_ver2_07.bin > [ 1.396144] i915 0000:03:00.0: [drm] Finished loading DMC firmware > i915/dg2_dmc_ver2_07.bin (v2.7) > [ 1.404739] i915 0000:03:00.0: firmware: direct-loading firmware > i915/dg2_guc_70.bin > [ 1.484762] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist > Class(1):Compute(4)! > [ 1.484763] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist > Instance(2):Compute(4)! > [ 1.487222] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist > Class(1):Compute(4)! > [ 1.487223] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist > Instance(2):Compute(4)! > [ 1.488237] i915 0000:03:00.0: [drm] GuC firmware i915/dg2_guc_70.bin > version 70.5.1 > [ 1.488347] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist > Class(1):Compute(4)! > [ 1.488348] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist > Instance(2):Compute(4)! > [ 1.500565] i915 0000:03:00.0: [drm] GuC submission enabled > [ 1.500565] i915 0000:03:00.0: [drm] GuC SLPC enabled > [ 1.500891] i915 0000:03:00.0: [drm] GuC RC: enabled > [ 1.521026] [drm] Initialized i915 1.6.0 20201103 for 0000:03:00.0 on > minor 0 > [ 2.234182] fbcon: i915drmfb (fb0) is primary device > [ 2.326912] i915 0000:03:00.0: [drm] fb0: i915drmfb frame buffer device > [ 4.824372] snd_hda_intel 0000:04:00.0: bound 0000:03:00.0 (ops > i915_audio_component_bind_ops [i915]) > > Is anyone else seeing a similar problem? What can I do to avoid this? Do > we need anything else to narrow it down further? > > Thanks for your time! >
Hi, A little research shows that this is not that uncommon. A suggested workaround is to disable the power management for the device as follows. Create a file (such as): /etc/modprobe.d/snd-intel-disable-power-management.conf Add the following line: options snd_hda_intel power_save=0 Reboot. Hopefully this may assist. Regards Phil -- *** Playing the game for the games own sake. *** Associations: * Debian Maintainer (DM) * Fedora/EPEL Maintainer. * Contributor member of the AlmaLinux foundation. WWW: https://kathenas.org Buy Me a Coffee: https://www.buymeacoffee.com/kathenasorg Twitter: @kathenasorg Instagram: @kathenasorg IRC: kathenas GPG: 724AA9B52F024C8B
signature.asc
Description: This is a digitally signed message part