Control: tags -1 + moreinfo Hi,
On Thu, Jul 11, 2024 at 09:50:18AM -0500, S. wrote: > Had the same thing happen with the 6.1.0-21 kernel, so I tried the 6.7.12 > kernel from Backports. Then while booted into that kernel it happened again, > but this time I SSHed into the system from another computer and found a > coredump from i915: > > ---------------------------------------- > > Jul 10 15:17:01 IntelNUC9 CRON[44323]: pam_unix(cron:session): session opened > for user root(uid=0) by (uid=0) > Jul 10 15:17:01 IntelNUC9 CRON[44325]: (root) CMD (cd / && run-parts --report > /etc/cron.hourly) > Jul 10 15:17:01 IntelNUC9 CRON[44323]: pam_unix(cron:session): session closed > for user root > Jul 10 15:18:18 IntelNUC9 kernel: i915 0000:00:02.0: [drm] *ERROR* media: > timed out waiting for forcewake ack request. > Jul 10 15:18:18 IntelNUC9 kernel: i915 0000:00:02.0: [drm:add_taint_for_CI > [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x1f5/0x250 [i915] > Jul 10 15:18:18 IntelNUC9 kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for > preemption time out > Jul 10 15:18:18 IntelNUC9 kernel: i915 0000:00:02.0: [drm] *ERROR* GT0: rcs0 > reset request timed out: {request: 00000001, RESET_CTL: 00000001} > Jul 10 15:18:18 IntelNUC9 kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode > 9:1:eedfffff, in zoom.real [39829] > Jul 10 15:18:28 IntelNUC9 kernel: Asynchronous wait on fence > 0000:00:02.0:cinnamon[1948]:137a40 timed out (hint:intel_atomic_commit_ready > [i915]) > Jul 10 15:18:31 IntelNUC9 wpa_supplicant[1392]: wlan0: > CTRL-EVENT-SIGNAL-CHANGE above=0 signal=-69 noise=9999 txrate=1000 > Jul 10 15:18:34 IntelNUC9 kernel: BUG: kernel NULL pointer dereference, > address: 0000000000000270 > Jul 10 15:18:34 IntelNUC9 kernel: #PF: supervisor read access in kernel mode > Jul 10 15:18:34 IntelNUC9 kernel: #PF: error_code(0x0000) - not-present page > Jul 10 15:18:34 IntelNUC9 kernel: PGD 800000018e8a8067 P4D 800000018e8a8067 > PUD 6bf133067 PMD 6bf041067 PTE 0 > Jul 10 15:18:34 IntelNUC9 kernel: Oops: 0000 [#1] PREEMPT SMP PTI > Jul 10 15:18:34 IntelNUC9 kernel: CPU: 5 PID: 264 Comm: kworker/5:1H Tainted: > G W 6.7.12+bpo-amd64 #1 Debian 6.7.12-1~bpo12+1 > Jul 10 15:18:34 IntelNUC9 kernel: Hardware name: Intel(R) Client Systems > LAPQC71A/LAPQC71A, BIOS QCCFL357.0144.2022.0124.1433 01/24/2022 > Jul 10 15:18:34 IntelNUC9 kernel: Workqueue: events_highpri heartbeat [i915] > Jul 10 15:18:34 IntelNUC9 kernel: RIP: 0010:__i915_gpu_coredump+0x227/0x760 > [i915] > Jul 10 15:18:34 IntelNUC9 kernel: Code: 44 24 08 85 c0 79 37 49 8b 74 24 08 > 48 8b 44 24 20 49 8d 54 24 18 48 8b 36 48 8b 48 20 4c 8b 40 28 48 8b 7e 08 48 > 8b 74 24 18 <44> 0f b7 8e 70 02 00 00 48 c7 c6 50 88 38 c1 e8 95 29 85 d1 48 > 8b > Jul 10 15:18:34 IntelNUC9 kernel: RSP: 0000:ffffb08580793c88 EFLAGS: 00010286 > Jul 10 15:18:34 IntelNUC9 kernel: RAX: ffff9980db2e2680 RBX: ffff9983fa32ac00 > RCX: 0000000000001155 > Jul 10 15:18:34 IntelNUC9 kernel: RDX: ffff997e80f74018 RSI: 0000000000000000 > RDI: ffff997e81fb70c0 > Jul 10 15:18:34 IntelNUC9 kernel: RBP: 0000000000000000 R08: 000000000000398c > R09: 00000000ffffffff > Jul 10 15:18:34 IntelNUC9 kernel: R10: 0000000000000000 R11: 000000000000e164 > R12: ffff997e80f74000 > Jul 10 15:18:34 IntelNUC9 kernel: R13: ffff997f10e63000 R14: ffff9980030df000 > R15: ffff997f10e63c00 > Jul 10 15:18:34 IntelNUC9 kernel: FS: 0000000000000000(0000) > GS:ffff99861db40000(0000) knlGS:0000000000000000 > Jul 10 15:18:34 IntelNUC9 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > Jul 10 15:18:34 IntelNUC9 kernel: CR2: 0000000000000270 CR3: 0000000415a66004 > CR4: 00000000003706f0 > Jul 10 15:18:34 IntelNUC9 kernel: Call Trace: > Jul 10 15:18:34 IntelNUC9 kernel: <TASK> > Jul 10 15:18:34 IntelNUC9 kernel: ? __die+0x23/0x70 > Jul 10 15:18:34 IntelNUC9 kernel: ? page_fault_oops+0x171/0x4e0 > Jul 10 15:18:34 IntelNUC9 kernel: ? intel_gt_mcr_lock+0x42/0x140 [i915] > Jul 10 15:18:34 IntelNUC9 kernel: ? exc_page_fault+0x77/0x170 > Jul 10 15:18:34 IntelNUC9 kernel: ? asm_exc_page_fault+0x26/0x30 > Jul 10 15:18:34 IntelNUC9 kernel: ? __i915_gpu_coredump+0x227/0x760 [i915] > Jul 10 15:18:34 IntelNUC9 kernel: ? __i915_gpu_coredump+0x1fc/0x760 [i915] > Jul 10 15:18:34 IntelNUC9 kernel: i915_capture_error_state+0x61/0xd0 [i915] > Jul 10 15:18:34 IntelNUC9 kernel: intel_gt_handle_error+0x3c7/0x3e0 [i915] > Jul 10 15:18:34 IntelNUC9 kernel: ? execlists_submission_tasklet+0xfd/0x1740 > [i915] > Jul 10 15:18:34 IntelNUC9 kernel: heartbeat+0x3c2/0x3d0 [i915] > Jul 10 15:18:34 IntelNUC9 kernel: process_one_work+0x17c/0x350 > Jul 10 15:18:34 IntelNUC9 kernel: worker_thread+0x27b/0x3a0 > Jul 10 15:18:34 IntelNUC9 kernel: ? __pfx_worker_thread+0x10/0x10 > Jul 10 15:18:34 IntelNUC9 kernel: kthread+0xe5/0x120 > Jul 10 15:18:34 IntelNUC9 kernel: ? __pfx_kthread+0x10/0x10 > Jul 10 15:18:34 IntelNUC9 kernel: ret_from_fork+0x31/0x50 > Jul 10 15:18:34 IntelNUC9 kernel: ? __pfx_kthread+0x10/0x10 > Jul 10 15:18:34 IntelNUC9 kernel: ret_from_fork_asm+0x1b/0x30 > Jul 10 15:18:34 IntelNUC9 kernel: </TASK> > Jul 10 15:18:34 IntelNUC9 kernel: Modules linked in: cpuid ufs qnx4 hfsplus > hfs cdrom minix msdos jfs nls_ucs2_utils xfs ext4 mbcache jbd2 iptable_mangle > xt_CHECKSUM xt_multiport iptable_nat tun xt_nat xt_tcpudp veth uinput > xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype > nft_compat nf_tables nfnetlink br_netfilter bridge stp llc ctr ccm rfcomm > snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq > snd_seq_device cmac algif_hash algif_skcipher af_alg bnep overlay btusb btrtl > btintel btbcm zstd btmtk zram bluetooth uvcvideo videobuf2_vmalloc uvc > videobuf2_memops videobuf2_v4l2 sha3_generic videodev jitterentropy_rng drbg > videobuf2_common sg ansi_cprng mc ecdh_generic ecc crc16 binfmt_misc > nls_ascii nls_cp437 vfat fat snd_sof_pci_intel_cnl snd_sof_intel_hda_common > soundwire_intel soundwire_generic_allocation snd_sof_intel_hda_mlink > soundwire_cadence snd_sof_intel_hda snd_sof_pci iwlmvm snd_sof_xtensa_dsp > snd_sof snd_sof_utils > Jul 10 15:18:34 IntelNUC9 kernel: soundwire_bus snd_soc_skl mac80211 > snd_soc_hdac_hda snd_hda_ext_core snd_soc_sst_ipc snd_soc_sst_dsp > intel_rapl_msr intel_rapl_common snd_soc_acpi_intel_match snd_soc_acpi > x86_pkg_temp_thermal intel_powerclamp coretemp snd_soc_core kvm_intel > snd_hda_codec_realtek libarc4 snd_hda_codec_generic snd_hda_codec_hdmi > snd_compress snd_pcm_dmaengine kvm snd_hda_intel snd_intel_dspcfg > snd_intel_sdw_acpi iwlwifi irqbypass snd_hda_codec rapl snd_hda_core > intel_cstate snd_hwdep asus_wmi snd_pcm_oss joydev snd_mixer_oss > ledtrig_audio mei_hdcp mei_pxp intel_uncore sparse_keymap iTCO_wdt cfg80211 > snd_pcm platform_profile intel_wmi_thunderbolt wmi_bmof intel_pmc_bxt > iTCO_vendor_support snd_timer mei_me ee1004 snd watchdog rfkill mei soundcore > intel_pch_thermal intel_pmc_core ac acpi_pad acpi_tad hid_multitouch > serio_raw evdev msr parport_pc ppdev lp dm_mod parport loop configfs > efi_pstore efivarfs ip_tables x_tables autofs4 sd_mod btrfs blake2b_generic > xor raid6_pq libcrc32c crc32c_generic uas usb_storage usbhid i915 > Jul 10 15:18:34 IntelNUC9 kernel: nouveau nvme drm_gpuvm drm_exec nvme_core > gpu_sched drm_buddy crc32_pclmul i2c_algo_bit t10_pi crc32c_intel > drm_display_helper ahci cec crc64_rocksoft_generic libahci rc_core > crc64_rocksoft xhci_pci hid_generic ghash_clmulni_intel drm_ttm_helper > crc_t10dif libata ttm sha512_ssse3 i2c_hid_acpi crct10dif_generic r8169 > i2c_hid sha512_generic crct10dif_pclmul xhci_hcd drm_kms_helper scsi_mod hid > crc64 realtek intel_lpss_pci i2c_i801 sha256_ssse3 mdio_devres thunderbolt > usbcore drm mxm_wmi psmouse sha1_ssse3 libphy i2c_smbus intel_lpss > crct10dif_common scsi_common idma64 usb_common battery video wmi button > aesni_intel crypto_simd cryptd > Jul 10 15:18:34 IntelNUC9 kernel: CR2: 0000000000000270 > Jul 10 15:18:34 IntelNUC9 kernel: ---[ end trace 0000000000000000 ]--- > Jul 10 15:18:34 IntelNUC9 kernel: RIP: 0010:__i915_gpu_coredump+0x227/0x760 > [i915] > Jul 10 15:18:34 IntelNUC9 kernel: Code: 44 24 08 85 c0 79 37 49 8b 74 24 08 > 48 8b 44 24 20 49 8d 54 24 18 48 8b 36 48 8b 48 20 4c 8b 40 28 48 8b 7e 08 48 > 8b 74 24 18 <44> 0f b7 8e 70 02 00 00 48 c7 c6 50 88 38 c1 e8 95 29 85 d1 48 > 8b > Jul 10 15:18:34 IntelNUC9 kernel: RSP: 0000:ffffb08580793c88 EFLAGS: 00010286 > Jul 10 15:18:34 IntelNUC9 kernel: RAX: ffff9980db2e2680 RBX: ffff9983fa32ac00 > RCX: 0000000000001155 > Jul 10 15:18:34 IntelNUC9 kernel: RDX: ffff997e80f74018 RSI: 0000000000000000 > RDI: ffff997e81fb70c0 > Jul 10 15:18:34 IntelNUC9 kernel: RBP: 0000000000000000 R08: 000000000000398c > R09: 00000000ffffffff > Jul 10 15:18:34 IntelNUC9 kernel: R10: 0000000000000000 R11: 000000000000e164 > R12: ffff997e80f74000 > Jul 10 15:18:34 IntelNUC9 kernel: R13: ffff997f10e63000 R14: ffff9980030df000 > R15: ffff997f10e63c00 > Jul 10 15:18:34 IntelNUC9 kernel: FS: 0000000000000000(0000) > GS:ffff99861db40000(0000) knlGS:0000000000000000 > Jul 10 15:18:34 IntelNUC9 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > Jul 10 15:18:34 IntelNUC9 kernel: CR2: 0000000000000270 CR3: 0000000415a66004 > CR4: 00000000003706f0 > Jul 10 15:18:34 IntelNUC9 kernel: note: kworker/5:1H[264] exited with irqs > disabled > Jul 10 15:18:37 IntelNUC9 kernel: Fence expiration time out > i915-0000:00:02.0:zoom.real[39829]:398c! > Jul 10 15:18:37 IntelNUC9 kernel: Fence expiration time out > i915-0000:00:02.0:Xorg[1493]:833fa8! > Jul 10 15:18:37 IntelNUC9 kernel: Fence expiration time out > i915-0000:00:02.0:wined3d_cs[42767]:10e80! > Jul 10 15:18:37 IntelNUC9 kernel: Fence expiration time out > i915-0000:00:02.0:Xorg[1493]:833faa! > Jul 10 15:18:37 IntelNUC9 kernel: Fence expiration time out > i915-0000:00:02.0:cinnamon[1948]:137a40! > Jul 10 15:18:37 IntelNUC9 kernel: Fence expiration time out > i915-0000:00:02.0:Xorg[1493]:833fac! > Jul 10 15:18:37 IntelNUC9 kernel: Fence expiration time out > i915-0000:00:02.0:zoom.real[39829]:398e! > Jul 10 15:18:37 IntelNUC9 kernel: Fence expiration time out > i915-0000:00:02.0:wined3d_cs[42767]:10e82! > Jul 10 15:18:47 IntelNUC9 wpa_supplicant[1392]: wlan0: > CTRL-EVENT-SIGNAL-CHANGE above=0 signal=-72 noise=9999 txrate=1000 Thanks for reporting back. Can you test both with the most recent 6.1.y kernel in bookworm (6.1.99-1) and ideally est against any recent upstream version available, either 6.9.9-1 in unstable or the 6.10-1~exp1 in experimental? Is the problem still reproducible? If so, can you forward your report to upstream? David Airlie <airl...@gmail.com> (maintainer:DRM DRIVERS) Daniel Vetter <dan...@ffwll.ch> (maintainer:DRM DRIVERS) intel-...@lists.freedesktop.org (open list:INTEL DRM I915 DRIVER (Meteor Lake, DG2 and old...) dri-de...@lists.freedesktop.org (open list:DRM DRIVERS) linux-ker...@vger.kernel.org (open list) Would be a good set to reach out. Regards, Salvatore