I have an issue running Ubuntu 24.04 guest on Proxmox VE 8.4 host. When using SPICE as the display driver the system boots and presents the login screen. I then type in my password and hit enter and the display completely freezes. The VM is still running ok, it's just completely frozen on the display.
I don't know how to prove or disprove it, but it appears to be to be the same issue or linked to that reported here. I have encountered this issue on numerous Ubuntu 24 VMs and always have to switch the display driver to something else. Disappointing as SPICE is very much the preferred option. Incidentally I have just spun up a VM running Kubuntu 25 (kernel 6.14) and it's working perfectly fine via spice/qxl so it seems reasonable to expect that this is a kernel driver bug. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2065153 Title: [qxl] Ubuntu 24.04 VM guest console freezes after some hours Status in linux package in Ubuntu: Confirmed Status in linux source package in Jammy: Confirmed Status in linux source package in Noble: Confirmed Bug description: Thank you @dreibh for reporting the original description and reporting the bug! [ Impact ] * The qxl driver currently has a bug that causes console freezes on qxl paravirtualized GPUs. This issue does not cause a full system hang since the system is still accessible via other means such as SSH, but it does cause the virtual console output to hang. The following dmesg output is seen when the issue occurs: [ 280.618452] [TTM] Buffer eviction failed [ 280.618463] qxl 0000:00:01.0: object_init failed for (3149824, 0x00000001) [ 280.618466] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO * The issue was caused by commit: (5a838e5d5825 "drm/qxl: simplify qxl_fence_wait") Which does not add any new code but tries to simplify the already existing function. This commit due to the problems it has caused, has been reverted upstream with: 07ed11afb68d Revert ("drm/qxl: simplify qxl_fence_wait"). The commit also adds back the DMA_FENCE_WARN macro due to it's usage in the reverted functions. The macro was originally removed with: d72277b6c37d ("dma-buf: nuke DMA_FENCE_TRACE macros v2"). [ Test Plan ] To Reproduce the bug follow the below steps: 1. Install a Ubuntu version with an affected kernel in a VM and make sure that the QXL video driver is in use instead of virtio. The server edition is enough for the reproducer no need for a DE to be installed. The issue is reproducible on Jammy 5.15 and above except Plucky since the fix is included in kernel 6.14. 2. Create a script and make it executable with the following content: ``` #!/bin/bash chvt 3 for j in $(seq 80); do echo "$(date) starting round $j" if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != "" ]; then echo "bug was reproduced after $j tries" exit 1 fi for i in $(seq 100); do dmesg > /dev/tty3 done done echo "bug could not be reproduced" exit 0 ``` 3. Execute the script from the virtual console and from an SSH session, monitor the dmesg logs until you see the following: [ 280.618452] [TTM] Buffer eviction failed [ 280.618463] qxl 0000:00:01.0: object_init failed for (3149824, 0x00000001) [ 280.618466] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO [ Where problems could occur ] * Virtual displays might still freeze or hang * Warning messages related to the qxl driver might occur. [ Other Info] * The patch does cause a warning message to show up on boot when using the qxl video driver. The warning itself is harmless and does not seem to have any negative effects in my testing: [ 5.011445] WARNING: CPU: 15 PID: 822 at kernel/workqueue.c:2985 check_flush_dependency.part.0+0xde/0x140 [ 5.011449] Modules linked in: qrtr cfg80211 binfmt_misc intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core intel_vsec pmt_telemetry pmt_class kvm_intel kvm snd_hda_codec_generic irqbypass snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi rapl snd_hda_codec snd_hda_core snd_hwdep snd_pcm joydev snd_timer snd qxl i2c_i801 soundcore drm_ttm_helper i2c_smbus lpc_ich ttm input_leds mac_hid serio_raw sch_fq_codel dm_multipath msr efi_pstore nfnetlink dmi_sysfs qemu_fw_cfg ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 hid_generic usbhid hid crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 ahci sha1_ssse3 libahci psmouse virtio_rng xhci_pci xhci_pci_renesas aesni_intel crypto_simd cryptd [ 5.011493] CPU: 15 PID: 822 Comm: kworker/u65:1 Not tainted 6.8.0-999-generic #70 [ 5.011495] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 5.011496] Workqueue: ttm ttm_bo_delayed_delete [ttm] [ 5.011501] RIP: 0010:check_flush_dependency.part.0+0xde/0x140 [ 5.011502] Code: 24 18 4d 89 f0 49 8d 8d b0 00 00 00 48 c7 c7 e0 8f e6 8a c6 05 f3 90 8c 02 01 48 8b 70 08 48 81 c6 b0 00 00 00 e8 a2 5e fd ff <0f> 0b eb 91 0f b6 1d d9 90 8c 02 80 fb 01 0f 87 38 57 0a 01 83 e3 [ 5.011503] RSP: 0018:ffffbd85c0ce7c28 EFLAGS: 00010046 [ 5.011505] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 5.011506] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 5.011506] RBP: ffffbd85c0ce7c48 R08: 0000000000000000 R09: 0000000000000000 [ 5.011507] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9f308158a540 [ 5.011508] R13: ffff9f30801cea00 R14: ffffffffc0946570 R15: 0000000000000000 [ 5.011509] FS: 0000000000000000(0000) GS:ffff9f31f7d80000(0000) knlGS:0000000000000000 [ 5.011510] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5.011510] CR2: 000000c000a02000 CR3: 0000000108cf8000 CR4: 0000000000750ef0 [ 5.011514] PKRU: 55555554 [ 5.011514] Call Trace: [ 5.011516] <TASK> [ 5.011518] ? show_regs+0x6d/0x80 [ 5.011521] ? __warn+0x89/0x160 [ 5.011523] ? check_flush_dependency.part.0+0xde/0x140 [ 5.011524] ? report_bug+0x17e/0x1b0 [ 5.011527] ? handle_bug+0x6e/0xb0 [ 5.011529] ? exc_invalid_op+0x18/0x80 [ 5.011532] ? asm_exc_invalid_op+0x1b/0x20 [ 5.011535] ? __pfx_qxl_gc_work+0x10/0x10 [qxl] [ 5.011539] ? check_flush_dependency.part.0+0xde/0x140 [ 5.011540] ? check_flush_dependency.part.0+0xde/0x140 [ 5.011541] start_flush_work+0xba/0x340 [ 5.011543] flush_work+0x5f/0xb0 [ 5.011545] qxl_queue_garbage_collect+0x8c/0x90 [qxl] [ 5.011548] qxl_fence_wait+0xa3/0x1b0 [qxl] [ 5.011552] dma_fence_wait_timeout+0x64/0x140 [ 5.011555] dma_resv_wait_timeout+0x7f/0xf0 [ 5.011556] ttm_bo_delayed_delete+0x2a/0xc0 [ttm] [ 5.011560] process_one_work+0x181/0x3a0 [ 5.011562] worker_thread+0x306/0x440 [ 5.011563] ? __pfx_worker_thread+0x10/0x10 [ 5.011565] kthread+0xef/0x120 [ 5.011569] ? __pfx_kthread+0x10/0x10 [ 5.011572] ret_from_fork+0x44/0x70 [ 5.011574] ? __pfx_kthread+0x10/0x10 [ 5.011578] ret_from_fork_asm+0x1b/0x30 [ 5.011581] </TASK> [ 5.011582] ---[ end trace 0000000000000000 ]--- * The Jammy version of the patch (5.15) does not need the re- introduction of the DMA_FENCE_WARN macro since it already exist. [Original Description] I made simple Ubuntu 24.04 LTS Server installations as guests in an up-to-date Proxmox. No Xorg/Wayland, just CLI! The virtual graphics card is qml, 16 MiB memory (standard settings). Opening the console in the Proxmox GUI, or via remote-viewer initially is fine. However, after some time (usually: hours), the console just locks up. However, SSH into the guest machine remains fine. Ubuntu 22.04, or 20.04 are fine, the issue only occurs with the new Ubuntu 24.04. The issue is reproducible with all Ubuntu 24.04 VMs. A reboot the the VM makes the console usable again, until the issue occurs again (usually after some hours). Unusual observation from dmesg: ... [522890.748557] [TTM] Buffer eviction failed [522890.748981] qxl 0000:00:01.0: object_init failed for (4096, 0x00000001) [522890.749336] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO [522906.108616] [TTM] Buffer eviction failed [522906.109045] qxl 0000:00:01.0: object_init failed for (4096, 0x00000001) [522906.109386] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO [522921.468729] [TTM] Buffer eviction failed [522921.469154] qxl 0000:00:01.0: object_init failed for (4096, 0x00000001) [522921.469512] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO [522936.828783] [TTM] Buffer eviction failed [522936.829207] qxl 0000:00:01.0: object_init failed for (4096, 0x00000001) [522936.829630] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO ... nornetpp@hansa:~$ uname -a Linux hansa.management.crnalab.net 6.8.0-31-generic #31-Ubuntu SMP PREEMPT_DYNAMIC Sat Apr 20 00:40:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux nornetpp@hansa:~$ lsmod | grep qxl qxl 86016 0 drm_ttm_helper 12288 1 qxl ttm 110592 2 qxl,drm_ttm_helper ProblemType: Bug DistroRelease: Ubuntu 24.04 Package: xorg (not installed) ProcVersionSignature: Ubuntu 6.8.0-31.31-generic 6.8.1 Uname: Linux 6.8.0-31-generic x86_64 ApportVersion: 2.28.1-0ubuntu2 Architecture: amd64 CasperMD5CheckResult: pass Date: Wed May 8 11:05:07 2024 InstallationDate: Installed on 2024-03-12 (57 days ago) InstallationMedia: Ubuntu-Server 24.04 LTS "Noble Numbat" - Daily amd64 (20240312) ProcEnviron: LANG=en_IE.UTF-8 LANGUAGE=nb:de:en_US PATH=(custom, no user) SHELL=/bin/bash TERM=xterm-256color SourcePackage: xorg Symptom: display Title: Xorg freeze UpgradeStatus: No upgrade log present (probably fresh install) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2065153/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : [email protected] Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp

