I have an issue running Ubuntu 24.04 guest on Proxmox VE 8.4 host. When
using SPICE as the display driver the system boots and presents the
login screen. I then type in my password and hit enter and the display
completely freezes. The VM is still running ok, it's just completely
frozen on the display.

I don't know how to prove or disprove it, but it appears to be to be the
same issue or linked to that reported here. I have encountered this
issue on numerous Ubuntu 24 VMs and always have to switch the display
driver to something else. Disappointing as SPICE is very much the
preferred option.

Incidentally I have just spun up a VM running Kubuntu 25 (kernel 6.14)
and it's working perfectly fine via spice/qxl so it seems reasonable to
expect that this is a kernel driver bug.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2065153

Title:
  [qxl] Ubuntu 24.04 VM guest console freezes after some hours

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Jammy:
  Confirmed
Status in linux source package in Noble:
  Confirmed

Bug description:
  Thank you @dreibh for reporting the original description and reporting
  the bug!

  [ Impact ]

  * The qxl driver currently has a bug that causes console freezes on qxl 
paravirtualized GPUs. This issue does not cause a full system hang since the 
system is still accessible via other means such as SSH, but it does cause the 
virtual console output to hang. The following dmesg output is seen when the 
issue occurs:
  [  280.618452] [TTM] Buffer eviction failed
  [  280.618463] qxl 0000:00:01.0: object_init failed for (3149824, 0x00000001)
  [  280.618466] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate 
VRAM BO

  * The issue was caused by commit: (5a838e5d5825 "drm/qxl: simplify 
qxl_fence_wait") Which does not add any new code but tries to simplify the 
already existing function.
  This commit due to the problems it has caused, has been reverted upstream 
with: 07ed11afb68d Revert ("drm/qxl: simplify qxl_fence_wait"). The commit also 
adds back the DMA_FENCE_WARN macro due to it's usage in the reverted functions. 
The macro was originally removed with: d72277b6c37d ("dma-buf: nuke 
DMA_FENCE_TRACE macros v2").

  [ Test Plan ]

  To Reproduce the bug follow the below steps:

  1. Install a Ubuntu version with an affected kernel in a VM and make
  sure that the QXL video driver is in use instead of virtio. The server
  edition is enough for the reproducer no need for a DE to be installed.
  The issue is reproducible on Jammy 5.15 and above except Plucky since
  the fix is included in kernel 6.14.

  2. Create a script and make it executable with the following content:

  ```
  #!/bin/bash

  chvt 3
  for j in $(seq 80); do
      echo "$(date) starting round $j"
      if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != "" ];
  then
          echo "bug was reproduced after $j tries"
          exit 1
      fi
      for i in $(seq 100); do
          dmesg > /dev/tty3
      done
  done

  echo "bug could not be reproduced"
  exit 0
  ```

  3. Execute the script from the virtual console and from an SSH session, 
monitor the dmesg logs until you see the following:
  [  280.618452] [TTM] Buffer eviction failed
  [  280.618463] qxl 0000:00:01.0: object_init failed for (3149824, 0x00000001)
  [  280.618466] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate 
VRAM BO

  [ Where problems could occur ]

  * Virtual displays might still freeze or hang
  * Warning messages related to the qxl driver might occur.

  [ Other Info]

  * The patch does cause a warning message to show up on boot when using
  the qxl video driver. The warning itself is harmless and does not seem
  to have any negative effects in my testing:

  [    5.011445] WARNING: CPU: 15 PID: 822 at kernel/workqueue.c:2985 
check_flush_dependency.part.0+0xde/0x140
  [    5.011449] Modules linked in: qrtr cfg80211 binfmt_misc intel_rapl_msr 
intel_rapl_common intel_uncore_frequency_common intel_pmc_core intel_vsec 
pmt_telemetry pmt_class kvm_intel kvm snd_hda_codec_generic irqbypass 
snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi rapl snd_hda_codec 
snd_hda_core snd_hwdep snd_pcm joydev snd_timer snd qxl i2c_i801 soundcore 
drm_ttm_helper i2c_smbus lpc_ich ttm input_leds mac_hid serio_raw sch_fq_codel 
dm_multipath msr efi_pstore nfnetlink dmi_sysfs qemu_fw_cfg ip_tables x_tables 
autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy 
async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 hid_generic 
usbhid hid crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic 
ghash_clmulni_intel sha256_ssse3 ahci sha1_ssse3 libahci psmouse virtio_rng 
xhci_pci xhci_pci_renesas aesni_intel crypto_simd cryptd
  [    5.011493] CPU: 15 PID: 822 Comm: kworker/u65:1 Not tainted 
6.8.0-999-generic #70
  [    5.011495] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.16.3-debian-1.16.3-2 04/01/2014
  [    5.011496] Workqueue: ttm ttm_bo_delayed_delete [ttm]
  [    5.011501] RIP: 0010:check_flush_dependency.part.0+0xde/0x140
  [    5.011502] Code: 24 18 4d 89 f0 49 8d 8d b0 00 00 00 48 c7 c7 e0 8f e6 8a 
c6 05 f3 90 8c 02 01 48 8b 70 08 48 81 c6 b0 00 00 00 e8 a2 5e fd ff <0f> 0b eb 
91 0f b6 1d d9 90 8c 02 80 fb 01 0f 87 38 57 0a 01 83 e3
  [    5.011503] RSP: 0018:ffffbd85c0ce7c28 EFLAGS: 00010046
  [    5.011505] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
0000000000000000
  [    5.011506] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
0000000000000000
  [    5.011506] RBP: ffffbd85c0ce7c48 R08: 0000000000000000 R09: 
0000000000000000
  [    5.011507] R10: 0000000000000000 R11: 0000000000000000 R12: 
ffff9f308158a540
  [    5.011508] R13: ffff9f30801cea00 R14: ffffffffc0946570 R15: 
0000000000000000
  [    5.011509] FS:  0000000000000000(0000) GS:ffff9f31f7d80000(0000) 
knlGS:0000000000000000
  [    5.011510] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [    5.011510] CR2: 000000c000a02000 CR3: 0000000108cf8000 CR4: 
0000000000750ef0
  [    5.011514] PKRU: 55555554
  [    5.011514] Call Trace:
  [    5.011516]  <TASK>
  [    5.011518]  ? show_regs+0x6d/0x80
  [    5.011521]  ? __warn+0x89/0x160
  [    5.011523]  ? check_flush_dependency.part.0+0xde/0x140
  [    5.011524]  ? report_bug+0x17e/0x1b0
  [    5.011527]  ? handle_bug+0x6e/0xb0
  [    5.011529]  ? exc_invalid_op+0x18/0x80
  [    5.011532]  ? asm_exc_invalid_op+0x1b/0x20
  [    5.011535]  ? __pfx_qxl_gc_work+0x10/0x10 [qxl]
  [    5.011539]  ? check_flush_dependency.part.0+0xde/0x140
  [    5.011540]  ? check_flush_dependency.part.0+0xde/0x140
  [    5.011541]  start_flush_work+0xba/0x340
  [    5.011543]  flush_work+0x5f/0xb0
  [    5.011545]  qxl_queue_garbage_collect+0x8c/0x90 [qxl]
  [    5.011548]  qxl_fence_wait+0xa3/0x1b0 [qxl]
  [    5.011552]  dma_fence_wait_timeout+0x64/0x140
  [    5.011555]  dma_resv_wait_timeout+0x7f/0xf0
  [    5.011556]  ttm_bo_delayed_delete+0x2a/0xc0 [ttm]
  [    5.011560]  process_one_work+0x181/0x3a0
  [    5.011562]  worker_thread+0x306/0x440
  [    5.011563]  ? __pfx_worker_thread+0x10/0x10
  [    5.011565]  kthread+0xef/0x120
  [    5.011569]  ? __pfx_kthread+0x10/0x10
  [    5.011572]  ret_from_fork+0x44/0x70
  [    5.011574]  ? __pfx_kthread+0x10/0x10
  [    5.011578]  ret_from_fork_asm+0x1b/0x30
  [    5.011581]  </TASK>
  [    5.011582] ---[ end trace 0000000000000000 ]---

  * The Jammy version of the patch (5.15) does not need the re-
  introduction of the DMA_FENCE_WARN macro since it already exist.

  
  [Original Description]
  I made simple Ubuntu 24.04 LTS Server installations as guests in an 
up-to-date Proxmox. No Xorg/Wayland, just CLI! The virtual graphics card is 
qml, 16 MiB memory (standard settings). Opening the console in the Proxmox GUI, 
or via remote-viewer initially is fine. However, after some time (usually: 
hours), the console just locks up. However, SSH into the guest machine remains 
fine.

  Ubuntu 22.04, or 20.04 are fine, the issue only occurs with the new
  Ubuntu 24.04. The issue is reproducible with all Ubuntu 24.04 VMs. A
  reboot the the VM makes the console usable again, until the issue
  occurs again (usually after some hours).

  Unusual observation from dmesg:
  ...
  [522890.748557] [TTM] Buffer eviction failed
  [522890.748981] qxl 0000:00:01.0: object_init failed for (4096, 0x00000001)
  [522890.749336] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate 
VRAM BO
  [522906.108616] [TTM] Buffer eviction failed
  [522906.109045] qxl 0000:00:01.0: object_init failed for (4096, 0x00000001)
  [522906.109386] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate 
VRAM BO
  [522921.468729] [TTM] Buffer eviction failed
  [522921.469154] qxl 0000:00:01.0: object_init failed for (4096, 0x00000001)
  [522921.469512] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate 
VRAM BO
  [522936.828783] [TTM] Buffer eviction failed
  [522936.829207] qxl 0000:00:01.0: object_init failed for (4096, 0x00000001)
  [522936.829630] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate 
VRAM BO
  ...

  nornetpp@hansa:~$ uname -a
  Linux hansa.management.crnalab.net 6.8.0-31-generic #31-Ubuntu SMP 
PREEMPT_DYNAMIC Sat Apr 20 00:40:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

  nornetpp@hansa:~$ lsmod | grep qxl
  qxl                    86016  0
  drm_ttm_helper         12288  1 qxl
  ttm                   110592  2 qxl,drm_ttm_helper

  ProblemType: Bug
  DistroRelease: Ubuntu 24.04
  Package: xorg (not installed)
  ProcVersionSignature: Ubuntu 6.8.0-31.31-generic 6.8.1
  Uname: Linux 6.8.0-31-generic x86_64
  ApportVersion: 2.28.1-0ubuntu2
  Architecture: amd64
  CasperMD5CheckResult: pass
  Date: Wed May  8 11:05:07 2024
  InstallationDate: Installed on 2024-03-12 (57 days ago)
  InstallationMedia: Ubuntu-Server 24.04 LTS "Noble Numbat" - Daily amd64 
(20240312)
  ProcEnviron:
   LANG=en_IE.UTF-8
   LANGUAGE=nb:de:en_US
   PATH=(custom, no user)
   SHELL=/bin/bash
   TERM=xterm-256color
  SourcePackage: xorg
  Symptom: display
  Title: Xorg freeze
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2065153/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to