On 11.09.25 17:11, Marek Marczykowski-Górecki wrote:
Hi,

The steps:
1. Have domU netfront ("untrusted" here) and domU netback
("sys-firewall-alt" here).
2. Pause frontend
3. Shutdown backend
4. Unpause frontend
5. Detach network (in my case attaching another one follows just after,
but I believe it's not relevant).

This gives the following on the frontend side:

     ------------[ cut here ]------------
     WARNING: CPU: 1 PID: 141 at include/linux/mm.h:1328 
xennet_disconnect_backend+0x1be/0x590 [xen_netfront]
     Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device 
snd_timer snd soundcore nft_reject_ipv6 nf_reject_ipv6 nft_reject_ipv4 
nf_reject_ipv4 nft_reject nft_ct nft_masq nft_chain_nat nf_nat nf_conntrack 
nf_defrag_ipv6 nf_defrag_ipv4 nf_tables intel_rapl_msr intel_rapl_common 
intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery 
pmt_class intel_pmc_ssram_telemetry intel_vsec 
polyval_clmulnighash_clmulni_intel xen_netfront pcspkr xen_scsiback 
target_core_mod xen_netback xen_privcmd xen_gntdev xen_gntalloc xen_blkback 
xen_evtchn i2c_dev loop fuse nfnetlink overlay xen_blkfront
     CPU: 1 UID: 0 PID: 141 Comm: xenwatch Not tainted 
6.17.0-0.rc5.1.qubes.1.fc41.x86_64 #1 PREEMPT(full)
     RIP: 0010:xennet_disconnect_backend+0x1be/0x590 [xen_netfront]
     Code: 00 0f 83 93 03 00 00 48 8b 94 dd 90 10 00 00 48 8b 4a 08 f6 c1 01 75 79 66 
90 0f b6 4a 33 81 f9 f5 00 00 00 0f 85 f3 fe ff ff <0f> 0b 49 81 ff 00 01 00 00 
0f 82 01 ff ff ff 4c 89 fe 48 c7 c7 e0
     RSP: 0018:ffffc90001123cf8 EFLAGS: 00010246
     RAX: 0000000000000010 RBX: 0000000000000001 RCX: 00000000000000f5
     RDX: ffffea0000a05200 RSI: 0000000000000001 RDI: ffffffff82528d60
     RBP: ffff888041400000 R08: ffff888005054c80 R09: ffff888005054c80
     R10: 0000000000150013 R11: ffff88801851cd80 R12: 0000000000000000
     R13: ffff888053619000 R14: ffff888005d61a80 R15: 0000000000000001
     FS:  0000000000000000(0000) GS:ffff8880952c6000(0000) 
knlGS:0000000000000000
     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     CR2: 00006182a11f3328 CR3: 000000001084c006 CR4: 0000000000770ef0
     PKRU: 55555554
     Call Trace:
      <TASK>
      xennet_remove+0x1e/0x80 [xen_netfront]
      xenbus_dev_remove+0x6e/0xf0
      device_release_driver_internal+0x19c/0x200
      bus_remove_device+0xc6/0x130
      device_del+0x160/0x3e0
      ? _raw_spin_unlock+0xe/0x30
      ? klist_iter_exit+0x18/0x30
      ? __pfx_xenwatch_thread+0x10/0x10
      device_unregister+0x17/0x60
      xenbus_dev_changed+0x1d7/0x240
      xenwatch_thread+0x8f/0x1c0
      ? __pfx_autoremove_wake_function+0x10/0x10
      kthread+0xf9/0x240
      ? __pfx_kthread+0x10/0x10
      ret_from_fork+0x152/0x180
      ? __pfx_kthread+0x10/0x10
      ret_from_fork_asm+0x1a/0x30
      </TASK>
     ---[ end trace 0000000000000000 ]---
     xen_netfront: backend supports XDP headroom
     vif vif-0: bouncing transmitted data to zeroed pages

The last two are likely related to following attach, not detach.

The same happens on 6.15 too, so it isn't new thing.

Shutting down backend without detaching first is not really a normal
operation, and doing that while frontend is paused is even less so. But
is the above expected outcome? If I read it right, it's
WARN_ON_ONCE(folio_test_slab(folio)) in get_page(), which I find
confusing.

Originally reported at 
https://github.com/QubesOS/qubes-core-agent-linux/pull/603#issuecomment-3280953080


Hmm, with this scenario I imagine you could manage to have
xennet_disconnect_backend() running multiple times for the same device
concurrently.

How reliable can this be reproduced? How many vcpus does the guest have?

Maybe the fix is as simple as adding a lock in xennet_disconnect_backend().


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to