During my development, I enabled some Linux kernel checkers, specifically
the “sleep in atomic” checker.
I ran into unrelated issue that appears to be a result of commit
463713eb6164b6 ("VMCI: dma dg: add support for DMA datagrams receive”).
IIUC, vmci_read_data() calls wait_event(), which is not allowed while IRQs
are disabled, which they are during IRQ handling.
I think "CONFIG_DEBUG_ATOMIC_SLEEP=y" is the one that triggers the warning
below, which indicates a deadlock is possible.
The splat below (after decoding) was experienced on Linux 5.19. Let me know
if you need me to open a bug in bugzilla or whether this issue is already
known.
[ 22.629691] BUG: sleeping function called from invalid context at
drivers/misc/vmw_vmci/vmci_guest.c:145
[ 22.633894] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 775,
name: cloud-init
[ 22.638232] preempt_count: 100, expected: 0
[ 22.641887] RCU nest depth: 0, expected: 0
[ 22.645461] 1 lock held by cloud-init/775:
[ 22.649013] #0: ffff88810e057200 (&type->i_mutex_dir_key#6){++++}-{3:3}, at:
iterate_dir (fs/readdir.c:46)
[ 22.653012] Preemption disabled at:
[ 22.653017] __do_softirq (kernel/softirq.c:504 kernel/softirq.c:548)
[ 22.660264] CPU: 3 PID: 775 Comm: cloud-init Not tainted 5.19.0+ #3
[ 22.664004] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference
Platform, BIOS VMW201.00V.20253199.B64.2208081742 08/08/2022
[ 22.671600] Call Trace:
[ 22.675165] <IRQ>
[ 22.678681] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4))
[ 22.682303] dump_stack (lib/dump_stack.c:114)
[ 22.685883] __might_resched.cold (kernel/sched/core.c:9822)
[ 22.689500] __might_sleep (kernel/sched/core.c:9751 (discriminator 14))
[ 22.692961] vmci_read_data (./include/linux/kernel.h:110
drivers/misc/vmw_vmci/vmci_guest.c:145) vmw_vmci
[ 22.696461] ? vmci_interrupt_bm (drivers/misc/vmw_vmci/vmci_guest.c:121)
vmw_vmci
[ 22.699920] ? __this_cpu_preempt_check (lib/smp_processor_id.c:67)
[ 22.703305] ? wake_up_var (./include/linux/list.h:292
./include/linux/wait.h:129 kernel/sched/wait_bit.c:125
kernel/sched/wait_bit.c:193)
[ 22.706526] ? cpuusage_read (kernel/sched/wait_bit.c:192)
[ 22.709682] ? mark_held_locks (kernel/locking/lockdep.c:4234)
[ 22.712779] vmci_dispatch_dgs (drivers/misc/vmw_vmci/vmci_guest.c:332)
vmw_vmci
[ 22.715923] tasklet_action_common.constprop.0 (kernel/softirq.c:799)
[ 22.719008] ? vmci_read_data (drivers/misc/vmw_vmci/vmci_guest.c:308)
vmw_vmci
[ 22.722018] tasklet_action (kernel/softirq.c:819)
[ 22.724865] __do_softirq (kernel/softirq.c:571)
[ 22.727650] __irq_exit_rcu (kernel/softirq.c:445 kernel/softirq.c:650)
[ 22.730348] irq_exit_rcu (kernel/softirq.c:664)
[ 22.732947] common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 14))
[ 22.735513] </IRQ>
[ 22.737879] <TASK>
[ 22.740141] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:640)
[ 22.742498] RIP: 0010:stack_trace_consume_entry (kernel/stacktrace.c:83)
[ 22.744891] Code: be 80 01 00 00 48 c7 c7 40 82 cd 82 48 89 e5 e8 7d 38 53 00
5d c3 cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 48 89 e5 <41> 55 49 89
f5 41 54 53 48 89 fb 48 83 c7 10 e8 23 e0 36 00 48 8d
All code
========
0: be 80 01 00 00 mov $0x180,%esi
5: 48 c7 c7 40 82 cd 82 mov $0xffffffff82cd8240,%rdi
c: 48 89 e5 mov %rsp,%rbp
f: e8 7d 38 53 00 call 0x533891
14: 5d pop %rbp
15: c3 ret
16: cc int3
17: cc int3
18: cc int3
19: cc int3
1a: cc int3
1b: cc int3
1c: cc int3
1d: cc int3
1e: cc int3
1f: cc int3
20: cc int3
21: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
26: 55 push %rbp
27: 48 89 e5 mov %rsp,%rbp
2a:* 41 55 push %r13 <-- trapping instruction
2c: 49 89 f5 mov %rsi,%r13
2f: 41 54 push %r12
31: 53 push %rbx
32: 48 89 fb mov %rdi,%rbx
35: 48 83 c7 10 add $0x10,%rdi
39: e8 23 e0 36 00 call 0x36e061
3e: 48 rex.W
3f: 8d .byte 0x8d
Code starting with the faulting instruction
===========================================
0: 41 55 push %r13
2: 49 89 f5 mov %rsi,%r13
5: 41 54 push %r12
7: 53 push %rbx
8: 48 89 fb mov %rdi,%rbx
b: 48 83 c7 10 add $0x10,%rdi
f: e8 23 e0 36 00 call 0x36e037
14: 48 rex.W
15: 8d .byte 0x8d
[ 22.750370] RSP: 0018:ffff8881250674d0 EFLAGS: 00000286
[ 22.752906] RAX: ffffffff81676155 RBX: ffffffff81269600 RCX: ffffffff810e2106
[ 22.755572] RDX: dffffc0000000000 RSI: ffffffff81676155 RDI: ffff8881250675a8
[ 22.758217] RBP: ffff8881250674d0 R08: ffffffff810e20d4 R09: ffff88812f1a4000
[ 22.760877] R10: ffff8881250674e0 R11: 0000000000000001 R12: ffff8881250675a8
[ 22.763513] R13: 0000000000000000 R14: ffff88812f1a4000 R15: ffff88810f33c180
_______________________________________________
Virtualization mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/virtualization