During my development, I enabled some Linux kernel checkers, specifically
the “sleep in atomic” checker.

I ran into unrelated issue that appears to be a result of commit
463713eb6164b6 ("VMCI: dma dg: add support for DMA datagrams receive”).
IIUC, vmci_read_data() calls wait_event(), which is not allowed while IRQs
are disabled, which they are during IRQ handling.

I think "CONFIG_DEBUG_ATOMIC_SLEEP=y" is the one that triggers the warning
below, which indicates a deadlock is possible.

The splat below (after decoding) was experienced on Linux 5.19. Let me know
if you need me to open a bug in bugzilla or whether this issue is already
known.


[   22.629691] BUG: sleeping function called from invalid context at 
drivers/misc/vmw_vmci/vmci_guest.c:145
[   22.633894] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 775, 
name: cloud-init
[   22.638232] preempt_count: 100, expected: 0
[   22.641887] RCU nest depth: 0, expected: 0
[   22.645461] 1 lock held by cloud-init/775:
[   22.649013] #0: ffff88810e057200 (&type->i_mutex_dir_key#6){++++}-{3:3}, at: 
iterate_dir (fs/readdir.c:46) 
[   22.653012] Preemption disabled at:
[   22.653017] __do_softirq (kernel/softirq.c:504 kernel/softirq.c:548) 
[   22.660264] CPU: 3 PID: 775 Comm: cloud-init Not tainted 5.19.0+ #3
[   22.664004] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference 
Platform, BIOS VMW201.00V.20253199.B64.2208081742 08/08/2022
[   22.671600] Call Trace:
[   22.675165]  <IRQ>
[   22.678681] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) 
[   22.682303] dump_stack (lib/dump_stack.c:114) 
[   22.685883] __might_resched.cold (kernel/sched/core.c:9822) 
[   22.689500] __might_sleep (kernel/sched/core.c:9751 (discriminator 14)) 
[   22.692961] vmci_read_data (./include/linux/kernel.h:110 
drivers/misc/vmw_vmci/vmci_guest.c:145) vmw_vmci
[   22.696461] ? vmci_interrupt_bm (drivers/misc/vmw_vmci/vmci_guest.c:121) 
vmw_vmci
[   22.699920] ? __this_cpu_preempt_check (lib/smp_processor_id.c:67) 
[   22.703305] ? wake_up_var (./include/linux/list.h:292 
./include/linux/wait.h:129 kernel/sched/wait_bit.c:125 
kernel/sched/wait_bit.c:193) 
[   22.706526] ? cpuusage_read (kernel/sched/wait_bit.c:192) 
[   22.709682] ? mark_held_locks (kernel/locking/lockdep.c:4234) 
[   22.712779] vmci_dispatch_dgs (drivers/misc/vmw_vmci/vmci_guest.c:332) 
vmw_vmci
[   22.715923] tasklet_action_common.constprop.0 (kernel/softirq.c:799) 
[   22.719008] ? vmci_read_data (drivers/misc/vmw_vmci/vmci_guest.c:308) 
vmw_vmci
[   22.722018] tasklet_action (kernel/softirq.c:819) 
[   22.724865] __do_softirq (kernel/softirq.c:571) 
[   22.727650] __irq_exit_rcu (kernel/softirq.c:445 kernel/softirq.c:650) 
[   22.730348] irq_exit_rcu (kernel/softirq.c:664) 
[   22.732947] common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 14)) 
[   22.735513]  </IRQ>
[   22.737879]  <TASK>
[   22.740141] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:640) 
[   22.742498] RIP: 0010:stack_trace_consume_entry (kernel/stacktrace.c:83) 
[ 22.744891] Code: be 80 01 00 00 48 c7 c7 40 82 cd 82 48 89 e5 e8 7d 38 53 00 
5d c3 cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 48 89 e5 <41> 55 49 89 
f5 41 54 53 48 89 fb 48 83 c7 10 e8 23 e0 36 00 48 8d
All code
========
   0:   be 80 01 00 00          mov    $0x180,%esi
   5:   48 c7 c7 40 82 cd 82    mov    $0xffffffff82cd8240,%rdi
   c:   48 89 e5                mov    %rsp,%rbp
   f:   e8 7d 38 53 00          call   0x533891
  14:   5d                      pop    %rbp
  15:   c3                      ret    
  16:   cc                      int3   
  17:   cc                      int3   
  18:   cc                      int3   
  19:   cc                      int3   
  1a:   cc                      int3   
  1b:   cc                      int3   
  1c:   cc                      int3   
  1d:   cc                      int3   
  1e:   cc                      int3   
  1f:   cc                      int3   
  20:   cc                      int3   
  21:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  26:   55                      push   %rbp
  27:   48 89 e5                mov    %rsp,%rbp
  2a:*  41 55                   push   %r13             <-- trapping instruction
  2c:   49 89 f5                mov    %rsi,%r13
  2f:   41 54                   push   %r12
  31:   53                      push   %rbx
  32:   48 89 fb                mov    %rdi,%rbx
  35:   48 83 c7 10             add    $0x10,%rdi
  39:   e8 23 e0 36 00          call   0x36e061
  3e:   48                      rex.W
  3f:   8d                      .byte 0x8d

Code starting with the faulting instruction
===========================================
   0:   41 55                   push   %r13
   2:   49 89 f5                mov    %rsi,%r13
   5:   41 54                   push   %r12
   7:   53                      push   %rbx
   8:   48 89 fb                mov    %rdi,%rbx
   b:   48 83 c7 10             add    $0x10,%rdi
   f:   e8 23 e0 36 00          call   0x36e037
  14:   48                      rex.W
  15:   8d                      .byte 0x8d
[   22.750370] RSP: 0018:ffff8881250674d0 EFLAGS: 00000286
[   22.752906] RAX: ffffffff81676155 RBX: ffffffff81269600 RCX: ffffffff810e2106
[   22.755572] RDX: dffffc0000000000 RSI: ffffffff81676155 RDI: ffff8881250675a8
[   22.758217] RBP: ffff8881250674d0 R08: ffffffff810e20d4 R09: ffff88812f1a4000
[   22.760877] R10: ffff8881250674e0 R11: 0000000000000001 R12: ffff8881250675a8
[   22.763513] R13: 0000000000000000 R14: ffff88812f1a4000 R15: ffff88810f33c180
_______________________________________________
Virtualization mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Reply via email to