>Synopsis: kernel deadlock
>Environment:
System : OpenBSD 6.1
Details : OpenBSD 6.1-current (GENERIC) #14: Mon Apr 24 09:32:13
MDT 2017
[email protected]:/usr/src/sys/arch/amd64/compile/GENERIC
Architecture: OpenBSD.amd64
Machine : amd64
>Description:
Setup:
fast tcp sender(s)>relayd>slow backend
If senders send more than backend can handle it will slowly accumulate in the
openbsd
box running relayd. Eventually the box runs out of memory and swap and
deadlocks.
Unfortunately its still able to send carp hellos and remains carp master. I'm
able to break
into ddb. One of the relayd processes sits either in flt_noram1 or flt_noram5.
It will never
resolve by itself even if I stop the senders. In the production setup I have 1G
of memory.
It happens with and without socket splicing.
ddb> tr
Debugger() at Debugger+0x9
internal_command() at internal_command+0x155
wskbd_translate() at wskbd_translate+0x90
wskbd_input() at wskbd_input+0x59
pckbd_input() at pckbd_input+0xb4
pckbcintr_internal() at pckbcintr_internal+0x8e
intr_handler() at intr_handler+0x28
Xintr_ioapic_edge1() at Xintr_ioapic_edge1+0xc9
--- interrupt ---
pmap_test_attrs() at pmap_test_attrs+0xd
uvmpd_scan_inactive() at uvmpd_scan_inactive+0xfd
uvmpd_scan() at uvmpd_scan+0x228
uvm_pageout() at uvm_pageout+0x6b
end trace frame: 0x0, count: -12
ddb> ps
PID TID PPID UID S FLAGS WAIT COMMAND
18719 1258 0 0 3 0x14200 bored sosplice
7448 115595 1 0 2 0x100083 getty
92498 379625 1 0 3 0x100083 ttyin getty
4520 143232 1 0 3 0x100083 ttyin getty
39305 310613 1 0 3 0x100083 ttyin getty
21297 227528 1 0 3 0x100083 ttyin getty
37915 337975 1 0 3 0x100083 ttyin getty
65468 508956 1 0 3 0x100098 poll cron
47792 194899 52742 0 3 0x100080 netio tcpdump
52742 468181 3282 76 2 0x100012 tcpdump
3282 255291 1 0 3 0x80 piperd perl
50646 227272 1 110 3 0x100090 poll sndiod
44710 113903 1 99 3 0x100090 poll sndiod
86206 425755 1 0 3 0x100080 kqread httpd
59150 213389 1 67 3 0x100092 kqread httpd
91591 456586 1 67 3 0x100092 kqread httpd
73088 84908 1 67 3 0x100092 kqread httpd
40756 141226 1 67 3 0x100092 kqread httpd
2509 326497 28065 95 3 0x100092 kqread smtpd
94398 422696 28065 103 3 0x100092 kqread smtpd
72362 413118 28065 95 3 0x100092 kqread smtpd
21002 233627 28065 95 3 0x100092 kqread smtpd
96593 426294 28065 95 3 0x100092 kqread smtpd
17571 429923 28065 95 3 0x100092 kqread smtpd
28065 265739 1 0 3 0x100080 kqread smtpd
54616 144829 1 0 3 0x80 kqread relayd
3030 324195 1 89 3 0x100092 kqread relayd
71847 347768 1 89 3 0x100092 kqread relayd
79236 455595 1 89 3 0x100092 kqread relayd
26126 339624 1 89 3 0x100092 kqread relayd
79991 357923 1 89 3 0x100092 kqread relayd
26908 50243 1 89 3 0x100092 kqread relayd
39372 476224 1 89 3 0x100092 kqread relayd
6798 425753 1 89 3 0x100092 kqread relayd
24266 6018 1 89 3 0x100092 kqread relayd
90612 10267 1 89 3 0x100092 kqread relayd
74770 24617 1 89 3 0x100092 kqread relayd
16191 7266 1 89 3 0x100092 kqread relayd
48641 91345 1 89 3 0x100092 kqread relayd
11724 492172 1 89 3 0x100092 kqread relayd
76212 506506 1 89 3 0x100092 kqread relayd
97858 156934 1 89 3 0x100092 kqread relayd
30127 166905 1 89 3 0x100092 kqread relayd
42359 374245 1 89 3 0x100092 kqread relayd
46692 453103 1 89 3 0x100092 kqread relayd
85725 63743 1 89 3 0x100092 kqread relayd
26947 119020 1 89 3 0x100012 flt_noram5 relayd
31527 446625 1 89 2 0x100012 relayd
9481 307364 1 89 2 0x100012 relayd
28627 457378 1 89 2 0x100012 relayd
20393 160917 1 89 2 0x100012 relayd
37217 203020 1 89 2 0x100012 relayd
81784 344990 1 89 2 0x100012 relayd
55474 165735 1 89 2 0x100012 relayd
75308 298087 1 89 2 0x100012 relayd
63207 267659 1 89 2 0x100012 relayd
62744 101175 1 89 2 0x100012 relayd
44185 31553 1 89 2 0x100012 relayd
25683 193151 1 89 2 0x100012 relayd
68522 355887 1 89 2 0x100012 relayd
65481 59946 1 89 2 0x100012 relayd
58086 15299 1 89 2 0x100012 relayd
21528 229769 1 89 2 0x100012 relayd
46044 485924 1 89 2 0x100012 relayd
16225 486879 1 89 2 0x100012 relayd
46087 30050 1 89 2 0x100012 relayd
92929 218662 1 89 3 0x100092 kqread relayd
63821 320603 1 89 3 0x100092 kqread relayd
51579 318273 84588 75 2 0x100492 bgpd
90700 335378 84588 75 3 0x100092 poll bgpd
84588 28750 1 0 2 0x80 bgpd
87112 506584 1 0 3 0x80 kqread snmpd
91230 93464 1 91 3 0x92 kqread snmpd
89920 159381 1 91 2 0x92 snmpd
7220 98830 1 0 3 0x80 select sshd
27241 420362 1 0 3 0x100080 poll ntpd
44141 514243 8284 83 3 0x100092 poll ntpd
8284 145894 1 83 2 0x100492 ntpd
97598 219876 67025 74 2 0x100490 pflogd
67025 518783 1 0 3 0x80 netio pflogd
62641 259012 51899 73 3 0x100090 kqread syslogd
51899 308670 1 0 3 0x100082 netio syslogd
74861 361044 0 0 2 0x14200 zerothread
40078 4840 0 0 3 0x14200 aiodoned aiodoned
11396 394316 0 0 2 0x14200 update
95110 498998 0 0 3 0x14200 cleaner cleaner
78284 374918 0 0 3 0x14200 reaper reaper
*92804 338862 0 0 7 0x14200 pagedaemon
9723 206392 0 0 3 0x14200 bored crynlk
19328 289759 0 0 3 0x14200 bored crypto
10531 115280 0 0 2 0x14600 pfpurge
49817 508545 0 0 3 0x14200 bored viomb
81860 106767 0 0 3 0x14200 usbtsk usbtask
24646 211498 0 0 3 0x14200 usbatsk usbatsk
93940 99018 0 0 3 0x40014200 acpi0 acpi0
67453 88354 0 0 2 0x14200 softnet
52383 373249 0 0 2 0x14200 systqmp
51985 389922 0 0 2 0x14200 systq
29482 267706 0 0 3 0x40014200 bored softclock
41470 14506 0 0 3 0x40014200 idle0
1 86322 0 0 3 0x82 wait init
0 0 -1 0 3 0x10200 scheduler swapper
ddb> tr /p 0t119020
sleep_finish() at sleep_finish+0x86
msleep() at msleep+0x10f
uvm_fault() at uvm_fault+0xd9e
trap() at trap+0x283
--- trap (number 6) ---
end of kernel
end trace frame: 0xf6fd83a71f0, count: -4
0xf6fd866b5e4:
I have also seen:
ddb> tr
Debugger() at Debugger+0x9
internal_command() at internal_command+0x155
wskbd_translate() at wskbd_translate+0x90
wskbd_input() at wskbd_input+0x59
pckbd_input() at pckbd_input+0xb4
pckbcintr_internal() at pckbcintr_internal+0x8e
intr_handler() at intr_handler+0x28
Xintr_ioapic_edge1() at Xintr_ioapic_edge1+0xc9
--- interrupt ---
Xspllower() at Xspllower+0xc
__mtx_leave() at __mtx_leave+0x34
extent_do_alloc() at extent_do_alloc+0x60e
extent_alloc_subregion() at extent_alloc_subregion+0xa5
uvm_swap_alloc() at uvm_swap_alloc+0xb3
uvmpd_scan_inactive() at uvmpd_scan_inactive+0x54b
uvmpd_scan() at uvmpd_scan+0x29e
uvm_pageout() at uvm_pageout+0x6b
end trace frame: 0x0, count: -16
ddb> ps
PID TID PPID UID S FLAGS WAIT COMMAND
98510 454082 0 0 3 0x14200 bored sosplice
61020 167295 6510 0 3 0x100080 netio tcpdump
23052 453771 1 0 2 0x100083 getty
99310 304541 1 0 3 0x100083 ttyin getty
59992 46375 1 0 3 0x100083 ttyin getty
94180 48977 1 0 3 0x100083 ttyin getty
31867 368727 1 0 3 0x100083 ttyin getty
65827 183581 1 0 3 0x100083 ttyin getty
28126 292282 1 0 2 0x100010 cron
6510 87605 59875 76 2 0x100012 tcpdump
59875 37080 1 0 3 0x80 piperd perl
48125 216072 1 110 3 0x100090 poll sndiod
50613 296553 1 99 3 0x100090 poll sndiod
18652 448828 1 0 3 0x100080 kqread httpd
18945 502467 1 67 3 0x100092 kqread httpd
4132 6807 1 67 3 0x100092 kqread httpd
98138 255433 1 67 3 0x100092 kqread httpd
36567 323506 1 67 3 0x100092 kqread httpd
42835 234930 37277 95 3 0x100092 kqread smtpd
7470 157304 37277 103 3 0x100092 kqread smtpd
93610 470331 37277 95 3 0x100092 kqread smtpd
52521 493948 37277 95 3 0x100092 kqread smtpd
92601 131987 37277 95 3 0x100092 kqread smtpd
32371 229192 37277 95 3 0x100092 kqread smtpd
37277 266758 1 0 3 0x100080 kqread smtpd
85154 144921 1 0 3 0x80 kqread relayd
73872 403967 1 89 3 0x100092 kqread relayd
83871 323853 1 89 3 0x100092 kqread relayd
72192 107185 1 89 3 0x100092 kqread relayd
63728 434713 1 89 3 0x100092 kqread relayd
7463 237568 1 89 3 0x100092 kqread relayd
8573 10618 1 89 3 0x100092 kqread relayd
95560 222242 1 89 3 0x100092 kqread relayd
93731 388665 1 89 3 0x100092 kqread relayd
92056 453261 1 89 3 0x100092 kqread relayd
59991 293357 1 89 3 0x100092 kqread relayd
86473 71592 1 89 3 0x100092 kqread relayd
32379 28918 1 89 3 0x100092 kqread relayd
79429 244555 1 89 3 0x100092 kqread relayd
63847 15559 1 89 3 0x100092 kqread relayd
50075 349175 1 89 3 0x100092 kqread relayd
6058 159484 1 89 3 0x100092 kqread relayd
16880 367020 1 89 3 0x100092 kqread relayd
136 140946 1 89 3 0x100092 kqread relayd
13867 140645 1 89 3 0x100092 kqread relayd
70441 51568 1 89 3 0x100092 kqread relayd
55683 2953 1 89 2 0x100012 relayd
94827 327279 1 89 2 0x100012 relayd
83117 363844 1 89 2 0x100012 relayd
49903 68611 1 89 2 0x100012 relayd
46080 166396 1 89 2 0x100012 relayd
29597 175697 1 89 2 0x100012 relayd
39496 362805 1 89 2 0x100012 relayd
19019 294153 1 89 2 0x100012 relayd
45967 90936 1 89 2 0x100012 relayd
26992 149427 1 89 2 0x100012 relayd
60364 206396 1 89 2 0x100012 relayd
49833 356433 1 89 2 0x100012 relayd
79112 503654 1 89 2 0x100012 relayd
81078 319693 1 89 2 0x100012 relayd
78058 140027 1 89 2 0x100012 relayd
91182 184681 1 89 2 0x100012 relayd
37959 222788 1 89 2 0x100012 relayd
7050 276440 1 89 2 0x100012 relayd
54548 85994 1 89 2 0x100012 relayd
34773 213423 1 89 2 0x100012 relayd
3552 125399 1 89 3 0x100092 kqread relayd
89897 452096 1 89 3 0x100092 kqread relayd
35273 402937 90827 75 2 0x100012 bgpd
24240 293596 90827 75 3 0x100092 poll bgpd
90827 4256 1 0 2 0x80 bgpd
72809 171423 1 0 3 0x80 kqread snmpd
92764 212816 1 91 3 0x92 kqread snmpd
30459 414910 1 91 3 0x12 flt_noram1 snmpd
98342 18762 1 0 3 0x80 select sshd
91361 6408 1 0 3 0x100080 poll ntpd
92053 373229 95928 83 3 0x100092 poll ntpd
95928 207794 1 83 2 0x100492 ntpd
27466 156798 83367 74 2 0x100490 pflogd
83367 198766 1 0 3 0x80 netio pflogd
7678 511218 31604 73 3 0x100090 kqread syslogd
31604 252824 1 0 3 0x100082 netio syslogd
23988 322338 0 0 3 0x14200 pgzero zerothread
24605 125985 0 0 3 0x14200 aiodoned aiodoned
73480 399390 0 0 2 0x14200 update
41800 367817 0 0 3 0x14200 cleaner cleaner
81146 69342 0 0 3 0x14200 reaper reaper
*25140 16722 0 0 7 0x14200 pagedaemon
38872 493720 0 0 3 0x14200 bored crynlk
83120 104525 0 0 3 0x14200 bored crypto
44202 60584 0 0 2 0x14600 pfpurge
42782 359699 0 0 3 0x14200 bored viomb
79739 304953 0 0 3 0x14200 usbtsk usbtask
89868 403768 0 0 3 0x14200 usbatsk usbatsk
35580 285075 0 0 3 0x40014200 acpi0 acpi0
78992 92301 0 0 2 0x14200 softnet
14178 471898 0 0 2 0x14200 systqmp
86701 48006 0 0 2 0x14200 systq
75335 15065 0 0 3 0x40014200 bored softclock
16772 484410 0 0 3 0x40014200 idle0
1 192841 0 0 3 0x82 wait init
0 0 -1 0 3 0x10200 scheduler swapper
ddb> tr /p 0t414910
sleep_finish() at sleep_finish+0x86
msleep() at msleep+0x10f
uvmfault_anonget() at uvmfault_anonget+0x1d3
uvm_fault() at uvm_fault+0x468
trap() at trap+0x283
--- trap (number 6) ---
end of kernel
end trace frame: 0x5e3d2983000, count: -5
0x5e3750d73b6:
>How-To-Repeat:
I have created VM with smaller ammount of memory to reproduce it faster.
/etc/relayd.conf:
prefork 20
log updates
interval 3
timeout 2000
table <test2> { 10.33.192.66 }
relay test2 {
listen on 10.33.200.17 port 6000
forward to <test2> port 6000
}
ruby test client:
require 'socket'
threads = []
str = "plah" * 10000
200.times do |t|
threads << Thread.new(t) do |th|
loop do
begin
socket = TCPSocket.new "10.33.200.17 ", 6000
socket.write str
socket.close
rescue
end
end
end
end
threads.each { |th| th.join }
ruby test backend server:
require 'socket'
begin
acceptor = TCPServer.open("0.0.0.0", 6000)
fds = [acceptor]
while true
if ios = select(fds, [], [], 10)
reads = ios.first
reads.each do |client|
if client == acceptor
client, sockaddr = acceptor.accept
fds << client
elsif client.eof?
fds.delete(client)
client.close
else
sleep 1
client.gets
end
end
end
end
ensure
fds.each {|c| c.close}
end
>Fix:
Not known.
dmesg:
OpenBSD 6.1-current (GENERIC) #14: Mon Apr 24 09:32:13 MDT 2017
[email protected]:/usr/src/sys/arch/amd64/compile/GENERIC
real mem = 137203712 (130MB)
avail mem = 128581632 (122MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf69d0 (9 entries)
bios0: vendor SeaBIOS version "1.9.1-1.fc24" date 04/01/2014
bios0: QEMU Standard PC (i440FX + PIIX, 1996)
acpi0 at bios0: rev 0
acpi0: sleep states S5
acpi0: tables DSDT FACP APIC
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel Core Processor (Haswell, no TSX), 2594.03 MHz
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,ARAT
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line
16-way L2 cache
cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 1000MHz
ioapic0 at mainbus0: apid 0 pa 0xfec00000, version 11, 24 pins
acpiprt0 at acpi0: bus 0 (PCI0)
acpicpu0 at acpi0: C1(@1 halt!)
"ACPI0006" at acpi0 not configured
"PNP0303" at acpi0 not configured
"PNP0F13" at acpi0 not configured
"PNP0700" at acpi0 not configured
"PNP0501" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"QEMU0002" at acpi0 not configured
"PNP0A06" at acpi0 not configured
pvbus0 at mainbus0: KVM
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel 0
wired to compatibility, channel 1 wired to compatibility
atapiscsi0 at pciide0 channel 0 drive 0
scsibus1 at atapiscsi0: 2 targets
cd0 at scsibus1 targ 0 lun 0: <QEMU, QEMU DVD-ROM, 2.5+> ATAPI 5/cdrom removable
cd0(pciide0:0:0): using PIO mode 4, DMA mode 2
pciide0: channel 1 disabled (no drives)
piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x03: apic 0 int 9
iic0 at piixpm0
vga1 at pci0 dev 2 function 0 "Red Hat QXL Video" rev 0x04
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
virtio0 at pci0 dev 3 function 0 "Qumranet Virtio Network" rev 0x00
vio0 at virtio0: address 52:54:00:df:65:cc
virtio0: msix shared
virtio1 at pci0 dev 4 function 0 "Qumranet Virtio Network" rev 0x00
vio1 at virtio1: address 52:54:00:6c:c5:79
virtio1: msix shared
azalia0 at pci0 dev 5 function 0 "Intel 82801FB HD Audio" rev 0x01: apic 0 int
10
azalia0: No codecs found
virtio2 at pci0 dev 6 function 0 "Qumranet Virtio Console" rev 0x00
virtio2: no matching child driver; not configured
uhci0 at pci0 dev 7 function 0 "Intel 82801I USB" rev 0x03: apic 0 int 11
uhci1 at pci0 dev 7 function 1 "Intel 82801I USB" rev 0x03: apic 0 int 11
uhci2 at pci0 dev 7 function 2 "Intel 82801I USB" rev 0x03: apic 0 int 10
ehci0 at pci0 dev 7 function 7 "Intel 82801I USB" rev 0x03: apic 0 int 10
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00
addr 1
virtio3 at pci0 dev 8 function 0 "Qumranet Virtio Storage" rev 0x00
vioblk0 at virtio3
scsibus2 at vioblk0: 2 targets
sd0 at scsibus2 targ 0 lun 0: <VirtIO, Block Device, > SCSI3 0/direct fixed
sd0: 4096MB, 512 bytes/sector, 8388608 sectors
virtio3: msix shared
virtio4 at pci0 dev 9 function 0 "Qumranet Virtio Memory" rev 0x00
viomb0 at virtio4
virtio4: apic 0 int 10
isa0 at pcib0
isadma0 at isa0
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
fd0 at fdc0 drive 1: density unknown
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com0: console
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
usb1 at uhci0: USB revision 1.0
uhub1 at usb1 configuration 1 interface 0 "Intel UHCI root hub" rev 1.00/1.00
addr 1
usb2 at uhci1: USB revision 1.0
uhub2 at usb2 configuration 1 interface 0 "Intel UHCI root hub" rev 1.00/1.00
addr 1
usb3 at uhci2: USB revision 1.0
uhub3 at usb3 configuration 1 interface 0 "Intel UHCI root hub" rev 1.00/1.00
addr 1
vscsi0 at root
scsibus3 at vscsi0: 256 targets
softraid0 at root
scsibus4 at softraid0: 256 targets
root on sd0a (b3164b5c560a5158.a) swap on sd0b dump on sd0b
Rivo