On May 22, 2023, at 2:27 AM, Claudio Jeker <[email protected]> wrote:
> I have seen these WITNESS warnings on other systems as well. I doubt this
> is the problem. IIRC this warning is because sys_mount() is doing it wrong
> but it is not really an issue since sys_mount is not called often.
Yup. I see that now that I have tested witness on several arches. They all
show this lock order reversal right after booting the system. I guess this
means what I am seeing isn’t something that witness detects.
On -current with my T4-1, I can reliably reproduce the issues I am seeing.
While the problem is intermittent I can’t get very far into the jdk build
without
tripping it. Instructions for reproducing the issue are:
Add wxallowed to /usr/local/ and /usr/ports (or wherever WRKOBJDIR has
been changed to)
doas pkg_add jdk zip unzip cups-libs bash gmake libiconv giflib
cd /usr/ports/devel/jdk/1.8
FLAVOR=native_bootstrap make
There are two stages to the problem. A java command (or javac or javah)
gets stuck making forward progress and nearly all of its cpu time is in
sys time category. You can see this in top as 1500-3000% CPU time on
the java process. ktrace of the process in this state shows endless
sched_yield() calls. Debugging shows many threads in
pthread_cond_wait(3). The condition vars are configured to use
CLOCK_MONOTONIC.
The second stage of the problem is when things lock up. While java is
spinning in this sched_yield() state, if you display the process arguments in
top (pressing the right arrow) you trip the lockups. top stops responding.
getty will reprompt if enter is pressed, but locks up if a username is entered.
Most processes lock up when doing anything after this point. ddb ps at this
stage shows top waiting on vmmaplk and the rest of the stuck processes
waiting on sysctllk (sshd, systat, login).
I tried bisecting when this was introduced but as I go back in time with
kernels it becomes more intermittent and I didn’t notice that so I would need
to redo the bisecting. I can say I have seen the problem reproduce as far
back as Feb 16th kernel. When I updated the jdk in late January I didn’t notice
it but it could have been a lucky build as I tend to only do one
native_bootstrap
build of the jdk when updating as a way to test the resulting package.
Here is some sample output of top, systat and ddb ps output on -current in
my last reproduction of the problem.
load averages: 15.27, 4.25, 1.68
oracle.intricatesoftware.com 16:13:10
64 processes: 62 idle, 2 on processor
up 0
days 00:04:58
64 CPUs: 0.0% user, 1.8% nice, 55.5% sys, 0.4% spin, 0.1% intr, 42.2% idle
Memory: Real: 143M/2676M act/tot Free: 13G Cache: 2334M Swap: 0K/16G
PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND
38582 _pbuild 10 5 290M 64M onproc/19 fsleep 17:16 2892.14% javac
3 users Load 15.27 4.25 1.68 oracle.intricatesof 16:13:09
memory totals (in KB) PAGING SWAPPING Interrupts
real virtual free in out in out 12777 total
Active 146752 146752 13198032 ops vcons0
All 2740160 2740160 29713104 pages 5 vpci0:0
2 mpii0
Proc:r d s w Csw Trp Sys Int Sof Flt forks mpii1
18 148 78089 18666 8090 94097 18891 fkppw 2 em0
fksvm 31 ehci0
0.1%Int 0.3%Spn 55.1%Sys 1.8%Usr 42.7%Idle pwait 12737 clock
| | | | | | | | | | | relck
============================> rlkok
noram
Namei Sys-cache Proc-cache No-cache ndcpy
Calls hits % hits % miss % fltcp
zfod
cow
Disks cd0 sd0 sd1 sd2 cd1 sd3 66409 fmin
seeks 88545 ftarg
xfers 3 itarg
speed 18K 2 wired
sec 0.0 pdfre
pdscn
pzidl 1 IPKTS
13 kmape 1 OPKTS
ddb{0}> ps
PID TID PPID UID S FLAGS WAIT COMMAND
94135 299634 9252 0 3 0x82 sysctllk sshd
38582 227559 44647 55 3 0x200083 fsleep javac
38582 430054 44647 55 3 0x4200083 fsleep javac
38582 454002 44647 55 7 0x4200003 javac
38582 92205 44647 55 3 0x4200083 fsleep javac
38582 391595 44647 55 7 0x4200003 javac
38582 306671 44647 55 7 0x4200003 javac
38582 444915 44647 55 7 0x4200003 javac
38582 68842 44647 55 7 0x4200003 javac
38582 210759 44647 55 7 0x4200003 javac
38582 295002 44647 55 3 0x4200003 uobjlk javac
38582 20089 44647 55 7 0x4200003 javac
38582 426972 44647 55 7 0x4200003 javac
38582 485651 44647 55 7 0x4200003 javac
38582 198526 44647 55 7 0x4200003 javac
38582 140392 44647 55 7 0x4200003 javac
38582 426536 44647 55 7 0x4200003 javac
38582 237845 44647 55 3 0x4200003 vmmaplk javac
38582 11519 44647 55 7 0x4200003 javac
38582 493801 44647 55 7 0x4200003 javac
38582 431112 44647 55 7 0x4200003 javac
38582 5954 44647 55 3 0x4200003 uobjlk javac
38582 334521 44647 55 7 0x4200003 javac
38582 63717 44647 55 7 0x4200003 javac
38582 278840 44647 55 3 0x4200003 uobjlk javac
38582 288942 44647 55 3 0x4200003 vmmaplk javac
38582 358539 44647 55 3 0x4200003 vmmaplk javac
38582 292341 44647 55 3 0x4200003 uobjlk javac
38582 71353 44647 55 7 0x4200003 javac
38582 173462 44647 55 7 0x4200003 javac
38582 369162 44647 55 7 0x4200003 javac
38582 103608 44647 55 3 0x4200003 uobjlk javac
38582 283692 44647 55 7 0x4200003 javac
38582 73296 44647 55 7 0x4200003 javac
38582 107123 44647 55 7 0x4200003 javac
38582 190345 44647 55 7 0x4200003 javac
38582 519032 44647 55 7 0x4200003 javac
38582 512468 44647 55 7 0x4200003 javac
38582 30652 44647 55 7 0x4200003 javac
38582 379236 44647 55 3 0x4200003 uobjlk javac
38582 269178 44647 55 7 0x4200003 javac
38582 268232 44647 55 3 0x4200003 uobjlk javac
38582 352770 44647 55 7 0x4200003 javac
38582 377632 44647 55 7 0x4200003 javac
38582 244819 44647 55 7 0x4200003 javac
38582 72610 44647 55 7 0x4200003 javac
38582 469953 44647 55 3 0x4200083 fsleep javac
38582 371383 44647 55 3 0x4200083 fsleep javac
38582 158857 44647 55 3 0x4200083 fsleep javac
38582 11735 44647 55 3 0x4200083 fsleep javac
38582 84102 44647 55 3 0x4200083 fsleep javac
38582 75248 44647 55 3 0x4200083 fsleep javac
38582 375420 44647 55 3 0x4200083 fsleep javac
38582 518610 44647 55 3 0x4200083 fsleep javac
38582 127230 44647 55 3 0x4200003 vmmaplk javac
38582 83423 44647 55 3 0x4200083 fsleep javac
38582 159603 44647 55 3 0x4200083 fsleep javac
38582 31091 44647 55 3 0x4200083 fsleep javac
38582 390888 44647 55 3 0x4200083 fsleep javac
38582 166886 44647 55 3 0x4200083 fsleep javac
38582 377933 44647 55 3 0x4200083 fsleep javac
38582 376867 44647 55 3 0x4200083 fsleep javac
38582 110280 44647 55 3 0x4200083 fsleep javac
38582 396501 44647 55 3 0x4200083 fsleep javac
38582 337157 44647 55 3 0x4200083 fsleep javac
38582 312539 44647 55 3 0x4200083 fsleep javac
38582 439725 44647 55 3 0x4200083 fsleep javac
38582 174728 44647 55 3 0x4200083 fsleep javac
38582 350117 44647 55 3 0x4200083 fsleep javac
38582 336583 44647 55 3 0x4200083 fsleep javac
44647 53285 6389 55 3 0x100089 sigsusp sh
6389 9627 3586 55 3 0x10008b sigsusp sh
3586 519556 31484 55 3 0x83 wait gmake
29401 282226 87417 55 3 0x100083 piperd tee
31484 384291 87417 55 3 0x81 wait bash
10686 220950 84161 55 3 0x100083 piperd tee
87417 52900 84161 55 3 0x81 wait bash
84161 354717 90894 55 3 0x83 wait bash
90894 299418 54844 55 3 0x100089 sigsusp sh
54844 38202 23322 55 3 0x10008b sigsusp sh
23322 499463 12410 55 3 0x83 wait gmake
2137 306483 5971 1000 3 0x83 sysctllk systat
96354 374280 1 1000 3 0x1000b0 kqread ssh-agent
5971 326285 53265 1000 3 0x10008b sigsusp ksh
53265 95696 81543 1000 3 0x98 kqread sshd
81543 262878 9252 0 3 0x82 kqread sshd
12410 25695 53684 1000 3 0x10008b sigsusp make
53684 389550 2559 1000 3 0x10008b sigsusp sh
91269 101622 48618 1000 3 0x100003 vmmaplk top
26730 50864 1 1000 3 0x1000b0 kqread ssh-agent
48618 194268 38192 1000 3 0x10008b sigsusp ksh
38192 94599 15809 1000 3 0x98 kqread sshd
15809 420302 9252 0 3 0x82 kqread sshd
2559 172546 11253 1000 3 0x10008b sigsusp make
48347 451749 1 1000 3 0x1000b0 kqread ssh-agent
11253 329919 75474 1000 3 0x10008b sigsusp ksh
75474 434695 24864 1000 3 0x98 kqread sshd
24864 460675 9252 0 3 0x82 kqread sshd
80339 457654 1 0 3 0x83 sysctllk login
2783 471804 1 0 3 0x100098 kqread cron
13986 329572 1 99 3 0x1100090 kqread sndiod
17161 216329 1 110 3 0x100090 kqread sndiod
7125 459497 81420 95 3 0x1100092 kqread smtpd
34588 306264 81420 103 3 0x1100092 kqread smtpd
35539 431962 81420 95 3 0x1100092 kqread smtpd
95046 110434 81420 95 3 0x100092 kqread smtpd
75832 395854 81420 95 3 0x1100092 kqread smtpd
90905 131444 81420 95 3 0x1100092 kqread smtpd
81420 40917 1 0 3 0x100080 kqread smtpd
9252 412804 1 0 3 0x88 kqread sshd
77215 300303 47896 0 3 0x80 nfsd nfsd
39163 328218 47896 0 3 0x80 nfsd nfsd
18940 88416 47896 0 3 0x80 nfsd nfsd
30770 448875 47896 0 3 0x80 nfsd nfsd
47896 35491 1 0 3 0x80 netcon nfsd
69834 381834 1172 0 3 0x80 kqread mountd
1172 8898 1 0 3 0x100080 kqread mountd
34849 159509 1 28 3 0x1100090 kqread portmap
99111 374653 1 0 3 0x100080 kqread ntpd
96514 463552 69056 83 3 0x100092 sysctllk ntpd
69056 185686 1 83 3 0x1100092 sysctllk ntpd
76215 96020 37258 74 3 0x1100092 bpf pflogd
37258 158509 1 0 3 0x80 netio pflogd
74134 93758 35337 73 3 0x1100090 kqread syslogd
35337 436947 1 0 3 0x100082 netio syslogd
49461 134918 1 0 3 0x100080 kqread resolvd
15802 138340 35598 77 3 0x100092 kqread dhcpleased
85374 145907 35598 77 3 0x100092 kqread dhcpleased
35598 69070 1 0 3 0x80 kqread dhcpleased
78965 485143 22355 115 3 0x100092 kqread slaacd
49372 473630 22355 115 3 0x100092 kqread slaacd
22355 252711 1 0 3 0x100080 kqread slaacd
77472 293410 0 0 3 0x14200 bored smr
96287 2177 0 0 3 0x14200 pgzero zerothread
22397 479660 0 0 3 0x14200 aiodoned aiodoned
40550 268537 0 0 3 0x14200 syncer update
30711 491254 0 0 3 0x14200 cleaner cleaner
19845 282487 0 0 3 0x14200 reaper reaper
85959 130144 0 0 3 0x14200 pgdaemon pagedaemon
70483 182566 0 0 3 0x14200 usbtsk usbtask
51296 152380 0 0 3 0x14200 usbatsk usbatsk
16787 444929 0 0 3 0x14200 bored sensors
63429 82409 0 0 7 0x40014200 idle63
92645 493403 0 0 7 0x40014200 idle62
20350 375563 0 0 7 0x40014200 idle61
35222 74521 0 0 7 0x40014200 idle60
33426 375581 0 0 7 0x40014200 idle59
22679 177213 0 0 7 0x40014200 idle58
14639 386227 0 0 7 0x40014200 idle57
39400 120262 0 0 7 0x40014200 idle56
12654 7800 0 0 7 0x40014200 idle55
48608 174405 0 0 3 0x40014200 idle54
94678 113021 0 0 7 0x40014200 idle53
6170 359112 0 0 3 0x40014200 idle52
92056 451703 0 0 7 0x40014200 idle51
94062 149790 0 0 3 0x40014200 idle50
7490 352645 0 0 7 0x40014200 idle49
58618 321162 0 0 3 0x40014200 idle48
34596 30496 0 0 3 0x40014200 idle47
42019 493114 0 0 7 0x40014200 idle46
19753 346089 0 0 3 0x40014200 idle45
35289 255668 0 0 7 0x40014200 idle44
71157 135665 0 0 7 0x40014200 idle43
2489 156561 0 0 7 0x40014200 idle42
69267 449921 0 0 3 0x40014200 idle41
17078 23764 0 0 3 0x40014200 idle40
18986 140559 0 0 7 0x40014200 idle39
31565 508859 0 0 3 0x40014200 idle38
83892 503397 0 0 3 0x40014200 idle37
3971 218332 0 0 3 0x40014200 idle36
56087 426177 0 0 3 0x40014200 idle35
8255 404604 0 0 3 0x40014200 idle34
88769 308362 0 0 7 0x40014200 idle33
17805 369954 0 0 7 0x40014200 idle32
44764 388232 0 0 7 0x40014200 idle31
6184 164272 0 0 7 0x40014200 idle30
57793 246870 0 0 3 0x40014200 idle29
94847 282657 0 0 3 0x40014200 idle28
20730 274685 0 0 7 0x40014200 idle27
49977 358582 0 0 3 0x40014200 idle26
76662 491808 0 0 3 0x40014200 idle25
339 381654 0 0 7 0x40014200 idle24
95798 435838 0 0 7 0x40014200 idle23
17729 220014 0 0 3 0x40014200 idle22
32933 457100 0 0 3 0x40014200 idle21
93991 300506 0 0 7 0x40014200 idle20
83270 35155 0 0 3 0x40014200 idle19
113 190949 0 0 7 0x40014200 idle18
47202 13764 0 0 3 0x40014200 idle17
26572 32690 0 0 7 0x40014200 idle16
79156 94305 0 0 3 0x40014200 idle15
91078 227738 0 0 3 0x40014200 idle14
92007 190955 0 0 3 0x40014200 idle13
30258 465800 0 0 7 0x40014200 idle12
21174 286794 0 0 7 0x40014200 idle11
38572 384863 0 0 3 0x40014200 idle10
39753 266722 0 0 7 0x40014200 idle9
2402 328673 0 0 3 0x40014200 idle8
7204 135132 0 0 3 0x40014200 idle7
50857 147720 0 0 3 0x40014200 idle6
98253 405788 0 0 3 0x40014200 idle5
45666 511079 0 0 3 0x40014200 idle4
31437 507896 0 0 3 0x40014200 idle3
22223 410682 0 0 7 0x40014200 idle2
66020 259824 0 0 3 0x40014200 idle1
1702 438104 0 0 3 0x14200 bored softnet3
79408 470282 0 0 3 0x14200 bored softnet2
65184 187165 0 0 3 0x14200 bored softnet1
19984 214261 0 0 3 0x14200 bored softnet0
96155 318872 0 0 3 0x14200 bored systqmp
56400 116597 0 0 3 0x14200 bored systq
90431 390796 0 0 3 0x40014200 bored softclock
*64111 191058 0 0 7 0x40014200 idle0
18862 308472 0 0 3 0x14200 kmalloc kmthread
1 421752 0 0 3 0x82 wait init
0 0 -1 0 3 0x10200 scheduler swapper