On 18.08.2023 18:34, Dag-Erling Smørgrav wrote:
Dag-Erling Smørgrav <d...@freebsd.org> writes:
Plot twist: c47116e909 _without_ the patches also appears to be working
fine. The last kernel I know for sure deadlocks is b36f469a15, so I'm
going to test cd25b0f740 and 28d2e3b5de.
c47116e909 with cd25b0f740 and 28d2e3b5de reverted deadlocks, see
attached ddb.txt. I'm going to see if reverting only 28d2e3b5de but not
cd25b0f740 changes anything.
Yes, it looks like ZIL-related deadlock:
ZFS sync thread in zil_sync() is waiting for allocated ZIL zios to complete:
Tracing command zfskern pid 5 tid 101124 td 0xfffffe0408cbb020
sched_switch() at sched_switch+0x5da/frame 0xfffffe04090f7900
mi_switch() at mi_switch+0x173/frame 0xfffffe04090f7920
sleepq_switch() at sleepq_switch+0x104/frame 0xfffffe04090f7960
_cv_wait() at _cv_wait+0x165/frame 0xfffffe04090f79c0
zil_sync() at zil_sync+0x9b/frame 0xfffffe04090f7aa0
dmu_objset_sync() at dmu_objset_sync+0x51b/frame 0xfffffe04090f7b70
dsl_pool_sync() at dsl_pool_sync+0x11d/frame 0xfffffe04090f7bf0
spa_sync() at spa_sync+0xc68/frame 0xfffffe04090f7e20
txg_sync_thread() at txg_sync_thread+0x2eb/frame 0xfffffe04090f7ef0
fork_exit() at fork_exit+0x82/frame 0xfffffe04090f7f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe04090f7f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Some thread requested fsync(), allocated zil zio, but stuck waiting for
z_teardown_inactive_lock in attempt to get data to be written into zil,
so zios were never even issued:
Tracing command blacklistd pid 521 tid 101136 td 0xfffffe040d08d000
sched_switch() at sched_switch+0x5da/frame 0xfffffe040c25c710
mi_switch() at mi_switch+0x173/frame 0xfffffe040c25c730
sleepq_switch() at sleepq_switch+0x104/frame 0xfffffe040c25c770
_sleep() at _sleep+0x2d6/frame 0xfffffe040c25c810
rms_rlock_fallback() at rms_rlock_fallback+0xd0/frame 0xfffffe040c25c850
zfs_freebsd_reclaim() at zfs_freebsd_reclaim+0x2b/frame 0xfffffe040c25c880
VOP_RECLAIM_APV() at VOP_RECLAIM_APV+0x35/frame 0xfffffe040c25c8a0
vgonel() at vgonel+0x3a9/frame 0xfffffe040c25c910
vnlru_free_impl() at vnlru_free_impl+0x371/frame 0xfffffe040c25c990
vn_alloc_hard() at vn_alloc_hard+0xd3/frame 0xfffffe040c25c9b0
getnewvnode_reserve() at getnewvnode_reserve+0xa0/frame 0xfffffe040c25c9d0
zfs_zget() at zfs_zget+0x1f/frame 0xfffffe040c25ca80
zfs_get_data() at zfs_get_data+0x62/frame 0xfffffe040c25cb20
zil_lwb_commit() at zil_lwb_commit+0x32f/frame 0xfffffe040c25cb70
zil_lwb_write_issue() at zil_lwb_write_issue+0x4e/frame 0xfffffe040c25cbb0
zil_commit_impl() at zil_commit_impl+0x943/frame 0xfffffe040c25cd40
zfs_fsync() at zfs_fsync+0x8f/frame 0xfffffe040c25cd80
kern_fsync() at kern_fsync+0x18a/frame 0xfffffe040c25ce00
amd64_syscall() at amd64_syscall+0x138/frame 0xfffffe040c25cf30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe040c25cf30
--- syscall (95, FreeBSD ELF64, fsync), rip = 0x3f979eeb074a, rsp =
0x3f979a449c38, rbp = 0x3f979a449c50 ---
Third thread doing zfs rollback while holding z_teardown_inactive_lock
tries to wait for transaction commit, causing deadlock:
Tracing command zfs pid 65636 tid 138109 td 0xfffffe0438d721e0
sched_switch() at sched_switch+0x5da/frame 0xfffffe0439b2b950
mi_switch() at mi_switch+0x173/frame 0xfffffe0439b2b970
sleepq_switch() at sleepq_switch+0x104/frame 0xfffffe0439b2b9b0
_cv_wait() at _cv_wait+0x165/frame 0xfffffe0439b2ba10
txg_wait_synced_impl() at txg_wait_synced_impl+0xeb/frame
0xfffffe0439b2ba50
txg_wait_synced() at txg_wait_synced+0xb/frame 0xfffffe0439b2ba60
zfsvfs_teardown() at zfsvfs_teardown+0x203/frame 0xfffffe0439b2bab0
zfs_ioc_rollback() at zfs_ioc_rollback+0x12f/frame 0xfffffe0439b2bb00
zfsdev_ioctl_common() at zfsdev_ioctl_common+0x612/frame
0xfffffe0439b2bbc0
zfsdev_ioctl() at zfsdev_ioctl+0x12a/frame 0xfffffe0439b2bbf0
devfs_ioctl() at devfs_ioctl+0xd2/frame 0xfffffe0439b2bc40
vn_ioctl() at vn_ioctl+0xc2/frame 0xfffffe0439b2bcb0
devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe0439b2bcd0
kern_ioctl() at kern_ioctl+0x286/frame 0xfffffe0439b2bd30
sys_ioctl() at sys_ioctl+0x152/frame 0xfffffe0439b2be00
amd64_syscall() at amd64_syscall+0x138/frame 0xfffffe0439b2bf30
fast_syscall_common() at fast_syscall_common+0xf8/frame
0xfffffe0439b2bf30
--- syscall (54, FreeBSD ELF64, ioctl), rip = 0x1afaddea3aaa, rsp =
0x1afad4058328, rbp = 0x1afad40583a0 ---
Unfortunately I think the current code in main should still suffer from
this specific deadlock. cd25b0f740 fixes some deadlocks in this area,
may be that is why you are getting issues less often, but I don't
believe it fixes this specific one, may be you was just lucky. Only
https://github.com/openzfs/zfs/pull/15122 I believe should fix them.
--
Alexander Motin