Re: [OpenZFS Developer] avl_is_empty(&dn->dn_dbufs) assert in dnode_sync_free

Andriy Gapon Fri, 23 Oct 2015 05:02:58 -0700

Just for the posterity, I was able to reproduce the problem on illumos as well
(running a kernel without Justin's fix):


> ::panicinfo
             cpu                0
          thread ffffff0004819c40
         message assertion failed: avl_is_empty(&dn->dn_dbufs), file:
.../../common/fs/zfs/dnode_sync.c, line: 495

> $C
ffffff0004819770 vpanic()
ffffff00048197a0 0xfffffffffbdf37d8()
ffffff00048197e0 dnode_sync_free+0x278(ffffff01cb3b6528, ffffff0166429a00)
ffffff0004819860 dnode_sync+0x68b(ffffff01cb3b6528, ffffff0166429a00)
ffffff00048198b0 dmu_objset_sync_dnodes+0x93(ffffff014af3d330, 0, 
ffffff0166429a00)
ffffff00048199b0 dmu_objset_sync+0x1bc(ffffff014af3d080, ffffff014b1ff560,
ffffff0166429a00)
ffffff00048199f0 dsl_pool_sync_mos+0x42(ffffff01495d7e80, ffffff0166429a00)
ffffff0004819a80 dsl_pool_sync+0x2fe(ffffff01495d7e80, 233d6)
ffffff0004819b50 spa_sync+0x27e(ffffff014c2c0000, 233d6)
ffffff0004819c20 txg_sync_thread+0x260(ffffff01495d7e80)
ffffff0004819c30 thread_start+8()

On 28/09/2015 17:25, Andriy Gapon wrote:
> 
> There are several reports from FreeBSD users about getting a panic because of
> avl_is_empty(&dn->dn_dbufs) assertion in dnode_sync_free().  I was also able 
> to
> reproduce the problem with ZFS on Linux 0.6.5. There does not seem to be any
> reports from illumos users.
> 
> I think that the following stack traces demonstrate the problem rather well 
> (the
> stack traces are a little bit unusual as they come from Linux's crash utility,
> but should be legible):
> crash> foreach UN bt
> PID: 703    TASK: ffff88003b8a4440  CPU: 0   COMMAND: "txg_sync"
>  #0 [ffff880039fa3848] __schedule at ffffffff8160918d
>  #1 [ffff880039fa38b0] schedule at ffffffff816096e9
>  #2 [ffff880039fa38c0] spl_panic at ffffffffa0012645 [spl]
>  #3 [ffff880039fa3a48] dnode_sync at ffffffffa062b7cf [zfs]
>  #4 [ffff880039fa3b38] dmu_objset_sync_dnodes at ffffffffa0612dd7 [zfs]
>  #5 [ffff880039fa3b78] dmu_objset_sync at ffffffffa06130d5 [zfs]
>  #6 [ffff880039fa3c50] dsl_pool_sync at ffffffffa0641a8a [zfs]
>  #7 [ffff880039fa3cd0] spa_sync at ffffffffa0664408 [zfs]
>  #8 [ffff880039fa3da0] txg_sync_thread at ffffffffa067b970 [zfs]
>  #9 [ffff880039fa3e98] thread_generic_wrapper at ffffffffa000e18a [spl]
> #10 [ffff880039fa3ec8] kthread at ffffffff8109726f
> #11 [ffff880039fa3f50] ret_from_fork at ffffffff81614198
> 
> PID: 716    TASK: ffff88003b8a6660  CPU: 0   COMMAND: "trial"
>  #0 [ffff88003c68f738] __schedule at ffffffff8160918d
>  #1 [ffff88003c68f7a0] schedule at ffffffff816096e9
>  #2 [ffff88003c68f7b0] cv_wait_common at ffffffffa0014d15 [spl]
>  #3 [ffff88003c68f818] __cv_wait at ffffffffa0014e65 [spl]
>  #4 [ffff88003c68f828] txg_wait_synced at ffffffffa067a70f [zfs]
>  #5 [ffff88003c68f868] dsl_sync_task at ffffffffa064b017 [zfs]
>  #6 [ffff88003c68f928] dsl_destroy_head at ffffffffa06eee62 [zfs]
>  #7 [ffff88003c68f978] dmu_recv_cleanup_ds at ffffffffa06194ed [zfs]
>  #8 [ffff88003c68fa98] dmu_recv_stream at ffffffffa061a992 [zfs]
>  #9 [ffff88003c68fc20] zfs_ioc_recv at ffffffffa06b1bad [zfs]
> #10 [ffff88003c68fe50] zfsdev_ioctl at ffffffffa06b3c86 [zfs]
> #11 [ffff88003c68feb8] do_vfs_ioctl at ffffffff811d9ca5
> #12 [ffff88003c68ff30] sys_ioctl at ffffffff811d9f21
> #13 [ffff88003c68ff80] system_call_fastpath at ffffffff81614249
>     RIP: 00007ff39d5c0257  RSP: 00007ff38e5c2008  RFLAGS: 00010206
>     RAX: 0000000000000010  RBX: ffffffff81614249  RCX: 0000000000000024
>     RDX: 00007ff38e5c21d0  RSI: 0000000000005a1b  RDI: 0000000000000004
>     RBP: 00007ff38e5c57b0   R8: 342d663438372d62   R9: 636430382d646335
>     R10: 643266636131612d  R11: 0000000000000246  R12: 0000000000000060
>     R13: 00007ff38e5c3200  R14: 00007ff3880080a0  R15: 00007ff38e5c21d0
>     ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b
> 
> PID: 31758  TASK: ffff88003b332d80  CPU: 0   COMMAND: "dbu_evict"
>  #0 [ffff88003b723ca0] __schedule at ffffffff8160918d
>  #1 [ffff88003b723d08] schedule_preempt_disabled at ffffffff8160a8d9
>  #2 [ffff88003b723d18] __mutex_lock_slowpath at ffffffff81608625
>  #3 [ffff88003b723d78] mutex_lock at ffffffff81607a8f
>  #4 [ffff88003b723d90] dbuf_rele at ffffffffa05fd290 [zfs]
>  #5 [ffff88003b723db0] dmu_buf_rele at ffffffffa05fe57e [zfs]
>  #6 [ffff88003b723dc0] bpobj_close at ffffffffa05f78ed [zfs]
>  #7 [ffff88003b723dd8] dsl_deadlist_close at ffffffffa0636e19 [zfs]
>  #8 [ffff88003b723e10] dsl_dataset_evict at ffffffffa062d78b [zfs]
>  #9 [ffff88003b723e28] taskq_thread at ffffffffa000f912 [spl]
> #10 [ffff88003b723ec8] kthread at ffffffff8109726f
> #11 [ffff88003b723f50] ret_from_fork at ffffffff81614198
> 
> In 100% cases where I hit the assertion it was with DMU_OT_BPOBJ dnodes.
> Justin thinks that the situation is harmless and the assertion can be removed.
> I agree with him.
> But on the other hand, I wonder if something could be done in the DSL to avoid
> the described situation.
> I mean, it seems that bpo_cached_dbuf is a rare (the only?) case where a dbuf
> can be held beyond lifetime of  its dnode...
> 


-- 
Andriy Gapon
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Re: [OpenZFS Developer] avl_is_empty(&dn->dn_dbufs) assert in dnode_sync_free

Reply via email to