======== ANALYSIS ======== ******** #0 [ffff880036a73b38] schedule at ffffffff8175e320 #1 [ffff880036a73bc0] xfs_ail_push_all_sync at ffffffffa02e5478 [xfs] #2 [ffff880036a73c30] xfs_log_quiesce at ffffffffa02e0b67 [xfs] #3 [ffff880036a73c50] xfs_log_unmount at ffffffffa02e0bb6 [xfs] #4 [ffff880036a73c70] xfs_mountfs at ffffffffa029332a [xfs]
Analyzing the stack trace we can see that, during a xfs_fs_mount, the only possible way to get into "xfs_log_unmount" is if the XFS filesystem is CORRUPTED. 636 xfs_mountfs( 637 xfs_mount_t *mp) 638 { 639 xfs_sb_t *sbp = &(mp->m_sb); 640 xfs_inode_t *rip; ... 836 /* 837 * Get and sanity-check the root inode. 838 * Save the pointer to it in the mount structure. 839 */ 840 error = xfs_iget(mp, NULL, sbp->sb_rootino, 0, XFS_ILOCK_EXCL, &rip); 841 if (error) { 842 xfs_warn(mp, "failed to read root inode"); 843 goto out_log_dealloc; 844 } 845 ... 955 out_log_dealloc: 956 xfs_log_unmount(mp); ... So we DO KNOW your XFS is considered to be CORRUPTED (by XFS function xfs_iget(), called for the root inode as a sanity check). ******** Either way, lets continue debugging to make sure we understand why XFS didn't give us an error about the filesystem being corrupted: Following the stack: #2 [ffff880036a73c30] xfs_log_quiesce at ffffffffa02e0b67 [xfs] #3 [ffff880036a73c50] xfs_log_unmount at ffffffffa02e0bb6 [xfs] We can see that xfs_log_quiesce calls #1 [ffff880036a73bc0] xfs_ail_push_all_sync at ffffffffa02e5478 [xfs] The function responsible to push ALL *AIL structure into disk (for unmount and freeze purposes). This function has a simple code: 600 struct xfs_log_item *lip; 601 DEFINE_WAIT(wait); 602 603 spin_lock(&ailp->xa_lock); 604 while ((lip = xfs_ail_max(ailp)) != NULL) { 605 prepare_to_wait(&ailp->xa_empty, &wait, TASK_UNINTERRUPTIBLE); 606 ailp->xa_target = lip->li_lsn; 607 wake_up_process(ailp->xa_task); 608 spin_unlock(&ailp->xa_lock); 609 schedule(); 610 spin_lock(&ailp->xa_lock); 611 } 612 spin_unlock(&ailp->xa_lock); 613 614 finish_wait(&ailp->xa_empty, &wait); Where it gets all "xfs_log_items" from the AIL double linked list and calls the function responsible to commit this "log items" into the disk: 607 wake_up_process(ailp->xa_task); POSSIBLE XFS BUG: XFS can be stuck inside this loop because of something happening on the ail (xa_task) callback function (responsible to commit xfs log items). And this, of course, makes the "mount" process to hang in the "UNINTE- RRUPTIBLE state (since its not safe to let userland kill this process). OBS: Waiting for the core file so I can stack-trace this XFS worker to check wether it is hang and what are its causes. OBS: If XFS is hang on mount, for a corrupted XFS filesystem, this behavior is a bug since the "mount" command should have returned saying the filesystem didn't pass on the root-inode sanity check. Working on this... ** Changed in: linux (Ubuntu) Status: Incomplete => Confirmed ** Changed in: linux (Ubuntu) Assignee: (unassigned) => Rafael David Tinoco (inaddy) ** Tags added: cts -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1382801 Title: XFS: mount hangs for corrupted filesystem Status in “linux” package in Ubuntu: Confirmed Bug description: It was brought to my attention this situation: -------- mount hangs at the following stack: crash> bt 2882 PID: 2882 TASK: ffff88084e75c800 CPU: 7 COMMAND: "mount" #0 [ffff880036a73b38] schedule at ffffffff8175e320 #1 [ffff880036a73bc0] xfs_ail_push_all_sync at ffffffffa02e5478 [xfs] #2 [ffff880036a73c30] xfs_log_quiesce at ffffffffa02e0b67 [xfs] #3 [ffff880036a73c50] xfs_log_unmount at ffffffffa02e0bb6 [xfs] #4 [ffff880036a73c70] xfs_mountfs at ffffffffa029332a [xfs] #5 [ffff880036a73ce0] xfs_fs_fill_super at ffffffffa0296707 [xfs] #6 [ffff880036a73d20] mount_bdev at ffffffff811cd4a9 #7 [ffff880036a73db0] xfs_fs_mount at ffffffffa02946f5 [xfs] #8 [ffff880036a73dc0] mount_fs at ffffffff811ce123 #9 [ffff880036a73e10] vfs_kern_mount at ffffffff811e9bf6 #10 [ffff880036a73e60] do_new_mount at ffffffff811eb3a4 #11 [ffff880036a73ec0] do_mount at ffffffff811ec706 #12 [ffff880036a73f20] sys_mount at ffffffff811ecad0 #13 [ffff880036a73f80] system_call_fastpath at ffffffff8176ae2d RIP: 00007f2340eb6c2a RSP: 00007fff25675368 RFLAGS: 00010206 RAX: 00000000000000a5 RBX: ffffffff8176ae2d RCX: 0000000000000026 RDX: 0000000000b04c20 RSI: 0000000000b04bf0 RDI: 0000000000b04bd0 RBP: 00000000c0ed0400 R8: 0000000000b04c70 R9: 0000000000000001 R10: ffffffffc0ed0400 R11: 0000000000000202 R12: 0000000000b04bf0 R13: 0000000000b04b50 R14: 0000000000000400 R15: 0000000000000000 ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b The corresponding disk is /dev/sdd1, any IO (xfs_check, etc) also hangs and had "D" state. This reproducible with 3.11 and 3.13 kernel both. The storage node is out of service because of this problem -------- I'm still asking for more data (sosreport and kernel dump). To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1382801/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp