Explanatory note on bionic-proposed (see comment #17): The flaky fail set had a long tail (tests that failed just a few times).
Symptom: --- The logs show the failure cause is output mismatch, because '/target' failed to umount (busy), and it was logged; e.g., btrfs/027 [failed, exit status 1]- output mismatch (see /home/ubuntu/xfstests-dev/results//btrfs/027.out.bad) --- tests/btrfs/027.out 2020-10-26 22:23:54.626341276 +0000 +++ /home/ubuntu/xfstests-dev/results//btrfs/027.out.bad 2020-11-17 06:34:12.623548481 +0000 @@ -1,2 +1,3 @@ QA output created by 027 -Silence is golden +umount: /mnt/scratch: target is busy. +failed to unmount /dev/loop2 ... The number of failed tests increases with the number of umount failures. Comparing both numbers per log file: (line breaks added for clarity.) $ (grep ^Failed xfstests.log.2020-11-1*; \ grep -c 'umount: /mnt/scratch: target is busy' xfstests.log.2020-11-1* | grep -v :0) \ | sort xfstests.log.2020-11-16-21-03-32:Failed 31 of 823 tests // 30ish is normal failure count xfstests.log.2020-11-16-22-16-38:Failed 34 of 823 tests xfstests.log.2020-11-16-23-25-56:Failed 34 of 823 tests xfstests.log.2020-11-17-00-36-27:Failed 36 of 823 tests xfstests.log.2020-11-17-01-44-04:Failed 37 of 823 tests xfstests.log.2020-11-17-02-52-41:Failed 37 of 823 tests // things go bad after this. umount hung here. xfstests.log.2020-11-17-04-07-11:15 xfstests.log.2020-11-17-04-07-11:Failed 68 of 823 tests // umount hung here too. xfstests.log.2020-11-17-05-21-34:76 xfstests.log.2020-11-17-05-21-34:Failed 158 of 823 tests xfstests.log.2020-11-17-06-29-11:148 xfstests.log.2020-11-17-06-29-11:Failed 278 of 823 tests xfstests.log.2020-11-17-07-29-57:31 xfstests.log.2020-11-17-07-29-57:Failed 104 of 823 tests xfstests.log.2020-11-17-08-37-16:33 xfstests.log.2020-11-17-08-37-16:Failed 87 of 823 tests xfstests.log.2020-11-17-09-50-43:36 xfstests.log.2020-11-17-09-50-43:Failed 101 of 823 tests xfstests.log.2020-11-17-11-04-16:43 xfstests.log.2020-11-17-11-04-16:Failed 111 of 823 tests xfstests.log.2020-11-17-12-08-14:68 xfstests.log.2020-11-17-12-08-14:Failed 143 of 823 tests xfstests.log.2020-11-17-13-14-01:93 xfstests.log.2020-11-17-13-14-01:Failed 200 of 823 tests And the kernel log files show that 'umount' hung in the timestamps where the increase started. $ sudo grep umount kern.log.for.2020-11-1* kern.log.for.2020-11-17-02-52-41:Nov 17 03:52:55 mfo-s390x-bionic kernel: [24748.572259] CPU: 4 PID: 10862 Comm: umount Tainted: G W 4.15.0-125-generic #128-Ubuntu kern.log.for.2020-11-17-02-52-41:Nov 17 03:53:10 mfo-s390x-bionic kernel: [24762.822089] CPU: 4 PID: 11705 Comm: umount Tainted: G W 4.15.0-125-generic #128-Ubuntu kern.log.for.2020-11-17-02-52-41:Nov 17 03:53:16 mfo-s390x-bionic kernel: [24768.881926] CPU: 4 PID: 12110 Comm: umount Tainted: G W 4.15.0-125-generic #128-Ubuntu kern.log.for.2020-11-17-04-07-11:Nov 17 03:52:55 mfo-s390x-bionic kernel: [24748.572259] CPU: 4 PID: 10862 Comm: umount Tainted: G W 4.15.0-125-generic #128-Ubuntu kern.log.for.2020-11-17-04-07-11:Nov 17 03:53:10 mfo-s390x-bionic kernel: [24762.822089] CPU: 4 PID: 11705 Comm: umount Tainted: G W 4.15.0-125-generic #128-Ubuntu kern.log.for.2020-11-17-04-07-11:Nov 17 03:53:16 mfo-s390x-bionic kernel: [24768.881926] CPU: 4 PID: 12110 Comm: umount Tainted: G W 4.15.0-125-generic #128-Ubuntu This is the stack trace (s390x kernel pointer addresses.) 24748.572081] ------------[ cut here ]------------ [24748.572083] BTRFS: Transaction aborted (error -28) // i.e., -ENOSPC [24748.572217] WARNING: CPU: 4 PID: 10862 at /build/linux-9nQaV0/linux-4.15.0/fs/btrfs/extent-tree.c:3097 btrfs_run_delayed_refs+0x208/0x260 [btrfs] ... [24748.572259] CPU: 4 PID: 10862 Comm: umount Tainted: G W 4.15.0-125-generic #128-Ubuntu [24748.572259] Hardware name: IBM 2964 N63 400 (KVM/Linux) ... [24748.572297] Call Trace: [24748.572309] ([<000003ff8039710c>] btrfs_run_delayed_refs+0x204/0x260 [btrfs]) [24748.572322] [<000003ff803b0b46>] btrfs_commit_transaction+0x66/0x9e8 [btrfs] [24748.572330] [<00000000003cdf9e>] __sync_filesystem+0x56/0x80 [24748.572333] [<000000000038f088>] generic_shutdown_super+0x48/0x168 [24748.572334] [<000000000038f50e>] kill_anon_super+0x2e/0x40 [24748.572344] [<000003ff80375b60>] btrfs_kill_super+0x30/0x148 [btrfs] [24748.572346] [<000000000038fa90>] deactivate_locked_super+0x70/0xa0 [24748.572348] [<00000000003b6b94>] cleanup_mnt+0x64/0xa8 [24748.572351] [<000000000019a542>] task_work_run+0xba/0x100 [24748.572361] [<0000000000108a6e>] do_notify_resume+0x4e/0x60 [24748.572365] [<00000000008ff3be>] system_call+0xe2/0x2c8 After these errors, the failure rate increased; but not all times/tests; since it seems that at some point the umount finishes, unblocking tests. Cause: --- Since this issue didn't happen with the test/patched kernel, it would seem to originate from other/new patches. There are indeed 2 new ones not used in the test kernel: 6f64bf0873eb btrfs: qgroup: fix data leak caused by race between writeback and truncate <<< NEW d14eef17dcf2 btrfs: don't force read-only after error in drop snapshot <<< NEW $ ls -1r *.patch # test kernel on top of Ubuntu-4.15.0-122.124 0017-btrfs-ctree-check-key-order-before-merging-tree-bloc.patch 0016-btrfs-extent-tree-kill-the-BUG_ON-in-insert_inline_e.patch 0015-btrfs-extent-tree-kill-BUG_ON-in-__btrfs_free_extent.patch 0014-btrfs-extent_io-do-extra-check-for-extent-buffer-rea.patch 0013-btrfs-drop-unnecessary-offset_in_page-in-extent-buff.patch 0012-btrfs-use-BUG-instead-of-BUG_ON-1.patch 0011-btrfs-use-offset_in_page-instead-of-open-coding-it.patch 0010-btrfs-fix-wrong-address-when-faulting-in-pages-in-th.patch 0009-btrfs-fix-lockdep-splat-in-add_missing_dev.patch 0008-btrfs-require-only-sector-size-alignment-for-parent-.patch 0007-btrfs-fix-potential-deadlock-in-the-search-ioctl.patch 0006-uaccess-Add-non-pagefault-user-space-write-function.patch 0005-uaccess-Add-non-pagefault-user-space-read-functions.patch 0004-btrfs-set-the-lockdep-class-for-log-tree-extent-buff.patch 0003-btrfs-Remove-extraneous-extent_buffer_get-from-tree_.patch 0002-btrfs-Remove-redundant-extent_buffer_get-in-get_old_.patch 0001-btrfs-drop-path-before-adding-new-uuid-tree-entry.patch $ git log --oneline Ubuntu-4.15.0-122.124..Ubuntu-4.15.0-125.128 -- fs/btrfs/ a0a951aaa63f btrfs: ctree: check key order before merging tree blocks 0db62fbc1d08 btrfs: extent-tree: kill the BUG_ON() in insert_inline_extent_backref() 4f0e0482c42a btrfs: extent-tree: kill BUG_ON() in __btrfs_free_extent() 36b1133f57e3 btrfs: extent_io: do extra check for extent buffer read write functions 82e64a4e1ff8 btrfs: drop unnecessary offset_in_page in extent buffer helpers 3393e02f4100 btrfs: use BUG() instead of BUG_ON(1) 10679c69e0a8 btrfs: use offset_in_page instead of open-coding it 6f64bf0873eb btrfs: qgroup: fix data leak caused by race between writeback and truncate <<< NEW d14eef17dcf2 btrfs: don't force read-only after error in drop snapshot <<< NEW 2f54a800e2ae btrfs: fix wrong address when faulting in pages in the search ioctl 71ef83d6ab86 btrfs: fix lockdep splat in add_missing_dev cb1483e19a40 btrfs: require only sector size alignment for parent eb bytenr d738658f503d btrfs: fix potential deadlock in the search ioctl cca53f849a53 btrfs: set the lockdep class for log tree extent buffers bb11dbfbc405 btrfs: Remove extraneous extent_buffer_get from tree_mod_log_rewind 9a4e81ea6a44 btrfs: Remove redundant extent_buffer_get in get_old_root 725ce0e39e6d btrfs: drop path before adding new uuid tree entry The commit message in 6f64bf0873eb explicitly describes the umount path (now in return to userspace, but still.) So it does look like a match: commit 6f64bf0873ebc15a4e710cb04c7630576637a845 Author: Qu Wenruo <w...@suse.com> Date: Fri Jul 17 15:12:05 2020 +0800 btrfs: qgroup: fix data leak caused by race between writeback and truncate BugLink: https://bugs.launchpad.net/bugs/1901257 [ Upstream commit fa91e4aa1716004ea8096d5185ec0451e206aea0 ] [BUG] When running tests like generic/013 on test device with btrfs quota enabled, it can normally lead to data leak, detected at unmount time: BTRFS warning (device dm-3): qgroup 0/5 has unreleased space, type 0 rsv 4096 ------------[ cut here ]------------ WARNING: CPU: 11 PID: 16386 at fs/btrfs/disk-io.c:4142 close_ctree+0x1dc/0x323 [btrfs] RIP: 0010:close_ctree+0x1dc/0x323 [btrfs] Call Trace: btrfs_put_super+0x15/0x17 [btrfs] generic_shutdown_super+0x72/0x110 kill_anon_super+0x18/0x30 btrfs_kill_super+0x17/0x30 [btrfs] deactivate_locked_super+0x3b/0xa0 deactivate_super+0x40/0x50 cleanup_mnt+0x135/0x190 __cleanup_mnt+0x12/0x20 task_work_run+0x64/0xb0 __prepare_exit_to_usermode+0x1bc/0x1c0 __syscall_return_slowpath+0x47/0x230 do_syscall_64+0x64/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ---[ end trace caf08beafeca2392 ]--- BTRFS error (device dm-3): qgroup reserved space leaked Also, this issue happened with Focal's original kernel from master-next, which happened to have that commit already; and it didn't happen with Focal's 'pre-original' kernel from -updates, which didn't. More evidence. Revert? --- Apparently this doesn't need to be reverted; it doesn't happen everytime, as seen in this test, and with previous tests for Focal: As described above, this was also noticed in Focal's results w/ the original kernel from master-next, which has that commit, but to a lesser extent (see comment #8) so it wasn't noticed as much. Thus the impact seem to vary, and happen more rarely, because the results w/ focal-proposed did NOT hit it a single time in 11 runs, and commit IS there. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1902254 Title: Bionic: btrfs: kernel BUG at /build/linux- eTBZpZ/linux-4.15.0/fs/btrfs/ctree.c:3233! Status in linux package in Ubuntu: Fix Committed Status in linux source package in Bionic: Fix Committed Status in linux source package in Focal: Fix Committed Status in linux source package in Groovy: Fix Committed Bug description: [Impact] * Users of btrfs started hitting a kernel BUG() (below) after upgrade from 4.15.0-99.100 to 4.15.0-109.110, which has 55 btrfs changes. kernel BUG at /build/linux-eTBZpZ/linux-4.15.0/fs/btrfs/ctree.c:3233! ... Krnl PSW : 00000000be9cb874 00000000ef3786e8 (btrfs_set_item_key_safe+0x152/0x1c0 [btrfs]) ... [...] Call Trace: [...] btrfs_set_item_key_safe+0x11c/0x1c0 [btrfs]) [...] __btrfs_drop_extents+0xb5a/0xda8 [btrfs] [...] btrfs_log_changed_extents+0x35c/0xaf0 [btrfs] [...] btrfs_log_inode+0x9ee/0x1080 [btrfs] [...] btrfs_log_inode_parent+0x224/0xa10 [btrfs] [...] btrfs_log_dentry_safe+0x80/0xa8 [btrfs] [...] btrfs_sync_file+0x392/0x550 [btrfs] [...] do_fsync+0x5e/0x90 [...] SyS_fdatasync+0x32/0x48 [...] system_call+0xd8/0x2c8 $ git log --oneline Ubuntu-4.15.0-99.100..Ubuntu-4.15.0-109.110 -- fs/btrfs/ | wc -l 55 * The error happens at random moments, regardless of a particular activity/load. Workaround is to downgrade. [Fix] * This BUG()/function is addressed in patch 4/4 [1] of series 'btrfs: Enhanced runtime defence against fuzzed images' [2], after issues in the real world, not just crafted fs images: 'one internal report has hit one BUG_ON() with real world fs' kernel BUG at fs/btrfs/ctree.c:3188! ... RIP: 0010:btrfs_set_item_key_safe+0x16c/0x180 * The patch/set [3] is applied in v5.10-rc1 and Ubuntu Unstable: - d16c702fe4f2 btrfs: ctree: check key order before merging tree blocks - 07cce5cf3b48 btrfs: extent-tree: kill the BUG_ON() in insert_inline_extent_backref() - 1c2a07f598d5 btrfs: extent-tree: kill BUG_ON() in __btrfs_free_extent() - f98b6215d7d1 btrfs: extent_io: do extra check for extent buffer read write functions [Test Case] * There is working synthetic reproducer for this issue, which is hard to reproduce as reported in commit [4] that introduces debugging for the issue. * Regression tests with xfstests and stress-ng shows no regressions between un/patched kernels. [Other Info] * Trivial backports (only refreshing a few context lines) with 3 more dependency patches on Bionic and 1 on Focal. And Bionic needed one extra hunk to '#include' a header. Groovy all apply cleanly. [1] https://lore.kernel.org/linux-btrfs/20200819063550.62832-5-...@suse.com/ [2] https://lore.kernel.org/linux-btrfs/20200819063550.62832-1-...@suse.com/ [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d16c702fe4f274bd77b47d3ab737eadcf24e0b93 [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7c15d41016dc886cc011e3854d855e219759ae68 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1902254/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp