I confirm I'm having this problem too since migrating to Bionic from Xenial.
My setup is a bit analogous to the original bug reporter: I have 5 drives (22 TB) in RAID1, organized in around 10 subvolumes with up to 20 snapshots per subvolume. After some hours running normally, at some point [btrfs-transaction] goes into D state, and everything btrfs-related slowly comes down to a stall, with any program trying to touch it ending up in D state too. The call trace I have also references btrfs_qgroup_trace_extent_post. I'm currently testing 5.0-rc8 from the Ubuntu ppa mainline, to see if the problem is still there. Michael, did you end up reporting the problem upstream? I would be keen do to it on the btrfs mailing-list, as soon as I have the answer to whether this is fixed with 5.0 or not. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1765998 Title: FS access deadlock with btrfs quotas enabled Status in linux package in Ubuntu: Triaged Status in linux source package in Bionic: Triaged Bug description: I'm running into an issue on Ubuntu Bionic (but not Xenial) where shortly after boot, under heavy load from many LXD containers starting at once, access to the btrfs filesystem that the containers are on deadlocks. The issue is quite hard to reproduce on other systems, quite likely related to the size of the filesystem involved (4 devices with a total of 8TB, millions of files, ~20 subvolumes with tens of snapshots each) and the access pattern from many LXD containers at once. It definitely goes away when disabling btrfs quotas though. Another prerequisite to trigger this bug may be the container subvolumes sharing extents (from their parent image or due to deduplication). I can only reliably reproduce it on a production system that I can only do very limited testing on, however I have been able to gather the following information: - Many threads are stuck, trying to aquire locks on various tree roots, which are never released by their current holders. - There always seem to be (at least) two threads executing rmdir syscalls which are creating the circular dependency: One of them is in btrfs_cow_block => ... => btrfs_qgroup_trace_extent_post => ... => find_parent_nodes and wants to acquire a lock that was already aquired by btrfs_search_slot of the other rmdir. - Reverting this patch seems to prevent it from happening: https://patchwork.kernel.org/patch/9573267/ To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1765998/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp