Hi, I hit this issue on Bionic, Disco and Eoan. Our (server-team) Jenkins nodes are often filled by stale LXD containers which are left there because of "fails to destroy ZFS filesystem" errors.
Some thoughts and qualitative observations: 0. This is not a corner case, I see the problem all the time. 1. There is probably more than one issue involved here, even we get similar error messages when trying to delete a container. 2. One issue is about mount namespaces: stray mounts that prevent to the container to be deleted. This issue can be worked around by entering the namespace and unmounting. The container can then be deleted. When this happens retrying `lxd delete` doesn't help. This is described in [0]. I think the newer versions of LXD are way less prone to end up in this case. 3. In other cases `lxc delete --force` fails with the "ZFS dataset is busy" error, but the deletion succeeds if the delete is retried immediately after. In my case I don't even need to wait for a single second: the second delete in `lxc delete --force <x> ; lxc delete <x>` already works. Stopping and deleting the container as separate operations also works. 4. It has been suggested in [0] that LXD could retry the "delete" operation if it fails. stgraber wrote that LXD *already* retries the operation 20 times over 10 seconds, but the outcome is still a failure. It is not clear to me how retrying manually works, while LXD auto- retrying does not. 5. Some time ago (weeks) the error message changed from "Failed to destroy ZFS filesystem: dataset is busy" to "Failed to destroy ZFS filesystem:" with no other detail. I can't tell which specific upgrade triggered this change. 6. I see this problem in both file-backed and device-backed zpools. 7. I'm not sure system load plays a role: I often hit the problem on my lightly loaded laptop. 8. I don't have clear steps to reproduce the problem, but I personally see it happening most of the time. While I don't have steps to reproduce with 100% probability, I'm seeing this more times than I don't. But see the next point. 9. In my experience a system can be in a "bad state" (the problem always happens), or in a "good state" (the problem never happens). When the system is in a "good state" we can `lxc delete` hundreds of containers with no errors. I can't tell what makes a system switch from a good to a bad state. I almost certain I also saw systems switching from a bad to a good state. 10. The lxcfs package it not installed in the systems where I hit this issue That's it for the moment. Thanks for looking into this! Paride [0] https://github.com/lxc/lxd/issues/4656 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1779156 Title: lxc 'delete' fails to destroy ZFS filesystem 'dataset is busy' Status in linux package in Ubuntu: Triaged Status in lxc package in Ubuntu: Confirmed Status in linux source package in Cosmic: Triaged Status in lxc source package in Cosmic: Confirmed Bug description: I'm not sure exactly what got me into this state, but I have several lxc containers that cannot be deleted. $ lxc info <snip> api_status: stable api_version: "1.0" auth: trusted public: false auth_methods: - tls environment: addresses: [] architectures: - x86_64 - i686 certificate: | -----BEGIN CERTIFICATE----- <snip> -----END CERTIFICATE----- certificate_fingerprint: 3af6f8b8233c5d9e898590a9486ded5c0bec045488384f30ea921afce51f75cb driver: lxc driver_version: 3.0.1 kernel: Linux kernel_architecture: x86_64 kernel_version: 4.15.0-23-generic server: lxd server_pid: 15123 server_version: "3.2" storage: zfs storage_version: 0.7.5-1ubuntu15 server_clustered: false server_name: milhouse $ lxc delete --force b1 Error: Failed to destroy ZFS filesystem: cannot destroy 'default/containers/b1': dataset is busy Talking in #lxc-dev, stgraber and sforeshee provided diagnosis: | short version is that something unshared a mount namespace causing | them to get a copy of the mount table at the time that dataset was | mounted, which then prevents zfs from being able to destroy it) The work around provided was | you can unstick this particular issue by doing: | grep default/containers/b1 /proc/*/mountinfo | then for any of the hits, do: | nsenter -t PID -m -- umount /var/snap/lxd/common/lxd/storage-pools/default/containers/b1 | then try the delete again ProblemType: Bug DistroRelease: Ubuntu 18.10 Package: linux-image-4.15.0-23-generic 4.15.0-23.25 ProcVersionSignature: Ubuntu 4.15.0-23.25-generic 4.15.18 Uname: Linux 4.15.0-23-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.10-0ubuntu3 Architecture: amd64 AudioDevicesInUse: USER PID ACCESS COMMAND /dev/snd/controlC1: smoser 31412 F.... pulseaudio /dev/snd/controlC2: smoser 31412 F.... pulseaudio /dev/snd/controlC0: smoser 31412 F.... pulseaudio CurrentDesktop: ubuntu:GNOME Date: Thu Jun 28 10:42:45 2018 EcryptfsInUse: Yes InstallationDate: Installed on 2015-07-23 (1071 days ago) InstallationMedia: Ubuntu 15.10 "Wily Werewolf" - Alpha amd64 (20150722.1) MachineType: b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff' b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff' ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 inteldrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-23-generic root=UUID=f897b32a-eacf-4191-9717-844918947069 ro quiet splash vt.handoff=1 RelatedPackageVersions: linux-restricted-modules-4.15.0-23-generic N/A linux-backports-modules-4.15.0-23-generic N/A linux-firmware 1.174 SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 03/09/2015 dmi.bios.vendor: Intel Corporation dmi.bios.version: RYBDWi35.86A.0246.2015.0309.1355 dmi.board.asset.tag: ��������������������������������� dmi.board.name: NUC5i5RYB dmi.board.vendor: Intel Corporation dmi.board.version: H40999-503 dmi.chassis.asset.tag: ��������������������������������� dmi.chassis.type: 3 dmi.chassis.vendor: ��������������������������������� dmi.chassis.version: ��������������������������������� dmi.modalias: dmi:bvnIntelCorporation:bvrRYBDWi35.86A.0246.2015.0309.1355:bd03/09/2015:svn:pn:pvr:rvnIntelCorporation:rnNUC5i5RYB:rvrH40999-503:cvn:ct3:cvr: dmi.product.family: ��������������������������������� dmi.product.name: ��������������������������������� dmi.product.version: ��������������������������������� dmi.sys.vendor: ��������������������������������� To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1779156/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp