Hi Benjamin, Thanks for the detailed report.
Does this system show signs of memory pressure? bch_mca_scan() is part of bcache's memory shrinker, and thus should be called when the system is trying to release memory from its several caches. Also, the bucket lock usage is widespread in bcache (from a quick grep; and more used on writes, I'd imagine) thus if bch_mca_scan() is waiting a lot on it, ie, showing lock contention, it would seem like the IO load is indeed significant, as you mentioned. Do you know the IO load profile, or could reproduce the issue with the fio tool? If we can reproduce this, there _might_ be some heuristics to consider to use non-blocking trylock instead of blocking lock, and bail out/don't wait if the lock is taken and it doesn't seem worth it. (but that would be a development item, not a proper "bug" fix :) cheers, Mauricio -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1898786 Title: Issue with bcache bch_mca_scan causing huge IO wait Status in linux package in Ubuntu: Confirmed Status in linux source package in Bionic: New Bug description: Hello, In short, we faced an issue with a huge IO wait on a bionic Ubuntu 4.15.0-118.119-generic kernel. This is the full list of process and the kernel function they were stuck in [0]. The main issue can probably be summarized by this perf reports * first identify that the cpu are stuck in idle because of something[1] * second, see what kernel function seems to stuck the process kswapd0 and kswapd1 [2]. We could see that this seems to be the mutex_lock in the bch_mca_scan function [3]. After running the command: | sudo bash -c "echo 1 > /sys/fs/bcache/f1a1e8cb-3e6b-40ea-852e- 583c48d0c2b8/internal/btree_shrinker_disabled" The server started to respond normally and the IO wait dropped significantly [0]: https://pastebin.canonical.com/p/wYYKwHdRXk/ [1]: https://pastebin.canonical.com/p/n2Tw57QyBC/ [2]: https://pastebin.canonical.com/p/3QqFTfdHhX/ [3]: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/drivers/md/bcache/btree.c?h=Ubuntu-4.15.0-118.119#n674 ==================== $ cat /proc/version_signature Ubuntu 4.15.0-118.119-generic 4.15.18 ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-118-generic 4.15.0-118.119 ProcVersionSignature: User Name 4.15.0-118.119-generic 4.15.18 Uname: Linux 4.15.0-118-generic x86_64 AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Sep 29 10:04 seq crw-rw---- 1 root audio 116, 33 Sep 29 10:04 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.16 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Tue Oct 6 20:36:18 2020 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' MachineType: HP ProLiant DL380 G7 PciMultimedia: ProcFB: 0 radeondrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-118-generic root=UUID=c6ad1629-a506-4043-a339-6d57f0708d12 ro console=ttyS1,115200 nosplash RelatedPackageVersions: linux-restricted-modules-4.15.0-118-generic N/A linux-backports-modules-4.15.0-118-generic N/A linux-firmware 1.173.18 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to bionic on 2019-09-27 (375 days ago) dmi.bios.date: 05/05/2011 dmi.bios.vendor: HP dmi.bios.version: P67 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrP67:bd05/05/2011:svnHP:pnProLiantDL380G7:pvr:cvnHP:ct23:cvr: dmi.product.family: ProLiant dmi.product.name: ProLiant DL380 G7 dmi.sys.vendor: HP To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1898786/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp