>From the comparison graphs, we can see that the impact of the sysfs query is very significant for both read and write workloads. Taking just the "sysfs" test from my previous comment into account:
In the random write tests: * For 512b block sizes, we see about an 85% reduction in BW (down to ~2MB/s from ~16MB/s) * For 4k block sizes, the reduction in BW is also about 85% (~30MB/s compared to ~240MB/s) * For 512k block sizes (== bucket size) and higher, BW is reduced by about 64% (~90MB/s vs ~250MB/s) In the random read tests: * For 512b block sizes, BW goes down to ~3MB/s from ~25MB/s * For 4k block sizes, the BW reduction is around ~90% (~10MB/s compared to ~160MB/s) * For 512k block sizes and higher, BW is reduced by about 90% (150MB/s vs 1.6GB/s) We can see similar results as the above for the IOPS measures, and the latency measures are also much worse in the sysfs test. We observe frequent latency spikes (150ms+) when running fio together with the priority_stats query, and latency averages increase by about 50ms at least. Surprisingly, the "mutex" patch didn't improve the test results much. This was the case for both read and write workloads, which suggests that the bucket locking doesn't have that much impact on the system compared to the sorting. The cond_resched() patch showed great results, even though it causes the sysfs queries to take a bit longer. The write throughput of the bcache device is _much_ better with it, and the system doesn't stall anymore (even when pinning processes to the same CPU as the sysfs query). In some cases, it brings performance back to values close to the "raw" tests (i.e. without any sysfs queries). This patch seems like the best short-term solution for now, as the sysfs query taking a bit longer shouldn't really be a problem in most setups (whereas the IO performance and other issues are much more noticeable). ** Changed in: linux (Ubuntu) Status: Incomplete => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1840043 Title: bcache: Performance degradation when querying priority_stats Status in linux package in Ubuntu: Confirmed Bug description: [Impact] Performance degradation for read/write workloads in bcache devices, occasional system stalls [Description] In the latest bcache drivers, there's a sysfs attribute that calculates bucket priority statistics in /sys/fs/bcache/*/cache0/priority_stats. Querying this file has a big performance impact on tasks that run in the same CPU, and also affects read/write performance of the bcache device itself. This is due to the way the driver calculates the stats: the bcache buckets are locked and iterated through, collecting information about each individual bucket. An array of nbucket elements is constructed and sorted afterwards, which can cause very high CPU contention in cases of larger bcache setups. From our tests, the sorting step of the priority_stats query causes the most expressive performance reduction, as it can hinder tasks that are not even doing any bcache IO. If a task is "unlucky" to be scheduled in the same CPU as the sysfs query, its performance will be harshly reduced as both compete for CPU time. We've had users report systems stalls of up to ~6s due to this, as a result from monitoring tools that query the priority_stats periodically (e.g. Prometheus Node Exporter from [0]). These system stalls have triggered several other issues such as ceph-mon re-elections, problems in percona-cluster and general network stalls, so the impact is not isolated to bcache IO workloads. [0] https://github.com/prometheus/node_exporter [Test Case] Note: As the sorting step has the most noticeable performance impact, the test case below pins a workload and the sysfs query to the same CPU. CPU contention issues still occur without any pinning, this just removes the scheduling factor of landing in different CPUs and affecting different tasks. 1) Start a read/write workload on the bcache device with e.g. fio or dd, pinned to a certain CPU: # taskset 0x10 dd if=/dev/zero of=/dev/bcache0 bs=4k status=progress 2) Start a sysfs query loop for the priority_stats attribute pinned to the same CPU: # for i in {1..100000}; do taskset 0x10 cat /sys/fs/bcache/*/cache0/priority_stats > /dev/null; done 3) Monitor the read/write workload for any performance impact To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840043/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp