On Tue 13 Dec 2016 09:02:34 AM CET, Max Reitz wrote: > I definitely like how simple your approach is, but from a design > standpoint it is not exactly optimal, because O(n) for a cluster > lookup is simply worse than O(1).
It's actually not O(n) anymore since I rewrote that code last year. When I changed the algorithm to use LRU for cache eviction I also optimized the lookup: 1) qcow2_cache_put() simply divides the table offset by the cluster size and it immediately obtains the index. 2) qcow2_cache_get() hashes the offset to do the lookup. In my tests doing random I/O and a 2048 entry cache the average number of comparisons was 2.5 (430 before the rewrite). The cache was ~13% faster after the rewrite, but the results in terms of actual disk I/O improvements were negligible, because that's not where the bottleneck is. We even discussed how the hash function that I was using was weak, but we decided not to touch it because it was not worth the effort. >> But leaving that aside, would that improve anything? I don't think >> the cache itself adds any significant overhead here, IIRC even in >> your presentation at KVM Forum 2015 qcow2 was comparable to raw as >> long as all L2 tables were cached in memory. > > I haven't compared CPU usage, though. That may have gone up quite a > bit, I don't know. For large enough images, it may even become a > bottleneck. I don't know, I don't have any data to suggest that there's a problem there, particularly after the rewrite from last year I think the cache is reasonably fast (as long as the cache is big enough; and if it's not big enough the problem is going to be in the extra I/O). Berto