24.01.2019 18:39, Kevin Wolf wrote: > Am 24.01.2019 um 15:37 hat Vladimir Sementsov-Ogievskiy geschrieben: >> 23.01.2019 15:04, Vladimir Sementsov-Ogievskiy wrote: >>> 22.01.2019 21:57, Kevin Wolf wrote: >>>> Am 11.01.2019 um 12:40 hat Vladimir Sementsov-Ogievskiy geschrieben: >>>>> 11.01.2019 13:41, Kevin Wolf wrote: >>>>>> Am 10.01.2019 um 14:20 hat Vladimir Sementsov-Ogievskiy geschrieben: >>>>>>> drv_co_block_status digs bs->file for additional, more accurate search >>>>>>> for hole inside region, reported as DATA by bs since 5daa74a6ebc. >>>>>>> >>>>>>> This accuracy is not free: assume we have qcow2 disk. Actually, qcow2 >>>>>>> knows, where are holes and where is data. But every block_status >>>>>>> request calls lseek additionally. Assume a big disk, full of >>>>>>> data, in any iterative copying block job (or img convert) we'll call >>>>>>> lseek(HOLE) on every iteration, and each of these lseeks will have to >>>>>>> iterate through all metadata up to the end of file. It's obviously >>>>>>> ineffective behavior. And for many scenarios we don't need this lseek >>>>>>> at all. >>>>>>> >>>>>>> So, let's "5daa74a6ebc" by default, leaving an option to return >>>>>>> previous behavior, which is needed for scenarios with preallocated >>>>>>> images. >>>>>>> >>>>>>> Add iotest illustrating new option semantics. >>>>>>> >>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]> >>>>>> >>>>>> I still think that an option isn't a good solution and we should try use >>>>>> some heuristics instead. >>>>> >>>>> Do you think that heuristics would be better than fair cache for lseek >>>>> results? >>>> >>>> I just played a bit with this (qemu-img convert only), and how much >>>> caching lseek() results helps depends completely on the image. As it >>>> happened, my test image was the worst case where caching didn't buy us >>>> much. Obviously, I can just as easily construct an image where it makes >>>> a huge difference. I think that most real-world images should be able to >>>> take good advantage of it, though, and it doesn't hurt, so maybe that's >>>> a first thing that we can do in any case. It might not be the complete >>>> solution, though. >>> >>> Hmm, and one more idea from Den: >>> >>> We can detect preallocated image, comparing allocated size of real file with >>> number of non-zero qcow2 refcounts. So, real allocation is much less than >>> allocation in qcow2 point of view, we'll enable lseeks, otherwise - not. >>> >> >> Kevin, what do you think? > > I'm unsure. I think it requires scanning all refcount blocks in > qcow2_open(), right? This could be slow on huge images. On the other > hand, the first cluster allocation will probably do this anyway, so it > might be reasonable enough.
Seems like better is doing it on first block_status, not on open. > > How would you communicate this? Another block_status return flag that > says "don't bother to ask the protocol layer" and which we would only > set in qcow2 if the probing came to the conclusion that it's not > preallocated? Or bool flag on BlockDriverState. > > Kevin > -- Best regards, Vladimir
