Am 13.05.2016 um 18:09 hat Denis V. Lunev geschrieben: > oops, the patch gets committed... that is unexpected but great ;)
Oops, that was not intended, I just forgot to remove it from the queue when I realised that it's not ready yet. Should I stage a revert or do you prefer to fix it on top? > >>>>diff --git a/block/qcow2.c b/block/qcow2.c > >>>>index 470734b..c2474c1 100644 > >>>>--- a/block/qcow2.c > >>>>+++ b/block/qcow2.c > >>>>@@ -2411,21 +2411,74 @@ finish: > >>>> return ret; > >>>> } > >>>>+ > >>>>+static bool is_zero_cluster(BlockDriverState *bs, int64_t start) > >>>>+{ > >>>>+ BDRVQcow2State *s = bs->opaque; > >>>>+ int nr; > >>>>+ BlockDriverState *file; > >>>>+ int64_t res = bdrv_get_block_status_above(bs, NULL, start, > >>>>+ s->cluster_sectors, &nr, > >>>>&file); > >>>>+ return res >= 0 && ((res & BDRV_BLOCK_ZERO) || !(res & > >>>>BDRV_BLOCK_DATA)); > >>>Why did you add the !(res & BDRV_BLOCK_DATA) condition? This means that > >>>all unallocated clusters return true, even if the backing file contains > >>>non-zero data for them. > >>this is correct. From my POW this means that this area is unallocated > >>in the entire backing chain and thus it will be read as zeroes. Thus > >>we could cover it with zeroes. > >You're right that I made a mistake, I was thinking of the non-recursive > >bdrv_get_block_status(). > > > >However, I still think that we may not assume that !BDRV_BLOCK_DATA > >means zero data, even though that affects only more obscure cases. We > >have bdrv_unallocated_blocks_are_zero() to check whether the assumption > >is true. However, bdrv_co_get_block_status() already checks this > >internally and sets BDRV_BLOCK_ZERO in this case, so just checking > >BDRV_BLOCK_ZERO in qcow2 should be good. > > > >Did you find a case where you got !DATA, but not ZERO, and assuming > >zeroes was valid? If so, we may need to fix bdrv_co_get_block_status(). > actually we may have the following case (artificial)!: > - assuming we do not have bdrv_has_zero_init in backing store > - and qcow2 on top of this file > - reading from unallocated block should return 0 (no data in both > places), qcow2 > layer will return 0 > > It looks like we will have this situation. qcow2 sets bdi->unallocated_blocks_are_zero = true, so in this case you should correctly get BDRV_BLOCK_ZERO from bdrv_get_block_status(). > >Hm, I see: > > > > if (bs->bl.write_zeroes_alignment > > && num > bs->bl.write_zeroes_alignment) { > > > >Removing the second part should fix this, i.e. it would split a request > >into two unaligned halves even if there is no aligned "bulk" in the > >middle. > > > >I think it would match my expectations better, but maybe that's just me. > >What do you think? > actually the code here will not be significantly better (I presume), > but I'll make a try Yes, I agree that it won't make the qcow2 code significantly simpler. I just think that it would be less surprising semantics. > >>>>+ cl_end = sector_num + nb_sectors - s->cluster_sectors; > >>>>+ if (!is_zero_cluster(bs, cl_end)) { > >>>>+ return -ENOTSUP; > >>>>+ } > >>>>+ } > >>>>+ > >>>>+ qemu_co_mutex_lock(&s->lock); > >>>>+ /* We can have new write after previous check */ > >>>>+ if (!is_zero_cluster_top_locked(bs, sector_num) || > >>>>+ (cl_end > 0 && !is_zero_cluster_top_locked(bs, cl_end))) > >>>>{ > >>>>+ qemu_co_mutex_unlock(&s->lock); > >>>>+ return -ENOTSUP; > >>>>+ } > >>>Just lock the mutex before the check, the possible optimisation for the > >>>emulation case (which is slow anyway) isn't worth the additional code > >>>complexity. > >>bdrv_get_block_status_above(bs) takes s->lock inside. This lock is not > >>recursive thus the code will hang. This is the problem trying to be > >>addressed with this split of checks. > >> > >>May be we could make the lock recursive... > >Maybe your version is no far from the best we can do then. It deserves a > >comment, though, because it's not completely obvious. > > > >The other option that we have and that looks reasonable enough to me is > >checking is_zero_cluster_top_locked() first and only if that returns > >false, we check the block status of the backing chain, starting at > >bs->backing->bs. This way we would bypass the recursive call and could > >take the lock from the beginning. If we go that way, it deserves a > >comment as well. > > > >Kevin > OK. I'll send at least improved comments and (may be) > removal of "&& num > bs->bl.write_zeroes_alignment" > as follow up. The most important part is actually fixing is_zero_cluster(), because that's a real bug that can corrupt data in the !has_zero_init case. This is also the reason why I would revert the patch if we don't have a fix for this until my next pull request. The rest is just making things a bit nicer, so follow-ups are very welcome, but not as critical. Kevin