Re: [Nbd] write_zeroes/trim on the whole disk

Vladimir Sementsov-Ogievskiy Sat, 24 Sep 2016 10:48:06 -0700

On 24.09.2016 20:32, Alex Bligh wrote:
>> On 24 Sep 2016, at 18:13, Vladimir Sementsov-Ogievskiy 
>> <[email protected]> wrote:
>>
>> On 24.09.2016 19:49, Alex Bligh wrote:
>>>> On 24 Sep 2016, at 17:42, Vladimir Sementsov-Ogievskiy 
>>>> <[email protected]> wrote:
>>>>
>>>> On 24.09.2016 19:31, Alex Bligh wrote:
>>>>>> On 24 Sep 2016, at 13:06, Vladimir Sementsov-Ogievskiy 
>>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> Note: if disk size is not aligned to X we will have to send request 
>>>>>> larger than the disk size to clear the whole disk.
>>>>> If you look at the block size extension, the size of the disk must be an 
>>>>> exact multiple of the minimum block size. So that would work.
>> This means that this extension could not be used with any qcow2 disk, as 
>> qcow2 may have size not aligned to its cluster size.
>>
>> # qemu-img create -f qcow2 mega 1K
>> Formatting 'mega', fmt=qcow2 size=1024 encryption=off cluster_size=65536 
>> lazy_refcounts=off refcount_bits=16
>> # qemu-img info mega
>> image: mega
>> file format: qcow2
>> virtual size: 1.0K (1024 bytes)
>> disk size: 196K
>> cluster_size: 65536
>> Format specific information:
>>     compat: 1.1
>>     lazy refcounts: false
>>     refcount bits: 16
>>     corrupt: false
>>
>> And there is no such restriction in documentation. Or we have to consider 
>> sector-size (512b) as block size for qcow2, which is too small for our needs.
> If by "this extension" you mean the INFO extension (which reports block 
> sizes) that's incorrect.
>
> An nbd server using a QCOW2 file as the backend would report the sector size 
> as the minimum block size. It might report the cluster size or the sector 
> size as the preferred block size, or anything in between.
>
> QCOW2 cluster size essentially determines the allocation unit. NBD is not 
> bothered as to the underlying allocation unit. It does not (currently) 
> support the concept of making holes visible to the client. If you use 
> NBD_CMD_WRITE_ZEREOS you get zeroes, which might or might not be implemented 
> as one or more holes or 'real' zeroes (save if you specify 
> NBD_CMD_FLAG_NO_HOLE in which case you are guaranteed to get 'real' zeroes'). 
> If you use NBD_CMD_TRIM then the area trimmed might nor might not be written 
> with one or more whole. There is (currently) no way to detect the presence of 
> holes separately from zeroes (though a bitmap extension was discussed).


I just wanted to say, that if we want a possibility of clearing the 
whole disk in one request for qcow2 we have to take 512 as granularity 
for such requests (with X = 9). An this is too small. 1tb will be the 
upper bound for the request.

>
>>>> But there is no guarantee that disk_size/block_size < INT_MAX..
>>> I think you mean 2^32-1, but yes there is no guarantee of that. In that 
>>> case you would need to break the call up into multiple calls.
>>>
>>> However, being able to break the call up into multiple calls seems pretty 
>>> sensible given that NBD_CMD_WRITE_ZEROES may take a large amount of
>>> time, and a REALLY long time if the server doesn't support trim.
>>>
>>>> May be, additional option, specifying the shift would be better. With 
>>>> convention that if offset+length exceeds disk size, length should be 
>>>> recalculated as disk_size-offset.
>>> I don't think we should do that. We already have clear semantics that 
>>> prevent operations beyond the end of the disk. Again, just break the 
>>> command up into multipl commands. No great hardship.
>>>
>> I agree that requests larger than disk size are ugly.. But splitting request 
>> brings me again to idea of having separate command or flag for clearing the 
>> whole disk without that dance. Server may report availability of this/flag 
>> command only if target driver supports fast write_zeroes (qcow2 in our case).
> Why? In the general case you need to break up requests anyway (particularly 
> with the INFO extension where there is a maximum command size), and issuing a 
> command over a TCP connection that might take hours or days to complete with 
> no hint of progress, and no TCP traffic to keep NAT etc. alive, sounds like 
> bad practice. The overhead is tiny.
>
> I would be against this change.
>

Full backup, for example:

1. target can do fast write_zeroes: clear the whole disk (great if we 
can do it in one request, without splitting, etc), then backup all data 
except zero or unallocated (save a lot of time on this skipping).
2. target can not do fast write_zeroes: just backup all data. We need 
not clear the disk, as we will not save time by this.

So here, we need not splitting as a general. Just clear all or not 
clearing at all.

-- 
Best regards,
Vladimir


------------------------------------------------------------------------------
_______________________________________________
Nbd-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] write_zeroes/trim on the whole disk

Reply via email to