Re: [Qemu-devel] [PATCH V13 1/6] docs: document for add-cow file format

Dong Xu Wang Thu, 18 Oct 2012 19:15:38 -0700

于 10/19/2012 12:10 AM, Eric Blake 写道:
> On 10/18/2012 03:51 AM, Dong Xu Wang wrote:
>> Document for add-cow format, the usage and spec of add-cow are introduced.
>>
>> Signed-off-by: Dong Xu Wang <wdon...@linux.vnet.ibm.com>
>> ---
>>   docs/specs/add-cow.txt |  139 
>> ++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 files changed, 139 insertions(+), 0 deletions(-)
>>   create mode 100644 docs/specs/add-cow.txt
>>
>> diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
>> new file mode 100644
>> index 0000000..dc1e107
>> --- /dev/null
>> +++ b/docs/specs/add-cow.txt
>> @@ -0,0 +1,139 @@
>> +== General ==
>> +
>> +The raw file format does not support backing files or copy on write feature.
>> +The add-cow image format makes it possible to use backing files with raw
> 
> s/with raw/with a raw/
> 
Okay.


>> +image by keeping a separate .add-cow metadata file. Once all sectors
>> +have been written into the raw image it is safe to discard the .add-cow
>> +and backing files, then we can use the raw image directly.
>> +
>> +An example usage of add-cow would look like::
>> +(ubuntu.img is a disk image which has been installed OS.)
> 
> s/has been installed/has an installed/
Okay.
> 
>> +    1)  Create a raw image with the same size of ubuntu.img
>> +            qemu-img create -f raw test.raw 8G
>> +    2)  Create an add-cow image which will store dirty bitmap
>> +            qemu-img create -f add-cow test.add-cow \
>> +                -o backing_file=ubuntu.img,image_file=test.raw
>> +    3)  Run qemu with add-cow image
>> +            qemu -drive if=virtio,file=test.add-cow
>> +
>> +test.raw may be larger than ubuntu.img, in that case, the size of 
>> test.add-cow
>> +will be calculated from the size of test.raw.
>> +
>> +=Specification=
>> +
>> +The file format looks like this:
>> +
>> + +---------------+-------------+-----------------+
>> + |     Header    |   Reserved  |    COW bitmap   |
>> + +---------------+-------------+-----------------+
> 
> Since you call out what all 4096 bytes must be in the Header section
> (all bytes not occupied by a backing file name or format must be NUL),
> do your really need Reserved in this section, or can you just claim that
> the 4096-byte header is directly followed by the COW bitmap?
> 
Okay, I think Header + COW would be enough.
>> +
>> +All numbers in add-cow are stored in Little Endian byte order.
>> +
>> +== Header ==
>> +
>> +The Header is included in the first bytes:
>> +(#define HEADER_SIZE (4096 * header_size))
>> +    Byte    0 -  7:     magic
>> +                        add-cow magic string ("ADD_COW\xff").
>> +
>> +            8 -  11:    version
>> +                        Version number (only valid value is 1 now).
>> +
>> +            12 - 15:    backing file name offset
>> +                        Offset in the add-cow file at which the backing file
>> +                        name is stored (NB: The string is not 
>> nul-terminated).
>> +                        If backing file name does NOT exist, this field 
>> will be
>> +                        0. Must be between 80 and [HEADER_SIZE - 2](a file 
>> name
>> +                        must be at least 1 byte).
>> +
>> +            16 - 19:    backing file name size
>> +                        Length of the backing file name in bytes. It will 
>> be 0
>> +                        if the backing file name offset is 0. If backing 
>> file
>> +                        name offset is non-zero, then it must be non-zero. 
>> Must
>> +                        be less than [HEADER_SIZE - 80] to fit in the 
>> reserved
>> +                        part of the header.
> 
> More specifically, it must be small enough so that offset+size <=
> HEADER_SIZE.
Okay.

> 
>> +
>> +            20 - 23:    image file name offset
>> +                        Offset in the add-cow file at which the image file 
>> name
>> +                        is stored (NB: The string is not null terminated). 
>> It
>> +                        must be between 80 and [HEADER_SIZE - 2].
>> +
>> +            24 - 27:    image file name size
>> +                        Length of the image file name in bytes.
>> +                        Must be less than [HEADER_SIZE - 80] to fit in the 
>> reserved
>> +                        part of the header.
> 
> More specifically, it must be small enough so that offset+size <=
> HEADER_SIZE.
> 
Okay.

>> +
>> +            28 - 31:    cluster bits
>> +                        Number of bits that are used for addressing an 
>> offset
>> +                        within a cluster (1 << cluster_bits is the cluster 
>> size).
>> +                        Must not be less than 9 (i.e. 512 byte clusters).
>> +
>> +                        Note: qemu as of today has an implementation limit 
>> of 2 MB
>> +                        as the maximum cluster size and won't be able to 
>> open images
>> +                        with larger cluster sizes.
>> +
>> +            32 - 39:    features
>> +                        Bitmask of features. An implementation can safely 
>> ignore
>> +                        any unknown bits that are set.
> 
> Really?  That sounds more like optional features, if an implementation
> can ignore a set bit.  You really want to require that implementations
> reject operations on a file with a feature bit set that they don't
> recognize.
> 
Yep, I should reject the file if un-recognized bits are set.
>> +
>> +                        Bit 0:      All allocated bit.  If this bit is set 
>> then
>> +                                    backing file and COW bitmap will not be 
>> used,
>> +                                    and can read from or write to image 
>> file directly.
> 
> And this particular bit sounds like an optional feature - setting the
> bit is an optimization in speed, but leaving the bit clear or ignoring
> the bit when it is set does not change correctness.
> 
Okay, but if this bit is categorized to optional feature, I will be no
features bit then...

>> +
>> +                        Bits 1-63:  Reserved (set to 0)
>> +
>> +            40 - 47:    optional features
>> +                        Not used now. Reserved for future use. It must be 
>> set to 0.
>> +                        And must be ignored while reading.
> 
> s/. And must be ignored/, and ignored/
> 
Okay.

>> +
>> +            48 - 51:    header size
>> +                        The header field is variable-sized. This field 
>> indicates
>> +                        how many 4096 bytes will be used to store add-cow 
>> header.
>> +                        In add-cow v1, it is fixed to 1, so the header size 
>> will
>> +                        be 4096 * 1 = 4096 bytes.
> 
> So is the value '1' or '4096' in this field?  The wording isn't quite
> clear.  But reading elsewhere, it looks like this should always be '1'
> in version one add-cow.
> 
I mean '1' in this field, I will describe more clearly in future patch.
>> +
>> +            52 - 67:    backing file format
>> +                        Format of backing file. It will be filled with 0 if
>> +                        backing file name offset is 0. If backing file name
>> +                        offset is non-empty, it must be non-empty.
> 
> Are you going to enforce this?  Normally, if backing file format is
> omitted, then qemu knows how to probe backing file format (with the
> caveat that it is a security risk if the probe returns non-raw but the
> backing file really was raw).
> 
To avoid the security risk I add this field in header, can I do anything
to enforce this more?
>> It is coded
>> +                        in free-form ASCII, and is not NUL-terminated. Zero
>> +                        padded on the right.
>> +
>> +            68 - 83:    image file format
>> +                        Format of image file. It must be non-empty. It is 
>> coded
>> +                        in free-form ASCII, and is not NUL-terminated. Zero
>> +                        padded on the right.
> 
> Why do we need this field?  Isn't the image file ALWAYS raw?
> 
In v1, it only supports raw as a image_file, but in future it might
support other formats which do not support COW, so I add image file
format here.

>> +
>> +            84 - [HEADER_SIZE - 1]:
> 
> Elsewhere in your spec, you use 80 rather than 84 for the starting point
> of valid offsets.  Which is it?
Sorry, I add "cluster bits", which is 4 bytes, and I forgot changing 80
to 84. Will correct.
> 
>> +                        It is used to make sure COW bitmap field starts at 
>> the
>> +                        HEADER_SIZE byte, backing file name and image file 
>> name
>> +                        will be stored here. The bytes that is not pointing 
>> to
>> +                        backing file and image file names must be set to 0.
> 
> Not just file names, but also backing file format.
> 
Okay.

>> +
>> +== COW bitmap ==
>> +
>> +The "COW bitmap" field starts at offset HEADER_SIZE, stores a bitmap 
>> related to
> 
> Shouldn't that be HEADER_SIZE * number of header pages, since you
> dedicated a field in the header for that purpose?
> 
There is one line in this doc:
(#define HEADER_SIZE (4096 * header_size))
HEADER_SIZE and header_size are not good names, will correct.

>> +backing file and image file. The bitmap will track whether the sector in
>> +backing file is dirty or not.
>> +
>> +Each bit in the bitmap tracks one cluster's status. For example, if cluster
>> +bit is 16, then each bit tracks one cluster, (1 >> 16) = 65536 bytes. The 
>> size
> 
> s/>>/<</
> 
Okay.

>> +of bitmap is calculated according to virtual size of image file, and it must
>> +be multiple of 65536, the bits not used will be set to 0. Within each byte,
> 
> What must be a multiple of 65536, the image file, or the size of the
> bitmap in the add-cow file?  I think what you want is:
> 
I mean bitmap must be multiple for 65536. Will describe more clearly.

> The image size is rounded up to cluster size (where any bytes in the
> last cluster that do not fit in the image are ignored), then if the
> number of clusters is not a multiple of 8, then remaining bits in the
> bitmap will be set to 0.
> 
> Or do you really want to require that the bitmap is a multiple of 64k
> bytes (at 8 bits per byte, that means the bitmap covers a multiple of
> 512k clusters, and at 512 bytes as the minimum cluster size, that the
> add-cow file format manages a minimum of 256M)?  That is, are you
> requiring that the bitmap end on an aligned boundary, to make the bitmap
> easier to use without having to special case a short-read on the last
> page of the bitmap?
> 
I think my description was wrong. add-cow is using block-cache.c, which
is from qcow2-cache.c. I was not to change qcow2-cache.c to a great
extent, so I use it directly,  and it uses cluster_size as writing unit:
 ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table, 
 s->cluster_size);

So the size of bitmap would be multiple of cluster_size of add-cow, do
you think if it is acceptable?

>> +the least significant bit covers the first cluster. Bit orders in one
>> +byte look like:
>> + +----+----+----+----+----+----+----+----+
>> + | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
>> + +----+----+----+----+----+----+----+----+
>> +
>> +If the bit is 0, indicates the sector has not been allocated in image file, 
>> data
>> +should be loaded from backing file while reading; if the bit is 1, 
>> indicates the
> 
> s/indicates/it indicates/ (twice)
> 
Okay.

> If there is no backing file, or if the image file is larger than the
> backing file and the offset is beyond the end of the backing file, then
> the data should be read as all zero bytes instead.
> 
Okay.

>> +related sector has been dirty, should be loaded from image file while 
>> reading.
>> +Writing to a sector causes the corresponding bit to be set to 1.
>> +
>> +If raw image is not an even multiple of cluster bytes, bits that correspond 
>> to
>> +bytes beyond the raw file size in add-cow must be written as 0 and must be
>> +ignored when reading.
>> +
>> +Image file name and backing file name must NOT be the same, we prevent this
>> +while creating add-cow files.
> 
> You prevent it when creating add-cow files via qemu-img, but that
> doesn't stop malicious users from creating a file with those properties
> and then trying to get you to parse it as add-cow.  I think this needs
> to instead be a requirement on the consumer of a potentially bad file,
> and not a requirement on the producer to avoid bad files, since you
> can't control all producers, as in:
> If image file name and backing file resolve to the same file, the
> add-cow image must be treated as invalid.

Okay, I will check it while opening add-cow file.
> 

Thank you Eric.

Re: [Qemu-devel] [PATCH V13 1/6] docs: document for add-cow file format

Reply via email to