Nir Soffer <[email protected]> writes:
> When the `read-zeroes` is set, reads produce zeroes, and block status
> return BDRV_BLOCK_ZERO, emulating a sparse image.
>
> If we don't set `read-zeros` we report BDRV_BLOCK_DATA, but image data
> is undefined; posix_memalign, _aligned_malloc, valloc, or memalign do
> not promise to zero allocated memory.
>
> When computing a blkhash of an image via qemu-nbd, we want to test 3
> cases:
>
> 1. Sparse image: skip reading the entire image based on block status
> result, and use a pre-computed zero block hash.
> 2. Image full of zeroes: read the entire image, detect block full of
> zeroes and skip block hash computation.
> 3. Image full of data: read the entire image and compute a hash of all
> blocks.
>
> This change adds `read-pattern` option. If the option is set, reads
> produce the specified pattern. With this option we can emulate an image
> full of zeroes or full of non-zeroes.
>
> The following examples shows how the new option can be used with blksum
> (or nbdcopy --blkhash) to compute a blkhash of an image using the
> null-co driver.
>
> Sparse image - the very fast path:
>
> % ./qemu-nbd -r -t -e 0 -f raw -k /tmp/sparse.sock \
> "json:{'driver': 'raw', 'file': {'driver': 'null-co', 'size': '100g',
> 'read-zeroes': true}}" &
>
> % time blksum 'nbd+unix:///?socket=/tmp/sparse.sock'
> 300ad1efddb063822fea65ae3174cd35320939d4d0b050613628c6e1e876f8f6
> nbd+unix:///?socket=/tmp/sparse.sock
> blksum 'nbd+unix:///?socket=/tmp/sparse.sock' 0.05s user 0.01s system
> 92% cpu 0.061 total
>
> Image full of zeros - same hash, 268 times slower:
>
> % ./qemu-nbd -r -t -e 0 -f raw -k /tmp/zero.sock \
> "json:{'driver': 'raw', 'file': {'driver': 'null-co', 'size': '100g',
> 'read-pattern': 0}}" &
>
> % time blksum 'nbd+unix:///?socket=/tmp/zero.sock'
> 300ad1efddb063822fea65ae3174cd35320939d4d0b050613628c6e1e876f8f6
> nbd+unix:///?socket=/tmp/zero.sock
> blksum 'nbd+unix:///?socket=/tmp/zero.sock' 7.45s user 22.57s system
> 183% cpu 16.347 total
>
> Image full of data - difference hash, heavy cpu usage:
>
> % ./qemu-nbd -r -t -e 0 -f raw -k /tmp/data.sock \
> "json:{'driver': 'raw', 'file': {'driver': 'null-co', 'size': '100g',
> 'read-pattern': 255}}" &
>
> % time blksum 'nbd+unix:///?socket=/tmp/data.sock'
> 2c122b3ed28c83ede3c08485659fa9b56ee54ba1751db74d8ba9aa13d9866432
> nbd+unix:///?socket=/tmp/data.sock
> blksum 'nbd+unix:///?socket=/tmp/data.sock' 46.05s user 14.15s system
> 448% cpu 13.414 total
>
> Specifying both `read-zeroes` and `read-pattern` is an error since
> `read-zeroes` implies a sparse image. Example errors:
>
> % ./qemu-img map --output json \
> "json:{'driver': 'raw', 'file': {'driver': 'null-co', 'read-pattern':
> -1}}"
> qemu-img: Could not open 'json:{...}': read_pattern is out of range
> (0-255)
>
> % ./qemu-img map --output json \
> "json:{'driver': 'raw', 'file': {'driver': 'null-co', 'read-pattern':
> 0, 'read-zeroes': true}}"
> qemu-img: Could not open 'json:{...}': The parameters read-zeroes and
> read-pattern are in conflict
>
> Tested on top of
> https://lists.gnu.org/archive/html/qemu-devel/2025-04/msg05096.html.
>
> Signed-off-by: Nir Soffer <[email protected]>
[...]
> diff --git a/docs/devel/secure-coding-practices.rst
> b/docs/devel/secure-coding-practices.rst
> index 0454cc527e..73830684ea 100644
> --- a/docs/devel/secure-coding-practices.rst
> +++ b/docs/devel/secure-coding-practices.rst
> @@ -111,5 +111,6 @@ Use of null-co block drivers
> The ``null-co`` block driver is designed for performance: its read accesses
> are
> not initialized by default. In case this driver has to be used for security
> research, it must be used with the ``read-zeroes=on`` option which fills read
> -buffers with zeroes. Security issues reported with the default
> +buffers with zeroes, or with the ``read-pattern=N`` option which fills read
> +buffers with pattern. Security issues reported with the default
> (``read-zeroes=off``) will be discarded.
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 7c95c9e36a..2205ac9758 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -3295,13 +3295,23 @@
> #
> # @read-zeroes: if true, emulate a sparse image, and reads from the
> # device produce zeroes; if false, emulate an allocated image but
> -# reads from the device leave the buffer unchanged.
> +# reads from the device leave the buffer unchanged. Mutually
> +# exclusive with @read-pattern.
> # (default: false; since: 4.1)
Correct before the patch: absent behaves just like present and false.
That's no longer the case afterwards. Hmm.
> #
> +# @read-pattern: if set, emulate an allocated image, and reads from the
> +# device produce the specified byte value; if unset, reads from the
> +# device leave the buffer unchanged. Mutually exclusive with
> +# @read-zeroes.
> +# (since: 10.1)
Default? The usual way to document it would be something like (default:
false), but that's again problematic.
> +#
> # Since: 2.9
> ##
> { 'struct': 'BlockdevOptionsNull',
> - 'data': { '*size': 'int', '*latency-ns': 'uint64', '*read-zeroes': 'bool'
> } }
> + 'data': { '*size': 'int',
> + '*latency-ns': 'uint64',
> + '*read-zeroes': 'bool',
> + '*read-pattern': 'uint8' } }
>
> ##
> # @BlockdevOptionsNVMe:
Let's take a step back from the concrete interface and ponder what we're
trying to do. We want three cases:
* Allocated, reads return unspecified crap (security hazard)
* Allocated, reads return specified data
* Sparse, reads return zeroes
How would we do this if we started on a green field?
Here's my try:
bool sparse
uint8 contents
so that
* Allocated, reads return unspecified crap (security hazard)
@sparse is false, and @contents is absent
* Allocated, reads return specified data
@sparse is false, and @contents is present
* Sparse, reads return zeroes
@sparse is true, and @contents must absent or zero
I'd make @sparse either mandatory or default to true, so that we don't
default to security hazard.
Now compare this to your patch: same configuration data (bool × uint8),
better names, cleaner semantics, better defaults.
Unless we want to break compatibility, we're stuck with the name
@read-zeroes for the bool, and its trap-for-the-unwary default value,
but cleaner semantics seem possible.
Thoughts?