Am 26.01.2011 16:40, schrieb Avi Kivity:
> On 01/22/2011 11:29 AM, Stefan Hajnoczi wrote:
>> Converting qcow2 to use coroutines is fairly simple since most of qcow2
>> is synchronous. The synchronous I/O functions likes bdrv_pread() now
>> transparently work when called from a coroutine, so all the synchronous
>> code just works.
>>
>> The explicitly asynchronous code is adjusted to repeatedly call
>> qcow2_aio_read_cb() or qcow2_aio_write_cb() until the request completes.
>> At that point the coroutine will return from its entry function and its
>> resources are freed.
>>
>> The bdrv_aio_readv() and bdrv_aio_writev() user callback is now invoked
>> from a BH. This is necessary since the user callback code does not
>> expect to be executed from a coroutine.
>>
>> This conversion is not completely correct because the safety the
>> synchronous code does not carry over to the coroutine version.
>> Previously, a synchronous code path could assume that it will never be
>> interleaved with another request executing. This is no longer true
>> because bdrv_pread() and bdrv_pwrite() cause the coroutine to yield and
>> other requests can be processed during that time.
>>
>> The solution is to carefully introduce checks so that pending requests
>> do not step on each other's toes. That is left for a future patch...
>
> The way I thought of doing this is:
>
> qcow_aio_write(...)
> {
> execute_in_coroutine {
> co_mutex_lock(&bs->mutex);
> do_qcow_aio_write(...); // original qcow code
> co_mutex_release(&bs->mutex);
> }
> }
>
> (similar changes for the the other callbacks)
>
> if the code happens to be asynchronous (no metadata changes), we'll take
> the mutex and release it immediately after submitting the I/O, so no
> extra serialization happens. If the code does issue a synchronous
> metadata write, we'll lock out all other operations on the same block
> device, but still allow the vcpu to execute, since all the locking
> happens in a coroutine.
>
> Essentially, a mutex becomes the dependency tracking mechnism. A global
> mutex means all synchronous operations are dependent. Later, we can
> convert the metadata cache entry dependency lists to local mutexes
> inside the cache entry structures.
I thought a bit about it since you mentioned it in the call yesterday
and I think this approach makes sense. Even immediately after the
conversion we should be in a better state than with Stefan's approach
because I/O without metadata disk access won't be serialized.
In the other thread you mentioned that you have written some code
independently. Do you have it in some public git repository?
Kevin