On 01/04/2016 15:57, Fam Zheng wrote:
> Using the nested aio_poll() in coroutine is a bad idea. This patch
> replaces the aio_poll loop in bdrv_drain with a BH, if called in
> coroutine.
> 
> For example, the bdrv_drain() in mirror.c can hang when a guest issued
> request is pending on it in qemu_co_mutex_lock().
> 
> Mirror coroutine in this case has just finished a request, and the block
> job is about to complete. It calls bdrv_drain() which waits for the
> other coroutine to complete. The other coroutine is a scsi-disk request.
> The deadlock happens when the latter is in turn pending on the former to
> yield/terminate, in qemu_co_mutex_lock(). The state flow is as below
> (assuming a qcow2 image):
> 
>   mirror coroutine               scsi-disk coroutine
>   -------------------------------------------------------------
>   do last write
> 
>     qcow2:qemu_co_mutex_lock()
>     ...
>                                  scsi disk read
> 
>                                    tracked request begin
> 
>                                    qcow2:qemu_co_mutex_lock.enter
> 
>     qcow2:qemu_co_mutex_unlock()
> 
>   bdrv_drain
>     while (has tracked request)
>       aio_poll()
> 
> In the scsi-disk coroutine, the qemu_co_mutex_lock() will never return
> because the mirror coroutine is blocked in the aio_poll(blocking=true).
> 
> With this patch, the added qemu_coroutine_yield() allows the scsi-disk
> coroutine to make progress as expected:
> 
>   mirror coroutine               scsi-disk coroutine
>   -------------------------------------------------------------
>   do last write
> 
>     qcow2:qemu_co_mutex_lock()
>     ...
>                                  scsi disk read
> 
>                                    tracked request begin
> 
>                                    qcow2:qemu_co_mutex_lock.enter
> 
>     qcow2:qemu_co_mutex_unlock()
> 
>   bdrv_drain.enter
>>   schedule BH
>>   qemu_coroutine_yield()
>>                                  qcow2:qemu_co_mutex_lock.return
>>                                  ...
>                                    tracked request end
>     ...
>     (resumed from BH callback)
>   bdrv_drain.return
>   ...
> 
> Reported-by: Laurent Vivier <[email protected]>
> Suggested-by: Paolo Bonzini <[email protected]>
> Signed-off-by: Fam Zheng <[email protected]>

Tested-by: Laurent Vivier <[email protected]>

Reply via email to