Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer

Alberto Garcia Tue, 17 May 2016 07:29:53 -0700

On Thu 12 May 2016 05:28:34 PM CEST, Kevin Wolf wrote:
> Am 12.05.2016 um 17:13 hat Alberto Garcia geschrieben:
>> On Thu 12 May 2016 05:04:51 PM CEST, Kevin Wolf wrote:
>> > Am 12.05.2016 um 15:47 hat Alberto Garcia geschrieben:
>> >> On Tue 03 May 2016 03:48:47 PM CEST, Kevin Wolf wrote:
>> >> > Am 03.05.2016 um 15:33 hat Alberto Garcia geschrieben:
>> >> >> On Tue 03 May 2016 03:23:24 PM CEST, Kevin Wolf wrote:
>> >> >> >> c) we fix bdrv_reopen() so we can actually run both jobs at the same
>> >> >> >>    time. I'm wondering if pausing all block jobs between
>> >> >> >>    bdrv_reopen_prepare() and bdrv_reopen_commit() would do the
>> >> >> >>    trick. Opinions?
>> >> >> >
>> >> >> > I would have to read up the details of the problem again, but I think
>> >> >> > with bdrv_drained_begin/end() we actually have the right tool now to 
>> >> >> > fix
>> >> >> > it properly. We may need to pull up the drain (bdrv_drain_all() 
>> >> >> > today)
>> >> >> > from bdrv_reopen_multiple() to its caller and just assert it in the
>> >> >> > function itself, but there shouldn't be much more to it than that.
>> >> >> 
>> >> >> I think that's not enough, see point 2) here:
>> >> >> 
>> >> >> https://lists.gnu.org/archive/html/qemu-block/2015-12/msg00180.html
>> >> >> 
>> >> >>   "I've been taking a look at the bdrv_drained_begin/end() API, but as 
>> >> >> I
>> >> >>    understand it it prevents requests from a different AioContext.
>> >> >>    Since all BDS in the same chain share the same context it does not
>> >> >>    really help here."
>> >> >
>> >> > Yes, that's the part I meant with pulling up the calls.
>> >> >
>> >> > If I understand correctly, the problem is that first bdrv_reopen_queue()
>> >> > queues a few BDSes for reopen, then bdrv_drain_all() completes all
>> >> > running requests and can indirectly trigger a graph modification, and
>> >> > then bdrv_reopen_multiple() uses the queue which doesn't match reality
>> >> > any more.
>> >> >
>> >> > The solution to that should be simply changing the order of things:
>> >> >
>> >> > 1. bdrv_drained_begin()
>> >> > 2. bdrv_reopen_queue()
>> >> > 3. bdrv_reopen_multiple()
>> >> >     * Instead of bdrv_drain_all(), assert that no requests are pending
>> >> >     * We don't run requests, so we can't complete a block job and
>> >> >       manipulate the graph any more
>> >> > 4. then bdrv_drained_end()
>> >> 
>> >> This doesn't work. Here's what happens:
>> >> 
>> >> 1) Block job (a) starts (block-stream).
>> >> 
>> >> 2) Block job (b) starts (block-stream, or block-commit).
>> >> 
>> >> 3) job (b) calls bdrv_reopen() and does the drain call.
>> >> 
>> >> 4) job (b) creates reopen_queue and calls bdrv_reopen_multiple().
>> >>    There are no pending requests at this point, but job (a) is sleeping.
>> >> 
>> >> 5) bdrv_reopen_multiple() iterates over reopen_queue and calls
>> >>    bdrv_reopen_prepare() -> bdrv_flush() -> bdrv_co_flush() ->
>> >>    qemu_coroutine_yield().
>> >
>> > I think between here and the next step is what I don't understand.
>> >
>> > bdrv_reopen_multiple() is not called in coroutine context, right? All
>> > block jobs use block_job_defer_to_main_loop() before they call
>> > bdrv_reopen(), as far as I can see. So bdrv_flush() shouldn't take the
>> > shortcut, but use a nested event loop.
>> 
>> When bdrv_flush() is not called in coroutine context it does
>> qemu_coroutine_create() + qemu_coroutine_enter().
>
> Right, but if the coroutine yields, we jump back to the caller, which
> looks like this:
>
>     co = qemu_coroutine_create(bdrv_flush_co_entry);
>     qemu_coroutine_enter(co, &rwco);
>     while (rwco.ret == NOT_DONE) {
>         aio_poll(aio_context, true);
>     }
>
> So this loops until the flush has completed. The only way I can see how
> something else (job (a)) can run is if aio_poll() calls it.


If I'm not wrong that can happen if job (a) is sleeping.

If job (a) is launched with a rate limit, then the bdrv_drain() call
will not drain it completely but return when the block job hits the I/O
limit -> block_job_sleep_ns() -> co_aio_sleep_ns().

>> > I don't fully understand the problem yet, but as a shot in the
>> > dark, would pausing block jobs in bdrv_drained_begin() help?
>> 
>> Yeah, my impression is that pausing all jobs during bdrv_reopen()
>> should be enough.
>
> If you base your patches on top of my queue (which I think you already
> do for the greatest part), the nicest way to implement this would
> probably be that BlockBackends give their users a callback for
> .drained_begin/end and the jobs implement that as pausing themselves.
>
> We could, of course, directly pause block jobs in
> bdrv_drained_begin(), but that would feel a bit hackish. So maybe do
> that for a quick attempt whether it helps, and if it does, we can
> write the real thing then.

I'll try that.

On a related note, bdrv_drain_all() already pauses all block jobs. The
problem is that when it ends it resumes them, so it can happen that
there's again pending I/O requests before bdrv_drain_all() returns.

That sounds like a problem to me, or am I overlooking anything?

Berto

Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer

Reply via email to