On Thu 12 May 2016 05:28:34 PM CEST, Kevin Wolf wrote: > Am 12.05.2016 um 17:13 hat Alberto Garcia geschrieben: >> On Thu 12 May 2016 05:04:51 PM CEST, Kevin Wolf wrote: >> > Am 12.05.2016 um 15:47 hat Alberto Garcia geschrieben: >> >> On Tue 03 May 2016 03:48:47 PM CEST, Kevin Wolf wrote: >> >> > Am 03.05.2016 um 15:33 hat Alberto Garcia geschrieben: >> >> >> On Tue 03 May 2016 03:23:24 PM CEST, Kevin Wolf wrote: >> >> >> >> c) we fix bdrv_reopen() so we can actually run both jobs at the same >> >> >> >> time. I'm wondering if pausing all block jobs between >> >> >> >> bdrv_reopen_prepare() and bdrv_reopen_commit() would do the >> >> >> >> trick. Opinions? >> >> >> > >> >> >> > I would have to read up the details of the problem again, but I think >> >> >> > with bdrv_drained_begin/end() we actually have the right tool now to >> >> >> > fix >> >> >> > it properly. We may need to pull up the drain (bdrv_drain_all() >> >> >> > today) >> >> >> > from bdrv_reopen_multiple() to its caller and just assert it in the >> >> >> > function itself, but there shouldn't be much more to it than that. >> >> >> >> >> >> I think that's not enough, see point 2) here: >> >> >> >> >> >> https://lists.gnu.org/archive/html/qemu-block/2015-12/msg00180.html >> >> >> >> >> >> "I've been taking a look at the bdrv_drained_begin/end() API, but as >> >> >> I >> >> >> understand it it prevents requests from a different AioContext. >> >> >> Since all BDS in the same chain share the same context it does not >> >> >> really help here." >> >> > >> >> > Yes, that's the part I meant with pulling up the calls. >> >> > >> >> > If I understand correctly, the problem is that first bdrv_reopen_queue() >> >> > queues a few BDSes for reopen, then bdrv_drain_all() completes all >> >> > running requests and can indirectly trigger a graph modification, and >> >> > then bdrv_reopen_multiple() uses the queue which doesn't match reality >> >> > any more. >> >> > >> >> > The solution to that should be simply changing the order of things: >> >> > >> >> > 1. bdrv_drained_begin() >> >> > 2. bdrv_reopen_queue() >> >> > 3. bdrv_reopen_multiple() >> >> > * Instead of bdrv_drain_all(), assert that no requests are pending >> >> > * We don't run requests, so we can't complete a block job and >> >> > manipulate the graph any more >> >> > 4. then bdrv_drained_end() >> >> >> >> This doesn't work. Here's what happens: >> >> >> >> 1) Block job (a) starts (block-stream). >> >> >> >> 2) Block job (b) starts (block-stream, or block-commit). >> >> >> >> 3) job (b) calls bdrv_reopen() and does the drain call. >> >> >> >> 4) job (b) creates reopen_queue and calls bdrv_reopen_multiple(). >> >> There are no pending requests at this point, but job (a) is sleeping. >> >> >> >> 5) bdrv_reopen_multiple() iterates over reopen_queue and calls >> >> bdrv_reopen_prepare() -> bdrv_flush() -> bdrv_co_flush() -> >> >> qemu_coroutine_yield(). >> > >> > I think between here and the next step is what I don't understand. >> > >> > bdrv_reopen_multiple() is not called in coroutine context, right? All >> > block jobs use block_job_defer_to_main_loop() before they call >> > bdrv_reopen(), as far as I can see. So bdrv_flush() shouldn't take the >> > shortcut, but use a nested event loop. >> >> When bdrv_flush() is not called in coroutine context it does >> qemu_coroutine_create() + qemu_coroutine_enter(). > > Right, but if the coroutine yields, we jump back to the caller, which > looks like this: > > co = qemu_coroutine_create(bdrv_flush_co_entry); > qemu_coroutine_enter(co, &rwco); > while (rwco.ret == NOT_DONE) { > aio_poll(aio_context, true); > } > > So this loops until the flush has completed. The only way I can see how > something else (job (a)) can run is if aio_poll() calls it.
If I'm not wrong that can happen if job (a) is sleeping. If job (a) is launched with a rate limit, then the bdrv_drain() call will not drain it completely but return when the block job hits the I/O limit -> block_job_sleep_ns() -> co_aio_sleep_ns(). >> > I don't fully understand the problem yet, but as a shot in the >> > dark, would pausing block jobs in bdrv_drained_begin() help? >> >> Yeah, my impression is that pausing all jobs during bdrv_reopen() >> should be enough. > > If you base your patches on top of my queue (which I think you already > do for the greatest part), the nicest way to implement this would > probably be that BlockBackends give their users a callback for > .drained_begin/end and the jobs implement that as pausing themselves. > > We could, of course, directly pause block jobs in > bdrv_drained_begin(), but that would feel a bit hackish. So maybe do > that for a quick attempt whether it helps, and if it does, we can > write the real thing then. I'll try that. On a related note, bdrv_drain_all() already pauses all block jobs. The problem is that when it ends it resumes them, so it can happen that there's again pending I/O requests before bdrv_drain_all() returns. That sounds like a problem to me, or am I overlooking anything? Berto
