Re: [PATCH] block/mirror: add 'write-blocking-after-ready' copy mode

Kevin Wolf Tue, 21 Feb 2023 03:28:44 -0800

Am 21.02.2023 um 11:57 hat Fiona Ebner geschrieben:
> Am 14.02.23 um 17:19 schrieb Vladimir Sementsov-Ogievskiy:
> > On 02.02.23 16:27, Fiona Ebner wrote:
> >> Am 02.02.23 um 12:34 schrieb Kevin Wolf:
> >>> But having to switch the mirror job to sync mode just to avoid doing I/O
> >>> on an inactive device sounds wrong to me. It doesn't fix the root cause
> >>> of that problem, but just papers over it.
> >>
> >> If you say the root cause is "the job not being completed before
> >> switchover", then yes. But if the root cause is "switchover happening
> >> while the drive is not actively synced", then a way to switch modes can
> >> fix the root cause :)
> >>
> >>>
> >>> Why does your management tool not complete the mirror job before it
> >>> does the migration switchover that inactivates images?
> >>
> >> I did talk with my team leader about the possibility, but we decided to
> >> not go for it, because it requires doing the migration in two steps with
> >> pause-before-switchover and has the potential to increase guest downtime
> >> quite a bit. So I went for this approach instead.
> >>
> > 
> > 
> > Interesting point. Maybe we need a way to automatically complete all the
> > jobs before switchower?  It seems no reason to break the jobs if user
> > didn't cancel them. (and of course no reason to allow a code path
> > leading to assertion).
> > 
> 
> Wouldn't that be a bit unexpected? There could be jobs unrelated to
> migration or jobs at early stages. But sure, being able to trigger the
> assertion is not nice.
> 
> Potential alternatives could be pausing the jobs or failing migration
> with a clean error?


I wonder if the latter is what we would get if child_job.inactivate()
just returned an error if the job isn't completed yet?

It would potentially result in a mix of active and inactive BDSes,
though, which is a painful state to be in. If you don't 'cont'
afterwards, you're likely to hit another assertion once you try to do
something with an inactive BDS.

Pausing the job feels a bit dangerous, because it means that you can
resume it as a user. We'd have to add code to check that the image has
actually been activated again before we allow to resume the job.

> For us, the former is still best done in combination with a way to
> switch to active (i.e. write-blocking) mode for drive-mirror.

Switching between these modes is a useful thing to have either way.

> The latter would force us to complete the drive-mirror job before
> switchover even with active (i.e. write-blocking) mode, breaking our
> usage of drive-mirror+migration that worked (in almost all cases, but it
> would have been all cases if we had used active mode ;)) for many years now.
> 
> Maybe adding an option for how the jobs should behave upon switchover
> (e.g. complete/pause/cancel/cancel-migration) could help? Either as a
> job-specific option (more flexible) or a migration option?

Kevin

Re: [PATCH] block/mirror: add 'write-blocking-after-ready' copy mode

Reply via email to