On Wed, 4 Nov 2020 13:25:40 +0530 Kirti Wankhede <[email protected]> wrote:
> On 11/4/2020 1:57 AM, Alex Williamson wrote: > > On Wed, 4 Nov 2020 01:18:12 +0530 > > Kirti Wankhede <[email protected]> wrote: > > > >> On 10/30/2020 12:35 AM, Alex Williamson wrote: > >>> On Thu, 29 Oct 2020 23:11:16 +0530 > >>> Kirti Wankhede <[email protected]> wrote: > >>> > >> > >> <snip> > >> > >>>>>> +System memory dirty pages tracking > >>>>>> +---------------------------------- > >>>>>> + > >>>>>> +A ``log_sync`` memory listener callback is added to mark system > >>>>>> memory pages > >>>>> > >>>>> s/is added to mark/marks those/ > >>>>> > >>>>>> +as dirty which are used for DMA by VFIO device. Dirty pages bitmap is > >>>>>> queried > >>>>> > >>>>> s/by/by the/ > >>>>> s/Dirty/The dirty/ > >>>>> > >>>>>> +per container. All pages pinned by vendor driver through > >>>>>> vfio_pin_pages() > >>>>> > >>>>> s/by/by the/ > >>>>> > >>>>>> +external API have to be marked as dirty during migration. When there > >>>>>> are CPU > >>>>>> +writes, CPU dirty page tracking can identify dirtied pages, but any > >>>>>> page pinned > >>>>>> +by vendor driver can also be written by device. There is currently no > >>>>>> device > >>>>> > >>>>> s/by/by the/ (x2) > >>>>> > >>>>>> +which has hardware support for dirty page tracking. So all pages > >>>>>> which are > >>>>>> +pinned by vendor driver are considered as dirty. > >>>>>> +Dirty pages are tracked when device is in stop-and-copy phase because > >>>>>> if pages > >>>>>> +are marked dirty during pre-copy phase and content is transfered from > >>>>>> source to > >>>>>> +destination, there is no way to know newly dirtied pages from the > >>>>>> point they > >>>>>> +were copied earlier until device stops. To avoid repeated copy of > >>>>>> same content, > >>>>>> +pinned pages are marked dirty only during stop-and-copy phase. > >>>> > >>>> > >>>>> Let me take a quick stab at rewriting this paragraph (not sure if I > >>>>> understood it correctly): > >>>>> > >>>>> "Dirty pages are tracked when the device is in the stop-and-copy phase. > >>>>> During the pre-copy phase, it is not possible to distinguish a dirty > >>>>> page that has been transferred from the source to the destination from > >>>>> newly dirtied pages, which would lead to repeated copying of the same > >>>>> content. Therefore, pinned pages are only marked dirty during the > >>>>> stop-and-copy phase." ? > >>>>> > >>>> > >>>> I think above rephrase only talks about repeated copying in pre-copy > >>>> phase. Used "copied earlier until device stops" to indicate both > >>>> pre-copy and stop-and-copy till device stops. > >>> > >>> > >>> Now I'm confused, I thought we had abandoned the idea that we can only > >>> report pinned pages during stop-and-copy. Doesn't the device needs to > >>> expose its dirty memory footprint during the iterative phase regardless > >>> of whether that causes repeat copies? If QEMU iterates and sees that > >>> all memory is still dirty, it may have transferred more data, but it > >>> can actually predict if it can achieve its downtime tolerances. Which > >>> is more important, less data transfer or predictability? Thanks, > >>> > >> > >> Even if QEMU copies and transfers content of all sys mem pages during > >> pre-copy (worst case with IOMMU backed mdev device when its vendor > >> driver is not smart to pin pages explicitly and all sys mem pages are > >> marked dirty), then also its prediction about downtime tolerance will > >> not be correct, because during stop-and-copy again all pages need to be > >> copied as device can write to any of those pinned pages. > > > > I think you're only reiterating my point. If QEMU copies all of guest > > memory during the iterative phase and each time it sees that all memory > > is dirty, such as if CPUs or devices (including assigned devices) are > > dirtying pages as fast as it copies them (or continuously marks them > > dirty), then QEMU can predict that downtime will require copying all > > pages. > > But as of now there is no way to know if device has dirtied pages during > iterative phase. This claim doesn't make any sense, pinned pages are considered persistently dirtied, during the iterative phase and while stopped. > > If instead devices don't mark dirty pages until the VM is > > stopped, then QEMU might iterate through memory copy and predict a short > > downtime because not much memory is dirty, only to be surprised that > > all of memory is suddenly dirty. At that point it's too late, the VM > > is already stopped, the predicted short downtime takes far longer than > > expected. This is exactly why we made the kernel interface mark pinned > > pages persistently dirty when it was proposed that we only report > > pinned pages once. Thanks, > > > > Since there is no way to know if device dirtied pages during iterative > phase, QEMU should query pinned pages in stop-and-copy phase. As above, I don't believe this is true. > Whenever there will be hardware support or some software mechanism to > report pages dirtied by device then we will add a capability bit in > migration capability and based on that capability bit qemu/user space > app should decide to query dirty pages in iterative phase. Yes, we could advertise support for fine granularity dirty page tracking, but I completely disagree that we should consider pinned pages clean until suddenly exposing them as dirty once the VM is stopped. Thanks, Alex
