FYI. The following data comes from the first ping-pong mlx VF migration after rebooting the host.
1. Test for multifd=0: 1.1 Outgoing migration: VF number: 1 VF 4 VF Time elapsed: 10194 ms 10650 ms Memory processed: 903.911 MiB 783.698 MiB Memory bandwidth: 108.722 MiB/s 101.978 MiB/s Iteration: 4 6 Normal data: 881.297 MiB 747.613 MiB Total downtime 358ms 518ms Setup time 52ms 450ms 1.2 In coming migration: VF number: 1 VF 4 VF Time elapsed: 10161 ms 10569 ms Memory processed: 903.881 MiB 785.400 MiB Memory bandwidth: 107.952 MiB/s 100.512 MiB/s Iteration: 4 7 Normal data: 881.262 MiB 749.297 MiB Total downtime 315ms 513ms Setup time 47ms 414ms 2. Test for multifd=1: 2.1 Outgoing migration: VF number 1 VF 1 VF Channel number 4 5 Time elapsed: 10962 ms 10071 ms Memory processed: 908.968 MiB 908.424 MiB Memory bandwidth: 108.378 MiB/s 110.109 MiB/s Iteration: 4 4 Normal data: 882.852 MiB 882.566 MiB Total downtime 318ms 255ms Setup time 54ms 43ms VF number 4 VFs 4 VFs Channel number 8 16 Time elapsed: 10805 ms 10943 ms Setup time 445 ms 463ms Memory processed: 786.334 MiB 784.926 MiB Memory bandwidth 109.062 MiB/s 108.610 MiB/s Iteration: 5 7 Normal data: 746.758 MiB 744.938 MiB Total downtime 344 ms 335ms 2.2 Incoming migration: VF number 1 VF 1 VF Channel number 4 5 Time elapsed: 10064ms 10072 ms Memory processed: 909.786 MiB 923.746 MiB Memory bandwidth: 109.997 MiB/s 111.308 MiB/s Iteration: 4 4 Normal data: 883.664 MiB 897.848 MiB Total downtime 313ms 328ms Setup time 46ms 47ms VF number 4 VFs 4 VFs Channel number 8 16 Time elapsed: 10126 ms 9941 ms Memory processed: 791.308 MiB 779.560 MiB Memory bandwidth: 108.876 MiB/s 110.170 MiB/s Iteration: 7 5 Normal data: 751.672 MiB 739.680 MiB Total downtime 304 ms 309ms Setup time 442 ms 446ms Best Regards, Yanghang Liu On Fri, Dec 13, 2024 at 1:36β―AM Peter Xu <pet...@redhat.com> wrote: > > On Wed, Dec 11, 2024 at 12:06:04AM +0100, Maciej S. Szmigiero wrote: > > On 6.12.2024 23:20, Peter Xu wrote: > > > On Fri, Dec 06, 2024 at 07:03:36PM +0100, Maciej S. Szmigiero wrote: > > > > On 4.12.2024 20:10, Peter Xu wrote: > > > > > On Sun, Nov 17, 2024 at 08:19:55PM +0100, Maciej S. Szmigiero wrote: > > > > > > Important note: > > > > > > 4 VF benchmarks were done with commit 5504a8126115 > > > > > > ("KVM: Dynamic sized kvm memslots array") and its > > > > > > revert-dependencies > > > > > > reverted since this seems to improve performance in this VM config > > > > > > if the > > > > > > multifd transfer is enabled: the downtime performance with this > > > > > > commit > > > > > > present is 1141 ms enabled / 1730 ms disabled. > > > > > > > > > > > > Smaller VF counts actually do seem to benefit from this commit, so > > > > > > it's > > > > > > likely that in the future adding some kind of a memslot > > > > > > pre-allocation > > > > > > bit stream message might make sense to avoid this downtime > > > > > > regression for > > > > > > 4 VF configs (and likely higher VF count too). > > > > > > > > > > I'm confused why revert 5504a8126115 could be faster, and it affects > > > > > as > > > > > much as 600ms. Also how that effect differs can relevant to num of > > > > > VFs. > > > > > > > > > > Could you share more on this regression? Because if that's > > > > > problematic we > > > > > need to fix it, or upstream QEMU (after this series merged) will > > > > > still not > > > > > work. > > > > > > > > > > > > > The number of memslots that the VM uses seems to differ depending on its > > > > VF count, each VF using 2 memslots: > > > > 2 VFs, used slots: 13 > > > > 4 VFs, used slots: 17 > > > > 5 VFs, used slots: 19 > > > > > > It's still pretty less. > > > > > > > > > > > So I suspect this performance difference is due to these higher counts > > > > of memslots possibly benefiting from being preallocated on the previous > > > > QEMU code (before commit 5504a8126115). > > > > > > > > I can see that with this commit: > > > > > #define KVM_MEMSLOTS_NR_ALLOC_DEFAULT 16 > > > > > > > > So it would explain why the difference is visible on 4 VFs only (and > > > > possibly higher VF counts, just I don't have an ability to test > > > > migrating > > > > it) since with 4 VF configs we exceed KVM_MEMSLOTS_NR_ALLOC_DEFAULT. > > > > > > I suppose it means kvm_slots_grow() is called once, but I don't understand > > > why it caused 500ms downtime! > > > > In this cover letter sentence: > > > "the downtime performance with this commit present is 1141 ms enabled / > > > 1730 ms disabled" > > "enabled" and "disabled" refer to *multifd transfer* being enabled, not > > your patch being present (sorry for not being 100% clear there). > > > > So the difference that the memslot patch makes is 1141 ms - 1095ms = 46 ms > > extra > > downtime, not 500 ms. > > > > I can guess this is because of extra contention on BQL, with unfortunate > > timing. > > Hmm, I wonder why the address space changed during switchover. I was > expecting the whole address space is updated on qemu boots up, and should > keep as is during migration. Especially why that matters with mulitifd at > all.. I could have overlooked something. > > > > > > Not to mention, that patchset should at least reduce downtime OTOH due to > > > the small num of slots, because some of the dirty sync / clear path would > > > need to walk the whole slot array (our lookup is pretty slow for now, but > > > probably no good reason to rework it yet if it's mostly 10-20). > > > > With multifd transfer being disabled your memslot patch indeed improves the > > downtime by 1900 ms - 1730 ms = 170 ms. > > That's probably the other side of the change when slots grow, comparing to > the pure win where the series definitely should speedup the dirty track > operations quite a bit. > > > > > > In general, I would still expect that dynamic memslot work to speedup > > > (instead of slowing down) VFIO migrations. > > > > > > There's something off here, or something I overlooked. I suggest we > > > figure > > > it out.. Even if we need to revert the kvm series on master, but I so far > > > doubt it. > > > > > > Otherwise we should at least report the number with things on the master > > > branch, and we evaluate merging this series with that real number, because > > > fundamentally that's the numbers people will get when start using this > > > feature on master later. > > > > Sure, that's why in the cover letter I provided the numbers with your commit > > present, too. > > It seems to me we're not far away from the truth. Anyway, feel free to > update if you figure out the reason, or got some news on profiling. > > Thanks, > > -- > Peter Xu > >