On Wed, 1 Mar 2023 17:12:51 -0400 Jason Gunthorpe <[email protected]> wrote:
> On Wed, Mar 01, 2023 at 12:55:59PM -0700, Alex Williamson wrote: > > > So it seems like what we need here is both a preface buffer size and a > > target device latency. The QEMU pre-copy algorithm should factor both > > the remaining data size and the device latency into deciding when to > > transition to stop-copy, thereby allowing the device to feed actually > > relevant data into the algorithm rather than dictate its behavior. > > I don't know that we can realistically estimate startup latency, > especially have the sender estimate latency on the receiver.. Knowing that the target device is compatible with the source is a point towards making an educated guess. > I feel like trying to overlap the device start up with the STOP phase > is an unnecessary optimization? How do you see it benifits? If we can't guarantee that there's some time difference between sending initial bytes immediately at the end of pre-copy vs immediately at the beginning of stop-copy, does that mean any handling of initial bytes is an unnecessary optimization? I'm imagining that completing initial bytes triggers some initialization sequence in the target host driver which runs in parallel to the remaining data stream, so in practice, even if sent at the beginning of stop-copy, the target device gets a head start. > I've been thinking of this from the perspective that we should always > ensure device startup is completed, it is time that has to be paid, > why pay it during STOP? Creating a policy for QEMU to send initial bytes in a given phase doesn't ensure startup is complete. There's no guaranteed time difference between sending that data and the beginning of stop-copy. QEMU is trying to achieve a downtime goal, where it estimates network bandwidth to get a data size threshold, and then polls devices for remaining data. That downtime goal might exceed the startup latency of the target device anyway, where it's then the operators choice to pay that time in stop-copy, or stalled on the target. But if we actually want to ensure startup of the target is complete, then drivers should be able to return both data size and estimated time for the target device to initialize. That time estimate should be updated by the driver based on if/when initial_bytes is drained. The decision whether to continue iterating pre-copy would then be based on both the maximum remaining device startup time and the calculated time based on remaining data size. I think this provides a better guarantee than anything based simply on transferring a given chunk of data in a specific phase of the process. Thoughts? Thanks, Alex
