Avihai Horon <[email protected]> writes:
> The VFIO_MIGRATION event notifies users when a VFIO device transitions
> to a new state.
>
> One use case for this event is to prevent timeouts for RDMA connections
> to the migrated device. In this case, an external management application
> (not libvirt) consumes the events and disables the RDMA timeout
> mechanism when receiving the event for PRE_COPY_P2P state, which
> indicates that the device is non-responsive.
>
> This is essential because RDMA connections typically have very low
> timeouts (tens of milliseconds), which can be far below migration
> downtime.
>
> However, under heavy resource utilization, the device transition to
> PRE_COPY_P2P can take hundreds of milliseconds to complete. Since the
> VFIO_MIGRATION event is currently sent only after the transition
> completes, it arrives too late, after RDMA connections have already
> timed out.
>
> To address this, send an additional "prepare" event immediately before
> initiating the PRE_COPY_P2P transition. This guarantees timely event
> delivery regardless of how long the actual state transition takes.
>
> Signed-off-by: Avihai Horon <[email protected]>
[...]
> diff --git a/qapi/vfio.json b/qapi/vfio.json
> index a1a9c5b673..17b6046871 100644
> --- a/qapi/vfio.json
> +++ b/qapi/vfio.json
> @@ -11,7 +11,13 @@
> ##
> # @QapiVfioMigrationState:
> #
> -# An enumeration of the VFIO device migration states.
> +# An enumeration of the VFIO device migration states. In addition to
> +# the regular states, there are prepare states (with 'prepare' suffix)
> +# which indicate that the device is just about to transition to the
> +# corresponding state. Note that seeing a prepare state for state X
> +# doesn't guarantee that the next state will be X, as the state
> +# transition can fail and the device may transition to a different
> +# state instead.
> #
> # @stop: The device is stopped.
> #
> @@ -32,11 +38,14 @@
> # tracking its internal state and its internal state is available
> # for reading.
> #
> +# @pre-copy-p2p-prepare: The device is just about to move to
> +# pre-copy-p2p state. (since 11.0)
> +#
> # Since: 9.1
> ##
> { 'enum': 'QapiVfioMigrationState',
> 'data': [ 'stop', 'running', 'stop-copy', 'resuming', 'running-p2p',
> - 'pre-copy', 'pre-copy-p2p' ] }
> + 'pre-copy', 'pre-copy-p2p', 'pre-copy-p2p-prepare' ] }
>
> ##
> # @VFIO_MIGRATION:
Acked-by: Markus Armbruster <[email protected]>
[...]