On Wed, Jan 05, 2022 at 12:56:42PM -0700, Alex Williamson wrote:
> The VFIO_DEVICE_RESET ioctl might be backed by several different reset
> methods, including a device specific reset (ie. custom reset code in
> kernel), an ACPI reset (ie. custom reset code in firmware), FLR, PM,
> and bus resets. This listing is also the default priority order used
> by the kernel for trying reset methods. Traditionally we've had some
> FUD regarding the PM reset as the extent of a "Soft Reset" is not well
> defined in the PCI specification. Therefore we try to guess what type
> of reset a device might use for the VFIO_DEVICE_RESET and insert a bus
> reset via the vfio hot reset interface if we think it could be a PM
> reset.
>
> This results in a couple odd tests for PM reset in our hot reset code,
> as we assume if we don't detect has_pm_reset support that we can't
> reset the device otherwise. Starting with kernel v5.15, the kernel
> exposes a sysfs attribute for devices that can tell us the priority
> order for device resets, so long term (not implemented here) we no
> longer need to play this guessing game, and if permissions allow we
> could manipulate the order ourselves so that we don't need to inject
> our own hot reset.
>
> In the shorter term, implemented here, let's not assume we're out of
> reset methods if we can't perform a hot reset and the device doesn't
> support PM reset. We can use reset_works as the authority, which
> allows us to generate more comprehensible error messages for the case
> when it actually doesn't work.
>
> The impetus for this change is a result of commit d5daff7d3126 ("pcie:
> implement slot power control for pcie root ports"), where powering off
> a slot now results in a device reset. If the slot is powered off as a
> result of qdev_unplug() via the device request event, that device
> request is potentially the result of an unbind operation in the host.
> That unbind operation holds the kernel device lock, which causes the
> VFIO_DEVICE_RESET ioctl to fail (or in the case of some kernels, has
> cleared the flag indicating support of a device reset function). We
> can then end up with an SR-IOV VF device trying to trigger a hot reset,
> which finds that it needs ownership of the PF group to perform such a
> reset, resulting in confusing log messages.
>
> Ultimately the above commit still introduces a log message that we
> didn't have prior on such an unplug, but it's not unjustified to
> perform such a reset, though it might be considered unnecessary.
> Arguably failure to reset the device should always generate some sort
> of meaningful log message.
>
> Signed-off-by: Alex Williamson <[email protected]>
Looks reasonable. Just an extra idea: do we want to maybe validate the
return code from the ioctl? I assume it's something like EBUSY right?
In any case better than what we have now:
Acked-by: Michael S. Tsirkin <[email protected]>
> ---
> hw/vfio/pci.c | 44 ++++++++++++++++++++++++++++++++------------
> 1 file changed, 32 insertions(+), 12 deletions(-)
>
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 7b45353ce27f..ea697951556e 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2224,7 +2224,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool
> single)
> ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
> if (ret && errno != ENOSPC) {
> ret = -errno;
> - if (!vdev->has_pm_reset) {
> + if (!vdev->vbasedev.reset_works) {
> error_report("vfio: Cannot reset device %s, "
> "no available reset mechanism.",
> vdev->vbasedev.name);
> }
> @@ -2270,7 +2270,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool
> single)
> }
>
> if (!group) {
> - if (!vdev->has_pm_reset) {
> + if (!vdev->vbasedev.reset_works) {
> error_report("vfio: Cannot reset device %s, "
> "depends on group %d which is not owned.",
> vdev->vbasedev.name, devices[i].group_id);
> @@ -3162,6 +3162,8 @@ static void vfio_exitfn(PCIDevice *pdev)
> static void vfio_pci_reset(DeviceState *dev)
> {
> VFIOPCIDevice *vdev = VFIO_PCI(dev);
> + Error *err = NULL;
> + int ret;
>
> trace_vfio_pci_reset(vdev->vbasedev.name);
>
> @@ -3175,26 +3177,44 @@ static void vfio_pci_reset(DeviceState *dev)
> goto post_reset;
> }
>
> - if (vdev->vbasedev.reset_works &&
> - (vdev->has_flr || !vdev->has_pm_reset) &&
> - !ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
> - trace_vfio_pci_reset_flr(vdev->vbasedev.name);
> - goto post_reset;
> + if (vdev->vbasedev.reset_works && (vdev->has_flr ||
> !vdev->has_pm_reset)) {
> + if (!ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
> + trace_vfio_pci_reset_flr(vdev->vbasedev.name);
> + goto post_reset;
> + }
> +
> + error_setg_errno(&err, errno, "Unable to reset device");
> }
>
> /* See if we can do our own bus reset */
> - if (!vfio_pci_hot_reset_one(vdev)) {
> + ret = vfio_pci_hot_reset_one(vdev);
> + if (!ret) {
> goto post_reset;
> }
>
> + if (!err) {
> + error_setg_errno(&err, -ret, "Unable to perform bus reset");
> + }
> +
> /* If nothing else works and the device supports PM reset, use it */
> - if (vdev->vbasedev.reset_works && vdev->has_pm_reset &&
> - !ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
> - trace_vfio_pci_reset_pm(vdev->vbasedev.name);
> - goto post_reset;
> + if (vdev->vbasedev.reset_works && vdev->has_pm_reset) {
> + /* Prefer to report the ioctl failure mode */
> + error_free(err);
> + err = NULL;
> +
> + if (!ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
> + trace_vfio_pci_reset_pm(vdev->vbasedev.name);
> + goto post_reset;
> + }
> +
> + error_setg_errno(&err, errno, "Unable to reset device");
> }
>
> + warn_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
> + err = NULL;
> +
> post_reset:
> + error_free(err);
> vfio_pci_post_reset(vdev);
> }
>
>