On Thu, Dec 20, 2018 at 10:29:14AM +0100, Roger Pau Monné wrote:
>On Thu, Dec 20, 2018 at 10:46:29AM +0800, Chao Gao wrote:
>> On Wed, Dec 19, 2018 at 09:57:51AM +0100, Roger Pau Monné wrote:
>> >On Tue, Dec 18, 2018 at 10:43:37PM +0800, Chao Gao wrote:
>> >> I find some pass-thru devices don't work any more across guest
>> >> reboot. Assigning it to another domain also meets the same issue. And
>> >> the only way to make it work again is un-binding and binding it to
>> >> pciback. Someone reported this issue one year ago [1].
>> >>
>> >> If the device's driver doesn't disable MSI-X during shutdown or qemu is
>> >> killed/crashed before the domain shutdown, this domain's pirq won't be
>> >> unmapped. Then xen takes over this work, unmapping all pirq-s, when
>> >> destroying guest. But as pciback has already disabled meory decoding
>> >> before
>> >> xen unmapping pirq, Xen has to sets the host_maskall flag and maskall bit
>> >> to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
>> >> this process is:
>> >>
>> >> ->arch_domain_destroy
>> >> ->free_domain_pirqs
>> >> ->unmap_domain_pirq (if pirq isn't unmapped by qemu)
>> >> ->pirq_guest_force_unbind
>> >> ->__pirq_guest_unbind
>> >> ->mask_msi_irq(=desc->handler->disable())
>> >> ->the warning in msi_set_mask_bit()
>> >>
>> >> The host_maskall bit will prevent guests from clearing the maskall bit
>> >> even the device is assigned to another guest later. Then guests cannot
>> >> receive MSIs from this device.
>> >>
>> >> To fix this issue, a pirq is unmapped before memory decoding is disabled
>> >> by
>> >> pciback. Specifically, when a device is detached from a guest, all
>> >> established
>> >> mappings between pirq and msi are destroying before changing the
>> >> ownership.
>> >>
>> >> [1]:
>> >> https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html
>> >>
>> >> Signed-off-by: Chao Gao <[email protected]>
>> >> ---
>> >> Applied this patch, qemu would report the error below:
>> >> [00:05.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err:
>> >> 1, pirq: 302, gvec: 0xd5)
>> >> [00:05.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err:
>> >> 1, pirq: 301, gvec: 0xe5)
>> >> [00:04.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err:
>> >> 1, pirq: 359, gvec: 0x41)
>> >> [00:04.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err:
>> >> 1, pirq: 358, gvec: 0x51)
>> >>
>> >> Despite of the error, guest shutdown or device hotplug finishs smoothly.
>> >> It seems to me that qemu tries to unbind a msi which is already unbound by
>> >> the code added by this patch. I am not sure whether it is acceptable to
>> >> leave this error there.
>> >
>> >So QEMU would try to unmap IRQs after unbinding the device? I think
>>
>> It seems to me yes. I don't know the reason right now. maybe because it
>> is an asynchronous process?
>>
>> >QEMU should be fixed to first unmap the IRQs and then unbind the
>> >device.
>>
>> Yes. Agree.
>>
>> >
>> >As long as this doesn't affect QEMU functionality I guess the Xen side
>> >can be committed, but ideally a QEMU patch to avoid those error
>> >messages should be committed at the same time.
>> >
>> >> ---
>> >> xen/drivers/passthrough/io.c | 57
>> >> +++++++++++++++++++++++++++++--------------
>> >> xen/drivers/passthrough/pci.c | 49 +++++++++++++++++++++++++++++++++++++
>> >> xen/include/xen/iommu.h | 1 +
>> >> 3 files changed, 89 insertions(+), 18 deletions(-)
>> >>
>> >> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
>> >> index a6eb8a4..56ee1ef 100644
>> >> --- a/xen/drivers/passthrough/io.c
>> >> +++ b/xen/drivers/passthrough/io.c
>> >> @@ -619,6 +619,42 @@ int pt_irq_create_bind(
>> >> return 0;
>> >> }
>> >>
>> >> +static void pt_irq_destroy_bind_common(struct domain *d, struct pirq
>> >> *pirq)
>> >> +{
>> >> + struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
>> >> +
>> >> + ASSERT(spin_is_locked(&d->event_lock));
>> >> +
>> >> + if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
>> >> + list_empty(&pirq_dpci->digl_list) )
>> >> + {
>> >> + pirq_guest_unbind(d, pirq);
>> >> + msixtbl_pt_unregister(d, pirq);
>> >> + if ( pt_irq_need_timer(pirq_dpci->flags) )
>> >> + kill_timer(&pirq_dpci->timer);
>> >> + pirq_dpci->flags = 0;
>> >> + /*
>> >> + * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
>> >> + * call to pt_pirq_softirq_reset.
>> >> + */
>> >> + pt_pirq_softirq_reset(pirq_dpci);
>> >> +
>> >> + pirq_cleanup_check(pirq, d);
>> >> + }
>> >> +}
>> >> +
>> >> +void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq)
>> >> +{
>> >> + struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
>> >> +
>> >> + ASSERT(spin_is_locked(&d->event_lock));
>> >> +
>> >> + if ( pirq_dpci && pirq_dpci->gmsi.posted )
>> >> + pi_update_irte(NULL, pirq, 0);
>> >> +
>> >> + pt_irq_destroy_bind_common(d, pirq);
>> >> +}
>> >> +
>> >> int pt_irq_destroy_bind(
>> >> struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind)
>> >> {
>> >> @@ -727,26 +763,11 @@ int pt_irq_destroy_bind(
>> >> }
>> >> else
>> >> what = "bogus";
>> >> - }
>> >> - else if ( pirq_dpci && pirq_dpci->gmsi.posted )
>> >> - pi_update_irte(NULL, pirq, 0);
>> >> -
>> >> - if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
>> >> - list_empty(&pirq_dpci->digl_list) )
>> >> - {
>> >> - pirq_guest_unbind(d, pirq);
>> >> - msixtbl_pt_unregister(d, pirq);
>> >> - if ( pt_irq_need_timer(pirq_dpci->flags) )
>> >> - kill_timer(&pirq_dpci->timer);
>> >> - pirq_dpci->flags = 0;
>> >> - /*
>> >> - * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
>> >> - * call to pt_pirq_softirq_reset.
>> >> - */
>> >> - pt_pirq_softirq_reset(pirq_dpci);
>> >>
>> >> - pirq_cleanup_check(pirq, d);
>> >> + pt_irq_destroy_bind_common(d, pirq);
>> >> }
>> >> + else
>> >> + pt_irq_destroy_bind_msi(d, pirq);
>> >>
>> >> spin_unlock(&d->event_lock);
>> >>
>> >> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>> >> index 1277ce2..88a8007 100644
>> >> --- a/xen/drivers/passthrough/pci.c
>> >> +++ b/xen/drivers/passthrough/pci.c
>> >> @@ -368,6 +368,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg
>> >> *pseg, u8 bus, u8 devfn)
>> >> return NULL;
>> >> }
>> >> spin_lock_init(&msix->table_lock);
>> >> + msix->warned = DOMID_INVALID;
>> >> pdev->msix = msix;
>> >> }
>> >>
>> >> @@ -1514,6 +1515,52 @@ static int assign_device(struct domain *d, u16
>> >> seg, u8 bus, u8 devfn, u32 flag)
>> >> return rc;
>> >> }
>> >>
>> >> +/*
>> >> + * Unmap established mappings between domain's pirq and device's MSI.
>> >> + * These mappings were set up by qemu/guest and are expected to be
>> >> + * destroyed when changing the device's ownership.
>> >> + */
>> >> +static void pci_unmap_msi(struct pci_dev *pdev)
>> >> +{
>> >> + struct msi_desc *entry, *tmp;
>> >> +
>> >> + ASSERT(pcidevs_locked());
>> >> +
>> >> + if ( !pdev->domain )
>> >> + return;
>> >> +
>> >> + spin_lock(&pdev->domain->event_lock);
>> >> + list_for_each_entry_safe( entry, tmp, &pdev->msi_list, list )
>> >
>> >Do you really need the _safe version here? Couldn't you even use:
>>
>> Don't need the _safe version.
>>
>> >
>> >while ( (entry = list_first_entry_or_null(...)) != NULL )
>> >...
>>
>> I think it is the same with list_for_each_entry(). Any reason makes you think
>> this one would be better?
>
>Doesn't 'entry' get freed when you call unmap_domain_pirq? In which
>case using the list pointer from that struct would be a
>use-after-free.
>
>Using list_first_entry_or_null you don't need the previous entry in
>order to get the next one, since you always pick the first one until
>the list is empty.
Agree. here I should use _safe version and will take you advice.
>
>> >
>> >> + {
>> >> + struct pirq *info;
>> >> + struct hvm_pirq_dpci *pirq_dpci;
>> >> + int pirq = domain_irq_to_pirq(pdev->domain, entry->irq),
>> >> pirq_orig;
>> >> +
>> >> + pirq_orig = pirq;
>> >> +
>> >> + if ( !pirq )
>> >> + continue;
>> >> +
>> >> + /* For forcibly unmapped pirq, lookup radix tree with absolute
>> >> value */
>> >> + if ( pirq < 0)
>> >> + pirq = -pirq;
>> >
>> >I'm not sure I follow, the pirq hasn't been unmapped at this point
>> >yet?
>>
>> Qemu (i.e. compromised qemu) has the ability to do this. Right? we can't
>> assert the pirq hasn't been unmapped here.
>
>If the PIRQ is unmapped then the 'entry' would also be gone (freed)
>AFAICT (see unmap_domain_pirq which calls msi_free_irq)?
>
>I think that any entry in pdev->msi_list will always have entry->irq
> >= 0, but maybe I'm missing something. AFAICT map_domain_pirq will not
>add an entry with irq < 0.
Yes, you are right. I will add a "WARN_ON" when pirq < 0.
Thanks
Chao
_______________________________________________
Xen-devel mailing list
[email protected]
https://lists.xenproject.org/mailman/listinfo/xen-devel