On Wed, Jul 25, 2018 at 05:19:03PM +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: Xen-devel [mailto:[email protected]] On Behalf
> > Of Roger Pau Monné
> > Sent: 25 July 2018 15:12
> > To: [email protected]
> > Cc: xen-devel <[email protected]>; David Woodhouse
> > <[email protected]>; Jan Beulich <[email protected]>;
> > [email protected]
> > Subject: Re: [Xen-devel] PVH dom0 creation fails - the system freezes
> >
> > On Wed, Jul 25, 2018 at 04:57:23PM +0300, [email protected] wrote:
> > > On 07/25/2018 04:35 PM, Roger Pau Monné wrote:
> > > > On Wed, Jul 25, 2018 at 01:06:43PM +0300, [email protected]
> > wrote:
> > > > > On 07/24/2018 12:54 PM, Jan Beulich wrote:
> > > > > > > > > On 23.07.18 at 13:50, <[email protected]> wrote:
> > > > > > > For the last few days, I have been trying to get a PVH dom0
> > > > > > > running,
> > > > > > > however I encountered the following problem: the system seems
> > to
> > > > > > > freeze after the hypervisor boots, the screen goes black. I have
> > tried to
> > > > > > > debug it via a serial console (using Minicom) and managed to get
> > some
> > > > > > > more Xen output, after the screen turns black.
> > > > > > >
> > > > > > > I mention that I have tried to boot the PVH dom0 using different
> > kernel
> > > > > > > images (from 4.9.0 to 4.18-rc3), different Xen versions (4.10,
> > > > > > > 4.11,
> > 4.12).
> > > > > > >
> > > > > > > Below I attached my system / hypervisor configuration, as well as
> > the
> > > > > > > output captured through the serial console, corresponding to the
> > latest
> > > > > > > versions for Xen and the Linux Kernel (Xen staging and Kernel from
> > the
> > > > > > > xen/tip tree).
> > > > > > > [...]
> > > > > > > (XEN) [VT-D]iommu.c:919: iommu_fault_status: Fault Overflow
> > > > > > > (XEN) [VT-D]iommu.c:921: iommu_fault_status: Primary Pending
> > Fault
> > > > > > > (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:00:14.0] fault
> > addr 8deb3000, iommu reg = ffff82c00021b000
> > > > Can you figure out which PCI device is 00:14.0?
> > > This is the output of lspci -vvv for device 00:14.0:
> > >
> > > 00:14.0 USB controller: Intel Corporation Sunrise Point-H USB 3.0 xHCI
> > > Controller (rev 31) (prog-if 30 [XHCI])
> > > Subsystem: Intel Corporation Sunrise Point-H USB 3.0 xHCI
> > > Controller
> > > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > ParErr+
> > > Stepping- SERR+ FastB2B- DisINTx+
> > > Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> > > <TAbort- <MAbort+ >SERR- <PERR- INTx-
> > > Latency: 0
> > > Interrupt: pin A routed to IRQ 178
> > > Region 0: Memory at a2e00000 (64-bit, non-prefetchable) [size=64K]
> > > Capabilities: [70] Power Management version 2
> > > Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA
> > > PME(D0-,D1-,D2-,D3hot+,D3cold+)
> > > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > > Capabilities: [80] MSI: Enable+ Count=1/8 Maskable- 64bit+
> > > Address: 00000000fee0e000 Data: 4021
> > > Kernel driver in use: xhci_hcd
> > > Kernel modules: xhci_pci
> >
> > I'm afraid your USB controller is missing RMRR entries in the DMAR
> > ACPI tables, thus causing the IOMMU faults and not working properly.
> >
> > You could try to manually add some extra rmrr regions by appending:
> >
> > rmrr=0x8deb3=0:0:14.0
> >
> > To the Xen command line, and keep adding any address that pops up in
> > the iommu faults. This is of course quite cumbersome, but there's no
> > way to get the required memory addresses if the data in RMRR is
> > wrong/incomplete.
> >
>
> You could just add all E820 reserved regions in there. That will almost
> certainly cover it.
I have a prototype patch for this that attempts to identity map all
reserved regions below 4GB to the p2m. It's still a WIP, but if you
could give it a try that would help me figure out whether this fixes
your issues and is indeed something that would be good to have.
I don't really like the patch as-is because it doesn't check whether
the reserved regions added to the p2m overlap with the LAPIC page or
the PCIe MCFG regions for example, I will continue to work on a safer
version.
If you can give this a shot, please remove any rmrr options from the
command line and use iommu=debug in order to catch any issues.
Thanks, Roger.
---8<---
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 2c44fabf99..76a1fd6681 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -21,6 +21,8 @@
#include <xen/keyhandler.h>
#include <xsm/xsm.h>
+#include <asm/setup.h>
+
static int parse_iommu_param(const char *s);
static void iommu_dump_p2m_table(unsigned char key);
@@ -47,6 +49,8 @@ integer_param("iommu_dev_iotlb_timeout",
iommu_dev_iotlb_timeout);
* no-igfx Disable VT-d for IGD devices (insecure)
* no-amd-iommu-perdev-intremap Don't use per-device interrupt remapping
* tables (insecure)
+ * inclusive Include any memory ranges below 4GB not used
+ * by Xen or unusable to the iommu page tables.
*/
custom_param("iommu", parse_iommu_param);
bool_t __initdata iommu_enable = 1;
@@ -60,6 +64,7 @@ bool_t __read_mostly iommu_passthrough;
bool_t __read_mostly iommu_snoop = 1;
bool_t __read_mostly iommu_qinval = 1;
bool_t __read_mostly iommu_intremap = 1;
+bool __read_mostly iommu_inclusive = true;
/*
* In the current implementation of VT-d posted interrupts, in some extreme
@@ -126,6 +131,8 @@ static int __init parse_iommu_param(const char *s)
iommu_dom0_strict = val;
else if ( !strncmp(s, "sharept", ss - s) )
iommu_hap_pt_share = val;
+ else if ( !strncmp(s, "inclusive", ss - s) )
+ iommu_inclusive = val;
else
rc = -EINVAL;
@@ -165,6 +172,85 @@ static void __hwdom_init check_hwdom_reqs(struct domain *d)
iommu_dom0_strict = 1;
}
+static void __hwdom_init setup_inclusive_mappings(struct domain *d)
+{
+ unsigned long i, j, tmp, top, max_pfn;
+
+ BUG_ON(!is_hardware_domain(d));
+
+ max_pfn = (GB(4) >> PAGE_SHIFT) - 1;
+ top = max(max_pdx, pfn_to_pdx(max_pfn) + 1);
+
+ for ( i = 0; i < top; i++ )
+ {
+ unsigned long pfn = pdx_to_pfn(i);
+ bool map;
+ int rc = 0;
+
+ /*
+ * Set up 1:1 mapping for dom0. Default to include only
+ * conventional RAM areas and let RMRRs include needed reserved
+ * regions. When set, the inclusive mapping additionally maps in
+ * every pfn up to 4GB except those that fall in unusable ranges.
+ */
+ if ( pfn > max_pfn && !mfn_valid(_mfn(pfn)) )
+ continue;
+
+ if ( is_pv_domain(d) && iommu_inclusive && pfn <= max_pfn )
+ map = !page_is_ram_type(pfn, RAM_TYPE_UNUSABLE);
+ else if ( is_hvm_domain(d) && iommu_inclusive )
+ map = page_is_ram_type(pfn, RAM_TYPE_RESERVED);
+ else
+ map = page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL);
+
+ if ( !map )
+ continue;
+
+ /* Exclude Xen bits */
+ if ( xen_in_range(pfn) )
+ continue;
+
+ /*
+ * If dom0-strict mode is enabled or guest type is HVM/PVH then exclude
+ * conventional RAM and let the common code map dom0's pages.
+ */
+ if ( (iommu_dom0_strict || is_hvm_domain(d)) &&
+ page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) )
+ continue;
+
+ /* For HVM avoid memory below 1MB because that's already mapped. */
+ if ( is_hvm_domain(d) && pfn < PFN_DOWN(MB(1)) )
+ continue;
+
+ tmp = 1 << (PAGE_SHIFT - PAGE_SHIFT_4K);
+ for ( j = 0; j < tmp; j++ )
+ {
+ int ret;
+
+ if ( iommu_use_hap_pt(d) )
+ {
+ ASSERT(is_hvm_domain(d));
+ ret = set_identity_p2m_entry(d, pfn * tmp + j, p2m_access_rw,
+ 0);
+ }
+ else
+ ret = iommu_map_page(d, pfn * tmp + j, pfn * tmp + j,
+ IOMMUF_readable|IOMMUF_writable);
+
+ if ( !rc )
+ rc = ret;
+ }
+
+ if ( rc )
+ printk(XENLOG_WARNING " d%d: IOMMU mapping failed: %d\n",
+ d->domain_id, rc);
+
+ if (!(i & (0xfffff >> (PAGE_SHIFT - PAGE_SHIFT_4K))))
+ process_pending_softirqs();
+ }
+
+}
+
void __hwdom_init iommu_hwdom_init(struct domain *d)
{
const struct domain_iommu *hd = dom_iommu(d);
@@ -207,7 +293,10 @@ void __hwdom_init iommu_hwdom_init(struct domain *d)
d->domain_id, rc);
}
- return hd->platform_ops->hwdom_init(d);
+ hd->platform_ops->hwdom_init(d);
+
+ if ( !iommu_passthrough )
+ setup_inclusive_mappings(d);
}
void iommu_teardown(struct domain *d)
diff --git a/xen/drivers/passthrough/vtd/extern.h
b/xen/drivers/passthrough/vtd/extern.h
index fb7edfaef9..91cadc602e 100644
--- a/xen/drivers/passthrough/vtd/extern.h
+++ b/xen/drivers/passthrough/vtd/extern.h
@@ -99,6 +99,4 @@ void pci_vtd_quirk(const struct pci_dev *);
bool_t platform_supports_intremap(void);
bool_t platform_supports_x2apic(void);
-void vtd_set_hwdom_mapping(struct domain *d);
-
#endif // _VTD_EXTERN_H_
diff --git a/xen/drivers/passthrough/vtd/iommu.c
b/xen/drivers/passthrough/vtd/iommu.c
index 1710256823..569ec4aec2 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1304,12 +1304,6 @@ static void __hwdom_init intel_iommu_hwdom_init(struct
domain *d)
{
struct acpi_drhd_unit *drhd;
- if ( !iommu_passthrough && is_pv_domain(d) )
- {
- /* Set up 1:1 page table for hardware domain. */
- vtd_set_hwdom_mapping(d);
- }
-
setup_hwdom_pci_devices(d, setup_hwdom_device);
setup_hwdom_rmrr(d);
diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c
b/xen/drivers/passthrough/vtd/x86/vtd.c
index cc2bfea162..9971915349 100644
--- a/xen/drivers/passthrough/vtd/x86/vtd.c
+++ b/xen/drivers/passthrough/vtd/x86/vtd.c
@@ -32,11 +32,9 @@
#include "../extern.h"
/*
- * iommu_inclusive_mapping: when set, all memory below 4GB is included in dom0
- * 1:1 iommu mappings except xen and unusable regions.
+ * iommu_inclusive_mapping: superseded by iommu=inclusive.
*/
-static bool_t __hwdom_initdata iommu_inclusive_mapping = 1;
-boolean_param("iommu_inclusive_mapping", iommu_inclusive_mapping);
+boolean_param("iommu_inclusive_mapping", iommu_inclusive);
void *map_vtd_domain_page(u64 maddr)
{
@@ -107,67 +105,3 @@ void hvm_dpci_isairq_eoi(struct domain *d, unsigned int
isairq)
}
spin_unlock(&d->event_lock);
}
-
-void __hwdom_init vtd_set_hwdom_mapping(struct domain *d)
-{
- unsigned long i, j, tmp, top, max_pfn;
-
- BUG_ON(!is_hardware_domain(d));
-
- max_pfn = (GB(4) >> PAGE_SHIFT) - 1;
- top = max(max_pdx, pfn_to_pdx(max_pfn) + 1);
-
- for ( i = 0; i < top; i++ )
- {
- unsigned long pfn = pdx_to_pfn(i);
- bool map;
- int rc = 0;
-
- /*
- * Set up 1:1 mapping for dom0. Default to include only
- * conventional RAM areas and let RMRRs include needed reserved
- * regions. When set, the inclusive mapping additionally maps in
- * every pfn up to 4GB except those that fall in unusable ranges.
- */
- if ( pfn > max_pfn && !mfn_valid(_mfn(pfn)) )
- continue;
-
- if ( iommu_inclusive_mapping && pfn <= max_pfn )
- map = !page_is_ram_type(pfn, RAM_TYPE_UNUSABLE);
- else
- map = page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL);
-
- if ( !map )
- continue;
-
- /* Exclude Xen bits */
- if ( xen_in_range(pfn) )
- continue;
-
- /*
- * If dom0-strict mode is enabled then exclude conventional RAM
- * and let the common code map dom0's pages.
- */
- if ( iommu_dom0_strict &&
- page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) )
- continue;
-
- tmp = 1 << (PAGE_SHIFT - PAGE_SHIFT_4K);
- for ( j = 0; j < tmp; j++ )
- {
- int ret = iommu_map_page(d, pfn * tmp + j, pfn * tmp + j,
- IOMMUF_readable|IOMMUF_writable);
-
- if ( !rc )
- rc = ret;
- }
-
- if ( rc )
- printk(XENLOG_WARNING VTDPREFIX " d%d: IOMMU mapping failed: %d\n",
- d->domain_id, rc);
-
- if (!(i & (0xfffff >> (PAGE_SHIFT - PAGE_SHIFT_4K))))
- process_pending_softirqs();
- }
-}
-
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 6b42e3b876..15d6584837 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -35,6 +35,7 @@ extern bool_t iommu_snoop, iommu_qinval, iommu_intremap,
iommu_intpost;
extern bool_t iommu_hap_pt_share;
extern bool_t iommu_debug;
extern bool_t amd_iommu_perdev_intremap;
+extern bool iommu_inclusive;
extern unsigned int iommu_dev_iotlb_timeout;
_______________________________________________
Xen-devel mailing list
[email protected]
https://lists.xenproject.org/mailman/listinfo/xen-devel