On Fri, Feb 25, 2022 at 2:30 AM Michael S. Tsirkin <[email protected]> wrote:
>
> On Thu, Feb 24, 2022 at 05:54:58PM +0000, Joao Martins wrote:
> > On 2/24/22 17:23, Michael S. Tsirkin wrote:
> > > On Thu, Feb 24, 2022 at 04:07:22PM +0000, Joao Martins wrote:
> > >> On 2/23/22 23:35, Joao Martins wrote:
> > >>> On 2/23/22 21:22, Michael S. Tsirkin wrote:
> > >>>>> +static void x86_update_above_4g_mem_start(PCMachineState *pcms,
> > >>>>> + uint64_t pci_hole64_size)
> > >>>>> +{
> > >>>>> + X86MachineState *x86ms = X86_MACHINE(pcms);
> > >>>>> + uint32_t eax, vendor[3];
> > >>>>> +
> > >>>>> + host_cpuid(0x0, 0, &eax, &vendor[0], &vendor[2], &vendor[1]);
> > >>>>> + if (!IS_AMD_VENDOR(vendor)) {
> > >>>>> + return;
> > >>>>> + }
> > >>>>
> > >>>> Wait a sec, should this actually be tying things to the host CPU ID?
> > >>>> It's really about what we present to the guest though,
> > >>>> isn't it?
> > >>>
> > >>> It was the easier catch all to use cpuid without going into
> > >>> Linux UAPI specifics. But it doesn't have to tie in there, it is only
> > >>> for systems with an IOMMU present.
> > >>>
> > >>>> Also, can't we tie this to whether the AMD IOMMU is present?
> > >>>>
> > >>> I think so, I can add that. Something like a amd_iommu_exists() helper
> > >>> in util/vfio-helpers.c which checks if there's any sysfs child entries
> > >>> that start with ivhd in /sys/class/iommu/. Given that this HT region is
> > >>> hardcoded in iommu reserved regions since >=4.11 (to latest) I don't
> > >>> think it's
> > >>> even worth checking the range exists in:
> > >>>
> > >>> /sys/kernel/iommu_groups/0/reserved_regions
> > >>>
> > >>> (Also that sysfs ABI is >= 4.11 only)
> > >>
> > >> Here's what I have staged in local tree, to address your comment.
> > >>
> > >> Naturally the first chunk is what's affected by this patch the rest is a
> > >> precedessor patch to introduce qemu_amd_iommu_is_present(). Seems to pass
> > >> all the tests and what not.
> > >>
> > >> I am not entirely sure this is the right place to put such a helper, open
> > >> to suggestions. wrt to the naming of the helper, I tried to follow the
> > >> rest
> > >> of the file's style.
> > >>
> > >> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > >> index a9be5d33a291..2ea4430d5dcc 100644
> > >> --- a/hw/i386/pc.c
> > >> +++ b/hw/i386/pc.c
> > >> @@ -868,10 +868,8 @@ static void
> > >> x86_update_above_4g_mem_start(PCMachineState *pcms,
> > >> uint64_t pci_hole64_size)
> > >> {
> > >> X86MachineState *x86ms = X86_MACHINE(pcms);
> > >> - uint32_t eax, vendor[3];
> > >>
> > >> - host_cpuid(0x0, 0, &eax, &vendor[0], &vendor[2], &vendor[1]);
> > >> - if (!IS_AMD_VENDOR(vendor)) {
> > >> + if (!qemu_amd_iommu_is_present()) {
> > >> return;
> > >> }
> > >>
> > >> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> > >> index 7bcce3bceb0f..eb4ea071ecec 100644
> > >> --- a/include/qemu/osdep.h
> > >> +++ b/include/qemu/osdep.h
> > >> @@ -637,6 +637,15 @@ char *qemu_get_host_name(Error **errp);
> > >> */
> > >> size_t qemu_get_host_physmem(void);
> > >>
> > >> +/**
> > >> + * qemu_amd_iommu_is_present:
> > >> + *
> > >> + * Operating system agnostic way of querying if an AMD IOMMU
> > >> + * is present.
> > >> + *
> > >> + */
> > >> +bool qemu_amd_iommu_is_present(void);
> > >> +
> > >> /*
> > >> * Toggle write/execute on the pages marked MAP_JIT
> > >> * for the current thread.
> > >> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> > >> index f2be7321c59f..54cef21217c4 100644
> > >> --- a/util/oslib-posix.c
> > >> +++ b/util/oslib-posix.c
> > >> @@ -982,3 +982,32 @@ size_t qemu_get_host_physmem(void)
> > >> #endif
> > >> return 0;
> > >> }
> > >> +
> > >> +bool qemu_amd_iommu_is_present(void)
> > >> +{
> > >> + bool found = false;
> > >> +#ifdef CONFIG_LINUX
> > >> + struct dirent *entry;
> > >> + char *path;
> > >> + DIR *dir;
> > >> +
> > >> + path = g_strdup_printf("/sys/class/iommu");
> > >> + dir = opendir(path);
> > >> + if (!dir) {
> > >> + g_free(path);
> > >> + return found;
> > >> + }
> > >> +
> > >> + do {
> > >> + entry = readdir(dir);
> > >> + if (entry && !strncmp(entry->d_name, "ivhd", 4)) {
> > >> + found = true;
> > >> + break;
> > >> + }
> > >> + } while (entry);
> > >> +
> > >> + g_free(path);
> > >> + closedir(dir);
> > >> +#endif
> > >> + return found;
> > >> +}
> > >
> > > why are we checking whether an AMD IOMMU is present
> > > on the host?
> > > Isn't what we care about whether it's
> > > present in the VM? What we are reserving after all
> > > is a range of GPAs, not actual host PA's ...
> > >
> > Right.
> >
> > But any DMA map done by VFIO/IOMMU using those GPA ranges will fail,
> > and so we need to not map that portion of address space entirely
> > and use some other portion of the GPA-space. This has to
> > do with host IOMMU which is where the DMA mapping of guest PA takes
> > place. So, if you do not have an host IOMMU, you can't
> > service guest DMA/PCIe services via VFIO through the host IOMMU, therefore
> > you
> > don't need this problem.
> >
> > Whether one uses a guest IOMMU or not does not affect the result,
> > it would be more of a case of mimicking real hardware not fixing
> > the issue of this series.
>
>
> Hmm okay ... my first reaction was to say let's put it under VFIO then.
> And ideally move the logic reporting reserved ranges there too.
> However, I think vdpa has the same issue too.
> CC Jason for an opinion.
vDPA is even more complicated than VFIO as it allows the device to
reserve specific IOVA ranges:
static void vhost_vdpa_set_iova_range(struct vhost_vdpa *v)
{
struct vdpa_iova_range *range = &v->range;
struct vdpa_device *vdpa = v->vdpa;
const struct vdpa_config_ops *ops = vdpa->config;
if (ops->get_iova_range) {
*range = ops->get_iova_range(vdpa);
...
}
I wonder if we need to expose those via netlink protocol.
Thanks
> Also, some concerns
> - I think this changes memory layout for working VMs not using VFIO.
> Better preserve the old layout for old machine types?
>
> - You mention the phys bits issue very briefly, and it's pretty
> concerning. Do we maybe want to also disable the work around if phys
> bits is too small? Also, we'd need to test a bunch of old
> guests to see what happens. Which guests were tested?
>
> --
> MST
>