On 01/10/2021 16:08, Juergen Gross wrote: > On 01.10.21 16:29, Andrew Cooper wrote: >> On 01/10/2021 15:19, Jan Beulich wrote: >>> On 24.09.2021 21:39, Alex Olson wrote: >>>> Inspired by an earlier attempt by Chao Gao <[email protected]>, >>>> this revision aims to put the hypervisor in control of x86 APIC >>>> identifier >>>> definition instead of hard-coding a formula in multiple places >>>> (libxl, hvmloader, hypervisor). >>>> >>>> This is intended as a first step toward exposing/altering CPU topology >>>> seen by guests. >>>> >>>> Changes: >>>> >>>> - Add field to vlapic for holding default ID (on reset) >>>> >>>> - add HVMOP_get_vcpu_topology_id hypercall so libxl (for PVH domains) >>>> can access APIC ids needed for ACPI table definition prior to >>>> domain start. >>>> >>>> - For HVM guests, hvmloader now also uses the same hypercall. >>>> >>>> - Make CPUID code use vlapic ID instead of hard-coded formula >>>> for runtime reporting to guests >>> I'm afraid a primary question from back at the time remains: How is >>> migration of a guest from an old hypervisor to one with this change >>> in place going to work? >> >> I'm afraid its not. >> >> Fixing this is incredibly complicated. I have a vague plan, but it >> needs building on the still-pending libxl cpuid work of Rogers. >> >> Both the toolstack and Xen need to learn about how to describe topology >> correctly (and I'm afraid this patch isn't correct even for a number of >> the simple cases), and know about "every VM booted up until this point >> in time" being wrong. > > What about: > > - adding APIC-Id to the migration stream > - adding an optional translation layer for guest APIC-Id to the > hypervisor > - adding the functionality to set a specific APIC-Id for a vcpu > (will use the translation layer if not the same as preferred > by the hypervisor)
The vCPU APIC IDs are already in the migration stream. They're just too late in the stream for any easy fixup. A second problem we have is that (x)APIC IDs are writeable under Xen, but writeability of the register is a model specific trait to being with. Furthermore, you get potentially differing behaviour depending on whether APICV is enabled or not. I plan to fix this by simply outlawing it - OSes don't renumber the APICs in the first place (just like they don't move the MMIO window), and all they'll do is break things. The main topology problem is that we have no interlink between the CPUID-described data, and the default APIC IDs chosen. There are 5 different algorithms to choose from (vendor and CPU dependent) and we implement 0 of them. The xl config file needs more than just cpuid= data to express the topology correctly, because for non-power-of-two systems, there need to be gaps in the APIC_ID space, and this needs communicating to Xen too. (For old AMD, we also need a slide, but we can probably leave that as an exercise to anyone who cares, which seems to be noone so far). Either way, when the toolstack can reason about topologies correctly, we can extend the xl json in the stream. The absense of the marker serves as "This VM didn't boot with sane topology", which we can use the fallback logic (see libxl__cpuid_legacy() and the soon to exist companion in libxc) to re-synthesize the old pattern for when data is missing in the stream. Any change to the topology algorithms before the toolstack is capable of doing everything else leaves us with two[1] different kinds of VMs that we can't tell apart, and cannot cope with in a compatible way. ~Andrew [1] Actually 3. XenServer still has a revert of ca2eee92df44 in the patchqueue because that broke VMs which migrated across the point. As it's from 2008, pre-and-post VMs aren't something we need to care about, because anyone still running Xen 3.3 has far bigger problems than this to worry about.
