On 08/04/22 00:23, Michael S. Tsirkin wrote: > On Thu, Aug 04, 2022 at 12:08:07AM +0200, Jason A. Donenfeld wrote: >> Hi Michael, >> >> On Wed, Aug 03, 2022 at 06:03:20PM -0400, Michael S. Tsirkin wrote: >>> On Wed, Aug 03, 2022 at 07:07:52PM +0200, Jason A. Donenfeld wrote: >>>> On Wed, Aug 03, 2022 at 03:34:04PM +0200, Jason A. Donenfeld wrote: >>>>> On Wed, Aug 03, 2022 at 03:11:48PM +0200, Jason A. Donenfeld wrote: >>>>>> Thanks for the info. Very helpful. Looking into it now. >>>>> >>>>> So interestingly, this is not a new issue. If you pass any type of setup >>>>> data, OVMF appears to be doing something unusual and passing 0xffffffff >>>>> for all the entries, rather than the actual data. The reason this isn't >>>>> new is: try passing `-dtb any/dtb/at/all/from/anywhere` and you get the >>>>> same page fault, on all QEMU versions. The thing that passes the DTB is >>>>> the thing that passes the RNG seed. Same mechanism, same bug. >>>>> >>>>> I'm looking into it... >>>> >>>> Fixed with: >>>> https://lore.kernel.org/all/20220803170235.1312978-1-ja...@zx2c4.com/ >>>> >>>> Feel free to join into the discussion there. I CC'd you. >>>> >>>> Jason >>> >>> Hmm I don't think this patch will make it in 7.1 given the >>> timeframe. I suspect we should revert the patch for now. >>> >>> Which is where you maybe begin to see why we generally >>> prefer doing it with features - one can then work around >>> bugs by turning the feature on and off. >> >> The bug actually precedes this patch. Just boot with -dtb on any qemu >> version and you'll trigger it.
Yes! That's exactly what I expected, per https://bugzilla.redhat.com/show_bug.cgi?id=2114637#c4 There I wrote that this kind of setup_data chaining was a "first" for OVMF. While the same logic had existed in QEMU with for chaining a dtb, there never had been a reason (that I could imagine) for using that with OVMF guests. So it had to be either a preexistent bug in QEMU, or one in OVMF, that now got triggered (as Jason's patch for chaining the seed closely followed the pattern set by the dtb logic). Now, regarding the patch at <https://lore.kernel.org/all/20220803170235.1312978-1-ja...@zx2c4.com/>, including v2 at <https://lore.kernel.org/all/20220804004411.1343158-1-ja...@zx2c4.com/>... I don't think this kind of setup_data block chaining, with raw guest-physical addresses filled in by QEMU in guest RAM, in advance, is appropriate for an edk2 guest *in general*. By the time the firmware loads the kernel (including setup block and kernel block) from fw_cfg, the area in question (with the seed etc) may have been overwritten several times. Edk2 is very careful about memory ownership, but it needs the VMM and the guest OS to play along. There is a only very small set of "well known addresses" that are (a) open-coded in both QEMU board code and edk2 platform code and (b) not specified by industry specs; such addresses are used to set up everything else, and we seek not to introduce more of them. Consider e.g. the end of <https://www.kernel.org/doc/Documentation/x86/boot.rst>, namely the deprecation of the "EFI Handover Protocol". The idea is to use well-specified information channels that don't depend on the placement of the kernel. At least two mechanisms exist for dealing with this; the ACPI linker-loader, and the GUID-ed struct chaining in pflash (mostly used with SEV and I think TDX too). More below. > > Sure but it's still a regression. > >> We're still at rc0; there should be time >> enough for a bug fix. Please do chime in on that thread and maybe we can >> come up with something reasonable fast enough. >> >> Jason > > Maybe. QEMU commit 67f7e426e538 ("hw/i386: pass RNG seed via setup_data entry", 2022-07-22) references <https://git.kernel.org/tip/tip/c/68b8e9713c8>, and the commit message on that begins with: ---------- Currently, the only way x86 can get an early boot RNG seed is via EFI, which is generally always used now for physical machines, but is very rarely used in VMs, especially VMs that are optimized for starting "instantaneously", such as Firecracker's MicroVM. For tiny fast booting VMs, EFI is not something you generally need or want. ---------- So, first, I'd quite disagree with "EFI being rarely used in VMs" (the trend has been the opposite), and I'm not saying that because I'm married to edk2 (I switched to a different project last summer). I do agree with EFI being rarely used in one-shot, fast-booting VMs though. Second, I think this segmentation of use cases is actually great. If you need this particular kind of seed-passing for non-EFI VMs only (i.e., where the UEFI stub in the guest kernel cannot rely on EFI_RNG_PROTOCOL), then implement it in both QEMU and the (guest) kernel for non-EFI only. Both the guest kernel and QEMU can tell whether the guest firmware is UEFI (the guest kernel can tell that precisely, whereas in QEMU, if memory serves, we equate that with a particular pflash setup). Again, I don't think the setup_data chaining, using GPAs stored by QEMU for linkage, can be salvaged at all for UEFI guests. Normally some dedicated (possibly new) information channel would be needed that lets the firmware stay in control, but thankfully, this use case looks explicitly irrelevant for UEFI guests. So "just restrict the whole thing to non-UEFI guests" would be my proposal. ( Yes, the existent DTB linking in QEMU is broken, and has been broken forever, for edk2 (UEFI) guests. It never mattered in practice: edk2 guest firmware very explicitly deals with DTB passing (mostly for arm/aarch64, but maybe Gerd's microvm uses DTB too; I can't tell, I'd not been there), so I've never seen "dtb via setup_data" in practice. There has never been a reason to use that. On the other hand, commit 67f7e426e538 enables setup_data chaining by default, and that seems to break UEFI guests (with direct kernel boot anyway) summarily. It's more serious than one might think at first sight; "virt-install --location" uses direct (= fw_cfg) kernel boot. ) I think that restricting "seed passing via setup_data" to non-UEFI guests should be doable during Hard Feature Freeze too. The guest kernel patch should be possible to do later. Sorry about the wall of text. Laszlo