On Wed, Mar 03, 2021 at 04:26:39PM +0100, Igor Mammedov wrote: > On Wed, 3 Mar 2021 10:49:08 +0100 > David Hildenbrand <[email protected]> wrote: > > > On 02.03.21 19:43, David Hildenbrand wrote: > > >>>> The resizeable memory region that is created for the cmd blob has a > > >>>> maximum > > >>>> size of ACPI_BUILD_ALIGN_SIZE - 4k. This used to be sufficient, > > >>>> however, > > >> > > >> The expression "ACPI_BUILD_ALIGN_SIZE - 4k" makes no sense to me. > > >> ACPI_BUILD_ALIGN_SIZE is #defined in "hw/i386/acpi-build.c" as 0x1000, > > >> so the difference (ACPI_BUILD_ALIGN_SIZE - 4k) is zero. > > >> > > >> (1) Did you mean "ACPI_BUILD_ALIGN_SIZE -- 4k"? IOW, did you mean to > > >> quote the value of the macro? > > >> > > >> If you mean an em dash, then please use an em dash, not a hyphen (or > > >> please use parens). > > > > > > Yes, or rather use ACPI_BUILD_ALIGN_SIZE (4k). > > > > > >> > > >> > > >>>> as we try fitting in additional data (e.g., vmgenid, nvdimm, > > >>>> intel-iommu), > > >>>> we require more than 4k and can crash QEMU when trying to resize the > > >>>> resizeable memory region beyond its maximum size: > > >>>> $ build/qemu-system-x86_64 --enable-kvm \ > > >>>> -machine q35,nvdimm=on \ > > >>>> -smp 1 \ > > >>>> -cpu host \ > > >>>> -m size=2G,slots=8,maxmem=4G \ > > >>>> -object > > >>>> memory-backend-file,id=mem0,mem-path=/tmp/nvdimm,size=256M \ > > >>>> -device nvdimm,label-size=131072,memdev=mem0,id=nvdimm0,slot=1 > > >>>> \ > > >>>> -nodefaults \ > > >>>> -device vmgenid \ > > >>>> -device intel-iommu > > >>>> > > >>>> Results in: > > >>>> Unexpected error in qemu_ram_resize() at ../softmmu/physmem.c:1850: > > >>>> qemu-system-x86_64: Size too large: /rom@etc/table-loader: > > >>>> 0x2000 > 0x1000: Invalid argument > > >>>> > > >>>> We try growing the resizeable memory region (resizeable RAMBlock) > > >>>> beyond > > >>>> its maximum size. Let's increase the maximum size from 4k to 64k, which > > >>>> should be good enough for the near future. > > >> > > >> The existent code calls acpi_align_size(), for resizing the ACPI blobs > > >> (the GArray objects). > > >> > > >> (Unfortunately, the acpi_align_size() function is duplicated between > > >> "hw/i386/acpi-build.c" and "hw/arm/virt-acpi-build.c", which seems > > >> unjustified -- but anyway, I digress.) > > >> > > >> This seems to come from commit 868270f23d8d ("acpi-build: tweak acpi > > >> migration limits", 2014-07-29) and commit 451b157041d2 ("acpi: Align the > > >> size to 128k", 2020-12-08). > > >> > > >> (2) Why is the logic added in those commits insufficient? > > > > > > We are dealing with different blobs here (tables_blob vs. cmd_blob). > > > > > >> > > >> What is the exact call tree that triggers the above error? > > > > > > [...] > > > > > > acpi_build_update()->acpi_build_update()->memory_region_ram_resize()->qemu_ram_resize() > > > > > > A longer calltrace can be found in > > > https://bugzilla.redhat.com/show_bug.cgi?id=1927159. > > > > > >>>> +++ b/hw/i386/acpi-microvm.c > > >>>> @@ -255,7 +255,8 @@ void acpi_setup_microvm(MicrovmMachineState *mms) > > >>>> ACPI_BUILD_TABLE_MAX_SIZE); > > >>>> acpi_add_rom_blob(acpi_build_no_update, NULL, > > >>>> tables.linker->cmd_blob, > > >>>> - "etc/table-loader", 0); > > >>>> + ACPI_BUILD_LOADER_FILE, > > >>>> + ACPI_BUILD_LOADER_MAX_SIZE); > > >>>> acpi_add_rom_blob(acpi_build_no_update, NULL, > > >>>> tables.rsdp, > > >>>> ACPI_BUILD_RSDP_FILE, 0); > > >> > > >> (3) Why are we using a different "tool" here, from the previous > > >> approach? We're no longer setting the GArray sizes; instead, we make the > > >> "rom->romsize" fields diverge from -- put differently, grow beyond -- > > >> "rom->datasize". Why is that useful? What are the consequences? > > > > > > See ACPI_BUILD_TABLE_MAX_SIZE handling just in the acpi_add_rom_blob() > > > above. > > > > > >> > > >> Where is it ensured that data between "rom->datasize" and "rom->romsize" > > >> reads as zeroes? > > > We only expose the current memory_region_size() to our guest, which is > > > always multiples of 4k pages. > > > > > > rom->datasize and rom->romsize will be multiple of 4k AFAIKs. > > > > > > acpi_align_size()-> g_array_set_size() will take care of zeroing out > > > any unused parts within a single 4k page. > > > > > > So all unused, guest-visible part should always be 0 I think. > > > > > >> > > >> > > >>>> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h > > >>>> index 380d3e3924..93cdfd4006 100644 > > >>>> --- a/include/hw/acpi/aml-build.h > > >>>> +++ b/include/hw/acpi/aml-build.h > > >>>> @@ -6,6 +6,7 @@ > > >>>> > > >>>> /* Reserve RAM space for tables: add another order of magnitude. */ > > >>>> #define ACPI_BUILD_TABLE_MAX_SIZE 0x200000 > > >>>> +#define ACPI_BUILD_LOADER_MAX_SIZE 0x40000 > > >>>> > > >>>> #define ACPI_BUILD_APPNAME6 "BOCHS " > > >>>> #define ACPI_BUILD_APPNAME8 "BXPC " > > >>> > > >> > > >> The commit message says "Let's increase the maximum size from 4k to > > >> 64k", and I have two problems with that: > > >> > > >> (4a) I have no idea where the current "4k" size comes from. (In case the > > >> 4k refers to ACPI_BUILD_ALIGN_SIZE, then why are we not changing that > > >> macro?) > > > > > > Changing ACPI_BUILD_ALIGN_SIZE would affect the legacy_table_size in > > > acpi_build() - so that can't be right. > > > > > > What would also work is something like (to be improved) > > > > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c > > > index 45ad2f9533..49cfedddc8 100644 > > > --- a/hw/i386/acpi-build.c > > > +++ b/hw/i386/acpi-build.c > > > @@ -81,6 +81,8 @@ > > > #define ACPI_BUILD_LEGACY_CPU_AML_SIZE 97 > > > #define ACPI_BUILD_ALIGN_SIZE 0x1000 > > > > > > +#define ACPI_BUILD_LOADER_ALIGN_SIZE 0x2000 > > > + > > > #define ACPI_BUILD_TABLE_SIZE 0x20000 > > > > > > /* #define DEBUG_ACPI_BUILD */ > > > @@ -2613,10 +2615,10 @@ void acpi_build(AcpiBuildTables *tables, > > > MachineState *machine) > > > error_printf("Try removing CPUs, NUMA nodes, memory slots" > > > " or PCI bridges."); > > > } > > > - acpi_align_size(tables_blob, ACPI_BUILD_TABLE_SIZE); > > > + acpi_align_size(tables_blob, ACPI_BUILD_ALIGN_SIZE); > > > } > > > > > > - acpi_align_size(tables->linker->cmd_blob, ACPI_BUILD_ALIGN_SIZE); > > > + acpi_align_size(tables->linker->cmd_blob, > > > ACPI_BUILD_LOADER_ALIGN_SIZE); > > > > > > > > > At least for hw/i386/acpi-build.c. > > > > > > We will end up creating the resizeable memory region/RAMBlock always with > > > a size=maximum_size=8k. (could also go for 64k here) > > > > > > The only downside is that we might expose a bigger area to the > > > guest than necessary (e.g., 8k instead of 4k) and will e.g., migrate > > > 8k instead of 4k (not that we care). > > > > > > > > > On incoming migration from older QEMU versions, we should be able to just > > > shrink back from 8k to 4k - so migration from older QEMY versions should > > > continue working just fine. > > > > Correction: Older QEMU versions (e.g., before > > 62be4e3a5041e84304aa23637da623a205c53ecc) did not support resizeable RAM > > MemoryRegions / RAMBlocks. This affects ~ < QEMU v2.3.0. > > > > So unconditionally changing the size of the cmd_blob memory region > > (e.g., 4k -> 8k) would most probably break migration from never QEMU to > > older QEMU (v2.2.0.). Not sure if we really care. > > > > @MST, Igor what's your take? > We shouldn't change aligned size (an alignment value), since it's what goes > on migration stream wire. > Changing max should not affect migrations stream directly. > In most cases ping-pong migration should work as both sides will have > the same configuration, in unlikely case newer QEMU goes over current 4k, > it will jump to the next aligned size (8k) and migration will fail cleanly > due size mismatch and it can't be made any more prettier. > (similar to border cases when we switched to resizable regions for main tables > blob)
Right. We can backort the change in the stable tree too. I do think that we should add some kind of entry to the command though when mcfg is disabled so the size doesn't change like that. Will avoid weird failures if there's a convoluted config which overflows again, it will fail cleanly on qemu start. How about a dummy SSDT? -- MST
