"Arnd Bergmann" <a...@arndb.de> writes: > On Sun, Oct 20, 2024, at 17:39, Naresh Kamboju wrote: >> On Fri, 18 Oct 2024 at 12:35, Naresh Kamboju <naresh.kamb...@linaro.org> >> wrote: >>> >>> The QEMU-ARMv7 boot has failed with the Linux next-20241017 tag. >>> The boot log is incomplete, and no kernel crash was detected. >>> However, the system did not proceed far enough to reach the login prompt. >>> > >> Anders bisected this boot regressions and found, >> # first bad commit: >> [efe8419ae78d65e83edc31aad74b605c12e7d60c] >> vdso: Introduce vdso/page.h >> >> We are investigating the reason for boot failure due to this commit. > > Anders and I did the analysis on this, the problem turned out > to be the early_init_dt_add_memory_arch() function in > drivers/of/fdt.c, which does bitwise operations on PAGE_MASK > with a 'u64' instead of phys_addr_t: > > void __init __weak early_init_dt_add_memory_arch(u64 base, u64 size) > { > const u64 phys_offset = MIN_MEMBLOCK_ADDR; > > if (size < PAGE_SIZE - (base & ~PAGE_MASK)) { > pr_warn("Ignoring memory block 0x%llx - 0x%llx\n", > base, base + size); > return; > } > > if (!PAGE_ALIGNED(base)) { > size -= PAGE_SIZE - (base & ~PAGE_MASK); > base = PAGE_ALIGN(base); > } > > On non-LPAE arm32, this broke the existing behavior for > large 32-bit memory sizes. The obvious fix is to change > back the PAGE_MASK definition for 32-bit arm to a signed > number.
Agreed. However I think we were masking a calling issue that: /* Actual RAM size depends on initial RAM and device memory settings */ [VIRT_MEM] = { GiB, LEGACY_RAMLIMIT_BYTES }, And: -m 4G make no sense with no ARM_LPAE (which the kernel didn't have) but if you pass -machine virt,gic-version=3,highmem=off (the default changed awhile back) you will get a warning: qemu-system-arm: Addressing limited to 32 bits, but memory exceeds it by 1073741824 bytes but I guess that didn't trigger for some reason before this patch? > mips32, ppc32 and hexagon had the same definition as > well, so I think we should change at least those in order > to restore the previous behavior in case they are affected > by the same bug (or a different one). > > x86-32 and arc git flipped the other way by the patch, > from unsigned to signed, when CONFIG_ARC_HAS_PAE40 > or CONFIG_X86_PAE are set. I think we should keep > the 'signed' behavior as this was a bugfix by itself, > but we may want to change arc and x86-32 with short > phys_addr_t the same way for consistency. > > On csky, m68k, microblaze, nios2, openrisc, parisc32, > riscv32, sh, sparc32, um and xtensa, we've always used > the 'unsigned' PAGE_MASK, and there is no 64-bit > phys_addr_t, so I would lean towards staying with > 'unsigned' in order to not introduce a regression. > Alternatively we could choose to go with the 'signed' > version on all 32-bit architectures unconditionally > for consistency. Any preferences? > > Arnd -- Alex Bennée Virtualisation Tech Lead @ Linaro