* Gerd Hoffmann ([email protected]) wrote:
> Hi,
>
> > Something somewhere in qemu/ kernel/ firmware is already reading the number
> > of physical bits to determine PCI mapping; if I do:
> >
> > ./x86_64-softmmu/qemu-system-x86_64 -m 4096,slots=16,maxmem=128T
>
> No, it's not the physbits. You add some memory hotplug slots here.
> Qemu will ask seabios to reserve address space for those, which seabios
> promptly does and maps 64bit pci bars above the reserved address space.
Right, that's what I was trying to do - I wanted to see if I could get something
to use the non-existing address space.
> > -vga none -device
> > qxl-vga,bus=pcie.0,ram_size_mb=2048,vram64_size_mb=2048 -vnc 0.0.0.0:0
> > /home/vms/7.2a.qcow2 -chardev stdio,mux=on,id=mon -mon
> > chardev=mon,mode=readline -cpu host,phys-bits=48
> >
> > it will happily map the qxl VRAM right up high, but if I lower
> > the phys-bits down to 46 it won't.
>
> I suspect the linux kernel remaps the bar because the seabios mapping is
> unreachable. Check dmesg.
Right, and that is dependent on physbits; if I run with:
./x86_64-softmmu/qemu-system-x86_64 -machine q35,accel=kvm,usb=off -m
4096,slots=16,maxmem=128T -vga none -device
qxl-vga,bus=pcie.0,ram_size_mb=2048,vram64_size_mb=2048 -vnc 0.0.0.0:0
/home/vms/7.2a.qcow2 -chardev stdio,mux=on,id=mon -mon
chardev=mon,mode=readline -cpu host,phys-bits=48
(on a 46 bit xeon) it happily maps that 64-bit bar into somewhere
that shouldn't be accessible:
[ 0.266183] pci_bus 0000:00: root bus resource [mem
0x800480000000-0x8004ffffffff]
[ 0.321611] pci 0000:00:02.0: reg 0x20: [mem 0x800480000000-0x8004ffffffff
64bit pref]
[ 0.423257] pci_bus 0000:00: resource 8 [mem 0x800480000000-0x8004ffffffff]
lspci -v:
00:02.0 VGA compatible controller: Red Hat, Inc. QXL paravirtual graphic card
(rev 04) (prog-if 00 [VGA controller])
Subsystem: Red Hat, Inc QEMU Virtual Machine
Flags: fast devsel, IRQ 22
Memory at c0000000 (32-bit, non-prefetchable) [size=512M]
Memory at e0000000 (32-bit, non-prefetchable) [size=64M]
Memory at e4070000 (32-bit, non-prefetchable) [size=8K]
I/O ports at c080 [size=32]
Memory at 800480000000 (64-bit, prefetchable) [size=2G]
Expansion ROM at e4060000 [disabled] [size=64K]
Kernel driver in use: qxl
So that's mapped at an address beyond host phys-bits.
And it hasn't failed/crashed etc - but I guess maybe nothing is using that 2G
space?
If I change the phys-bits=48 to 46 the kernel avoids it:
[ 0.414867] acpi PNP0A08:00: host bridge window
[0x800480000000-0x8004ffffffff] (ignored, not CPU addressable)
[ 0.683134] pci 0000:00:02.0: can't claim BAR 4 [mem
0x800480000000-0x8004ffffffff 64bit pref]: no compatible bridge window
[ 0.703948] pci 0000:00:02.0: BAR 4: [mem size 0x80000000 64bit pref]
conflicts with PCI mem [mem 0x00000000-0x3fffffffffff]
[ 0.703951] pci 0000:00:02.0: BAR 4: failed to assign [mem size
0x80000000 64bit pref]
lspci shows:
Memory at <ignored> (64-bit, prefetchable)
(Although interesting qemu's info pci still shows it).
The 'ignored, not CPU addressable' comes from the kernel's
drivers/acpi/pci_root.c acpi_pci_root_validate_resources
that uses a value set in arch/x86/kernel/setup.c:
iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
So at least the Linux kernel does sanity check using the phys_bits value.
Obviously 128T is a bit silly for maxmem at the moment, however I was worrying
what
happens with 36/39/40bit hosts, and it's not unusual to pick a maxmem that's a
few TB
even if the VMs you're initially creating are only a handful of GB. (oVirt/RHEV
seems to use
a 4TB default for maxmem).
Still, this only hits as a problem if you hit the combination of:
a) You use large PCI bars
b) On a 36/39/40bit host
c) With a large maxmem that forces those PCI bars up to something silly.
Dave
>
> cheers,
> Gerd
>
--
Dr. David Alan Gilbert / [email protected] / Manchester, UK