On 15.07.2016 10:35, David Gibson wrote:
> On Fri, Jul 15, 2016 at 10:10:25AM +0200, Thomas Huth wrote:
>> Commit 86b50f2e1bef ("Disable huge page support if it is not available
>> for main RAM") already made sure that huge page support is not announced
>> to the guest if the normal RAM of non-NUMA configurations is not backed
>> by a huge page filesystem. However, there is one more case that can go
>> wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured
>> with huge page support (and only the memory of a DIMM is configured with
>> it). When QEMU is started with the following command line for example,
>> the Linux guest currently crashes because it is trying to use huge pages
>> on a memory region that does not support huge pages:
>>
>> qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \
>>
>> memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 \
>> -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \
>> -numa node,nodeid=0 -numa node,nodeid=1
>>
>> To fix this issue, we've got to make sure to disable huge page support,
>> too, when there is a NUMA node that is not using a memory backend with
>> huge page support.
>>
>> Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740
>> Signed-off-by: Thomas Huth <[email protected]>
>> ---
>> target-ppc/kvm.c | 10 +++++++---
>> 1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
>> index 884d564..7a8f555 100644
>> --- a/target-ppc/kvm.c
>> +++ b/target-ppc/kvm.c
>> @@ -389,12 +389,16 @@ static long getrampagesize(void)
>>
>> object_child_foreach(memdev_root, find_max_supported_pagesize, &hpsize);
>>
>> - if (hpsize == LONG_MAX) {
>> + if (hpsize == LONG_MAX || hpsize == getpagesize()) {
>> return getpagesize();
>> }
>>
>> - if (nb_numa_nodes == 0 && hpsize > getpagesize()) {
>> - /* No NUMA nodes and normal RAM without -mem-path ==> no huge
>> pages! */
>> + /* If NUMA is disabled or the NUMA nodes are not backed with a
>> + * memory-backend, then there is at least one node using "normal"
>> + * RAM. And since normal RAM has not been configured with "-mem-path"
>> + * (what we've checked earlier here already), we can not use huge pages!
>> + */
>> + if (nb_numa_nodes == 0 || numa_info[0].node_memdev == NULL) {
>
> Is that second clause sufficient, or do you need to loop through and
> check the memdev of every node?Checking the first entry should be sufficient. QEMU forces you to specify either a memory backend for all NUMA nodes (which we should have looked at during the object_child_foreach() some lines earlier), or you must not specify a memory backend for any NUMA node at all. You can not mix the settings, so checking numa_info[0] is enough. Thomas
signature.asc
Description: OpenPGP digital signature
