On 13.01.2023 00:15, Julien Grall wrote:
> Hi,
> 
> On 04/01/2023 10:23, Jan Beulich wrote:
>> On 23.12.2022 12:31, Julien Grall wrote:
>>> On 20/12/2022 15:30, Jan Beulich wrote:
>>>> On 16.12.2022 12:48, Julien Grall wrote:
>>>>> From: Hongyan Xia <[email protected]>
>>>>>
>>>>> This avoids the assumption that boot pages are in the direct map.
>>>>>
>>>>> Signed-off-by: Hongyan Xia <[email protected]>
>>>>> Signed-off-by: Julien Grall <[email protected]>
>>>>
>>>> Reviewed-by: Jan Beulich <[email protected]>
>>>>
>>>> However, ...
>>>>
>>>>> --- a/xen/arch/x86/srat.c
>>>>> +++ b/xen/arch/x86/srat.c
>>>>> @@ -139,7 +139,8 @@ void __init acpi_numa_slit_init(struct 
>>>>> acpi_table_slit *slit)
>>>>>                   return;
>>>>>           }
>>>>>           mfn = alloc_boot_pages(PFN_UP(slit->header.length), 1);
>>>>> - acpi_slit = mfn_to_virt(mfn_x(mfn));
>>>>> + acpi_slit = vmap_contig_pages(mfn, PFN_UP(slit->header.length));
>>>>
>>>> ... with the increased use of vmap space the VA range used will need
>>>> growing. And that's perhaps better done ahead of time than late.
>>>
>>> I will have a look to increase the vmap().
>>>
>>>>
>>>>> + BUG_ON(!acpi_slit);
>>>>
>>>> Similarly relevant for the earlier patch: It would be nice if boot
>>>> failure for optional things like NUMA data could be avoided.
>>>
>>> If you can't map (or allocate the memory), then you are probably in a
>>> very bad situation because both should really not fail at boot.
>>>
>>> So I think this is correct to crash early because the admin will be able
>>> to look what went wrong. Otherwise, it may be missed in the noise.
>>
>> Well, I certainly can see one taking this view. However, at least in
>> principle allocation (or mapping) may fail _because_ of NUMA issues.
> 
> Right. I read this as the user will likely want to add "numa=off" on the 
> command line.
> 
>> At which point it would be better to boot with NUMA support turned off
> I have to disagree with "better" here. This may work for a user with a 
> handful of hosts. But for large scale setup, you will really want a 
> failure early rather than having a host booting with an expected feature 
> disabled (the NUMA issues may be a broken HW).
> 
> It is better to fail and then ask the user to specify "numa=off". At
> least the person made a conscientious decision to turn off the feature.

Yet how would the observing admin make the connection from the BUG_ON()
that triggered and the need to add "numa=off" to the command line,
without knowing Xen internals?

> I am curious to hear the opinion from the others.

So am I.

Jan

Reply via email to