On 19.02.2021 14:18, Jürgen Groß wrote:
> On 19.02.21 14:10, Jan Beulich wrote:
>> On 19.02.2021 13:48, Jürgen Groß wrote:
>>> On 17.02.21 14:48, Marek Marczykowski-Górecki wrote:
>>>> On Wed, Feb 17, 2021 at 07:51:42AM +0100, Jürgen Groß wrote:
>>>>> On 17.02.21 06:12, Marek Marczykowski-Górecki wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm observing Linux PV/PVH guest crash when I resume it from sleep. I do
>>>>>> this with:
>>>>>>
>>>>>>        virsh -c xen dompmsuspend <vmname> mem
>>>>>>        virsh -c xen dompmwakeup <vmname>
>>>>>>
>>>>>> But it's possible to trigger it with plain xl too:
>>>>>>
>>>>>>        xl save -c <vmname> <some-file>
>>>>>>
>>>>>> The same on HVM works fine.
>>>>>>
>>>>>> This is on Xen 4.14.1, and with guest kernel 5.4.90, the same happens
>>>>>> with 5.4.98. Dom0 kernel is the same, but I'm not sure if that's
>>>>>> relevant here. I can reliably reproduce it.
>>>>>
>>>>> This is already on my list of issues to look at.
>>>>>
>>>>> The problem seems to be related to the XSA-332 patches. You could try
>>>>> the patches I've sent out recently addressing other fallout from XSA-332
>>>>> which _might_ fix this issue, too:
>>>>>
>>>>> https://patchew.org/Xen/[email protected]/
>>>>
>>>> Thanks for the patches. Sadly it doesn't change anything - I get exactly
>>>> the same crash. I applied that on top of 5.11-rc7 (that's what I had
>>>> handy). If you think there may be a difference with the final 5.11 or
>>>> another branch, please let me know.
>>>>
>>>
>>> Some more tests reveal that this seems to be s hypervisor regression.
>>> I can reproduce the very same problem with a 4.12 kernel from 2019.
>>>
>>> It seems as if the EVTCHNOP_init_control hypercall is returning
>>> -EINVAL when the domain is continuing to run after the suspend
>>> hypercall (in contrast to the case where a new domain has been created
>>> when doing a "xl restore").
>>
>> But when you resume the same domain, the kernel isn't supposed to
>> call EVTCHNOP_init_control, as that's a one time operation (per
>> vCPU, and unless EVTCHNOP_reset was called of course). In the
>> hypervisor map_control_block() has (always had) as its first step
>>
>>      if ( v->evtchn_fifo->control_block )
>>          return -EINVAL;
>>
>> Re-setup is needed only when resuming in a new domain.
> 
> But the same guest will not crash when doing the same on a 4.12
> hypervisor.

Is the kernel perhaps not given the bit of information anymore that
it needs to tell apart the two resume modes?

Jan

Reply via email to