On 07/12/2022 09:41, Jan Beulich wrote:
> On 06.12.2022 18:55, Demi Marie Obenour wrote:
>> On Tue, Dec 06, 2022 at 12:06:24PM +0000, Andrew Cooper wrote:
>>> On 06/12/2022 04:33, Demi Marie Obenour wrote:
>>>> @@ -961,13 +1000,24 @@ get_page_from_l1e(
>>>>
>>>> switch ( l1f & PAGE_CACHE_ATTRS )
>>>> {
>>>> - case _PAGE_WB:
>>>> + default:
>>>> +#ifndef NDEBUG
>>>> + printk(XENLOG_G_WARNING
>>>> + "d%d: Guest tried to use bad cachability attribute %u
>>>> for MFN %lx\n",
>>>> + l1e_owner->domain_id, l1f & PAGE_CACHE_ATTRS, mfn);
>>> %pd. You absolutely want to convert the PTE bits to a PAT value before
>>> priniting (decimal on a PTE value is useless), and %PRI_mfn.
>> I’ll have to look at the rest of the Xen tree to see how to use this.
>>
>>>> + pv_inject_hw_exception(TRAP_gp_fault, 0);
>>> As I said on IRC, we do want this, but I'm not certain if we can get
>>> away with just enabling it in debug builds. _PAGE_GNTTAB was ok because
>>> it has always been like that, but there's a non-trivial chance that
>>> there are existing dom0 kernels which violate this constraint.
>> I chose this approach because it is very simple to implement. Anything
>> more complex ought to be in a separate patch, IMO.
>>
>>>> + return -EINVAL;
>>>> +#endif
> As to "complex": Just the "return -EINVAL" would be yet less complex.
> Irrespective of the remark my understanding of Andrew's response is that
> the concern extends to the error return as well.
It's a tradeoff.
From a debugging point of view, #GP is absolutely the way to go, because
you get a full backtrace out in Linux. It doesn't matter in the
slightest which side of the SYSCALL instruction it points at, because
that's not the interesting information to look at.
Returning -EINVAL stops the problematic cache attributes from being
used, but is far more variable in terms of noticeable change inside
Linux. It ranges from hitting a BUG(), to having the return code lost.
In this case, I'd err on the side of a #GP fault because it is a
definite error in the guest needing debugging and fixing. But as I said
originally, it probably needs to be on-by-default with a knob to disable.
~Andrew