On 25.09.2025 22:08, Oleksii Kurochko wrote:
> On 9/20/25 1:36 AM, Jan Beulich wrote:
>> On 17.09.2025 23:55, Oleksii Kurochko wrote:
>>> +static pte_t *p2m_get_root_pointer(struct p2m_domain *p2m, gfn_t gfn)
>>> +{
>>> +    unsigned long root_table_indx;
>>> +
>>> +    root_table_indx = gfn_x(gfn) >> P2M_LEVEL_ORDER(P2M_ROOT_LEVEL);
>>> +    if ( root_table_indx >= P2M_ROOT_PAGES )
>>> +        return NULL;
>>> +
>>> +    /*
>>> +     * The P2M root page table is extended by 2 bits, making its size 16KB
>>> +     * (instead of 4KB for non-root page tables). Therefore, p2m->root is
>>> +     * allocated as four consecutive 4KB pages (since alloc_domheap_pages()
>>> +     * only allocates 4KB pages).
>>> +     *
>>> +     * To determine which of these four 4KB pages the root_table_indx falls
>>> +     * into, we divide root_table_indx by
>>> +     * P2M_PAGETABLE_ENTRIES(P2M_ROOT_LEVEL - 1).
>>> +     */
>>> +    root_table_indx /= P2M_PAGETABLE_ENTRIES(P2M_ROOT_LEVEL - 1);
>> The subtraction of 1 here feels odd: You're after the root table's
>> number of entries, i.e. I'd expect you to pass just P2M_ROOT_LEVEL.
>> And the way P2M_PAGETABLE_ENTRIES() works also suggests so.
> 
> The purpose of this line is to select the page within the root table, which
> consists of 4 consecutive pages. However, 
> P2M_PAGETABLE_ENTRIES(P2M_ROOT_LEVEL)
> returns 2048, so root_table_idx will always be 0 after devision, which is not
> what we want.
> 
> As an alternative, P2M_PAGETABLE_ENTRIES(0) could be used, since it always
> returns 512. Dividing root_table_idx by 512 then yields the index of the page
> within the root table, which is made up of 4 consecutive pages.
> 
> Does it make sense now?

Yes and no. I understand what you're after, but that doesn't make the use of
P2M_PAGETABLE_ENTRIES() (with an arbitrary level as argument) correct. This
calculation wants doing by solely using properties of the top level.

> The problem may occur with DECLARE_OFFSET(), which can produce an incorrect
> index within the root page table. Since the index is in the range [0, 2047],
> it becomes an issue if the value is greater than 511, because DECLARE_OFFSET()
> does not account for the larger range of the root index.
> 
> I am not sure whether it is better to make DECLARE_OFFSET() generic enough
> for both P2M and Xen page tables, or to provide a separate 
> P2M_DECLARE_OFFSET()
> and use it only in P2M-related code.
> Also, it could be an option to move DECLARE_OFFSET() from asm/page.h header
> to riscv/pt.c and define another one DECLARE_OFFSETS in riscv/p2m.c.
> 
> Do you have a preference?

Not really, no. I don't like DECLARE_OFFSETS() anyway.

>>> +static int p2m_set_entry(struct p2m_domain *p2m,
>>> +                           gfn_t gfn,
>>> +                           unsigned long page_order,
>>> +                           mfn_t mfn,
>>> +                           p2m_type_t t)
>> Nit: Indentation.
>>
>>> +{
>>> +    unsigned int level;
>>> +    unsigned int target = page_order / PAGETABLE_ORDER;
>>> +    pte_t *entry, *table, orig_pte;
>>> +    int rc;
>>> +    /*
>>> +     * A mapping is removed only if the MFN is explicitly set to 
>>> INVALID_MFN.
>>> +     * Other MFNs that are considered invalid by mfn_valid() (e.g., MMIO)
>>> +     * are still allowed.
>>> +     */
>>> +    bool removing_mapping = mfn_eq(mfn, INVALID_MFN);
>>> +    DECLARE_OFFSETS(offsets, gfn_to_gaddr(gfn));
>>> +
>>> +    ASSERT(p2m_is_write_locked(p2m));
>>> +
>>> +    /*
>>> +     * Check if the level target is valid: we only support
>>> +     * 4K - 2M - 1G mapping.
>>> +     */
>>> +    ASSERT(target <= 2);
>>> +
>>> +    table = p2m_get_root_pointer(p2m, gfn);
>>> +    if ( !table )
>>> +        return -EINVAL;
>>> +
>>> +    for ( level = P2M_ROOT_LEVEL; level > target; level-- )
>>> +    {
>>> +        /*
>>> +         * Don't try to allocate intermediate page table if the mapping
>>> +         * is about to be removed.
>>> +         */
>>> +        rc = p2m_next_level(p2m, !removing_mapping,
>>> +                            level, &table, offsets[level]);
>>> +        if ( (rc == P2M_TABLE_MAP_NONE) || (rc == P2M_TABLE_MAP_NOMEM) )
>>> +        {
>>> +            rc = (rc == P2M_TABLE_MAP_NONE) ? -ENOENT : -ENOMEM;
>>> +            /*
>>> +             * We are here because p2m_next_level has failed to map
>>> +             * the intermediate page table (e.g the table does not exist
>>> +             * and they p2m tree is read-only).
>> I thought I commented on this or something similar already: Calling the
>> p2m tree "read-only" is imo misleading.
> 
> I will change then "read-only" to "not allocatable".

That'll be only marginally better: What's "allocatable"? Why not something
like "... does not exist and none should be allocated"? Or maybe simply
omit this part of the comment?

>>> +    /*
>>> +     * Free the entry only if the original pte was valid and the base
>>> +     * is different (to avoid freeing when permission is changed).
>>> +     *
>>> +     * If previously MFN 0 was mapped and it is going to be removed
>>> +     * and considering that during removing MFN 0 is used then `entry`
>>> +     * and `new_entry` will be the same and p2m_free_subtree() won't be
>>> +     * called. This case is handled explicitly.
>>> +     */
>>> +    if ( pte_is_valid(orig_pte) &&
>>> +         (!mfn_eq(pte_get_mfn(*entry), pte_get_mfn(orig_pte)) ||
>>> +          (removing_mapping && mfn_eq(pte_get_mfn(*entry), _mfn(0)))) )
>>> +        p2m_free_subtree(p2m, orig_pte, level);
>> I continue to fail to understand why the MFN would matter here.
> 
> My understanding is that if, for the same GFN, the MFN changes fromMFN_1 to
> MFN_2, then we need to update any references on the page referenced by
> |orig_pte| to ensure the proper reference counter is maintained for the page
> pointed to byMFN_1.
> 
>>   Isn't the
>> need to free strictly tied to a VALID -> NOT VALID transition? A permission
>> change simply retains the VALID state of an entry.
> 
> It covers a case when removing happens and probably in this case we don't need
> to check specifically for mfn(0) case "mfn_eq(pte_get_mfn(*entry), _mfn(0))",
> but it would be enough to check that pte_is_valid(entry) instead:
>    ...
>    (removing_mapping && !pte_is_valid(entry)))) )
> 
> Or only check removing_mapping variable as `entry` would be invalided by the
> code above anyway. So we will get:
> +    if ( pte_is_valid(orig_pte) &&
> +         (!mfn_eq(pte_get_mfn(*entry), pte_get_mfn(orig_pte)) || 
> removing_mapping) )
> +        p2m_free_subtree(p2m, orig_pte, level);
> 
> Does it make sense now?

Not really, sorry. Imo the complicated condition indicates that something is
wrong (or at least inefficient) here.

Jan

Reply via email to