On 05.03.2026 15:54, Bernhard Kaindl wrote:
> Jan Beulich wrote:
>> On 05.03.2026 14:12, Bernhard Kaindl wrote:
>>>
>>> Roger requested the domctl API to allow claiming from multiple nodes in one 
>>> go
>>> and he specified that we should focus on getting the implementation for one
>>> node-specific claim done first before we dive into multi-node claims code.
>>>
>>> - Instead of adding/linking an array of claims to struct domain, we can keep
>>>   using d->outstanding_pages for the single-node claim.
>>>
>>> - There are numerous comments and questions for this minimal implementation.
>>>   If we'd add multi-node claims to it, this review may become even more 
>>> complex.
>>>
>>> - The single-node claims backend contains the infrastructure and multi-node
>>>   claims would be an extension on top of that infrastructure.
>>
>> What is at the very least needed is an outline of how multi-node claims are
>> intended to work. This is because what you do here needs to fit that scheme.
>> Which in turn I think is going to be difficult when for a domain more memory
>> is needed than any single node can supply. Hence why I think that you may
>> not be able to get away with just single-node claims, no matter that this
>> of course complicates things.
>>
>> It's also not quite clear to me how multiple successive claims against
>> distinct nodes would work (which isn't all that different from a multi-node
>> claim).
>>
>> Thinking of it, interaction with the existing mem-op also wants clarifying.
>> Imo only one of the two ought to be usable on a single domain.
> 
> Yes, correct. As implemented by Xen in domain_set_outstanding_claims(),
> Xen claims work very different from something like an allocation:
> 
> For example, when you allocate, you get memory, and when you repeat,
> you have a bigger allocation.
> 
> But Xen claims in domain_set_outstanding_claims() don't work like that:
> 
> - When a domain has a claim, domain_set_outstanding_claims() only allows
>   to reset the claim to 0, nothing else. A second, or changed claim is not
>   possible. I think this was intentional:
> 
>   - domain_set_outstanding_claims() rejects increasing/reducing a claim:
> 
>     A claim is designed to be made by domain build when the size of the
>     domain is known. There is no tweaking it afterwards: The needed pages
>     shall be claimed by the domain builder before the domain is built.
>     
>     Note: The claims are not only consumed when populating guest memory:
>     Claims are also (at least attempted to be) consumed when Xen needs to
>     allocate memory for other resources of the domain. For this reason,
>     the domain builder needs to add some headroom for allocations done by
>     Xen for creating the domain.
> 
>     When the domain builder has finished building the domain, it is expected
>     to reset the claim to release any not consumed headroom it added.
> 
>   - If a domain already has memory when the domain builder stakes a claim
>     for completing the build of the domain, the outstanding_claims are set
>     to the target value of the claim call, minus domain_tot_pages(d), so
>     already allocated memory does not contribute to a bigger total booking.
> 
> For NUMA claims and global host-level claims, it is similar:
> 
> A NUMA node-specific claim is implicitly also added to the global
> host-level outstanding_claims of the host, as a Node-specific memory
> is also part of the host's memory, so the host-level claims protection
> does not have to also check for node-specific claims:
> 
> The effect of host-level claim is also given when you make a node-level claim.
> 
> When a domain one kind of claim, it does not make a lot of sense to then
> later add a differently sized claim for another target. Like described in
> how domain_set_outstanding_claims() is implemented, a domain builder stakes
> a claim once, then builds the domain, then resets it, and that's all to it.
> 
> For example, Xapi toolstack and libxenguest have calls to claim memory,
> but in any given configuration, only the first actor to claim memory for
> a domain is the one who defines the claim: No mixing, changing, updating.
> It makes things clear that the initial creator did make the claim.
> 
> Similar for multi-node claims:
> 
> Roger described how he wants this API do work here:
> https://lists.xenproject.org/archives/html/xen-devel/2025-06/msg00484.html

Fits my understanding, but doesn't fit you limiting the new sub-op to a
single node. As said, if you introduce the new sub-op this way, I'd still
expect for a single domain to have claims across multiple nodes, and
that (preferably) whatever the caller does to achieve that will continue
to work once the restriction is lifted.

Yet I can't see you describe such claims-on-multiple-nodes use case in
of your reply above. And indeed to achieve that you'd need data layout
changes, in particular there then couldn't be any single d->claim_node.

>> Ideally, we would need to introduce a new hypercall that allows making
>> claims from multiple nodes in a single locked region, as to ensure
>> success or failure in an atomic way.
> 
> In the locked region (inside heap_lock), we can check the claims requests
> against existing claims and memory of the affected nodes and determine if
> the claim call is a go or a no-go. If it is a go, we update all counters
> which are all protected by the heap_lock and are done.

Yet as per above, afaics you don't even have the needed data layout to
record two (or more) claims against distinct nodes.

Jan

Reply via email to